RE: Steem Sincerity - Working with SteemPlus to crowdsource spam classification training data

You are viewing a single comment's thread from:

RE: Steem Sincerity - Working with SteemPlus to crowdsource spam classification training data

View the full context

fraenk (63)in #steemdev • 6 years ago (edited)

something needs to be done to seriously improve that classification.

bots that give 3 times as much upvotes than comments (and we are talking just a few hundreds) are classified as top 500 spammers with a 1.0 spammer score (yes, I am talking about my poor @cuddlekitten) while actual spambots leaving thousands of identical comments get a 0.42 human score (see @tomole444).

For the time being maybe you could put a huge warning sticker on the API as it's still pretty damn inaccurate. A lot of people are already embracing the API, and yes, it is a great step forward for the steem ecosystem... but with such inaccuracies it could be quite damaging to the wrong accounts.

6 years ago in #steemdev by fraenk (63)

$0.04

3 votes

Sort:

Trending

[-]

andybets (62) 6 years ago

Thanks for the feedback. I'm working on improving classification, I have presented the limitations of the software in my posts about the subject. I agree that the classification for tomole444 is not correct, and I also disagree with the scores received by cuddlekitten.

$0.04

1 vote

[-]

fraenk (63) 6 years ago

I think there should be a stronger biasing of the data towards evaluating total number of comments and downvote-ratio.

I'll be curious as to how this progresses, but I do indeed think it should be used with caution as it is, as this becomes available to "end-users" already the labels will be taken as true-beyond-doubt and cuddlekitten has been made aware of her new spammer-label by a few confused cuddle friends already.

$0.00