Steemit has a love/hate relationship with bots. There are good bots like @cheetah that try to make steemit more enjoyable for everyone, and evil bots like @rickydevil that downvotes en masse.
NOTE: I DON'T HAVE ANYTHING AGAINST BOTS. THIS WAS JUST A FUN, INTELLECTUAL CHALLENGE!
In my previous post I presented some graphs that showed the hourly distribution of upvoting by whales for each day of the week.
The graphs (example above) showing the known bots were clearly different from regular users, so I wondered: is it possible to identify bots using Python?
Needle in a haystack?
Instead of looking at upvote distribution for a single user over a long time period, which is dreadfully slow, I compared the sum of all upvotes by all users using the median absolute deviation to find outliers. This can be run against much smaller datasets, making it much faster.
Running this for all blocks in the fifteen minutes (approximately 300 blocks) I got the following results:
Outliers (candidate bots) are shown in red. The top graph represents the outliers based on a mean absolute deviation that includes all samples; the bottom graph is based on a mean absolute deviation that excludes all votes under the mean. This removes low-values outliers improving the results slightly.
There were 886 upvotes in by 351 users; the data is skewed to the right. This is what you'd expect if most users are human - they read some articles and up-vote, then get on with other things.
So how many bots did it find? In total 11 (potential) bots were found including upvotes under the mean; only two without:
Getting stats from block 3989048 to block 3989348
count mad mad2
voter
activcat 30 14.0 13.0
boy 13 5.5 4.5
bue 13 5.5 4.5
bue-witness 13 5.5 4.5
bunny 13 5.5 4.5
daniel.pan 13 5.5 4.5
healthcare 13 5.5 4.5
helen.tan 14 6.0 5.0
mini 13 5.5 4.5
moon 14 6.0 5.0
vukasin 41 19.5 18.5
11 (potential) bots found
count mad mad2
voter
activcat 30 14.0 13.0
vukasin 41 19.5 18.5
2 (potential) bots found
Are they really bots?
Verification isn't easy. Some users upvote often by hand to try and get as many curation rewards as possible and these lie on the cut-off point. Looking at their activity on steemd.com they all have similar profiles that are dominated by upvotes:
Perhaps the most interesting result is what happens when you look at the distribution graphs for these users using the code from the previous post.
The two users found by removing the upvote counts below the mean don't behave like bots at all:
But all the users found including users with upvotes below the mean (excluding the two users above, which were also in this set) look an awful lot like bots - THEY RARELY SLEEP (not all graphs are shown)!
And these bot look-a-likes also appear to be voting in an identical manner. Are they owned by the same user? Are they coincidentally just watching the same set of authors waiting to upvote? And what the hell is going on during the weekend?
It's also possible that the first two users are relatively new and the bots haven't done enough upvoting yet to look like bots. Only time will tell.
How many bots are there?
Now this is the burning question! Based on this data, which is a really, really small sample size and doesn't tell the whole picture by any stretch of the imagination, 2% of the users were identified as potential bots.
I would love to see some proper statistics done on the blockchain to find this out!
Show me the code
You can find the code for everything in the jonblack/steem-data repository. This is much easier than posting it here. Fork to your hearts content!
Like my post? Don't forget to follow me!
Big shout out to @klye for the banner image. Fantastic stuff!