Steemit/Steem: Reputation Voting Analysis

This is an analysis of the voting patterns of the 'Reputation' number given to each account on the Steem Blockchain.

The aim of this analysis is to find out if there any patterns with regards to the voting 'style' of the Reputation number, with a focus around the following:

  • Do reputations like to vote for the same reputations?

  • If so, to what extent, and is this the same across all reputations assessed?

  • With 'self-voting' excluded, do the same patterns (if any) take shape?


title.png


Background

The data collected for this analysis covers one week from the time of script execution - 5,087,078 votes in 7 days, prior to further data removal as stated in the following points.

The reputation range has been limited to between 40 and 78 - 78 being the highest reputation of an account at this time.

The reputation numbers used is from the 'accounts' table.

No account names are listed in this analysis.

Every Bid-bot listed at https://steembottracker.com/ was removed from the original data set.


Sourcing the data

The following query was used to obtain the base data to be analysed:

image.png

This data was stored locally to perform additional queries as below:

image.png


Presentation

The first table and chart show the total vote weight as a %, given to equal reputations.

e.g. Of all the accounts at 64 reputation, 15% of the total vote weight was given to accounts with a reputation of 64.

1.png

With an average around 16% across all the reputations, 75, 76, 77 stand out from this figure.

Reputation 75, and 77 have the lowest weight % going to the same reputation, while Reputation 76 has the highest of all the reputations analysed at almost 48%.


Next, the same table format, but this time including the reputation that received the most vote weight by each reputation.

Table 1 - Votes to Self included

image.png

As we can see, the table is identical bar one row.

All reputations except for 77, vote with the highest total weight percentage to the same reputation.


This time, let's take a look at the same dataset, and exclude the votes that are to the same account: Voter <> Author.

Table 2 - Votes to Self excluded

image.png

image.png

Now we have a much lower 'average % of vote to the same reputation'.

With self-votes included this was 16.2%, and now excluding votes to self, 6.1%

The highest voting % given to a reputation, is Rep 76, distributing 31% of their total vote weight to reputation 74.

And, without votes to self, only 4 reputations (43, 51, 54, 60) give the highest vote weight % to the same reputation.


Analysis Summary

Looking at Table 1, and excluding further analysis regarding sock puppet accounts (alternate accounts with same owner), it looks fairly clear that many users vote for themselves. Only reputation 77 distributes a higher vote weight % to a reputation not equal to its own.

Only 3 accounts have reputation 77 at the time of this analysis, and a certain 'Daddy Chilli' has 0.0% votes to self in the past 7 days. A round of applause from the analyst today.

The highest %'s by a distance in table 1 fall under reputations 76 and 78. There are 6 accounts with a reputation of 76, and 1 account with a reputation of 78. It is safe to say there is an above average amount of self-voting within these reputation levels.

Table 2 is worth further discussion. It is possible to see where the reputations' (or accounts) choose to send their votes when not voting for their own content.

Reputations 40-44 all vote to reputations in the same range. This could due to small accounts helping each other, but another reason could be that these reputations are sock puppet accounts, and are voting (with low SP) each other. As there is no option to filter content by reputation, and usually the higher reputations sit at the top of any given tag/link, the second reason seems the more likely option.

Reputations 45 and 46 both vote for reputation 58 the most. This stands out in the dataset but the analyst has no obvious conclusions as to why. Anyone?

Reputations 50-59, as with 40-44, all vote with the most weight to reputations in the same range. It is less likely that these accounts are sock puppets, and possible that in this Reputation range, communities and friendships are starting to form.

Reputations 60-70 as above, all vote within the same range apart from Rep 68 that votes with the most weight to Reputation 74. This is another anomaly in the dataset that is difficult to explain without further analysis into account details.

Reputations 71,72,73,75,77, and 78 all vote with most weight to Reputations in the 60's. This makes sense as there are much more 'established' user accounts in this range. Apart from Rep 78, the top weight % is fairly low at around 6-7%, showing a reasonable spread is likely.

78, the highest Reputation account and the only one at this level, votes with an particularly high voting weight % to reputation 63. This stands out, and with an own reputation vote weight, taken from table 1, of 27% and knowing this is the only account at 78, it looks like this account does not spread their vote weight around too much.

Reputations 74 and 76 both vote for reputations in the 70's. The stand out figure is the vote weight % of reputation 76, using 31% of their vote total weight to reputation 74.

As there are fewer accounts up at these levels, it is 'strange' to see reputations in the 70's voting with a largest % of vote weight to 70+ reputations, particularly when self-votes are removed.


Summary

Across the total Reputation dataset, voting of same reputations (and to self) is common.

When self-voting data is removed, same rep highest vote weight %'s fall from 37 (out of 38) to 4, and this % falls on average from 16.2% to 6.1%.

Suspicious voting patterns relating to account Reputation appear at the very bottom of the sample, and at the top.


Tools used to gather this data and compile report

The data is sourced from SteemSQL - A publicly available SQL database with all the blockchain data held within.

The SQL queries to extra to the data have been produced in both SQL Server Personal Edition and LINQPAD 5.

The charts used to present the data were produced using MS Excel.

This data was compiled on the 4th March 2018 at 8pm (UCT)

I am part of a Steemit Business Intelligence community. We all post under the tag #blockchainbi. If you have analysis you would like to be carried out on utopian-io/Steem data, please do contact me or any of the #blockchainbi team and we will do our best to help you.


Thanks

Asher @abh12345



Posted on Utopian.io - Rewarding Open Source Contributors

H2
H3
H4
3 columns
2 columns
1 column
89 Comments