HF20 Exploratory Data Analysis on Proposed PayOut Changes

The upcoming HF20 proposes to make changes in how pay-out is rewarded.  The proposed changes will affect posts in which there are votes cast in the first 15 minutes.  

Repository

https://github.com/steemit/steem 

Currently rewards from votes cast in the first 30 min are on a curve on the split between Author and Curator.  it's an increasing % to the curator from the first minute, 0 at 0 mins, 100% from 30 mins on wards.    Under the proposed changes, for votes made in the first 15 min the author will no longer get these rewards but instead the will be redistributed between all posts with pay-outs.  The reasons given for this change are to discourage self-voting.

This has led to a post by @tcpolymath and a response by @davemccoy where the discussion turns to how the rewards are redistributed.  From these conversations the concerns are that these changes will redistribute the rewards to those that are already earning a lot, and the rich will just get richer.

I would suggest that you have a read of both posts and the conversations that take place in the comments

/@tcpolymath/this-post-will-exist-in-fifteen-minutes

/@davemccoy/new-ad-slogan-steemit-where-the-rich-get-richer

Its is because of these posts that I decided to take a look at the data.

Aim of Analysis

This is an exploratory analysis.  The aim was to get an indication of the level of votes placed on posts within the first 15 and 30 min to see how much of a problem early voting is.  I also wanted to try an establish who votes early and get an idea of how much this drains from the rewards pool.  From here I wanted to use this information to draw a conclusion on if HF20 will make the rich richer or have any impact at all.

As this was an exploratory analysis only a small sample of data has been taken.  Exploratory analysis allows you use visualizations to get an understanding of data.  It does not aim to give a detailed review or accurate calculations.  Exploratory analysis are often used to spot things in the data that would require further analysis.

I have taken data for 2nd June and the sample is 130K posts that received 468K votes

As not to distract from the finding, details of the data queries can be found on the bottom of this report. However, within the analysis you will find the steps and methodology taken.

The Analysis

First, I wanted to look at the distribution of vote timings in periods of 30 minutes. To do this I took the vote time less the post time to get the duration and grouped this into bins of 30 minutes.

 Below you can see a rather skewed graph as 223K votes were cast in the first 30 min of posting.  The skewness is so great the chart is hardly readable.

 

So, to get a better understanding I created a pie chart showing the % of votes made for each 30 min time period after the vote was made.  The first surprise was a limitation in Power BI, it seems that 48 data points in a pie chart requires too much colour and some brackets, such as 30 – 60 min are showing up white.

However, what is clear from this is that 47.74% of votes are placed within the first 30 minutes of posting.

 

What about self-votes, what % of self-votes are cast in the first 30 min?

 

Wow almost 84% of self-votes are made in the first 30 min.  That is rather interesting and sheds a little light on reasons why Steemit inc have proposed these changes.

But the changes are only on votes made in the first 15 minutes, so to gauge this, I changed the bin size for grouping from 30 to 15.  Again, the histogram was skewed to far to be able to read it properly so I prepare pie charts which still had the limitation above but gave me a clearer picture.

27% of votes were cast in the first 15 min

 

And if we look at self-votes 78% of self-votes are made in the first 15 min.

 

Next, I wanted to see who votes in the first 15 min and with what weight.  First, I sorted the data by the number of votes in the first 15 Minutes

 

Then by the Steempower controlled by the account

 

But this does not really give me an indication of the effect on the vote values or rewards pool and from the data taken it would not be possible to calculate the actual worth of each vote. So, I decided to try and work out an approximate value for all votes given in the first 15 min.

As I have the number of votes in the first 15 min, the controlling steem power for each of the accounts and the average weight, I can use this to calculate the effect Steem power for the votes cast in the first 15 min.

No of votes * Steempower * average weight

Using this effective steem power I can now plot this by pie chart and see who contributes the most in vote value on votes made in the first 15 minutes.

 It is worth pointing out that the value of a vote is dependent on many factors, including the voting power.  With out the voting power at time of vote, any calculations for here are very general.

If ranchorelaxo contributed 4.94% of the SP used for votes in the first 15 min and this account voted 5 times in the first 15 min at 100% power with a current vote worth of $103 then we can say 4.94% of votes made in the first 15 min was worth $515 then 100% would be worth $10,400

It has been said that this will be redistributed between posts that have no votes in 15 min.  From the sample data there were 130K posts, of which 29K had a vote in the first 15 minutes.  That is 22% of all posts.

Here are the top authors in May 

Making an assumption that the top authors will not vote in the first 15 minutes we can see from the % column how much of the distribution each author would receive.


However, as it currently stands many of the top paid authors have votes in the first 15 minutes. 

Conclusion

27% of votes cast are cast in the first 15 minutes and 78% of self-votes are also made in this time.  It would make sense for a change in the code to penalise these early votes so the favour is not for the self-voter.

However, I noted from looking at the data, the are many bots involved in the early voting, including paid voting bots.  It would be easy for the bot owners to change their code so that the vote takes place after the first 15 minutes.

What we could see happen is some of the older bots non-updated and fade out, but to be honest I don’t see much of this happening.

I would say a considerable number of votes give in the first 15 min are also auto votes, from which the setting can be changed if and when the HF takes place.

We are currently looking at around $10.5K a day in direct vote worth so about $315K a month. Based on Mays posts this will about 10% of total post pay-outs.   Given that many of the votes are auto/bot and easily changed, this will leave the manual voter that is not aware of changes.   Steemit could easily implement something into the UI that warns people that try to vote before the 15 min of the consequences with the aim of education.

 The concern in the posts from @tcpolymath and @davemccoy that the rich will get richer is not really substantiated. Looking at the data it’s the rich early voting that have the most impact on the values in the first place. Now if these were to continue to vote within 15 min after the HF, then it’s the poor that will benefit.  

Given that most of the bots and auto votes will be updated and votes moved passed the 15 min cut off, I don’t see this HF change making any real impact either side of the fence.

As an exploratory analysis would normally highlight things to investigate further, I would not take this analysis any further because the main players involved will quickly change their voting when HF20 comes in and so further analysis on this history data do not make much sense.

Data Queries

I use Power BI to connect to SteemSQL using M lanugage.  The following codes were used to extract and transform the data

Votes query

let
    Source = Sql.Database("vip.steemsql.com", "DBsteem", [Query="select *#(lf)from txvotes [NOLOCK]#(lf)where timestamp = CONVERT(DATE,'2018-07-02')"]),
    #"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"timestamp", type text}}, "en-IE"), "timestamp", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"timestamp.1", "timestamp.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"timestamp.1", type date}, {"timestamp.2", type time}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"timestamp.1", "date"}, {"timestamp.2", "time"}}),
    #"Added Custom" = Table.AddColumn(#"Renamed Columns", "% weight", each [weight]/10000),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"% weight", Percentage.Type}})
in
    #"Changed Type1"

Posts query

let
    Source = Sql.Database("vip.steemsql.com", "DBsteem", [Query="select author, permlink, created#(lf)from comments [NOLOCK]#(lf)where created = CONVERT(DATE,'2018-07-02')"]),
    #"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"created", type text}}, "en-IE"), "created", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"created.1", "created.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"created.1", type date}, {"created.2", type time}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"created.1", "date"}, {"created.2", "time"}}),
    #"Removed Duplicates" = Table.Distinct(#"Renamed Columns", {"permlink"})
in
    #"Removed Duplicates"

and to get the SP controlled I connected to the accounts table using 

let
    Source = Sql.Database("vip.steemsql.com", "DBsteem", [Query="select name, vesting_shares, delegated_vesting_shares, received_vesting_shares#(lf)from accounts [NOLOCK]#(lf)"]),
    #"Replaced Value" = Table.ReplaceValue(Source,"VESTS","",Replacer.ReplaceText,{"vesting_shares", "delegated_vesting_shares", "received_vesting_shares"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Replaced Value",{{"vesting_shares", type number}, {"delegated_vesting_shares", type number}, {"received_vesting_shares", type number}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "total vests", each [vesting_shares]-[delegated_vesting_shares]+[received_vesting_shares]),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "SteemPower", each [total vests]*.000492)
in
    #"Added Custom1"

From here I created relationships in the model and used DAX for calculations.

H2
H3
H4
3 columns
2 columns
1 column
84 Comments