Steem Analytics - Distribution of Earnings


SteemAnalytics.png

This post is an investigative analysis. It forms part of a series in which I attempt to paint a broad picture of the Steem economy.


Introduction

I have been planning a series of analyses for a while. The aims are to paint a broad picture of the Steem economy, looking at where we are now and at where Steem might be headed.

In this first analysis I consider the distribution of earnings.

  • How much do users typically earn through posting and commenting on the Steem blockchain?
  • Is there an even distribution of rewards? Are most payouts taken by high earning accounts? Or do low earners collectively take home the largest piece of the pie?
  • How do curation and benefactor rewards impact the picture?

0. Data

I have based the earnings analysis on data from the first two weeks of September (September 1 - September 14 inclusive).

The value of post rewards is heavily influenced by the price of Steem. Given the high volatility of the crypto markets different data periods will produce significantly different total rewards. However in this analysis I am mainly interested in the comparative spread of earnings which should hopefully have greater stability. As a check that the data period is representative I have repeated the study with data from the first two weeks of August.

I decided to use posts and comments created in the 14 day period rather than posts and comments paid out over the 14 days, i.e. earnings accrued by activity rather than earnings accrued by payment. I believe that either approach would be appropriate.

All earnings amounts are expressed in STU throughout this analysis. Payouts in Vests, Steem and SBD are converted to STU using factors derived empirically for each hour of the 14 day period.

1. Distribution of earnings - Author earnings:

The first task was to produce a distribution of author earnings, i.e. earnings from posting and commenting. This was achieved by summing the author earnings for each user over the 14 days and then grouping users into buckets: users that earn <$1, users that earn <$2, and so forth. The number of users in each bucket provides the distribution.

However it is difficult to get a clear visualisation on this distribution of author user earnings. The distribution is broadly (negative) exponential, with the vast majority of users at the lower end of the scale, and with a long tail.

Using buckets of $50 (<$50 earned over the 14 days, <$100, ... up to $1500+) produces the following chart under a linear y-axis scale:

authorEarningsLinSep.png

Not particularly informative. Switching to a log scale for the y-axis makes the data visible but is not intuitive to read:

authorEarningsLogSep.png

Using small buckets (<$1, <$2 and so on) reduces the volume of the first bucket but only to a limited extent. The full chart on this basis would require a vast number of buckets (here are the first 50):

authorEarnings1BucketSep.png

Finally I have decided to create bespoke buckets that I feel are intuitive and best illustrate the distribution of earnings. The buckets are as follows:

  • Less than $1 in total over the 14 days;
  • Less than $1 per day (on average, i.e. $1-$14 in total);
  • $1 - $5 per day (on average, i.e. $14 - $70 in total); and
  • More than $5 per day (on average, i.e. $70+ in total).

authorEarningsBespoke.png

September 1-14

Already we have some interesting information. As can be seen, 71% of users earned less than $1 in author rewards in total over the 14 days. Another 19% earned less than $1 per day on average. 7% of users earned $1 - $5 per day on average and only 3% of users earned $5 per day or more (i.e. $70 for the month).

To provide evidence that this data period was not unrepresentative I repeated the exercise for the first two weeks of August, the prior month. The distribution, which is broadly similar, is shown below:

authorEarningsBespokeAug.png

August 1-14

2. Spread of earnings - Author earnings:

The above analysis shows how many users earn at each earnings level (or bucket) over the 14 day period. But I was also interested in how the overall earnings were distributed across the different earning levels. Are most rewards distributed to the high earners? Or do lower earners take home the largest piece of the pie?

To generate this earnings spread distribution I replaced the count of users within each bucket with the sum of author earnings from all users in each bucket. The chart based on $50 buckets is as follows:

authorearningsSumSep.png

Being inquisitive by nature I had a look through the users in the $1500+ bucket (users earning in excess of $100 per day on average). After a brief investigation it became clear that I needed to remove the impact of voting bots.

3. Removing the impact of voting bots

As most readers will be aware, a fairly significant proportion of upvotes on Steem are currently purchased from voting bots. The use of voting bots can exaggerate a user's earnings as measured by post payout information, since the votes need to be paid for, reducing the user's net earnings. I decided that this element needed to be removed before I progressed any further.

In order to remove the impact of voting bots I used the following steps:

  • Creation of a list of voting bots;
  • Capture of all votes from each voting bot made on the posts included in the above author rewards analysis;
  • Calculation of the value of each vote;
  • Aggregation of these vote values by author;
  • Merging of the array of earnings by author and the array of voting bot deductions by author;
  • Creation of the earnings distribution from the new merged array of net author earnings.

To illustrate the impact here is the earnings distribution from section 2 restated with the voting bot votes removed:

authorbidbotearningssumSep.png

There are a fair number of limitations here:

  • A more interesting approach might have been to deduct the amounts paid for each voting bot upvote and thus include voting bot profits and losses in the earnings distribution. However I was limited here by difficulties with the fx rates to convert between Steem and SBD transfers and the STU rewards.
  • I only excluded bid-bots but did not adjust for voting bots such as minnowbooster.
  • I probably should have added the voting bot deductions to the relevant vote-bot accounts in the earnings distribution. This would produce a more complete earnings distribution.

Plenty to work on in future!

4. Spread of earnings - bespoke buckets

Back to our bespoke buckets! We can now compare the count of users in each bucket with the sum of earnings in each bucket:

authorcountvearningsv2.png

It's an interesting chart with some tasty soundbites:

  • 2% of accounts (970 users) earn 57% of author rewards.
  • 9% of accounts (3907 users) earn 85% or author rewards.
  • 72% of accounts took home 1.5% of author rewards between them.

5. Curation and Benefactor rewards

Finally, how do curation and benefactor rewards impact the picture?

The distribution of curation rewards was included in the overall distribution by:

  • Taking each vote on posts included within the author earnings analysis (so the analysis considers a set of posts in completeness rather than votes made within the two week period - this felt like a more solid approach);
  • Capturing all votes with curation rewards (in Vests);
  • Translating the Vests rewards to STU;
  • Aggregating the curation rewards by user;
  • Merging of the array of curation rewards earnings by author and the array of author rewards earnings;
  • Creation of the earnings distribution from the new merged array.

Benfactor rewards were included using a broadly similar approach.

I have produced charts of the $50 bucket distribution:

50bucketscombinedSep.png

And the bespoke buckets distribution:

combinedbucketsSep.png

As can be seen, both curation rewards and benefactor rewards increase the proportion of rewards heading to high earning accounts.

6. Conclusions

In response to the original questions:

How much do users typically earn through posting and commenting on the Steem blockchain?
The majority of accounts earn very little. However it is too early to draw much in the way of conclusions from this data. Are these accounts new users? Are they bots? How much did they post? Are they really dedicated users gaining no rewards? Only a more in-depth study of these accounts would provide these answers.

Is there an even distribution of rewards? Are most payouts taken by high earning accounts? Or do low earners collectively take home the largest piece of the pie?
A small number of accounts, approximately 1000, claimed the majority of rewards (57% of net author rewards, or 67% once curation and benefactor rewards are included). It looks fairly safe to conclude that most payouts are taken by a small number of high earning accounts.

How do curation and benefactor rewards impact the picture?
Curation and benefactor rewards both skew the distribution towards high-earning accounts. This is an unexpected conclusion.


Next steps

In the next installment of this investigative series I will look at how earnings impact user retention and consider how many users the Steem blockchain can support.


Tools and Scripts

gears_blockops_green.jpg

I used the block.ops analysis system to produce this study. Block.ops is an open-source analysis tool designed for heavy-duty analyses of the Steem blockchain data.

You can find the repository for block.ops here:
https://github.com/miniature-tiger/block.ops

The study can be recreated by:

  • Loading the data for the relevant time period into block.ops.
  • Using the earningsdistribution command from the command line, for example:
    $ node blockOps earningsdistribution "2018-09-01" "2018-09-15"

Block.ops stores all posts and comments from the period in a MongoDB collection and the "earningsdistribution" command runs aggregation queries to summarise the results, then post-processes to export the results to csv. Payout amounts are converted to STU using hourly fx factors derived from actual posts. I used the mac numbers spreadsheet tool for the chart illustrations. Eventually I will build my own charts for use with block.ops.


Relevant Links and Resources

Links are provided in the text.


Repository

https://github.com/steemit/steem

This analysis is of data from the Steem blockchain which is an open source project.


Thanks for reading!

H2
H3
H4
3 columns
2 columns
1 column
32 Comments