Towards a reputation system suitable for SteemSTEM

Although several reputation systems for Steem have been proposed (some of them even very recently), I believe that none of them are appropriate for implementation as a reputation metric that could be potentially used on our up-coming SteemSTEM app (to be officially released very soon).


[Credits: @hightouch]

Therefore, I decided to build one myself. Here I share the code and its ingredients, and would love to read comments or suggestions for potential improvements. The code is open (available from this GitHub repository) and can be used by anyone freely.

What will SteemSTEM do with such reputation indicators for now? Well, I don’t know (yet!), but it was fun to develop ;)

A good reputation for the SteemSTEM community must in my opinion include two ingredients, an authorship component and an engagement component, contributing in equal parts. Indeed, we absolutely need both authors to provide amazing contributions, and readers questioning authors and entertaining worthy discussions. I suspect that holds true for any community by the way.


AUTHORSHIP INDICATOR


The authorship metric is built from a few key principles.

Each SteemSTEM vote on a post at x% gives x reputation points at the time of the vote. The SteemSTEM curation team scours Steem to find the best STEM content. As all these blogs contributed to what SteemSTEM has become today, it makes sense they all enter any given reputation indicator.

It would be weird to assign a large reputation score to someone who contributed a lot two years ago but then left Steem. However, the reputation of that person should be somehow non negligible. After all, this person has left a trace in our memories. For this reason, I decided to introduce reputation points which vary with time. After a given time (the authorship point half-life), one point loses half its value. After twice the time, the remaining 0.5 points are 1/4 point worth, and so on. A simple exponential decay.

As SteemSTEM strives to push for quality as much as possible,e prefer someone writing one excellent post a week (thus supported very strongly) over someone writing 5 good posts a week (thus supported five times moderately). For this reason, the reputation score today (i.e. accounting for the fact that each gained point has lost value with time) is divided by the square root of the number of posts. The square root tames the effect when a very large number of posts is reached.

Finally, we may want to remove individuals from the algorithm, like the team, blacklisted people, bots, etc. Moreover, the total amount of reputation points is fixed to a given value so that each score is renormalized at the end of the day.


RESULTS FROM THE STEEMSTEM AUTHORSHIP INDICATOR


I adopted an authorship half-life of 3.5 months and excluded all team members (management and curators), bots and blacklisted authors from the run. The total number of available authorship reputation points is normalized to 1000.

The top 30 most reputed SteemSTEM authors of all time, out of 2662 authors, are (with their score):

  1 abigail-dantes            6.305
  2 chloroform                6.244
  3 egotheist                 5.918
  4 scienceblocks             5.884
  5 steemit-italia            5.691
  6 lordneroo                 5.632
  7 zen-art                   5.216
  8 nonzerosum                5.089
  9 highonthehog              4.931
 10 conficker                 4.902
 11 nikolanikola              4.893
 12 effofex                   4.808
 13 anaestrada12              4.804
 14 tomastonyperez            4.555
 15 deathbatter               4.466
 16 samminator                4.456
 17 hidden84                  4.385
 18 agmoore                   4.361
 19 romulexx                  4.357
 20 jfermin70                 4.266
 21 answerswithjoe            4.159
 22 dysfunctional             4.093
 23 elvigia                   4.047
 24 n4zrizulkafli             4.021
 25 alexander.alexis          4.017
 26 dedicatedguy              3.993
 27 lupafilotaxia             3.958
 28 anasav                    3.909
 29 scienceangel              3.856
 30 irelandscape              3.809

The code has been run on Sep 24th at 10:11:25 AM.


ENGAGEMENT INDICATOR


Here, I track every single comment to any SteemSTEM-supported post and give reputation points to the comment author.

First, if the comment length is smaller than N characters, it is considered as spammy and no po. Moreover, if the comment has been posted more than W weeks after the SteemSTEM vote, no point is given. I want meaningful comments that help illustrating that supported posts are interesting during the time in which they are hot or trending (on the #steemstem tag).

If non zero, the score is given by the square root of the comment length. The square root allows once again to make a large difference between smallish and average comments, but tame down the difference once a given length is crossed. This is the only way I have found so far to deduce the score, and I am only partially satisfied with it. But at least, it provides some level of quantification of the engagement of the readers.

As with the authorship indicator, any earned engagement point loses value with time, the score today is divided by the square root of the number of comments and some individuals can be removed from the algorithm.

The final score is normalized as for the authorship case, the total number of available points being fixed to a given value (taken to be the same as engagement and authorship are considered as important).


RESULTS FROM THE STEEMSTEM ENGAGEMENT INDICATOR


I adopted an engagement half-life of 1.75 months, and excluded comments whose length is smaller than 100 characters (N=100). I fixed W to 2 weeks. I excluded all team members (management and curators), bots and blacklisted authors from the run. The total number of available engagement points is 1000.

The top 30 most engaging SteemSTEM comment authors of all time, out of 23134 comment authors, are (with their score):

  1 erh.germany               2.886
  2 agmoore                   2.764
  3 steemit-italia            2.623
  4 amestyj                   2.621
  5 abigail-dantes            2.357
  6 scienceblocks             2.175
  7 fran.frey                 2.149
  8 insight-out               2.114
  9 rudyardcatling            2.096
 10 lupafilotaxia             2.079
 11 dedicatedguy              2.031
 12 samminator                1.979
 13 tsoldovieri               1.950
 14 alexander.alexis          1.921
 15 cyprianj                  1.853
 16 herbayomi                 1.847
 17 tomastonyperez            1.833
 18 jamalgayoni               1.756
 19 steepup                   1.726
 20 alexdory                  1.682
 21 kimberlylane              1.678
 22 synick                    1.665
 23 olamseu                   1.656
 24 emperorhassy              1.628
 25 lucylin                   1.625
 26 osariemen                 1.611
 27 ied                       1.576
 28 egotheist                 1.575
 29 delpilar                  1.526
 30 chireerocks               1.487

The code has been run on Sep 24th at 10:11:25 AM.


FINAL REPUTATION INDICATOR


The final reputation is given by the average of the two above metrics. The top 25 (with the score) is given by

  1 abigail-dantes            4.331
  2 steemit-italia            4.157
  3 scienceblocks             4.029
  4 egotheist                 3.746
  5 chloroform                3.579
  6 agmoore                   3.563
  7 lordneroo                 3.489
  8 nonzerosum                3.247
  9 erh.germany               3.245
 10 samminator                3.217
 11 tomastonyperez            3.194
 12 conficker                 3.151
 13 effofex                   3.115
 14 lupafilotaxia             3.019
 15 dedicatedguy              3.012
 16 alexander.alexis          2.969
 17 anaestrada12              2.959
 18 nikolanikola              2.911
 19 cyprianj                  2.795
 20 tsoldovieri               2.734
 21 zen-art                   2.723
 22 jfermin70                 2.605
 23 alexdory                  2.598
 24 highonthehog              2.576
 25 amestyj                   2.572

The code has been run on Sep 24th at 10:11:25 AM.


MORE ABOUT THE CODE


The code can be obtained from the following GitHub repository. It is programmed in Python 3 and requires steem-python.

I am not happy with the way the engagement indicator is computed, because I need to get the information on each post separately, which takes an enormous amount of time. For this reason, the information is saved into a file when the SteemSTEM upvote on a post is older than two weeks (as any later a comment would just bring 0 point). This requires removal of the ‘null’ author from the algorithm, which is used to trace posts without any single comment.

To run it, it is sufficient to complete the setup part of the code,

## Setup
 half_life_vote    = 3.5*30*24*3600.     # 3.5 months - authorship point half-life
 half_life_comment = 1.75*30*24*3600.    # 1.75 month - engagement point half-life
 comment_timelimit = 14*24*3600.         # 2 weeks - the W number
 comment_spam_limit= 100                 # minimum number of characters for a comment to be valid (N)
 comment_filename  = 'comments_data.txt' # where to save the treated comments
 load_backup = True                      # Using the file with the saved comments
 normalized_rep = 1000                   # Score normalization

## Exclusions
 team = [ null ]
 bots = [ ]
 blacklist = [ ]

and execute the program.

H2
H3
H4
3 columns
2 columns
1 column
98 Comments