Re-Thinking Curation

Post curation is broken. I think a lot of new users can agree with me on that. A single upvote by a whale can make your post more visible than if it were to receive 1,000 upvotes by active community members at a penny each. Votes can be bought and sold, and the visibility of a post has very little to do with the quality of the content in the post itself, and a lot to do with the poster's connections to powerful people and their ability to use bots effectively.

The biggest problem with the current curation model is that it's not based on viewers. It doesn't matter how many viewers actually enjoyed the content. The only thing that matters is the influence of the accounts who voted on the content - Which might have very little to do with how interesting the content actually is.

Ideally, only four numbers should matter when it comes to content curation (and none of those numbers are the SP of the person casting the vote). In fact, the four numbers that matter the most are right in front of us on every post:

  • How many times has a post been seen?
  • How many upvotes has it gained?
  • How many downvotes has it gained?
  • How many comments does it have?

Using these four numbers, it should be possible to evaluate the quality of a post. If something is getting hundreds of views, but has a very low vote and comment rate odds are that it's low quality content. If something has very few views, but almost every viewer has commented and upvoted it, odds are that it's quality content.

Let's come up with a better way to calculate what posts deserve to be visible on our SteemIt feeds:

V = Post views
U = Upvotes
D = Downvotes
C = Number of users who have commented
A = Age of post

Q1, Q2, Q3, etc = Constants: Some numbers that we will have to choose - More on these later

My initial thought for fair curation is that it should be based on a very simple model: The more upvotes that a post has relative to its views, the better it is.

(U-D)/V

This is nice and simple. Let's take two example posts, A and B. Post A has 200 views, 24 upvotes and a 3 downvotes. Post B has 20 views and 3 upvotes. Under the current system, post A is likely to be more popular because those 24 upvotes likely had more influence than the 3 upvotes on post B.

Under my proposed formula, instead we get this:

Post A Popularity: (24-3)/200 = 0.105
Post B Popularity: (3-0)/20 = 0.15

Given that Post B was more heavily enjoyed by a larger percentage of its viewers compared to post A, we should be giving post B more attention on the platform compared to post A.

Of course this is not ideal. What if post A is a controversial subject? Maybe it inspired a lot of back and forth debate and things got heated, leading to some down-voting.

We should probably build comments into our formula.

(1+C)(U-D)/V

That's a bit better. Every post's popularity is now multiplied by the number of users who have commented on it. Unfortunately this unfairly penalizes new posts with very few comments, while giving a massive boost to posts with heavy comments - Even if they weren't rated very favorably.

A better approach would be to use come constant, Q1, to determine how much we should weight comments in comparison to how much we should weight the value of the initial post. Q's value is a bit arbitrary, and would have to be played with. Q1=10 might be a good starting value.

(Q1+C)(U-D)/V

Back to my example with posts A and B, let's assume A has 1 comment and B has 15 comments.

Post A Popularity: (10+15)(24-3)/200 = 2.625
Post B Popularity: (10+1)(3-0)/20 = 1.65

That looks a bit better. Post B is doing well. It has been rated overwhelmingly well by people who looked at it and definitely deserves attention. Post A is controversial, but has inspired a lot of talk by a lot of different people and definitely deserves some more attention based on the fact that good discussions are happening.

It could be argued that this system isn't weighing downvotes heavily enough. There's an easy fix for that. Add another constant, Q2, to weigh the downvotes more heavily as needed.

(Q1+C)(U-D*Q2)/V

Great!

The nice thing about this formula is that it's self-correcting. Let's say some stupid post somehow gets a lot of discussion and upvotes early on. It rockets to the top of the charts. Now people can downvote it, and it will quickly disappear. Or even if they don't bother to downvote it, just looking at the post, deciding it's not interesting and choosing not to interact will increase the value of V, causing the post to lose some rating every time it's viewed and not interacted with. V will have less of an impact on posts that already have a large V value, so failing to interact with a post that already has 1000 views will be far less impactful than failing to interact with a post that only has 10 views. This means that good quality posts with heavy interactions and upvotes will tend to stay at the top.

The last factor to consider is the age of the post, A. This really isn't a big issue, but there are some minor problems that could occur such as a post being pushed to the very top of the popularity charts because it has 3 views, 3 upvotes and a comment. While it would quickly self-correct within a few views, this would lead to unnecessary churn in the top posts with a lot of posts cycling in and out very quickly.

As such, it would be good to somehow incorporate the age of the post, A into the formula. I'm not exactly sure on how this part should go, but I suspect that we would want posts to have some minimum age (Say, 1 hour) and some maximum age (7 days) to prevent new content from churning the feed 24/7, and to prevent old content from never disappearing.

Would this work?

No.

This is an idealistic formula. It assumes that all voters are well-intentioned. More importantly, it has a utopian vision wherein I assume that all actors are acting in the best interests of the platform. In a world where there aren't bots, this is a beautiful way of curating content. Content with a high percentage of upvotes becomes more visible. Content with a high percentage of downvotes becomes less visible. Content with a lot of views but very few votes becomes less visible. And so on.

Sadly, the real world of SteemIt has bots. So many bots. The instant this formula is implemented, bots would be gaming the system. We'd see services with hundreds of 0SP bots upvoting and downvoting posts in no time at all. Until the bot problems are solved and the community finds a solution to all of the automated voting that's going on, it's very difficult to solve the curation problems.

I present this article not as a solution for present-day problems, but as a future consideration for how curation should work in an ideal environment.

If you enjoy my posts, don't forget to upvote and

Thanks for reading,
-Matt

H2
H3
H4
3 columns
2 columns
1 column
17 Comments