教你做一个 ChinaDaily 排名分析程序 - 来算一下你在中文区的排名吧

DQmYVYrq1ed279QNqPiKdmduUuB2tHuPCCszmobHfDpTXku_1680x8400.png
(图片来自网络, 与本文无直接关系)

初衷

每天都能看到 ChinaDaily 的数据分析报告, 奈何我这种新人总是上不了榜, 想知道自己在中文区的排名情况更是不知道要等到什么时候. 想必有不少中文区的朋友也想知道自己的排名吧, 甭管几百.

目标

ChinaDaily 关于 Author 的榜单有 3 个, 分别是按照 Rep, 预估收益, 和字母表顺序排序.
本文的目标是通过阅读本文, 你也可以上手做一个包含上面功能的排名分析的小程序.

准备

想要数据分析, 先要有数据, 想获取 Steem Author 的全网数据在区块链这种开放环境下很容易, 做好的半成品也很多, 比如 SteemSQL, Steemd, SteemData, Steem API, 这里我们方便起见, 选择 SteemData . SteemData 是一个全开放的基于 MongoDB 的 Steem 区块链数据库, 不仅包括 Author 信息, 还包括全网的文章, 点赞等信息, 本身有提供很好的 API 和 SDK 可以访问数据, 实在方便.

远端数据库选好了, 需要一个 UI 界面展示一下, 方便比对调试, 这里我们选择免费开源的 RoboMongo, 全平台支持一键安装.

安装好 RoboMongo 之后, 可以直接建立一个链接到 SteemData

Host: mongo1.steemdata.com 
Port: 27017 
Database: SteemData 
Username: steemit 
Password: steemit 

连接成功, 全网 331351 位作者的信息尽收眼底, 爽不爽?
Screen Shot 2017-08-30 at 11.53.54.png
Screen Shot 2017-08-30 at 11.53.38.png

至此, 准备工作结束. 好戏刚刚开始.

思路

我们的目标是获取7日内中文区活跃用户, 然后按照 rep, 预估收益, 和首字母进行排序, 我们来一个一个思考一下.
第零个问题是获取7日内中文区活跃用户. 这个问题很容易, 可以通过查询 Posts 表, 以最近7天为条件, 取得文章标签为 cn 的全部文章作者集合, 就定义为7日内中文区活跃用户’, 在用这个用户名集合到 Accounts 表去做查询, 就取得了全部的待处理数据;

这里还有很多细碎的问题值得探讨, 比如加了 cn 标签, 但不是第一个, 这算不算中文区活跃用户? 类似这种问题由于也很容易解决, 就不在本文的讨论范围了, 如果感兴趣, 可以留言.

第一个问题是如何按照 rep 排序, 很简单, Accounts 表本身有一列就是 rep, 直接逆序输出就可以了;
第二个问题是如何按照 预估收益 计算, 这个也不难, 就是要知道预估收益是如何计算的, 这里经过一些 google 和咨询, 找到了一个计算公式

Value = (SP + Steem) * medianPrice + SBD

其中, medianPrice 可以使用 Steem API 获得, 或者从 steemd 手动查询(网页内搜索关键字: feed price). 而参与计算的其他参数: SP, Steem, SBD, 均可从 Accounts 表获取.
第三个问题其实不算是问题, 为了节约篇幅, 不讨论了. 后面直接看代码.

至此, 思路和关键要素准备完毕, 动手写代码.

代码

app.js

var MongoClient = require('mongodb').MongoClient, assert = require('assert');
var steem = require('steem');
var fs = require('fs’);

// Connect DB
var url = 'mongodb://steemit:steemit@mongo1.steemdata.com:27017/SteemData';

var postsCount = 0;
var authorsCount = 0;
var authorObjects = {};
var medianPrice = 1;
var start = new Date();
var end = new Date();

var basepath = "/Users/bullda/Workspace/SteemBot/";

createOutputFile();

steem.api.getCurrentMedianHistoryPrice(function(err, result) {
    if (!err) {
        medianPrice = (result['base'].split(" "))[0];
        console.log("==== medianPrice ==== " + medianPrice);
        updateOutputFile("$medianPrice$", medianPrice);

        // Connect to the db
        MongoClient.connect(url, function(err, db) {
            assert.equal(null, err);
             console.log("Connected successfully to server");

            findAuthorByPosts(db, function(docs) {
                // get posts count
                postsCount = docs.length;
                console.log("==== postsCount ==== " + postsCount);
                updateOutputFile("$postsCount$", postsCount);

                // authors count & authors set
                for (var i = 0; i < docs.length; i++) {
                    authorObjects[docs[i]['author']] = 1 + (authorObjects[docs[i]['author']] || 0);
                }
                authorsCount = Object.keys(authorObjects).length;
                console.log("==== authorsCount ==== " + authorsCount);
                updateOutputFile("$authorsCount$", authorsCount);

                // authors details
                getAuthorsDetail(db, function(docs) {
                    // sort by value
                    var topValue = "";
                    var topValueLimit = 20;    // get top 20 value, you can change it as your wish

                    docs.sort(sortByValue);

                    for (var i = 0; i < topValueLimit; i++) {
                        var name = docs[i]['name'];
                        var value = ((docs[i]['sp'] + docs[i]['balances']['total']['STEEM']) * medianPrice + docs[i]['balances']['total']['SBD']).toFixed(2);
                        var steemValue = docs[i]['balances']['total']['STEEM'];
                        var SPValue = docs[i]['sp'];
                        var SBDValue = docs[i]['balances']['total']['SBD'];
                        
                        topValue += ((i + 1) + "|@" + name + "|" + value + "|" + steemValue + "|" + SPValue + "|" + SBDValue + "\n");
                    }
                    console.log("==== topValue ==== " + topValue);
                    updateOutputFile("$topValue$", topValue);

                    // sort by rep
                    var topRep = "";
                    var topRepLimit = 20; // get top 20 rep, you can change it as your wish

                    docs.sort(sortByRep);

                    for (var i = 0; i < topRepLimit; i++) {
                        var name = docs[i]['name'];
                        var rep = docs[i]['rep'];
                        var vp = (docs[i]['voting_power'] / 100).toFixed(2);
                        var online = daysBetween(new Date(), new Date(docs[i]['created']));
                        var value = ((docs[i]['sp'] + docs[i]['balances']['total']['STEEM']) * medianPrice + docs[i]['balances']['total']['SBD']).toFixed(2);
                        
                        topRep += ((i + 1) + "|@" + name + "|" + rep + "|" + vp + "|" + online + "|" + value + "\n");
                    }
                    console.log("==== topRep ==== " + topRep);
                    updateOutputFile("$topRep$", topRep);

                    db.close();    
                });
            });
        });
    };
});

function findAuthorByPosts(db, callback) {
    var collection = db.collection('Posts');

    start.setDate(start.getDate() - 7);
    start = new Date(start.toISOString().slice(0, 10));
    console.log("==== start ==== " + start.toISOString());
    updateOutputFile("$start$", start.toISOString().slice(0, 19));

    end = new Date(end.toISOString().slice(0, 10));
    console.log("==== end ==== " + end.toISOString());
    updateOutputFile("$end$", end.toISOString().slice(0, 19));
    updateOutputFile("$print$", end.toISOString().slice(0, 19));

     var query = { "tags":"cn" , "created":{$gte: start, $lt: end}};
     var fetchDataLimit = 5000;

     collection.find(query).skip(0).limit(fetchDataLimit).toArray(function(err, docs) {
        assert.equal(err, null);
        callback(docs);
     });
}

function getAuthorsDetail(db, callback) {
    var collection = db.collection('Accounts');

    var query = {"name": {$in: Object.keys(authorObjects)}};
    var fetchDataLimit = 5000;

    collection.find(query).skip(0).limit(fetchDataLimit).toArray(function(err, docs) {
        assert.equal(err, null);
        callback(docs);
     });
}

function daysBetween(date1, date2) {
    // The number of milliseconds in one day
    var ONE_DAY = 1000 * 60 * 60 * 24

    // Convert both dates to milliseconds
    var date1_ms = date1.getTime()
    var date2_ms = date2.getTime()

    // Calculate the difference in milliseconds
    var difference_ms = Math.abs(date1_ms - date2_ms)

    // Convert back to days and return
    return Math.round(difference_ms / ONE_DAY)
}

function createOutputFile() {
    var from = basepath + "cn_report_template.md";
    var to = basepath + "cn_" + end.toISOString().slice(0, 10);
    var content = fs.readFileSync(from, 'utf8');
    fs.writeFileSync(to, content);
}

function updateOutputFile(key, value) {
    var filepath = basepath + "cn_" + end.toISOString().slice(0, 10);
    var content = fs.readFileSync(filepath, 'utf8');
    content = content.replace(key, value);
    fs.writeFileSync(filepath, content);
    console.log("==== saved ==== " + key);
}

function sortByRep(authorA, authorB) {
    return (authorB['rep'] - authorA['rep']);
}

function sortByValue(authorA, authorB) {
    var valueA = (authorA['sp'] + authorA['balances']['total']['STEEM']) * medianPrice + authorA['balances']['total']['SBD'];
    var valueB = (authorB['sp'] + authorB['balances']['total']['STEEM']) * medianPrice + authorB['balances']['total']['SBD'];

    return (valueB - valueA);
}

function sortByAlphabeta(authorA, authorB) {
    var nameA = authorA['name'].toLowerCase()
    var nameB = authorB['name'].toLowerCase();

     if (nameA < nameB) {
         return -1;
     } else if (nameA > nameB) {
         return 1;
     } else {
         return 0;
     }
}

cn_report_template.md

# Information

: https://steemdata.com/
: `$print$(UTC)`
: `$start$(UTC)``$end$(UTC)`
: 7(`cn`)
: `$postsCount$`, `$authorsCount$`
: $`$medianPrice$` / Steem

# 财富榜 (sorted by Estimated Account Value)
Rank.|ID|Value|STEEM|SP|SBD
----|----|----|----|----|----
$topValue$

# 信誉榜 (sorted by Reputation score)
Rank.|ID|Rep|VP(%)|Online|Value
----|----|----|----|----|----
$topRep$

# 说明
* **SP**: Steem Power, , , 
* **Value**: Estimated Account Value, 
* **Rep**: Reputation, Steemit 
* **VP**: Voting Power, 100%
* **Online**: Online days, 线

结果

Information

数据来源: https://steemdata.com/
生成时间: 2017-08-30T00:00:00(UTC)
时间覆盖: 2017-08-23T00:00:00(UTC)2017-08-30T00:00:00(UTC)
用户范围: 近7日中文区发帖用户(包含cn标签)
数据统计: 文章数1525, 发帖用户数477
中间价取值: $1.470 / Steem

财富榜 (sorted by Estimated Account Value)

Rank.IDValueSTEEMSPSBD
1@davidding152887.981.31103658.425508.173
2@czechglobalhosts139632.74094976.31617.559
3@nextgen62246068.221316.0630020.9182.861
4@rea43086.56029129.515266.175
5@lawrenceho8425689.53017438.94754.273
6@ace10823363.751147.74812455.8333366.483
7@oflyhigh23066.6618.1315375.857437.503
8@bullionstackers21950.2510.21713861.4031558.971
9@deanliu20345.52199.28612068.5982311.729
10@arcange19033.81012948.1690
11@helene18683.9421.14511977.3891046.098
12@chhaylin17609.60011969.5614.344
13@isaaclab17047.10504.00110780.412459.014
14@lemooljiang12346.6722.9528298.25114.504
15@joythewanderer12055.1607441.1681116.645
16@skt111495.5707681.671203.517
17@rivalhw11392.2706532.0651790.136
18@shieha10773.0107304.60535.24
19@myfirst9460.140.456361.966107.389
20@chinadaily9440.92488.5675805.647188.422

信誉榜 (sorted by Reputation score)

Rank.IDRepVP(%)OnlineValue
1@chinadaily73.5489.713969440.92
2@myfirst72.2331.954039460.14
3@elfkitchen72.2190.984014625.23
4@oflyhigh72.1777.7040123066.66
5@birds9071.7381.653584708.97
6@ace10871.5153.5240923363.75
7@deanliu70.9327.3441220345.52
8@helene70.3448.0039618683.94
9@shieha69.9764.0938310773.01
10@rivalhw69.9468.5039811392.27
11@lemooljiang69.9173.1441012346.67
12@bullionstackers69.6882.1240821950.25
13@rea68.3696.0641243086.56
14@cnfund67.8894.123796427.19
15@arcange67.6356.8541319033.81
16@germanlifestyle67.5896.802674691.74
17@jademont67.4486.014727781.51
18@jubi66.7930.612353552.68
19@justyy66.7563.713745151.56
20@curiesea66.7238.843451568.75

说明

  • SP: Steem Power, 持有越多, 点赞得到回报越丰厚, 对别人文章收益影响越大
  • Value: Estimated Account Value, 用户账户估值
  • Rep: Reputation, Steemit 网站上用于衡量用户信誉度的分值
  • VP: Voting Power, 用户剩余的投票能量,满值是100%,投票越多下降越快,随时间缓慢恢复
  • Online: Online days, 用户在线天数,从注册时算起

排版略有调整, 一级标题变成二级标题
篇幅所限, 结果中去掉了所有用户的首字母排序排名
如果想取得自己的排名(如果你在20开外的话), 可以把代码中的 limit 改大, 如果有问题, 请评论, 我会回复

收工

代码还有蛮多的优化空间, 因为是练手, 没有细扣算法和 IO 问题, 仅讨论实现过程.
如果觉得本文对开启思路有价值, 请不吝关注点赞, 感谢:)

H2
H3
H4
3 columns
2 columns
1 column
17 Comments