Code to Analyze the Top 100 Followed Accounts

I've been wanting to play with the blockchain data for over a month now, and last night I finally carved off some time to get to know the API a little better. Big thanks to @jesta who helped me figure out how to chain calls together when accessing plugin API's like the follow_api.

Here's the basic idea:

curl https://node.steem.ws -d '{"jsonrpc":"2.0","method":"call", "params":[1, "get_api_by_name", ["follow_api"]],"id":0}'
{"id":0,"result":3}
curl https://node.steem.ws -d '{"jsonrpc":"2.0","method":"call", "params":[3, "get_followers", ["lukestokes","",1]],"id":0}'
{"id":0,"result":[{"id":"8.6.3206","follower":"aaronwebb","following":"lukestokes","what":["blog"]}]}

The first call gives you an identifer for the follow_api plugin which is 3.

The second call uses that identifier to then make a call to get_followers. The tricky part is, you can only get 100 at a time and the second parameter (the "" string in my example) isn't a number, it's an account name. You have to keep track of the last account you saw when paginating through the data.

I know this code is really rough as I put it together in just one night, but I wanted to share it with you anyway because I had fun building it out. It's written in PHP and will hopefully give you some ideas as well. I'll be adding it to php-steem-tools on Github shortly.

Simple wrapper to curl for making API calls

function call($method, $params) {
    global $debug;
    $request = getRequest($method, $params);
    $response = curl($request);
    if (array_key_exists('error', $response)) {
        var_dump($response['error']);
        die();
    }
    return $response['result'];
}

function getRequest($method, $params) {
    global $debug;
    $request = array(
        "jsonrpc" => "2.0",
        "method" => $method,
        "params" => $params,
        "id" => 0
        );
    $request_json = json_encode($request);

    if ($debug) { print $request_json . "\n"; }

    return $request_json;
}

function curl($data) {
    global $debug;
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, 'https://node.steem.ws');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
    $result = curl_exec($ch);

    if ($debug) { print $result . "\n"; }

    $result = json_decode($result, true);

    return $result;
}

Now we can just call call instead of having to create strings like {"jsonrpc":"2.0","method":"call", "params"...

Simple helper method for getting the follow_api identifier

function getFollowAPIID() {
    return getAPIID('follow_api');
}

function getAPIID($api_name) {
    global $apis;
    if (array_key_exists($api_name, $apis)) {
        return $apis[$api_name];
    }
    $response = call('call', array(1,'get_api_by_name',array('follow_api')));
    $apis[$api_name] = $response;
    return $response;
}

Yes, yes, I know, I'm using globals and that's BAD. This is just a quickie script to get the data I want. Note the use of caching in memory to avoid unnecessary calls to the API.

Get followers and follower count

function getFollowerCount($account) {
    $followers = getFollowers($account);
    return count($followers);
}

function getFollowers($account, $start = '') {
    $limit = 100;
    $followers = array();
    $followers = call('call', array(getFollowAPIID(),'get_followers',array($account,$start,$limit)));
    if (count($followers) == $limit) {
        $last_account = $followers[$limit-1];
        $more_followers = getFollowers($account, $last_account['follower']);
        array_pop($followers);
        $followers = array_merge($followers, $more_followers);
    }
    return $followers;
}

This call requires some recursion because the limit is 100. If we go beyond that limit, we call it again using the last account name we saw as the start. We have to pop it off our list so it doesn't get included twice.

Get all accounts on Steemit

function getAllAccounts() {
    $all_accounts = @file_get_contents('all_accounts.txt');
    if ($all_accounts) {
        $all_accounts = unserialize($all_accounts);
        print "Found " . count($all_accounts) . " accounts.\n";
        return $all_accounts;
    }
    $all_accounts = call('lookup_accounts', array('*',-1));
    print "Queried for " . count($all_accounts) . " accounts.\n";
    file_put_contents('all_accounts.txt',serialize($all_accounts));
    return $all_accounts;
}

This might actually exploit a bug because the code comments say the limit there should be 1000, but I was able to pass in -1 and get all 54k+ accounts at once. I store this to a file so I won't have to fetch them again while running this script.

Get accounts with at least one post

function getAccountsWithPosts($all_accounts, $start, $batch_size) {
    $some_accounts = array_slice($all_accounts,$start,$batch_size);
    $accounts_with_info = call('get_accounts', array($some_accounts));
    $active_accounts = filterForActiveAccounts($accounts_with_info);
    $account_account_names = array();
    return $active_accounts;
}

function filterForActiveAccounts($accounts) {
    $filtered_accounts = array();
    foreach($accounts as $account) {
        if ($account['post_count'] > 0) {
            $filtered_accounts[] = $account['name'];
        }
    }
    return $filtered_accounts;
}

When I first started working on this, I realized how many accounts are just miner accounts with no activity at all. Checking their follower count was a huge waste of time so I wrote this to filter them out in batches. I get all the details for a batch of accounts, and then only return the account names of the ones who have posted before.

Save accounts with at least one post to a file

function getAllAccountsWithPosts() {
    $all_accounts = getAllAccounts();
    $total = count($all_accounts);
    $start = @file_get_contents('getAllAccountsWithPosts_start.txt');
    if (!$start) {
        $start = 0;
        file_put_contents('getAllAccountsWithPosts_start.txt',$start);
    }
    $batch_size = 100;
    while($total > $batch_size) {
        file_put_contents('getAllAccountsWithPosts_start.txt',$start);
        $filtered_accounts = getAccountsWithPosts($all_accounts, $start, $batch_size);
        $start += $batch_size;
        $total -= $batch_size;
        print '.';
        foreach ($filtered_accounts as $filtered_account) {
            file_put_contents('getAllAccountsWithPosts_accounts.txt', $filtered_account . "\n", FILE_APPEND);
        }
    }
    $start -= $batch_size;
    $filtered_accounts = getAccountsWithPosts($all_accounts, $start, $total);
    print '.';
    foreach ($filtered_accounts as $filtered_account) {
        file_put_contents('getAllAccountsWithPosts_accounts.txt', $filtered_account . "\n", FILE_APPEND);
    }
}

This is where it gets a little fun. I built this script to dump data to two files getAllAccountsWithPosts_start.txt, which keeps track of where we are in the process so we can stop it at any time and start it where we let off, and getAllAccountsWithPosts_accounts.txt which stores the account names in a file which have at least one post.

Save follower accounts to a file

function saveFollowerCounts() {
    $min_threshold = 0;
    $follower_counts = array();
    // $all_accounts = getAllAccounts();
    $all_accounts = @file('getAllAccountsWithPosts_accounts.txt');
    if (!$all_accounts) {
        getAllAccountsWithPosts();
        $all_accounts = file('getAllAccountsWithPosts_accounts.txt');
    }
    $start = @file_get_contents('saveFollowerCounts_start.txt');
    if (!$start) {
        $start = 0;
        file_put_contents('saveFollowerCounts_start.txt',$start);
    }
    print "Starting at $start\n";
    for ($i = $start; $i<count($all_accounts); $i++) {
        $account = trim($all_accounts[$i]);
        if ($i % 100 == 0) {
            print $i;
        }
        print '.';
        $follower_count = getFollowerCount($account);
        file_put_contents('saveFollowerCounts_start.txt',$i);
        if ($follower_count > $min_threshold) {
            file_put_contents('saveFollowerCounts_counts.txt', $account . ',' . $follower_count . "\n", FILE_APPEND);
        }
    }
}

This method follows a similar pattern as getAllAccountsWithPosts. It can be started and stopped and will continue where it left off.

Finally, we print out the results

function printTopFollowed() {
    $number_of_top_accounts_to_show = 100;
    $accounts = array();
    $file = fopen("saveFollowerCounts_counts.txt","r");
    while(!feof($file)) {
        $line = fgetcsv($file);
        $accounts[trim($line[0])] = trim($line[1]);
    }
    fclose($file);

    arsort($accounts);

    $header = "|    |           Account|    Number of Followers   | \n";
    $header .= "|:--:|:----------------:|:------------------------:|\n";

    print "\n## 
TOP $number_of_top_accounts_to_show USERS BY FOLLOWER COUNT
\n\n"
; print $header; $count = 0; foreach ($accounts as $account => $follower_count) { $count++; if ($count > $number_of_top_accounts_to_show) { break; } print '| ' . $count . ' |' . sprintf('%15s','@'.$account) . ': | ' . $follower_count . " |\n"; } }

Pretty cool, right? I had so much fun putting this together.

None of this would be possible without the excellent work of @xeroc and @jesta to put together https://steem.ws/ which you can read more about here.

If you enjoy this stuff, follow my blog for more.

The Top 100 Followed Accounts

H2
H3
H4
3 columns
2 columns
1 column
21 Comments