SteemData 1.2 is here ∙ Raised $5,120 of $5,000 ∙ Now on GitHub

SteemData 1.2

I've decided to ship early, and not wait until SteemData 2.0. The main reason is that I'd like to push out all the breaking changes now, to reduce the amount of pain in the future.

Features in 1.2

Fast updates and eventual consistency

Before 1.2, I would run a handful of workers in a loop, and scrape account related updates one by one. Steem now has over 120,000 accounts, and this approach certainly doesn't scale. It also means that an account can only be updated once every few hours, and thus some of the data is stale.

I have solved this problem by switching to an asynchronous event based model (powered by Celery and RabbitMQ, the distributed queue), where posts, accounts and their virtual operations are updated shortly after new blocks become available.

I have repurposed the old worker model as a fail-safe - if for whatever reason the event based approach fails in such a way that it would cause loss of data - the background worker will back-fill the missing data afterwards.

Structural Changes and Types

This release contains a handful of design improvements and changes, which are not backwards compatible. I do not expect any major breaking changes for 2.0.
Also, the typing support has been improved greatly.

Historic Prices

I've added hourly snapshots for STEEM, implied SBD and Bitcoin prices.

Performance Improvements

The new Mongo deployment is wriredTiger enabled.

I have reworked indexes on all collections, which yields in over 2-10 fold query performance improvement for most historic queries.

SteemData is now also hosted on a more expensive, Intel i7 6700k powered server with 64GB RAM. The hardware upgrade should yield over 2x performance gain.

Open Source

All of the code powering SteemData is now available on Github, and is licensed under highly permissive MIT.

steemdata-node

If you're looking for a Docker based, easy to use steemd RPC deployment, this is it.
It comes with all blockchain plugins enabled, latest seed node list and automatic blockchain snapshot download on first run for quick syncing times (thanks to @gtg).

steemdata-mongo

This repo contains all the code that is responsible for syncing STEEM blockchain with MongoDB.

steemdata

This is a core library for working with STEEM blockchain data. It is database agnostic (could be used for SQL or any other database in the future).

steemdata.com

Right now, the website only hosts basic instructions and stats.

Eventually, I would like to build:

  • an API for 3rd party apps
  • blockchain explorer
  • steemle inspired charts and analytics

TODO (until next release)

  • Integrate Comments
  • Add Relationships via HRefs
  • Create Sample Notebooks
  • Documentation!

Now that the stable base is in place, I'd like to work on making this project more useful and friendly to people who can benefit from it. If you're a developer, please talk to me (I am @furion on steemit.chat)

Upgrade Now

The old version of SteemData will be shutting down on Feb 10th. Please upgrade to SteemData 1.2, see steemdata.com for connection info.

Crowdfunding

We have raised $5,120 of the $5,000 goal so far. Big thanks to @cass for making this project possible.

Supporters
@cass$4,900
@fabien$100
@abit$100
@tuck-fheman$20

The donations should be sent to @steemdata, and the list of friendly donors will be published and updated here, as well as in future announcements.


If you'd like to support my work, feel free to vote @furion for witness.

H2
H3
H4
3 columns
2 columns
1 column
15 Comments