I’m @vandeberg, the Senior Blockchain Engineer at Steemit and today I want to talk a little bit about MIRA, the software solution to Steem’s hardware scalability challenge that I have been developing for the last few of months and which we’ve just announced has begun its soft roll-out on Steemit’s nodes. You can find that announcement on @steemitblog. You can also watch me, or listen to me, talk about MIRA on the latest episode of the Steemit podcast below.
Thanks to MIRA we are seeing dramatic reductions in the cost of running Steem nodes that results from migrating the Steem blockchain database (steemd) from expensive, high-performance, hardware to low cost, run-of-the-mill hardware like network attached SSDs or even old school spinning disk drives!
Database Replacement
MIRA is a complete replacement for the backend database that utilizes technology called RocksDB. RocksDB was developed by Facebook to power their Feed which has to load data very rapidly in order to provide a pleasant user experience. Leveraging RocksDB allows us to run the Steem blockchain much more cost effectively, and put our hardware scaling challenges at rest once and for all.
Commodity Hardware
Initially when Steem launched, the database was stored on more affordable hardware. But as the database grew in size, those storage media had difficulty keeping pace with the level of engagement that was being demanded of the protocol. To address those issues we developed innovative software solutions that migrated the database to more high performance storage media in order to ensure a consistent user experience.
This was a good temporary solution, but as Steem continued to grow, the cost of running the blockchain in state-of-the-art storage media started to become financially burdensome on those who run Steem nodes like app developers and Witnesses (i.e. block producers). MIRA resets the hardware requirements for Steem to what they were when Steem first launched, without negatively affecting performance.
Costs Decreasing Over Time
Thanks to MIRA, the technology is now in place to allow Steem to scale much more efficiently into the future, ideally on a course that will see hardware outpacing the requirements of Steem. That means that despite the fact that Steem will continue to grow, it might actually get cheaper to run Steem over time, as long as hardware improvements continue at their current rate. This is important because the largest cost for blockchains (especially blockchains that produce blocks very rapidly) is going to be disk space, but enabling the data to be stored on a slow disk should neutralize that cost.
What is RocksDB?
RocksDB is a fork of LevelDB which is a fork of Berkeley DB. Many blockchains use LevelDB to run on disk, but that’s because they aren’t nearly as fast as Steem, which has 3 second block times. RocksDB adds many more layers of caching algorithms that make it a lot more efficient than LevelDB while also providing interfaces that made it much easier for us to into it into the existing Steem codebase. All of that is important for ensuring that such a high performance blockchain as Steem can be run on cheap hardware and not just exotic hardware like nVMEs.
But this is not a problem that will be unique Steem. As other blockchains like Ethereum seek to shrink their block times, the requirements for quick data access will become all the more important, at which point they will need to look at better scaling solutions for their backend database. Luckily for those team, since MIRA will be Open Source, as much as 90% of their work will already be done for them.
Why an Adapter?
There are projects like Hyperledger use RocksDB directly, which begs the question of why we did so much work to build yet another piece of software. One of the things the adapter (i.e. MIRA) accomplishes is it reduces programmer error, we have interfaces that are well defined and work and are really really easy to develop on, a big part of Steem’s initial 3 month development time can be attributed to those interfaces, they have really good C++ bindings that obfuscate all the database management and allow us to develop all the code very quickly. If we were to build directly on RocksDB, all of the Steem code would have to be rewritten, it would have been very error prone and would have taken months–to years–to do a refactor of that scale and test and ensure no bugs
Non-Steem Use
One of the great things about MIRA is that it will be fully Open Source, so if another team wanted to use MIRA, they would only need to do a little work to integrate MIRA into their solution because MIRA is also blockchain agnostic. It can be used for many different applications. I know that if I were developing an app in C++ and I needed the type of the type of database scaling solution that MIRA provides I would not hesitate to use it and doing so would save me probably 90% of my development time in part because we put a lot of effort into building good interfaces that make it easy to work with MIRA.
The point of MIRA was to wrap RocksDB in an interface that matches what is provided by boost multi-index containers which are a more widely used library, as Boost is standard in C++ applications. That’s because the people who maintain Boost are some of the most talented C++ developers in the world. Without their amazing work, MIRA wouldn’t even be possible. Most of our work was digging into the boost multi-index containers, looking at their implementation and swapping out code where needed to interface with RocksDB rather than be in memory object structures that boost multi index containers do. So if any project want to use Boost to migrate a database from memory to disk, MIRA can help.
The Joy of Open Source Development
The joy of working in Open Source is that we get to borrow great ideas from great engineers, add on our own ideas, and let the rest of the world use all of it. The code is just sitting there, waiting for people to use it. It would be awesome if other projects use MIRA, but even if they don’t, it fills a critical need within our ecosystem. If the Steem source code is the only code that uses MIRA, that will mean that everyone who runs a Steem node will be using MIRA to make their lives easier and reduce their costs. And that’s still a pretty big deal.