[steemtools] automatic failover for witness nodes

I was told "code or it didn't happen" when my node gracefully failed over about an hour ago, so here it is. I'm a bit tired, so I apologize if there's gaps in the documentation.

https://github.com/aaroncox/witness-failover

This script is something I've been tinkering with over the past few weeks as part of my infrastructure as a witness. If you're planning on using this, I'm willing to answer questions, but I'm not willing to set this up for you. You need a basic understanding of infrastructure, docker, and witness management to really be effective with this code. I also make no claims that this will work in the future or on your system, please use responsibly!

Either that or just use it to learn from - and build your own tools based on it's principals.

witness node configuration

This setup requires a 3 system setup as described below:

  • Witness Node #1, with Signing Private Key #1
  • Witness Node #2, with Signing Private Key #2
  • This script, running someplace besides the witness nodes, potentially a workstation.

In this scenario, we're going to assume your account currently has Signing Key #1 active.

To enable the failover trap,

  • Edit the .env file and fill out your witness account, the active wif private key*, and all of your preferred witness properties.
  • Put the public signing key of Witness Node #2 into the configuration as steem_backup.
  • Check how many total_missed blocks you currently have, and +1 or +2 that number, and put that also in the .env file as threshold.

Note: Yes, this requires your private active key, which means you need to ensure this system is secure. This is a valid reason to use a 2nd account for witnessing or make sure all of your liquid funds are locked in savings accounts. (Originally I had linked from here to a post I made about feed_publish operations and permissions. I was tired, that had nothing to do with this)

piston + steemtools + twilio

Once configured and running, this script performs the following actions every interval:

  • Every 30 seconds, checks the defined witness to see how many total_missed blocks have been reported.
  • If total_missed >= threshold (threshold is a number you set), the script triggers.
  • Once triggered, it uses @furion's steemtools, built on @xeroc's piston, to issue a witness_update transaction to change your signing_key to a different key.
  • It also then uses twilio (SMS Service, Paid) to send out an SMS message letting you know the trap has triggered. If you don't want to have to setup a twilio account, you can edit the code and remove the requirement. I'm very happily paying $0.0075 USD for it to notify me once in a blue moon of issues occurring :)

With the witness_update command triggered, it will broadcast a new signing key to your account (that of Witness Node #2 in this example), which will automatically fail your witness server over to the backup.

It's a one-shot failover script that terminates itself

Think of it like a mouse trap - you start running it and once it's triggered, you're going to have to go in and reset the variables and start it again. This is not meant to be completely automated, though it could be taken that far over time. I haven't spent an incredible amount of time on that direction, as this works for now.

I'm releasing this code with The Unlicense, so you're free to do whatever you want with this. I also make no claims as to this actually working, as there are many dependancies that could break along the way. If you're embarking on the "Adventure of Automatic Failover" for witnessing, please make sure you know what you're doing :)

H2
H3
H4
3 columns
2 columns
1 column
10 Comments