Bash Script to Monitor Your Witness Node

Don't miss another block.

I'm three days into being a backup witness for Steemit, and I'm still geeking out about it! Thank you so much for your support. I'm currently in position 40!

Right now in that position, I get to witness a block about every 40 minutes or so.

One thing I've noticed is that due to network issues or the steemd program crashing, it's possible to miss your moment when your node is assigned to witness a block and add it to the blockchain. This is what the "missed blocks" number is all about, and it's an important thing to consider when voting for a witness. You can view the https://steemit.chat/channel/witness-blocks channel to see how often this happens.

As a backup witness, missing a block is a real bummer because depending on where you are on the list, that may have been your only chance all day. With that in mind, I wrote up a script tonight that will send a text message to my phone if my witness node isn't working as expected.

This script assumes @someguy123's docker setup which gives output like this when running ./run.sh logs:

1530159ms th_a       application.cpp:499           handle_block         ] Got 9 transactions on block 12487122 by anyx -- latency: 159 ms
1533043ms th_a       application.cpp:499           handle_block         ] Got 9 transactions on block 12487123 by blocktrades -- latency: 43 ms
1536043ms th_a       application.cpp:499           handle_block         ] Got 6 transactions on block 12487124 by klye -- latency: 43 ms
1539033ms th_a       application.cpp:499           handle_block         ] Got 3 transactions on block 12487125 by bhuz -- latency: 33 ms
1542021ms th_a       application.cpp:499           handle_block         ] Got 6 transactions on block 12487126 by witness.svk -- latency: 21 ms
1545019ms th_a       application.cpp:499           handle_block         ] Got 5 transactions on block 12487127 by pfunk -- latency: 19 ms



I take the last line, get the last block number, save it to a file and then when I run it again, we compare the latest block number (or random error output, if that's what it happens to be) to the previous block number and make sure the block numbers are increasing.

Here's the bash script (call it check_status.sh)

#!/usr/bin/env bash
if [ -f notification_sent.txt ]; then
    exit 1
fi

if [ ! -f last_block.txt ]; then
    echo "1" > last_block.txt
    exit 1
fi

last_saved_block=$(< last_block.txt)
latest_log_entry=$(./run.sh logs | tail -n 1)

if [[ $latest_log_entry == *"Generated block"* ]]; then
    echo "We just mined a block. Exiting."
    exit 1
fi

latest_block=$(echo $latest_log_entry | awk '{print $11;}')
# to test, uncomment this next line:
#latest_block=""
if [[ "$latest_block" -gt "$last_saved_block" ]]; then
  echo "We're good. Latest block is... $latest_block"
  echo $latest_block > last_block.txt
  exit 1
fi

# Houston, we have a problem!

echo "There seems to be a problem."
echo "Last saved block: $last_saved_block"
echo "Latest block: $latest_block"

python sendnotice.py

echo -e "Something went wrong. Here's what we got for the latest block: $latest_block \r\n" > notification_sent.txt
./run.sh logs >> notification_sent.txt



The first time it runs, it'll save a file called last_block.txt with the latest block.

The next time, it compares the latest block number with the previously saved block number. If it's a greater number, we're good to go.

If it's not or there's some other strange output (you can comment out the #latest_block="" line to test this yourself), it will run the python script to send a text message via SMTP and create a file notification_sent.txt to ensure it doesn't continue sending notifications after the initial text message.

Here's what's in the sendnotice.py:

# Import smtplib for the actual sending function
import smtplib

# Import the email modules we'll need
from email.mime.text import MIMEText

msg = MIMEText("Check your Steemit Witness Node!")
msg['Subject'] = "Witness Node ERROR!"
msg['From'] = "YOURFROMEMAILHERE"
msg['To'] = "***********@txt.att.net"
s = smtplib.SMTP_SSL('smtp.gmail.com')
s.login('*************@gmail.com','*********APP PASSWORD**********')
s.sendmail(msg['From'], [msg['To']], msg.as_string())
s.quit()



The box I'm using has Python 2.7.12 installed, so adjust this as needed according to your version of python.

If you have sendmail installed, just use that, or you can use any other SMTP server. If you have gmail installed, you can set up an application specific password for gmail and send via that.

Things to change:

  • Replace YOURFROMEMAILHERE with your email address.
  • Replace **********@txt.att.net with your 10 digit phone number and domain for your phone carrier service. You can google around for more on that.
  • Replace ************@gmail.com with your SMTP login.
  • Replace *********APP PASSWORD********** with your SMTP password.

Next, set up a cron job via crontab -e to run this script every minute:

* * * * * cd /home/luke/steem-docker; ./check_status.sh

When testing, it worked just as expected:

Why did I do this?

There are already great tools out there for failing over to a backup witness node if your main node starts missing blocks. For backup witnesses, they aren't usually running multiple nodes because of the infrastructure costs involved. This script attempts to find problems and alert the owner before a block is missed so they can proactively deal with the issue.

I like the idea of knowing right away if a system I'm responsible for isn't working as it should. Over the last 10 years of running FoxyCart, we've had some rough times on hosting environments that required constant monitoring and often we knew about the outage before the datacenter did.

I hope my fellow witnesses find this script useful to continue providing excellence service for the Steemit blockchain.

Steem On!


Update: I got my first false positive about two hours ago:

I must have been pretty tired because I slept through the notification. Thankfully, it was a false positive because my node is running just fine, but it did highlight the need for more information. I made a slight change to the script to not just touch the notification file, but to put some useful information in it for debugging:

echo -e "Something went wrong. Here's what we got for the latest block: $latest_block \r\n" > notification_sent.txt
./run.sh logs >> notification_sent.txt



Update 2: With the extra logging, I found out the false positive was because the log format changes when you mine a block! :) Added this to the script. There may be other log messages I'm not aware of yet, but for now this seems to work.

if [[ $latest_log_entry == *"Generated block"* ]]; then
    echo "We just mined a block. Exiting."
    exit 1
fi

I have to pinch myself to make sure I'm not dreaming... there we are at position 40!

Related Posts:

If you're a regular reader of my blog (thank you!), don't worry, we'll get back to interesting discussions soon. I've got some ideas about "screen time" for kids and dopamine which I'm looking forward to discussing with you all.


Luke Stokes is a father, husband, business owner, programmer, voluntaryist, and blockchain enthusiast. He wants to help create a world we all want to live in.

I'm a Witness! Please vote for @lukestokes.mhth

H2
H3
H4
3 columns
2 columns
1 column
27 Comments