MySBDS - Steem Blockchain Data Service in MySQL

about 6 years ago (Edited)

MySBDS - Steem Blockchain Data Service in MySQL

What is the project about?

The Steem Blockchain Data Service is an open source project managed by Steemit that allows you to read the Steem blockchain and store it in a local MySQL database. The project opens with an obvious Notice:

This is prerelease software, not yet suitable for production use. Proceed at your own risk.

I'm working under this warning to get SBDS operational in the most stable way possible today, while watching and supporting the project as it moves forward. At this point, I've spent 100 hours (that I tracked, so more really) digging into SBDS and working to understand it. I'm not gonna give myself the Complete Understanding Badge yet, but I'm on my way.

MySBDS.com

Today, MySBDS.com and blervin/mysbds exist to provide three key things:

Proven deployment scripts to get up and running with SBDS quickly
Documentation of the challenges identified thus far (both on Steemit.com) and on MySBDS.com
A public endpoint offering an always up-to-date full mysqldump of the entire Steem blockchain

I've dumped a pretty significant amount of time and energy into SBDS and I'm barely scratching the surface so I want to share everything I've found (today and ongoing) to get a dialog going with everyone else who's looking to get a local MySQL database running. And also to provide this public documentation of implementation results for the team working on updating the code for steemit/sbds.

My near-term goal is to get a public instance up and running for those who aren't quite ready to deploy and manage an entire environment themselves, but still wanna dig into the blockchain data.

Technology Stack

Everything outlined here is deployed in docker on CentOS 7.4 using common bash commands.

I will specifically outline a DigitalOcean deployment along with some useful functionality available there.

From what I see around here, it looks like Ubuntu is pretty popular so I'm planning to put together another version for that as well.

Roadmap

In order of complexity (and likely implementation):

Prepare MySBDS.com to highlight current scripts, documentation, and the most relevant Steemit posts about SBDS
Develop an Ubuntu deployment script
Document and outline challenges with deployments
Document my monitoring infrastructure
Deploy a public MySQL server
Explore interest in fully managed deployments or even replication
Continue performance monitoring to identify refinements and share those insights

How to contribute?

Comment here and any other posts about SBDS, catch me on steemit.chat, or just let me know how you can help make the blockchain easily accessible in MySQL!

Deploy MySBDS

Overview

I was excited to play with the SteemSQL service that was originally public, so when it recently became a paid service a little beyond my budget I began to look for alternatives and I realized I could easily maintain my own copy of the blockchain.

At the time of this writing, the size of the MySQL database of the entire blockchain is 403GB, and growing rapidly. Because of this growth, I wanted to look at ways to deploy this on flexible infrastructure and I found a solution with Block Storage at DigitalOcean. This allows us to deploy storage at any size for $0.10/GB per month and increase that size at any time later, however, storage can never be decreased.

We'll walk through the steps to create a script that will deploy a cloud server at DigitalOcean running the Steem Blockchain Data Service and MySQL in docker.

Requirements

You will need an account at DigitalOcean to deploy a cloud server and related storage. Also, it is presumed that you are comfortable in the shell and understand how to connect to the server via SSH and perform basic systems admin work on a Linux platform.

Initial Setup

We need to ensure we deploy our droplet in a region that supports block storage. The supported regions for volumes are:

NYC1
NYC3
SFO2
FRA1
SGP1
TOR1
BLR1
LON1
AMS3

You'll need to consider the size of the droplet you want to deploy, or the real question, whether you want to download the full mysqldump of the blockchain. I have written the low_mem.sh and high_mem.sh for both scenarios.

If you want to just get an instance up and slowly download everything, you can deploy a droplet with 4GB of memory and use the low_mem.sh script.

To download and restore the entire blockchain takes a fair amount of resources so the high_mem.sh script is written for 16GB of memory.

I'd give an estimate for times, but in my testing it varies a great deal depending on a lot of factors, including the node we're connecting to grab blocks. In general, it is always many hours but with the low_mem.sh deployment it may easily become a few days before it completes.

Prepare Script

Start by opening your favorite text editor such as Sublime Text or just Notepad and grab either the high_mem.sh or low_mem.sh script.

The first modification is to change the mysql_password in your copy of the script. Make that change and save your copy of the script. We'll modify this a bit more shortly.

mysql_password="mystrongcomplexpassword"

Create Droplet and Volume

We start by clicking the Create button and selecting Droplets

Then, we select CentOS and 7.4 x64 from the drop-down:

As we look to Choose a size you'll notice there are Standard Droplets and Optimized Droplets available. We can later expand our instance, however that expansion is only within this droplet type. You cannot deploy a Standard and later convert it to an Optimized instance, or vice versa.

Next, click Add Volume to add block storage.

Given the current size of the Steem blockchain, I recommend using at least 500GB, especially if you plan to download the blockchain as you'll need the extra space.

Next, we'll pick a region.

As you'll see, only the supported regions are selectable with all others grayed out.

Everything here around the volume name presumes a lot and those presumptions may be wrong. By default, DigitalOcean names a Volume as volume-{region}-{volume_index} where the index is essentially the count of Volumes you have in that region. Our approach here works in most scenarios, but if you have issues mounting your volume just look to the actual name displayed in DigitalOcean once it is created.

That said, we look to the following line in our script setting the volume_name variable:

volume_name="volume-tor1-01"

There are two parts to this name, the region (tor1 here) and the index of the volume (01 in this case) for that region.

If you do not have any volumes in the region you are using, this should deploy with the 01 index. If you have existing volumes in that region, it will increment to the next number. Update the volume_name accordingly.

Next, we check the User data box and paste in our modified script:

Now, we can scroll down and click the Create button.

The page will refresh and bring us back to the list of droplets and you'll see this one building. Once it's complete, the IP address will appear and you can click the Copy link to grab that and paste into your SSH client such as Putty.

Known Issues

One thing I discovered quickly is that some of the sizes set for database fields are too small and we will see Data too long for column 'raw' errors.

Fortunately, the community stepped in and there is a pull request at github that fixes everything. My scripts here manually checkout that pull request rather than the master branch.

I know, I saw the signs that advised against hiking off-trail, but c'mon, this is an adventure so let's explore and see what we find! Dry humor aside, this branch has proven reliable in my testing so I'm comfortable with it but please advise of any issues you encounter.

The SBDS project still references steemd.steemit.com as the RPC node it connects to, but this was discontinued on January 6, 2018 so I've got a little sed in here to update that to api.steemit.com

Confirm Script Execution

The deploy will take a few minutes and you should check for the install email from DigitalOcean to get the password. Once you are logged in, check the script execution status by tailing the log we created.

tail -fn5 /var/log/mysbds_install.log

Somewhere between a few hours and a few days, you'll finally see:

END install

A simple way to check progress

mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
mysql -h $mysql_ip -p -e "SELECT max(block_num) AS lastblock FROM sbds_core_blocks" steem

And you'll see something like this, with a much smaller block number.

+-----------+
| lastblock |
+-----------+
|  20065928 |
+-----------+

Then, just sit back patiently until everything fully syncs up.

You can watch the head_block_number shown at https://api.steemjs.com/getDynamicGlobalProperties to see how far along you are.

Conclusion

I plan to continue working with SBDS and updating this project with new scripts, tricks, and tools that I discover along the way.

I'm not seeing a whole lot of posts about SBDS so I'm hoping to open the discussion so we can all work together to get this running well. If you're already using SBDS or I just opened this door for you please comment and share your experiences! I'm gonna document everything I personally discover as well as any insights shared by the community.

My goal is to make it as simple as possible to push the blockchain to a MySQL database and I feel like I'm off to a good start.

Posted on Utopian.io - Rewarding Open Source Contributors

utopian-io mysql steemdev database linux

0.000

39 comments

@ipromote 64

about 6 years ago

You got a 3.45% upvote from @ipromote courtesy of @blervin!
If you believe this post is spam or abuse, please report it to our Discord #abuse channel.

If you want to support our Curation Digest or our Spam & Abuse prevention efforts, please vote @themarkymark as witness.

0.000

@mercurybot 51

about 6 years ago

You got a 15.38% upvote from @mercurybot courtesy of @blervin!

0.000

@upmewhale 65

about 6 years ago

You got a 4.62% upvote from @upmewhale courtesy of @blervin!

0.000

@upmyvote 56

about 6 years ago

You got a 6.06% upvote from @upmyvote courtesy of @blervin!
If you believe this post is spam or abuse, please report it to our Discord #abuse channel.

If you want to support our Curation Digest or our Spam & Abuse prevention efforts, please vote @themarkymark as witness.

0.000

@buildawhale 84

about 6 years ago

You got a 2.01% upvote from @buildawhale courtesy of @blervin!
If you believe this post is spam or abuse, please report it to our Discord #abuse channel.

If you want to support our Curation Digest or our Spam & Abuse prevention efforts, please vote @themarkymark as witness.

0.000

@boomerang 53

about 6 years ago

This post has received a 5.08 % upvote from @boomerang thanks to: @blervin

0.000

@justyy 74

about 6 years ago (Edited)

Your contribution cannot be approved because it does not follow Utopian Rules

Hello, thanks for your post, however, we believe the SBDS is open source and your contribution is two scripts (configuration).

Trivial code snippets, example code or simple templates will not be accepted.

You can contact us on Discord.
[utopian-moderator]

0.000

@twodollars 53

about 6 years ago

Thanks for putting this information together. I've been wanting to get the steem blockchain in mysql for quite a while, and I've got a droplet on digitalocean running your script as I type. I'll be sure to let you know how it goes, but I might not be able to experiment with things again until tomorrow night. I just wanted to quickly say thanks.

0.000