MySBDS - Steem Blockchain Data Service in MySQL

avatar
(Edited)

MySBDS - Steem Blockchain Data Service in MySQL

What is the project about?

The Steem Blockchain Data Service is an open source project managed by Steemit that allows you to read the Steem blockchain and store it in a local MySQL database. The project opens with an obvious Notice:

This is prerelease software, not yet suitable for production use. Proceed at your own risk.

I'm working under this warning to get SBDS operational in the most stable way possible today, while watching and supporting the project as it moves forward. At this point, I've spent 100 hours (that I tracked, so more really) digging into SBDS and working to understand it. I'm not gonna give myself the Complete Understanding Badge yet, but I'm on my way.

MySBDS.com

Today, MySBDS.com and blervin/mysbds exist to provide three key things:

  • Proven deployment scripts to get up and running with SBDS quickly
  • Documentation of the challenges identified thus far (both on Steemit.com) and on MySBDS.com
  • A public endpoint offering an always up-to-date full mysqldump of the entire Steem blockchain

I've dumped a pretty significant amount of time and energy into SBDS and I'm barely scratching the surface so I want to share everything I've found (today and ongoing) to get a dialog going with everyone else who's looking to get a local MySQL database running. And also to provide this public documentation of implementation results for the team working on updating the code for steemit/sbds.

My near-term goal is to get a public instance up and running for those who aren't quite ready to deploy and manage an entire environment themselves, but still wanna dig into the blockchain data.

Technology Stack

Everything outlined here is deployed in docker on CentOS 7.4 using common bash commands.

I will specifically outline a DigitalOcean deployment along with some useful functionality available there.

From what I see around here, it looks like Ubuntu is pretty popular so I'm planning to put together another version for that as well.

Roadmap

In order of complexity (and likely implementation):

  • Prepare MySBDS.com to highlight current scripts, documentation, and the most relevant Steemit posts about SBDS
  • Develop an Ubuntu deployment script
  • Document and outline challenges with deployments
  • Document my monitoring infrastructure
  • Deploy a public MySQL server
  • Explore interest in fully managed deployments or even replication
  • Continue performance monitoring to identify refinements and share those insights

How to contribute?

Comment here and any other posts about SBDS, catch me on steemit.chat, or just let me know how you can help make the blockchain easily accessible in MySQL!

Deploy MySBDS

Overview

I was excited to play with the SteemSQL service that was originally public, so when it recently became a paid service a little beyond my budget I began to look for alternatives and I realized I could easily maintain my own copy of the blockchain.

At the time of this writing, the size of the MySQL database of the entire blockchain is 403GB, and growing rapidly. Because of this growth, I wanted to look at ways to deploy this on flexible infrastructure and I found a solution with Block Storage at DigitalOcean. This allows us to deploy storage at any size for $0.10/GB per month and increase that size at any time later, however, storage can never be decreased.

We'll walk through the steps to create a script that will deploy a cloud server at DigitalOcean running the Steem Blockchain Data Service and MySQL in docker.

Requirements

You will need an account at DigitalOcean to deploy a cloud server and related storage. Also, it is presumed that you are comfortable in the shell and understand how to connect to the server via SSH and perform basic systems admin work on a Linux platform.

Initial Setup

We need to ensure we deploy our droplet in a region that supports block storage. The supported regions for volumes are:

  • NYC1
  • NYC3
  • SFO2
  • FRA1
  • SGP1
  • TOR1
  • BLR1
  • LON1
  • AMS3

You'll need to consider the size of the droplet you want to deploy, or the real question, whether you want to download the full mysqldump of the blockchain. I have written the low_mem.sh and high_mem.sh for both scenarios.

If you want to just get an instance up and slowly download everything, you can deploy a droplet with 4GB of memory and use the low_mem.sh script.

To download and restore the entire blockchain takes a fair amount of resources so the high_mem.sh script is written for 16GB of memory.

I'd give an estimate for times, but in my testing it varies a great deal depending on a lot of factors, including the node we're connecting to grab blocks. In general, it is always many hours but with the low_mem.sh deployment it may easily become a few days before it completes.

Prepare Script

Start by opening your favorite text editor such as Sublime Text or just Notepad and grab either the high_mem.sh or low_mem.sh script.

The first modification is to change the mysql_password in your copy of the script. Make that change and save your copy of the script. We'll modify this a bit more shortly.

mysql_password="mystrongcomplexpassword"

Create Droplet and Volume

We start by clicking the Create button and selecting Droplets

create.png

Then, we select CentOS and 7.4 x64 from the drop-down:

centos.png

As we look to Choose a size you'll notice there are Standard Droplets and Optimized Droplets available. We can later expand our instance, however that expansion is only within this droplet type. You cannot deploy a Standard and later convert it to an Optimized instance, or vice versa.

choose_a_size.png

Next, click Add Volume to add block storage.

add_storage.png

Given the current size of the Steem blockchain, I recommend using at least 500GB, especially if you plan to download the blockchain as you'll need the extra space.

block_storage.png

Next, we'll pick a region.

region.png

As you'll see, only the supported regions are selectable with all others grayed out.

Everything here around the volume name presumes a lot and those presumptions may be wrong. By default, DigitalOcean names a Volume as volume-{region}-{volume_index} where the index is essentially the count of Volumes you have in that region. Our approach here works in most scenarios, but if you have issues mounting your volume just look to the actual name displayed in DigitalOcean once it is created.

That said, we look to the following line in our script setting the volume_name variable:

volume_name="volume-tor1-01"

There are two parts to this name, the region (tor1 here) and the index of the volume (01 in this case) for that region.

If you do not have any volumes in the region you are using, this should deploy with the 01 index. If you have existing volumes in that region, it will increment to the next number. Update the volume_name accordingly.

Next, we check the User data box and paste in our modified script:

additional_options.png

Now, we can scroll down and click the Create button.

finalize.png

The page will refresh and bring us back to the list of droplets and you'll see this one building. Once it's complete, the IP address will appear and you can click the Copy link to grab that and paste into your SSH client such as Putty.

Known Issues

One thing I discovered quickly is that some of the sizes set for database fields are too small and we will see Data too long for column 'raw' errors.

Fortunately, the community stepped in and there is a pull request at github that fixes everything. My scripts here manually checkout that pull request rather than the master branch.

I know, I saw the signs that advised against hiking off-trail, but c'mon, this is an adventure so let's explore and see what we find! Dry humor aside, this branch has proven reliable in my testing so I'm comfortable with it but please advise of any issues you encounter.

The SBDS project still references steemd.steemit.com as the RPC node it connects to, but this was discontinued on January 6, 2018 so I've got a little sed in here to update that to api.steemit.com

Confirm Script Execution

The deploy will take a few minutes and you should check for the install email from DigitalOcean to get the password. Once you are logged in, check the script execution status by tailing the log we created.

tail -fn5 /var/log/mysbds_install.log

Somewhere between a few hours and a few days, you'll finally see:

END install

A simple way to check progress

mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
mysql -h $mysql_ip -p -e "SELECT max(block_num) AS lastblock FROM sbds_core_blocks" steem

And you'll see something like this, with a much smaller block number.

+-----------+
| lastblock |
+-----------+
|  20065928 |
+-----------+

Then, just sit back patiently until everything fully syncs up.

You can watch the head_block_number shown at https://api.steemjs.com/getDynamicGlobalProperties to see how far along you are.

Conclusion

I plan to continue working with SBDS and updating this project with new scripts, tricks, and tools that I discover along the way.

I'm not seeing a whole lot of posts about SBDS so I'm hoping to open the discussion so we can all work together to get this running well. If you're already using SBDS or I just opened this door for you please comment and share your experiences! I'm gonna document everything I personally discover as well as any insights shared by the community.

My goal is to make it as simple as possible to push the blockchain to a MySQL database and I feel like I'm off to a good start.



Posted on Utopian.io - Rewarding Open Source Contributors



0
0
0.000
39 comments
avatar
(Edited)

Your contribution cannot be approved because it does not follow Utopian Rules

Hello, thanks for your post, however, we believe the SBDS is open source and your contribution is two scripts (configuration).

Trivial code snippets, example code or simple templates will not be accepted.

You can contact us on Discord.
[utopian-moderator]

0
0
0.000
avatar

Thanks for putting this information together. I've been wanting to get the steem blockchain in mysql for quite a while, and I've got a droplet on digitalocean running your script as I type. I'll be sure to let you know how it goes, but I might not be able to experiment with things again until tomorrow night. I just wanted to quickly say thanks.

0
0
0.000
avatar

Awesome, please follow up and let me know how it goes!

0
0
0.000
avatar

I'm currently on the step "build/run sdbs" and I see the mount is using 437G of space so things seem to be working. I'm having trouble finding out how to access the data in mysql, and I've tried setting up phpmyadmin, but I can only get phpmyadmin working on localhost and I can't seem to get it to working using the network ip. What program are you using the access the mysql data?

0
0
0.000
avatar

Here's a reliable tool you can use to connect: https://www.heidisql.com/download.php

0
0
0.000
avatar

I'm trying to find a way to connect to it using the same droplet on digitalocean. Maybe the problem is that I installed mysql on the droplet to be able to run phpmyadmin. For example, when I type the following command "mysql -u root -p" I can login with nothing as a password and then at the mysql prompt I can type "show databases;" and the only databases shown are information_schema, mysql, and performance_schema, and there is no reference to the steem mysql database. When I vi to /etc/my.cnf I see that the current data directory is "/var/lib/mysql" but I think the data your script generated is found in "/mnt/volume-tor1-01/volumes". I'll try to play around with it more later tonight. My goal is to be able to connect to the steem mysql data with php because that is the language I am most familiar with.

0
0
0.000
avatar

Yeah, installing mysql locally is fine, but might make it a little confusing since you now have two separate mysql instances running, but this should clarify.

To connect from the droplet itself, we need to look to the docker networking to find the IP the mysql container is running at by running this command:

docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql

For example, I get 172.17.0.2 as the IP. Then, you can connect using the IP returned from that command:

mysql -h 172.17.0.2 -u root -p

If you are connecting from any remote system, you can just use the public IP of this droplet with the normal port 3306.

0
0
0.000
avatar

Thanks for all your help. Since the commands you provided didn't work with my dual-mysql-trying-to-get-phpmyadmin-to-work setup, I destroyed the droplet I had been working on started over (I had to start over more than once because I forgot to increment the volume_name but thanks for your note about incrementing the volume name), and now it seems to be working for me like it is for you. I'm pretty excited because it looks like the hurdles have been removed so I can play around with the steem blockchain in mysql. Note: I tried using steemsql.com a while back but I wasn't able to get that to work with php in linux. Also, with the help of @eonwarped simplifying some Github instructions and pasting some commands for me I was able to get a condenser working which was neat to see, but my main motivation with setting up the condenser was because thought I would enable me to have the steem blockchain in mysql, but I couldn't get mysql working with the condenser, and when I saw the size of the condenser installation I doubted if it had all the blockchain data anyway. In short, I really appreciate there work you did here.

0
0
0.000
avatar

Sorry you had to start over but I'm glad you're up and running again. The learning curve here is a little steep but I've ran into most of the problems you'll see so don't hesitate to follow up if you have any other issues.

0
0
0.000
avatar

I'm running into an issue on my fresh install. The command "mysql -h 172.17.0.2 -u root -p" used to work for me during the download or restore phase, but now that I am in the "build/run sdbs" phase, that same mysql command produces the error, "ERROR 2003 (HY000): Can't connect to MySQL server on '172.17.0.2' (113)". I also tried connecting to the database remotely using the droplet IP (instead of 172.17.0.2), and I got a similar error except it mentioned (111) instead of (113). Do I just need to wait for the install script to finish or is there something else I need to do?

0
0
0.000
avatar

Check docker to make sure mysql is running:

docker ps

If not, you can try starting it again:

docker start steem_mysql

And if you're still having troubles, you can take a look at logs:

docker logs steem_mysql

This assumes you used the name steem_mysql as specified in my script, change as necessary.

0
0
0.000
avatar

Thanks for helping me with the basics of docker. The "docker ps" showed that steem_mysql wasn't running so I ran "docker start steem_mysql" and then I saw that it was running, but I still couldn't connect so I ran the "docker logs steem_mysql" command and noticed a bunch of recent messages about doing a recovery, but the recovery seemed to finish in about three minutes and after that everything seems to work as expected :)

0
0
0.000
avatar

Awesome, glad you got it going!

This all underscores how many 'grey areas' I left ambiguous in here, so I'm putting together a more thorough tutorial that'll go into docker and other details a bit more.

0
0
0.000
avatar

Sorry to keep troubling you. I'm noticing that steem_mysql keeps stopping and gives the following type of error, "Aborted connection 5 to db: 'steem' user: 'root' host: '172.17.0.3' (Got an error reading communication packets)". I keep restarting steem_myslq (I've restarted at least 10 times and probably closer to 20 times) and it was stuck on the lastblock of 20080434 for a long time, and then I checked again and and the lastblock jumped to 20192434. Is having to restart steem_mysql this many times normal?

0
0
0.000
avatar

No worries, it's good to document these issues in the blockchain so others can learn too!

Interestingly, this is one issue I have not yet run into. For me, steem_mysql has been running flawlessly, it's the steem_sbds container that has stopped and had issues so far.

My general thoughts are this may be system level issues. How much memory do you have? I've tested this in a 4GB instance and can say it definitely does NOT work with less, at least not reliably.

Also, you mentioned installing other tools so these may also have an impact.

Something to check is to dump all the logs to local files so you can parse and review more easily. Docker makes it a little weird, so use this technique:

docker logs steem_mysql > mysql.log 2>&1
docker logs steem_sbds > sbds.log 2>&1
0
0
0.000
avatar

I didn't install any other tools in this particular setup (I destroyed the server with the other tools), and I think I adhered very closely to the instructions you provided. I'm using the 16GB digitalocean server and 500GB volume that were recommended in the instructions, and I used the high memory script.

0
0
0.000
avatar

Alright, with that deploy when you're doing the database restore, that does hammer mysql pretty hard so that's probably the main issue here.

Again, check the docker logs to see if anything obvious pops out there. Beyond that, watch mysql to see what it's doing by running this command:

mysqladmin -h 172.17.0.2 -u root -p processlist

While the restore is running, you should see 'INSERT INTO...' in the process list for many hours.

If mysql dies, you shouldn't need to start all the way over. Assuming that you still have all the .gz files in the dump/ folder you can run this:

mysql_password="mystrongcomplexpassword"
volume_name="volume-tor1-01"
mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
for i in /mnt/$volume_name/dump/*.gz ; do gunzip < $i | mysql -h $mysql_ip -p$mysql_password steem ; done

Obviously, update the password and volume. This is just part of the script that calls the import.

0
0
0.000
avatar
(Edited)

I haven't been too suspicious about the database restores because I actually did the restores twice (onec on the server I destroyed and once on the current server), and both times seemed to take about the same amount of time (maybe around 8 hours) and both resulted in about the same amount of space being used on /mnt so before investigating the database restore, I wanted to try something based on what you said about how you had no issues with steem_mysql stopping and that you tested things on 4GB so I'm in the process of trying things with your low memory script's mysql settings. More specifically, I removed both the docker steem_mysql and steem_sbds and then ran lines 38 and 59 in your low memory script to get them both going again, and I found I also had to rename a directory with a really long name that was in the "volumes" directory so the system would continue using all existing mysql data instead of populating mysql from scratch. So far things seem to be working as steem_mysql hasn't stopped in over an hour.

0
0
0.000
avatar

I'm glad it's coming together! It really is a lot of learning little pieces of the puzzle like this to understand everything. I'm still getting that tutorial together to walk through all of the steps like this in a little more detail.

One tip, renaming the volume could cause issues with docker (thought it sounds like it worked here) so here's a trick in case it comes up again.

Inside the volume folder you'll see another folder named _data containing all of the databases. Make sure you docker stop steem_mysql first so nothing is running and then just move your _data folder with all the existing data to overwrite the _data folder in the running container. You can't just replace the steem/ folder as this is innoDB and needs the ib... files as well.

Then, just docker start steem_mysql and it'll see the database.

0
0
0.000
avatar

It still seems to be working, and the lastblock seems to stay very close to the current head_block_number.

Thanks for the tip about the _data folder. I was definitely not confident about what I did, but it is good to know the proper way to do things going forward.

Note: What I did fortunately had the same effect as overwriting the data folder in the running container because there was /old64characterdirectory/_data/ and /new64characterdirectory/_data/ and since I saw that the old one was about 360G but that only the new one was the only one being written to so I stopped both steem_mysql and steem_sbds and then removed the new one and renamed the old one to what the new one was which was a roundabout way of overwriting the _data folder in the running container, but my approach could have been problematic if there was anything other than the _data folder in the running container.

0
0
0.000
avatar

Awesome! To test, I run this a least once a day:

SELECT MAX(block_num) AS lastblock FROM sbds_core_blocks

And check against the head_block_number here: https://api.steemjs.com/getDynamicGlobalProperties

I have fallen a bit behind a few times and as far as I can tell it's the public node we're connecting to that's the issue. Even so, stopping and restarting steem_sbds should get you back on track eventually if you see you're falling behind.

0
0
0.000
avatar

Yes, I've been checking the max block_num against the head_block_number throughout the day, and for the past 24 hours it seems to be keeping up well with about a 20 block lag which seems reasonable.

As a test I reverted back to the high memory settings (line 38 of your high memory script) to see if steem_mysql would keep stopping like it was doing before, and so far it has been working well for the past hour, and I plan to continue testing it out tomorrow to see if I need to do any restarts. I'm thinking that the high memory settings may only run into issues when it is way way behind on the block_num because maybe the high memory mysql settings cause the system to try and catch up too fast. It will be interesting to see if steem_mysql has any stops tomorrow.

0
0
0.000
avatar

I've pretty consistently stayed within 20 blocks as well, which is one minute and more than reasonable.

You're exactly right, the settings you changed are exactly what to tweak. I've been working on a new 'tutorial' post the past couple of days after gleaning a bunch of insights from our discussion.

I'm gonna put something together that goes through each piece of my scripts in more detail, including these mysql settings.

0
0
0.000
avatar

The high memory settings stopped working around noon today so it made it through the last night and and this morning and then failed. I started things up again tonight and noticed the server load go up to 18 in about 10 or 15 minutes with kswapd0 being the top process which I think is related to virtual memory.

I briefly looked for a way to change the mysql settings in docker without having to do the thing where I do a docker stop, rm, run, but I couldn't find a way.

At some point tomorrow I plan to go back to the low memory settings to see if they continue to work as well as they did yesterday.

0
0
0.000
avatar

This is good to hear. I only ran the high memory instance for a couple of days. I thought I had proved it enough, but obviously there's more to learn there.

As far as changing mysql settings in docker, I can see the settings with docker inspect but cannot find a way to set them.

0
0
0.000
avatar

I was wondering about the possibility of zipping and tarring up the _data directory to populate sbds instead of using mysqldumps. For example, instead of just offering latest.tar for download on your website, would it also work to offer latest_data.tar as a download? There is probably a reason why mysqldumps are what people typically do, but I'm wondering if this is a special case where the _data directory might be a faster solution because the database restore took around 8 hours on a 6 virtual core machine with 16GB and the approach of downloading overwriting the _data directory could be much faster.

0
0
0.000
avatar

Haha, the .tar download itself was a whole other battle for me.

The short answer is the ibdata... and ib_logfile... files are required as well as the steem/ folder, but those files contain data about all the databases on that server. Even so, this approach should work if you backup the entire _data folder, but nothing short of that. That would almost assuredly go faster than dealing with the dump.

Creating my .tar is no walk in the park either. I'm looping through the tables and doing a mysqldump one at a time and appending each of them to the .tar file. This database is so huge a standard mysqldump of the entire thing never finished once in my testing.

0
0
0.000
avatar

It is difficult and expensive to be working with such large databases which got me thinking about a way to potentially reduce the size of the database by more than 50% without too much of a sacrifice. Anyway, I noticed that the sbds_core_blocks table accounts for around two-thirds of the entire database size, and I'm guessing that the mediumtext entry named, "raw" (probably for raw blockchain data) probably accounts for the vast majority of that space, and it got me wondering if it might be possible to simply replace the "raw" entries with a single character or something save space. For example, I think doing this might enable the entire database to fit comfortably on a single $40/month droplet with room to spare.

0
0
0.000
avatar

I've been asking myself this exact same question as well, your approach just may work. I'll definitely post if I find some answers.

0
0
0.000
avatar

Another thing that may be worth considering is that the comments table is the second biggest in storage space, and it may be worthwhile to save space removing removing older edited versions of the "body" mediumtext entries. For example, sometimes I edit a big long post a few times to correct a few typos and this type of editing may be fairly common and may be able to reduce the database size a lot.

0
0
0.000
avatar

Yeah, great point! The body is fully duplicated on edits so that really could save some significant space.

0
0
0.000
avatar

Hi, is the project being continued? People hosting sbds have problem with maintaining it, it brakes all the time.

0
0
0.000
avatar

The project is continuing, but I ran into exactly the problems you're describing.

There has been a lot of work on SBDS over the last few months and there is an open issue to deploy an update, which I hope happens soon.

Basically, I'm dead in the water like everyone else until this gets fixed, but once it does I'll be back online!

0
0
0.000