Beem Basics: Downloading a Day of Steem Blockchain Data

in #beem5 years ago


Image: @creativista

Beem is a python library for the Steem blockchain. This post shows how to 'download' all blockchain transfers from 1 day:

import sys
from beem.blockchain import Blockchain
from datetime import datetime, timedelta
from beem.utils import addTzInfo
import shelve

start = addTzInfo(datetime(2019, 12, 19))
stop = start + timedelta(days=1)

b = Blockchain()
startblk = b.get_estimated_block_num(start)
stopblk = b.get_estimated_block_num(stop)

ops = []
for op in b.stream(start=startblk, stop=stopblk, max_batch_size=50,
                   opNames=['transfer']):
    sys.stdout.write("%s\r" % (op['timestamp']))
    ops.append(op)

s = shelve.open("transfers-%04d-%02d-%02d.shelf" %
                (start.year, start.month, start.day))
s['ops'] = ops
s.close()

Let's go through this in more detail

  • Imports: The Blockchain class is the most important part here. sys is for status outputs, shelve to store the results in a file. datetime and beem.utils is for start/stop date format handling
  • start and stop define the boundaries for the blockchain data. Since beem uses a timestamp format with timezone information, the addTzInfo helper can be used to add the timezone. The timestamps are in UTC time zone.
  • b = Blockchain() creates a class instance. This is where the connection to one of the Steem nodes is set up.
  • Requesting data from the blockchain is based on block numbers. The get_estimated_block_num function translates the start/stop timestamps into block numbers.
  • b.stream() finally fetches all operations of the blocks between startblk and stopblk from the Steem node. max_batch_size=50 instructs beem to bundle 50 block requests into one API call. This is much faster than fetching each block individually. opNames filters the type of blockchain operations we're interested in. It's transfer in this case, but you can set any other op type there or leave it out to get all (non-virtual) ops.
  • sys.stdout.write() is just to print out some status information on how far the script already processed the data
  • I'm capturing all ops in a list here and save them to a shelve file for later/offline analysis.

Depending on your connection to a Steem node, this script might take a few minutes for a full day.


Any questions, remarks or things you'd like to see done with beem? Leave a comment!

Sort:  

Congratulations @stmdev! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You distributed more than 145000 upvotes. Your next target is to reach 150000 upvotes.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

glad to see you writing again. I am just learning to code

ah, I think I remember you from steemdevs discord, you're one of @simplegame's game account, aren't you? feel free to ask if you have any questions :)

Thanks for the awesome description of what your code is doing. I am teaching myself Python and one of the goals is to be able to write code to extract personal STEEM stats, ie. posts, comments and other information that I came get from services like steemworld, but customised for myself and on a single page. The beem docs are currently still a little confusing for a noob like myself so posts like this are very helpful.

I'm glad if it is of any help. The beem docs can indeed be confusing when starting with beem, but they really contain a lot of information.

It is. I am going through some online courses to get more familiar with the workings of Python and that will help me follow your code better and the I will hopefully make better sense of the beem documentation.