HAF-powered Hivemind dev environment setup guide (according to my findings, it may or may not be the best way)

in #hivemindlast year (edited)

image.png

Hello !

If you're an astute reader, you probably realized this title was similar to my previous post here: https://peakd.com/hivemind/@howo/hivemind-dev-environment-setup-guide-according-to-my-findings-it-may-or-may-not-be-the-best-way

It's for good reasons, this is pretty much an updated guide. Recently hivemind was updated to use HAF as a backend instead of querying hived directly. This drastically changes the way you need to setup your local env and after spending a weekend figuring it out, I thought I'd dump my findings here.

Disclaimer, a bunch of things here are not optimal/secure and should only be used in a local testing environment. it's more of a "if it works, it works".

I'm using ubuntu 22.04 LTs as the time of writing.

Pre-requisites

let's start by updating our machine and installing some dependencies:

sudo apt update && sudo apt upgrade

sudo apt-get install -y \
        autoconf \
        automake \
        autotools-dev \
        build-essential \
        cmake \
        doxygen \
        git \
        libboost-all-dev \
        libyajl-dev \
        libreadline-dev \
        libssl-dev \
        libtool \
        liblz4-tool \
        ncurses-dev \
        python3 \
        python3-dev \
        python3-jinja2 \
        python3-pip \
        libgflags-dev \
        libsnappy-dev \
        zlib1g-dev \
        libbz2-dev \
        liblz4-dev \
        libzstd-dev \
    ninja-build \
    libpqxx-dev \
    postgresql-server-dev-14 \
    liburing-dev

Postgresql

In the last post I was using docker, it was very conveninent but because HAF scripts installs extensions and often assumes a local install I suggest relying on a local install instead.

then we install postgres:

sudo apt -y install postgresql-14

Disabling passwords (bad idea)

I disable password checks because a bunch of the scripts use the --no-password option. This is where I went nuclear and disabled all password checks which is very bad. There is a better way to do this but I couldn't be bothered to spend the time as it's only for my local dev needs. I cannot stress this enough, DO NOT do this on a production server.

remove the password for postgres
sudo -i -u postgres psql -c 'alter role postgres password null;'

then edit this file which determines the authentication method and put all the lines to trust

sudo nano /etc/postgresql/14/main/pg_hba.conf

at the end the file should look like this:

local   all             postgres                                trust

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

If postgresql isn't running on 5432:

I don't know if it's because in the past I tweaked with postgres ports but I had to change the port to 5432:

sudo nano /etc/postgresql/14/main/postgresql.conf

there is a line with port= if it's not set to 5432, set it to that number.

restart postgres with the new config

sudo systemctl restart postgresql.service

if something is wrong, you can check the logs here:
cat /var/log/postgresql/postgresql-14-main.log

HAF / Hived / Hivemind

Now that we are done, it's time to setup the actual stack. first download hivemind and enter in the dir:

git clone https://gitlab.syncad.com/hive/hivemind.git
cd hivemind

Something to understand is that HAF is registered on hivemind as a git submodule, it's important to download haf using the gitsubmodule because a haf version downloaded otherwise may not be compatible with hivemind.

Similarly if we go one node deeper, hived is a git submodule of haf, so for the same reasons you should build it from the haf repository you got from the submodule.

if that was confusing to you here's an pruned version of the directory tree:

hivemind
├── haf
│   ├── hive

Now let's pull all the directories with a single command:

git submodule update --init --recursive

now let's go into the haf dir, create a build directory to put all the binaries we are going to build, and build :

cd haf
mkdir build
cd build

and now we prep the compilation files:
cmake -DPOSTGRES_INSTALLATION_DIR=/usr/lib/postgresql/14/bin -DCMAKE_BUILD_TYPE=Release .. -GNinja

and compile !

ninja

note: if your computer doesn't have enough ram, it may be a good idea to reduce the amount of thread with the -j option eg for 4 threads: ninja -j4

Once it's done, you can add the newly compiled extensions to postgres:

sudo ninja install

Prepping postgres for HAF

The script is going to execute scripts as postgres, but postgres does not have access to the directory. This is another one of those "don't do this" moment. But for the same reasons I didn't want to spend the time figuring it out.

Give access to postgres to the hivemind directory recursively
sudo setfacl -R -m u:postgres:rwx /home/howo/hivemind/

Finally we are ready to execute scripts:

Go out of the haf build directory into haf's script dir and execute those scripts to setup the database:

cd ../scripts
sudo chown -R postgres:postgres /home/howo/projects/hivemind/haf/scripts/haf_database_store
sudo chmod -R 700 /home/howo/projects/hivemind/haf/scripts/haf_database_store
sudo ./setup_postgres.sh
sudo ./setup_db.sh --haf-db-admin=postgres

Finally it's time to setup hive and fill the db, go out of scripts into haf's directory and create a directory to store the block_log etc...

First we'll download the 5 million block_log from @gtg

cd ..
mkdir hive_data
cd hive_data
mkdir blockchain
cd blockchain
wget -O block_log https://gtg.openhive.network/get/blockchain/block_log.5M

Now we add the configuration, note that I have added extra configs at the end to enable the sql_serializer plugin

cd ..
nano config.ini

this is the config I use feel free to edit for your needs:

log-appender = {"appender":"stderr","stream":"std_error"}
log-logger = {"name":"default","level":"info","appender":"stderr"}
backtrace = yes
plugin = webserver p2p json_rpc
plugin = database_api
# condenser_api enabled per abw request
plugin = condenser_api
plugin = block_api

# market_history enabled per abw request
plugin = market_history
plugin = market_history_api

plugin = account_history_rocksdb
plugin = account_history_api

# gandalf enabled transaction status
plugin = transaction_status
plugin = transaction_status_api

# gandalf enabled account by key
plugin = account_by_key
plugin = account_by_key_api

# and few apis
plugin = block_api network_broadcast_api rc_api

history-disable-pruning = 1
account-history-rocksdb-path = "blockchain/account-history-rocksdb-storage"

# shared-file-dir = "/run/hive"
shared-file-size = 20G
shared-file-full-threshold = 9500
shared-file-scale-rate = 1000

flush-state-interval = 0

market-history-bucket-size = [15,60,300,3600,86400]
market-history-buckets-per-size = 5760

p2p-endpoint = 0.0.0.0:2001
p2p-seed-node =
# gtg.openhive.network:2001

transaction-status-block-depth = 64000
transaction-status-track-after-block = 42000000

webserver-http-endpoint = 0.0.0.0:8091
webserver-ws-endpoint = 0.0.0.0:8090

webserver-thread-pool-size = 8
plugin = sql_serializer
psql-url = dbname=haf_block_log user=postgres hostaddr=127.0.0.1 port=5432
psql-index-threshold = 1000000
psql-operations-threads-number = 5
psql-transactions-threads-number = 2
psql-account-operations-threads-number = 2
psql-enable-account-operations-dump = true
psql-force-open-inconsistent = false
psql-livesync-threshold = 100000

Now it's time to fill the haf database !

cd ..
./build/hive/programs/hived/hived -d hive_data --replay-blockchain --stop-replay-at-block 5000000 --exit-after-replay --force-replay --psql-index-threshold 65432

This should run for a little while (10 minutes to a few hours depending on how good of a computer you have) Once it's done, you have a full HAF database ready to be used by hivemind !

Hivemind

let's go up one level to go back to the hivemind directory and into the scripts directory for hivemind

cd ../scripts

from there we can setup the database with the scripts provided:

./setup_postgres.sh --host=localhost
./setup_db.sh --postgres-url=postgresql://postgres@localhost:5432/haf_block_log

And now hivemind can index things !

Install hivemind dependencies:

You can find all the dependencies in setup.cfg. You can use tox to run it but I prefer using local python so I just install them directly. As of the time of writing the dependencies are as follow:

    aiopg == 1.3.4
    jsonrpcserver == 4.2.0
    simplejson == 3.17.6
    aiohttp == 3.8.1
    certifi == 2022.6.15
    sqlalchemy == 1.4.39
    funcy == 1.17
    ujson == 5.4.0
    urllib3 == 1.26.10
    psycopg2-binary==2.9.3
    aiocache == 0.11.1
    configargparse == 1.5.3
    diff-match-patch == 20200713
    prometheus-client == 0.14.1
    psutil == 5.9.1
    atomic == 0.7.3
    cffi == 1.14.5
    gitpython == 3.1.27
    pytz == 2022.1

optional step: mocks:

if you want to add mocks, there is a script to do so that will populate the mocks in HAF:

MOCK_BLOCK_DATA_PATH=./hivemind/mock_data/block_data
MOCK_VOPS_DATA_PATH=./hivemind/mock_data/vops_data
export PYTHONPATH=$PYTHONPATH:/home/howo/hivemind
python3 ./hive/indexer/mocking/populate_haf_with_mocked_data.py --database-url postgresql://postgres:root@localhost:5432/haf_block_log

Ready to sync !

Finally we are ready to sync ! update test-max-block depending on your mocks.

python3 ./hivemind/hive/cli.py --database-url postgresql://postgres:@localhost:5432/haf_block_log --test-max-block=5000024

Now this will take anywhere from 30 minutes to a few hours. It's the longest part of the process but once you're synced, you're ready to go :)

Sort:  

I could not get hived to stop pulling blocks after the 5m block. Is the --stop-replay-at-block=5000000 switch working for you? I see you don't have the equals sign in the command in your post.

Yep, there is no =, also you need to use gtg's 5m block log