techcoderx.com node is HAF ready and how you can run one too

in #haf2 years ago (edited)

witness update log 26.png

After 6 days of cleaning the server, moving files and replaying, techcoderx.com is now running on HAF enabled hived v1.26 along with HAfAH for account history APIs. Both are used to sync Hivemind on the latest stable release which are currently not HAF-based.

Screenshot 2022-07-29 at 10.10.38 AM.png
sql_serializer plugin in action

Many have probably asked about how to run a node like this. As I'm writing this, the docs for them are all over the place and that makes it hard to follow along. So I will be documenting the whole process I went through here for anyone to follow.

Certain parts of the guide may change over time as it isn't a final release yet, but most of this should be valid for the next 7 days from the publication of this post (most Hive posts are rarely viewed past the post payout anyways).

2024 update

Most of this guide is out-of-date except the hardware requirements and conclusion as it remains largely the same. Most HAF deployments today should use the Docker Compose that runs all the required components for a fully featured HAF node.

The guide below resembles the manual approach mostly for references.

Hardware requirements

  • At least 6 core CPU, preferably scores >1200 single-core and >6000 multi-core on Geekbench 5
  • 32GB 64GB RAM (128GB or more works better due to ZFS caching)
  • 2x 2TB NVMe SSD in RAID 0 for Postgres DB. I will explain why later.
  • 1TB SATA/NVMe SSD as boot drive and to store block_log file
  • No GPU is required
  • 20Mbps bandwidth. More is required for high traffic node.

If running on residential internet connection behind CGNAT, a nearby VPS is required for a public node. Anything $5/month or less should work as long as it has sufficient bandwidth. Alternatively, glorified reverse proxies (i.e. Cloudflare Tunnel) could also work if you accept the risk.

Dependency setup

For this guide, we will be using Ubuntu Server 22.04 LTS which is what I used on techcoderx.com node.

Essentials

First of all, install some essential packages (including ZFS):

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install wget curl iotop iftop htop jq zfsutils-linux

hived and Postgresql

sudo apt-get install -y \
    autoconf \
    automake \
    cmake \
    g++ \
    git \
    zlib1g-dev \
    libbz2-dev \
    libsnappy-dev \
    libssl-dev \
    libtool \
    ninja-build \
    pkg-config \
    doxygen \
    libncurses5-dev \
    libreadline-dev \
    perl \
    python3 \
    python3-jinja2 \
    libboost-all-dev \
    libpqxx-dev \
    postgresql \
    postgresql-server-dev-14

ZFS pool setup

Running the node without compression will fill up the drives very quickly and you will need a lot more NVMe storage, so it is much more practical to place the postgresql database in a ZFS pool with lz4 compression enabled.

Ideally ZFS pools would consist of at least 2 drives, hence the dual NVMe requirement listed above. ZFS will also handle striping the drives to mimic a RAID 0 setup.

A single-drive ZFS pool is totally possible and it is currently what I'm using, but the performance of a single PCIe 3.0 NVMe SSD isn't enough to sync both hived and Hivemind at reasonable speeds (but can still catch up with the network). PCIe Gen 4 might work better, but I would say it is more cost effective over time to go for a RAID 0 setup.

If you only have one 2TB NVMe SSD like me, you can probably get away by using a very recent Hivemind dump (no more than 1 month old) for the next 6-12 months.

I have used this tutorial to set this up, but to summarize in a few commands:

List all disk identifiers

To create any ZFS pools, you need to know the identifiers for the drives to be used first.

$ lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0    62M  1 loop /snap/core20/1581
loop1         7:1    0    62M  1 loop /snap/core20/1587
loop2         7:2    0  79.9M  1 loop /snap/lxd/22923
loop3         7:3    0  44.7M  1 loop /snap/snapd/15534
loop4         7:4    0    47M  1 loop /snap/snapd/16292
nvme0n1     259:0    0 465.8G  0 disk 
├─nvme0n1p1 259:1    0     1G  0 part /boot/efi
└─nvme0n1p2 259:2    0 464.7G  0 part /
nvme1n1     259:3    0   1.9T  0 disk 
├─nvme1n1p1 259:4    0   1.9T  0 part 
└─nvme1n1p9 259:5    0     8M  0 part 

In my example above, I'm using /dev/nvme1n1 for my single drive ZFS pool. Yours may vary, take note of the identifiers of each drive to be used in the pool.

Create pool

Command:

sudo zpool create -f <pool_name> <disk_id1> <disk_id2> ...

Example for single-drive pool:

sudo zpool create -f haf /dev/nvme1n1

Example for two-drive pool:

sudo zpool create -f haf /dev/nvme1n1 /dev/nvme2n1

Get pool status

sudo zpool status

Enable compression

sudo zfs set compression=on <pool_name>

Set ARC RAM usage limit

Only needed for 32GB RAM systems. In this command, we are setting it to 8GB.

echo "options zfs zfs_arc_max=8589934592" | sudo tee -a /etc/modprobe.d/zfs.conf

You should now have the pool mounted at /<pool_name>. For more useful commands, refer to my hiverpc-resources notes on ZFS.

Postgresql setup

By default, the postgresql database is stored in the boot drive. We want it to be in the ZFS pool created previously.

Create the required directory with appropriate permissions (assuming haf is the pool name):

cd /haf
mkdir -p db/14/main
sudo chown -R postgres:postgres db
sudo chmod -R 700 db

Set the data directory in /etc/postgresql/14/main/postgresql.conf:

data_directory = '/haf/db/14/main'

And tune the database as such in the same config file (below values are for 64GB RAM, scale according to what you have):

effective_cache_size = 48GB
shared_buffers = 16GB
work_mem = 512MB
synchronous_commit = off
checkpoint_timeout = 20min
max_wal_size = 4GB

Restart postgresql to reflect the changes.

sudo service postgresql restart

Compile hived

git clone https://gitlab.syncad.com/hive/haf
cd haf
git submodule update --init --recursive
cd hive
git checkout <latest_develop_commit_hash>
cd ..
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja
sudo ninja install

sudo make install installs the required extensions for sql_serializer plugin to work with postgresql. To install hived and cli_wallet, move the executables found in hive/programs/hived/hived and hive/programs/cli_wallet/cli_wallet respectively to /usr/bin.

Setup HAF db

Enter the psql shell as postgres user:

sudo -i -u postgres psql

Create the db with permissions and extensions:

# Roles
CREATE ROLE hived_group WITH NOLOGIN;
CREATE ROLE hive_applications_group WITH NOLOGIN;
CREATE ROLE hived LOGIN PASSWORD 'hivedpass' INHERIT IN ROLE hived_group;
CREATE ROLE application LOGIN PASSWORD 'applicationpass' INHERIT IN ROLE hive_applications_group;
CREATE ROLE haf_app_admin WITH LOGIN PASSWORD 'hafapppass' CREATEROLE INHERIT IN ROLE hive_applications_group;

# Set password if needed
ALTER ROLE role_name WITH PASSWORD 'rolepass';

# Database used by hived
CREATE DATABASE block_log;

# Use database
\c block_log

# sql_serializer plugin on db
CREATE EXTENSION hive_fork_manager CASCADE;

Authentication

By default, Postgresql allows only for peer authentication. You may enable password authentication by appending the following line in /etc/postgresql/14/main/pg_hba.conf:

local all all peer # after this line
local all all md5 # add this

For more details, refer to the documentation here.

Block Log

As of writing this, there isn't a compressed block_log file to download, so if not already you can download an up-to-date uncompressed block_log:

Update (17 Oct 2022): The compressed block_log is now available for download.

wget -c https://gtg.openhive.network/get/blockchain/compressed/block_log

Compress block_log

To generate a compressed block_log from an uncompressed version, use the compress_block_log tool found in hive/programs/util/compress_block_log in the build directory.

./compress_block_log -j $(nproc) -i /src/uncompressed/blocklog/folder -o /dest/blocklog/folder

Create ramdisk

98% of the time the replay is bottlenecked by disk I/O if the shared_memory.bin file is located on disk. Move it to ramdisk to reduce SSD wear and significantly better performance, hence the memory requirements above. Replace 25G with the size of the ramdisk desired.

cd ~
mkdir ramdisk
sudo mount -t tmpfs -o rw,size=25G tmpfs ~/ramdisk

Configure hived

Generate a hived config file if not already:

hived --dump-config

By default, the data directory is ~/.hived but you may also store it in another directory if you wish.

mkdir -p ~/.hived/blockchain

Move the resulting compressed block_log and block_log.artifacts files to the blockchain folder of the data directory.

Then edit the config.ini file in the data directory as follows (find the key and modify its value, add them if it is not there):

Plugins

# Basic
plugin = witness webserver p2p json_rpc database_api network_broadcast_api condenser_api block_api rc_api

# Reputation
plugin = reputation reputation_api

# Get accounts/witness
plugin = account_by_key account_by_key_api

# Internal market
plugin = market_history market_history_api

# Transaction status
plugin = transaction_status transaction_status_api

# State snapshot
plugin = state_snapshot

# SQL serializer
plugin = sql_serializer

# Wallet bridge API
plugin = wallet_bridge_api

SQL serializer

psql-url = dbname=block_log user=postgres password=pass hostaddr=127.0.0.1 port=5432
psql-index-threshold = 1000000
psql-operations-threads-number = 5
psql-transactions-threads-number = 2
psql-account-operations-threads-number = 2
psql-enable-account-operations-dump = true
psql-force-open-inconsistent = false
psql-livesync-threshold = 10000

See sql_serializer plugin documentation on details of each parameter and configuring relevant filters if desired.

Shared file directory

Specify path to folder containing shared_memory.bin:

shared-file-dir = "/home/<username>/ramdisk"

Shared file size

At least 24GB is needed (currently 22GB), this value will increase over time.

shared-file-size = 24G

Endpoints

Configure the IP and port of the endpoints to be made available for seed and API calls.

p2p-endpoint = 0.0.0.0:2001
webserver-http-endpoint = 0.0.0.0:8091

Replay!

Finally we are ready to replay the blockchain. Start a screen or tmux session and run the following:

hived --replay-blockchain

If your hived data directory is not the default ~/.hived, specify it in the above command with -d /path/to/data/dir.

For HAF nodes, this process is split into 5 stages:

  1. Normal replay. hived reads the blockchain and rebuilds chain state.
  2. Initial network sync, which fetches the latest blocks so far from the network.
  3. Indexes and foreign keys creation.
  4. Normal network sync to catch up after stage 3 above.
  5. Live sync, where your node is fully synced. You should see something like the first screenshot of this post above in your logs at this point.

This entire process took under 48 hours for me to replay 66.4M Hive blocks, ~5-10% faster than the previous version.

HAfAH

Setup required permissions

Run in psql console in block_log database for the setup scripts to run successfully:

GRANT CREATE ON DATABASE block_log TO haf_app_admin;

Installation

git clone https://gitlab.syncad.com/hive/HAfAH hafah
cd hafah
git submodule update --init
./run.sh setup-postgrest
./run.sh setup <server_port> postgresql://haf_app_admin:<password>@127.0.0.1:5432/block_log

Notice we are not cloning the submodules recursively as the haf repo without its submodules is the only dependency for the setup scripts.

Start HAfAH server

Replace <server_port> with the port you wish to use. I prefer to use the next one available, that is 8093.

./run.sh start <server_port> postgresql://haf_app_admin:<password>@127.0.0.1:5432/block_log

Test API call

To check that it works, try out an API call to the HAfAH server.

curl -s -H "Content-Type: application/json" --data '{"jsonrpc":"2.0", "method":"condenser_api.get_ops_in_block", "params":[60000000,false], "id":1}' http://localhost:<server_port> | jq

Hivemind

Valid as of v1.27.3 HAF-based Hivemind release.

Setup hivemind db

Run in psql shell in block_log database:

\c block_log
CREATE EXTENSION IF NOT EXISTS intarray;
CREATE ROLE hivemind_app WITH LOGIN PASSWORD 'hivemindpass' CREATEROLE INHERIT IN ROLE hive_applications_group;
GRANT CREATE ON DATABASE block_log TO hivemind_app;

Installation

python3 -m pip install --upgrade pip setuptools wheel
git clone --recurse-submodules https://gitlab.syncad.com/hive/hivemind.git
cd hivemind
python3 -m pip install --no-cache-dir --verbose --user . 2>&1 | tee pip_install.log

Configure

At the end of the ~/.bashrc file, define the env vars:

export DATABASE_URL=postgresql://hivemind_app:hivemindpass@localhost:5432/block_log
export HTTP_SERVER_PORT=8092

Where hivemind_app and hivemindpass are username and passwords set during role creation above.

Restore dump

Update (4 Jan 2023): As of now there is no HAF Hivemind dump available, so this might not relevant as you will have to sync it from scratch.

Download a copy of the dump here, and restore it:

pg_restore -j $(nproc) -d hive hivemind.dump

Run the upgrade script after restoring:

cd hive/db/sql_scripts
./db_upgrade.sh <username> hive

Sync the db

In screen or tmux session:

hive sync

In my single drive ZFS pool, the disk I/O is the bottleneck for 90% of the time and it took me over 2 hours to sync 6 days of blocks and left it running overnight to build the indexes. hived struggled a bit but still managed to not go out of sync.

Start server

In screen or tmux session:

hive server

Reverse proxy

At this point, you should have the required pieces to serve a public Hive node. To put them together into a single endpoint, we need a reverse proxy such as jussi, rpc-proxy or drone.

If your node is on residential internet behind CGNAT, setup the VPN and forward the required ports (hived and HAF app servers if running reverse proxy on VPN server, reverse proxy port otherwise) using iptables rules, as such:

iptables -t nat -A PREROUTING -p tcp --dport <port> -j DNAT --to-destination 10.8.0.2:<port>

Replace <port> with the port to be forwarded, and 10.8.0.2 with the actual Wireguard peer to forward the port from. Repeat this command for every port to be forwarded.

You may also have to setup the ufw rules to allow traffic to/from the Wireguard network interface:

sudo ufw route allow in on wg0 out on eth0
sudo ufw route allow in on eth0 out on wg0

Replace eth0 with the host network interface and wg0 with the Wireguard network interface (of the peer on the proxy server).

For those who need a jussi config, you can find them here. Replace 10.8.0.2 with the actual URL to the respective services.

The easiest way to run jussi is through docker:

git clone https://gitlab.syncad.com/hive/jussi
cd jussi
docker build -t="$USER/jussi:$(git rev-parse --abbrev-ref HEAD)" .
docker run -itp 8094:8080 -d --name jussi --env JUSSI_UPSTREAM_CONFIG_FILE=/app/config.json -v /home/username/jussi/jussi_config.json:/app/config.json "username/jussi:master"

Replace /home/username/jussi/jussi_config.json with path to jussi config file.

At this point, you should now have a Hive API node accessible at port 8094 which can be put behind nginx and be accessible by the public.

Conclusion

As of now, it is good to see that running such Hive node is totally possible with mainstream hardware (e.g. my barebones B350 motherboard) today. The disk requirements may look very demanding on its own, but it is nothing compared to the node requirements of popular blockchains such as BSC or SQLana.

Even though we now store the data as compressed format, the actual disk usage after compression remains the same at 1.9TB total compared to the previous node setup. Details below.

As for the RAM usage, hived have consumed less memory than previously as seen in the smaller size of shared_memory.bin file that is 2GB less.

Server resource statistics

These stats are as of 29 July 2022. For the latest stats, please refer to the latest witness update logs.

hived (v1.25, all plugins, as of block 66365717)

block_log file size: 620 GB
shared_memory.bin file size: 22 GB
account-history-rocksdb-storage folder size: 755 GB
blockchain folder size: 1,398 GB

hived (v1.26, d821f1aa, all plugins)

block_log file size (compressed): 317 GB
block_log.artifacts file size: 1.5 GB
shared_memory.bin file size: 20 GB
blockchain folder size: 338 GB

HAF db

Output of SELECT pg_size_pretty( pg_database_size('block_log') );
Database size: 2.9 TB
Compressed: 1.2 TB

hivemind (v1.25.3, a68d8dd4)

Output of SELECT pg_size_pretty( pg_database_size('hive') );
Database size: 528 GB
Compressed: 204 GB
RAM usage: 4.2 GB


Changelog

10 Oct 2022: Added sample Postgresql config
17 Oct 2022: Added download link to compressed block_log, update RAM requirements, update Hivemind dump restore instructions
21 Oct 2022: Added Postgresql authentication configuration
24 Oct 2022: Remove redundant permissions as per here.
26 Oct 2022: Update hive.db_head_state troubleshooting for Hivemind v1.26.1
3 Nov 2022: Remove references of moving shared_memory.bin back into disk.
2 Jan 2023: Update hived compile to use ninja instead of make.
4 Jan 2023: Update Hivemind setup to be HAF-based.
5 Jan 2023: Include instructions for running Python HAfAH in virtual environment.
7 Jan 2023: Added PostgREST HAfAH setup instructions.
25 Jan 2023: Reorder HAfAH setup instructions such that PostgREST is the first.
24 Dec 2023: Added note on the new Docker Compose deployment, removed Python HAfAH setup instructions and replaced OpenVPN guide link with Wireguard install script link.


Hive witness footer 2.png

Sort:  

I have an easier time learning Greek but I know I need to set something like this up eventually. Great detailed guide.

The big whales should have upvoted this excellent guide.

I guess I didn't have the popularity for them to notice, but this is still my highest paid out post since HF23.

Make them notice this quality, meanwhile there's a ton of shit posts worth hundreds of dollars.

Thank you for your witness vote!
Have a !BEER on me!
To Opt-Out of my witness beer program just comment STOP below


Hey @techcoderx, here is a little bit of BEER from @isnochys for you. Enjoy it!

Do you want to win SOME BEER together with your friends and draw the BEERKING.

Thank you for your witness vote!
Have a !BEER on me!
To Opt-Out of my witness beer program just comment STOP below


Hey @techcoderx, here is a little bit of BEER from @isnochys for you. Enjoy it!

Learn how to earn FREE BEER each day by staking your BEER.


The rewards earned on this comment will go directly to the people sharing the post on Twitter as long as they are registered with @poshtoken. Sign up at https://hiveposh.com.