Single-Machine Deployment

StreamingFast Firehose NEAR Setup

In this document, we are going to showcase how you can launch a Firehose on NEAR instance on a single machine that will serve everything. Firehose installation is accomplished through a few fairly simple tasks including obtaining specific binaries and some configuration steps.

Running on a single machine is quick and easy and can work fairly good if you are to use it for your own needs. For production grade set up, especially those shared across many users, we highly recommend splitting each component of Firehose in their own container with a shared storage for files access properly set up between them. This enables horizontal scaling with fine control over which component should be scaled out.

To bootstrap our instance, we are going to start Firehose on NEAR from a snapshot provided by the NEAR foundation. This will make our Firehose installation serves fairly recent blocks from the NEAR network. A caveat of using a snapshot like this is that historical blocks, e.g. blocks that were produced before the snapshot was taken, will not be available. To get access to historical blocks, look at Backfill historical blocks section.

It's important to understand here that we are going to run a NEAR full node (a.k.a NEAR RPC node). Operation blockchain's full node is a complex task and requires access to powerful disk(s) and powerful machine. The firehose-near binary is going to launch near-firehose-indexer which is a thin wrapper around neard that outputs Firehose Instrumentation Logs for NEAR. The near-firehose-indexer process acts just like neard would, synchronizing with the network. How to properly and efficiently operate a NEAR full node is not the responsibility of Firehose. When you have problem syncing or slow ingestion rate, you should first look at the Full Node official documentation and seek help with neard in mind.

Requirements

This tutorial have been tested on a Ubuntu 22.04 machine, and you will need to compile near-firehose-indexer manually.

Hardware requirements should follow NEAR full node requirements found at https://near-nodes.io/rpc/hardware-rpc. Firehose requires also extra disk space for Firehose NEAR blocks produced, indexes file for filtered blocks stream and for Substreams, if enabled. The actual space usage is hard to give exactly, specially for Substreams which is highly dependent on usage. Firehose NEAR Mainnet blocks weight ~600 GiB, this is in addition to space taken by NEAR node itself, so a minimum of 2 TiB is recommended.

Installation

firenear

First install the firenear binary:

# Use correct binary for your platform, `linux_x86_64` in the command below can be replaced by `darwin_x86_64` or `darwin_arm64`
TAG=$(curl -s https://api.github.com/repos/streamingfast/firehose-near/releases/latest | grep "tag_name" | cut -f 4 -d '"')
LINK=$(curl -s https://api.github.com/repos/streamingfast/firehose-near/releases/latest | grep -Eo "https://.*linux_x86_64.tar.gz")
curl -L $LINK  | tar zxf -

# Copy result to be available system-wide
cp firenear "/usr/local/bin/firenear-$TAG"
ln -fs /usr/local/bin/firenear-$TAG /usr/local/bin/firenear

The command above will download the latest firehose-near tarball, extracts it in the current folder and copy over /usr/local/bin as well as creating a symlink for versioning.

And validate that everything is working as expected:

It should print:

Firehose Instrumented Node Binary

Second step is to have the Firehose instrumented node binary. In the case of NEAR, we are going to get our hand on near-firehose-indexer. This binary is actually a NEAR Indexer binary and essentially, it's neard configured to index the block and transactions as they are synced by the node and emit Firehose logs out of it.

To avoid any compatibility issues, we are going to compile the binary directly on the machine that will execute the binary.

It's an important that you pick the current active version to sync with the network, the NEAR latest stable releases page lists the most recent version that is needed to sync with Mainnet. As new versions of neard are published, new versions of near-firehose-indexer will be made available by us, so be sure to subscribe to near-firehose-indexer releases to be informed when a new release is out.

Install required build dependencies:

Install and configure rustup:

Follow the instructions and don't forget to run source "$HOME/.cargo/env" at the end to make the binaries available.

Now let's clone near-firehose-indexer and checkout the correct version. In this tutorial we are going to use 1.30.1-fire because it's the latest version at time of writing, be sure to use the correct version (latest tag can be found with curl -s https://api.github.com/repos/streamingfast/near-firehose-indexer/releases/latest | grep tag_name):

Then let's compile it:

This will take several minutes to complete depending on your machine size. And to terminate, let's copy the binary somewhere it's going to be available:

Finally, let's verify that it worked correctly:

It should print:

If you see 1.27.0 printed while you downloaded 1.30.1, it's just a mistake of this release where the version was not updated properly.

Snapshot

We are now going to download a NEAR Mainnet snapshot to our local disk. Instructions for download NEAR snapshot for Mainnet or Testnet are given at https://near-nodes.io/intro/node-data-snapshots. We are going to use s5cmd because it improves download speed a lot. We are going to quickly give the instructions but please refer to https://near-nodes.io/intro/node-data-snapshots for further details.

The NEAR snapshot should be put under a folder named data within the NEAR home directory. In our instructions, our NEAR home directory will be at /data/node so we are going to transfer all NEAR snapshot data under /data/node/data. Feel free to adjust those paths to your own setup.

Running

Now that we have our NEAR full node snapshot at /data/node/data, we need to setup the required configuration files that are needed to run the node. Essentially, we are following https://near-nodes.io/rpc/run-rpc-node-without-nearup#mainnet instructions, please refer to there for further details about some element.

The files that are needed:

  • config.json

  • genesis.json

  • node_key.json

Those files should be put in the NEAR home directory, which is at /data/node in our case. Those files are provided by NEAR directly:

The config.json already comes with pre-defined boot nodes as well as the tracked shards that we are interested in. Remember that those files are coming from NEAR directly, so please refer to their documentation for further details about the meaning of those config values.

We will now generate a unique node_key.json for this node. For this, we are going to use firenear tools generate-node-key. In the https://near-nodes.io/rpc/run-rpc-node-without-nearup#mainnet instructions, the node_key.json file is generated by doing neard init which essentially simply generates an ED25519 Public/Secret key pair and serialize it to a JSON file. We provide firenear tools generate-node-key as a convenience to avoid downloading yet another binary.

Now, we are going to sanity check that everything is all good by running near-firehose-indexer. This sanity check will also enable us to see at what block the snapshot is currently syncing, which is required later when we will start firenear binary.

If everything works properly, you should see something like:

As soon as you see some output, you can do Ctrl-C as we are going to restart everything now but through the firenear directly, which will launch near-firehose-indexer as a sub-process and manages it.

Prior continuing however, find the first line of the form INFO stats: #84349610 Waiting for peers and note the block number that you see, in our case it's 84349610. We will use this value soon to compute for firehose-near the first streamable block of our setup.

We will now create a config file for firehose-near and explain within the file itself some of the configuration value. Let's create a file /data/firehose.yaml with the following content:

The configuration file can actually by all turned into flags passed to the binary directly if you prefer. The args should be joined together with , and passed to firenear start directly while all the configuration value should be prefixed with -- and pass as flag.

Let's now start the firenear stack:

If everything is working, you should see logs like this:

Logs that have the date format Feb 03 02:54:00.930 are coming from NEAR node directly and not from firenear. The logs with date format 2023-02-03T02:54:12.537Z are those from firenear. You will see a bunch of logs like Feb 03 02:56:41.062 INFO network: Error connecting to addr=142.132.150.14:24567 err=Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }, those are simply stating that neard tried to connect to a remote node but the connection was refused.

Logs like 2023-02-03T02:56:40.594Z INFO (merger) reading from blocks store: file does not (yet?) exist, retrying in {"filename": "/data/storage/merged-blocks/0084350700.dbin.zst", "base_filename": "0084350700", "retry_delay": "4s"} are also normal, indeed, we have not yet produced this file because the node is still catching up.

Now, the node is waiting for peers to download the missing block header and to be able to replay the blocks. Once synchronization starts properly, you will see files being produced in /data/storage/one-blocks and in /data/storage/merged-blocks. It can take many minutes (and even dozen of minutes) before you are able to connect to good peers that will provide good data to you, this is something outside of Firehose controls, read NEAR documentation to try to improve P2P to your node.

If you have been stuck a long time on Feb 03 02:54:02.982 INFO stats: #84349610 Waiting for peers 2 peers ⬇ 12.8 kB/s ⬆ 1.58 kB/s 0.00 bps 0 gas/s CPU: 112%, Mem: 497 MB, you may want to Ctrl-C and start again, sometimes it help getting better peers.

Once peering is good, there is still some wait time to download missing headers and state, this depends on how old is the snapshot you started from. You should see logs like:

Which gives some information about completion rate. Now it's time to wait, you can monitor /data/storage/merged-blocks/ folder and wait until a least one merged bundle is produced, you can follow to next steps once you have one. This step can be quite long depending on the peering and the machine used as well as external network factors.

Operator Notes

NEAR works with a fast replay mode when the node is too far away from the canonical block of the network, this happen if your node is more than 1 (or 2 epochs, it's not 100% clear) away from the rest of the network, the node is going to not process blocks in between and instead "jump" to recent block by download some state snapshot. Firehose cannot work if blocks are missing, so it NEAR node needs to continuously synchronize with the network for Firehose to generate block properly and never create hole. If your node is down for tool long, you will need to fill the whole somehow, see backfilling below for details.

Verifying

To verify that everything is good, we are going to install grpcurl, a curl like command line tool but for gRPC protocol:

Let's now perform a Firehose stream blocks request:

If you see some output, everything is working normally, your instance is working as expected. It will now continue to synchronize with the network.

If instead you see something like Failed to dial target host "localhost:9000": dial tcp 127.0.0.1:9000: connect: connection refused, it means firenear has not produced a single block yet, wait until blocks are present in /data/storage/merged-blocks and then try again.

Backfill

Now that you are synchronizing blocks live with the network, you need to backfill blocks that you did not process so far. This can be achieved in two ways:

  • Download existing blocks from a trusted provider

  • Configure an archive node instead of a full node, and replay blocks you are missing

Download existing blocks

You can reach to us on Discord to discuss exchanging our NEAR blocks. Note that you will need to pay the egress cost associated with the transfer.

Archive node

Now that you know how to sync Firehose for NEAR, you can repeat a similar procedure as the tutorial but with an archive node instead. Archive node are able to replay all block from genesis, so when you start with an archive node, you can specify the start block that you desire.

Docker Images

Docker images are available and come in two flavor. One that only contains firenear and another one that we called bundled image that contains firenear as well as the near-firehose-indexer, which is essentially neard codebase wrapped in NEAR indexer framework.

Both kind of image are pushed to repository ghcr.io/streamingfast/firehose-near, the image's tag can be used to determine which version it is:

  • Image containing only firenear contains a single tag like ghcr.io/streamingfast/firehose-near:v1.0.0

  • Image containing the bundle firenear and near-firehose-indexer contains two tags separated by a dash - character like ghcr.io/streamingfast/firehose-near:v1.0.0-1.30.1-fire which essentially means that firehose-near version v1.0.0 is bundled with near-firehose-indexer version 1.30.1-fire.

Last updated