Skip to content

Latest commit

 

History

History
261 lines (180 loc) · 12.1 KB

readme.md

File metadata and controls

261 lines (180 loc) · 12.1 KB

Introduction

This guide aims to demonstrate the setup of multiple post services using go-spacemesh. It assumes familiarity with the setup and employs a standalone network for easy illustration.

Using a similar setup for a production environment is feasible, but that's not the purpose of this guide.

We will create a simple network topology for the post services and then construct a directed acyclic graph (DAG) to manage the post services.

1:n post service guide with a standalone network

Prerequisites

  1. A Linux or MacOS system. This guide has not been tested on Windows.
  2. go-spacemesh version 1.4.0-alpha.3 or later, unzipped and located in the ./go-spacemesh directory beside the config.json file. This file is the configuration for the node in the standalone setup and is tailored for this demonstration.
  3. postcli version 0.7.1 or later in the ./postcli directory.
  4. dagu version 1.12.9 or later available in the ./dagu directory.

Standalone network

We will operate a network with epochs lasting 15 minutes, layers that last for 1 minute, with a poet cycle gap of 5 minutes, and a phase shift of 10 minutes.

Please note that in standalone mode, the poet works within the same process as the node, which might lead to occasional 100% usage of one CPU core.

The only parameter not pre-set in the config is the genesis time. Ideally, set this time a little into the future to allow for preparation of the post services for the first epoch. For this guide, we will set it as 2024-03-08T14:30:00Z

The setup

Dagu setup

As we touched on in the introduction, we will use DAGU to coordinate the post proving processes.

  1. Start dagu with the following command while in the dagu directory:
./dagu server -d dags

By default, the Dagu UI will be accessible at http://localhost:8080

Post initialization and node setup

  1. Start the node using the following command:
./go-spacemesh -c config.json --preset=standalone --genesis-time=2024-03-08T14:30:00Z --grpc-json-listener 127.0.0.1:10095 -d ../node_data | tee -a node.log
  1. Go to DAGU and execute the init DAG. It will set up the post data with postcli, retrieve the goldenATX from the node API, and copy the identity.key from each post data folder to the node data directory.
  2. Confirm the initialization was successful by checking that the DAGU status for that DAG is finished.
  3. Stop the node. This is necessary because the node does not reload keys from the disk by design.
  4. Remove the local.key file from the node data directory. The setup of multiple post services requires the deletion of the node's local.key file. The node will not start if this file is present.
  5. Restart the node with the same command as in step 1 and keep it running.

Setup description

In the init.yaml DAG, you'll notice we have set up 10 post directories. We've chosen this naming scheme to resemble a realistic scenario.

post/diskA_post1
├── identity.key
├── postdata_0.bin
└── postdata_metadata.json
... (similar structure for other post directories)

Imagine each disk* as a separate physical disk and each _post* as a separate post service operating on that disk.

We aim for a configuration where no more than one post service is proven at a time per disk and, at the same time, we are comfortable running all necessary post proving processes on different disks.

The DAG is a straightforward orchestration of post proving processes. It will execute post proving processes in the order we want, based on the dependencies we define.

Warning

This setup is meant for demonstration purposes only. There are other ways to achieve similar results, and this particular setup should not be employed in a production environment.

Starting the post services

You can now go to DAGU and run the proving DAG. This will begin the initial post proving processes for each post service.

Upon successful execution, you'll see the resulting DAG as such: Proving DAG

The visualization clearly displays the dependencies and the sequence of the post proving processes.

Each post proving process represents an individual post service that is started when needed and stopped after fulfilling its purpose.

This demo is equipped with a wait_for_cg DAG that essentially waits for the poet's cycle gap to open (in a very naive manner; definitely not intended for production use) and then automatically triggers the proving DAG. The wait_for_cg DAG functions only when the command ./dagu scheduler -d ./dags runs alongside dagu server -d ./dags. If you prefer not to run the scheduler, you can safely manually run the wait_for_cg DAG, or run the proving DAG when the cycle gap is available.

If you keep the wait_for_cg DAG running (and dagu scheduler) the post services will continue to prove every epoch.

API and more

On the node side on grpc-post-listener you will find an additional method: spacemesh.v1.PostInfoService.PostStates which returns the state of the post services.

Wherever the service is IDLE it means that the post service is not needed by the node at the moment. The PROVING state means that the NODE expects post service to be proving. Which means:

  • Whenever the state is IDLE you can shut down the given post service, the node does not need it anymore or yet. Nothing bad will happen if the post service remains connected to the node, but it's a waste of resources.
  • When the state is PROVING you should run the post service. However, it's important to understand that you're free to orchestrate the post services as you wish. The moment you connect the post service when the node expects it to be PROVING the post service will start the proving process (with some small delay because of API calls etc).

Sample output:

grpcurl --plaintext localhost:10094 spacemesh.v1.PostInfoService.PostStates
{
  "states": [
    {
      "id": "rHQgcnqCBZE/gzjqqIpOtwVfYvrhkqP0toL4NcpDMzY=",
      "state": "IDLE",
      "name": "diskC_post3.key"
    },
    {
      "id": "nZW66vTRVDDD0CCChSSvNWWC7GArbFZPYc9Mm1EVwh4=",
      "state": "PROVING",
      "name": "diskA_post1.key"
    },
    {
      "id": "MkID511oMCESXfJSPiNNXdwUFn3xieAM8/fMKZJlXuY=",
      "state": "PROVING",
      "name": "diskB_post1.key"
    },
    {
      "id": "f6Y9rsTegmzFw1uu5cxwN62gzw0iurPhZvZVlHBRl+8=",
      "state": "PROVING",
      "name": "diskC_post1.key"
    },
    {
      "id": "8i7+8B3NoZ9pblqW4va6s9JPbupkJ03sA22Wi9eFJMY=",
      "state": "IDLE",
      "name": "diskC_post2.key"
    },
    {
      "id": "yPnJ+19oeqkBie7pD/flLxAYjwcPsjJwq4plDp/BaBs=",
      "state": "IDLE",
      "name": "diskB_post2.key"
    },
    {
      "id": "thyo/dndJANWWiVX6t9XrTrZb4D2KVUiconDkJOvsAc=",
      "state": "PROVING",
      "name": "diskA_post2.key"
    },
    {
      "id": "YQsPFR/rQD37omE9qXDAEQVrRBNRoSFgbMhnKFOF9NQ=",
      "state": "IDLE",
      "name": "diskD_post1.key"
    },
    {
      "id": "7kBK/uJcLUO8JnYTzoxBTYIhKksgep1GtQrTHGX2AxQ=",
      "state": "IDLE",
      "name": "diskE_post1.key"
    },
    {
      "id": "OMcoQmeBVpWqBgzW0UQPno377ymvunqmoVnpZdtoOCA=",
      "state": "IDLE",
      "name": "diskB_post3.key"
    }
  ]
}

You can also see in Events: grpcurl -plaintext localhost:10093 spacemesh.v1.AdminService.EventsStream

{
  "timestamp": "2024-03-11T15:48:16.759101Z",
  "help": "Node finished PoST execution using PoET challenge.",
  "postComplete": {
    "challenge": "zsw7v26gmJMUqfUpPUEAxAsDPO0cHtbYSnV8iAX2lBA=",
    "smesher": "nZW66vTRVDDD0CCChSSvNWWC7GArbFZPYc9Mm1EVwh4="
  }
}

As you can see smesher here points to the id behind the post service that was used to prove the PoST.

Please note that each of the post services exposes its own API (--operator-address) which can be used to see the state of post-service itself:

# Not doing anything
❯ curl http://localhost:50051/status
"Idle"

# Proving
❯ curl http://localhost:50051/status
{"Proving":{"nonces":{"start":0,"end":128},"position":0}}

# Proving, read some data already
❯ curl http://localhost:50051/status
{"Proving":{"nonces":{"start":0,"end":128},"position":10000}}

# Started second pass
❯ curl http://localhost:50051/status
{"Proving":{"nonces":{"start":128,"end":256},"position":10000}}

# Finished proving, but the node has not fetched the proof yet
❯ curl http://localhost:50051/status
"DoneProving"

How this demo differs from an actual production setup

Dagu

While Dagu is small and easy to use, it might not be the best choice for a production environment. This really depends on your needs. Larger systems like Apache Airflow or Prefect have more features and integrations. But, they are also more complex and need more resources and expertise to manage.

You might also succeed with different type of tools like n8n, depending on your setup.

Scripts in the ./scripts folder

The scripts provided here are designed to simplify the demo.

In a Docker-based system, you might replace the ./scripts/run_service.sh script with a docker run ... command, and ./scripts/stop_service.sh with docker stop ....

For setups across multiple hosts, like Kubernetes, you could schedule pods to handle the post services.

No mTLS Setup

This demonstration presumes that everything operates on a single machine. In scenarios where the node exists within a different network (or for heightened security measures), the implementation of mutual TLS (mTLS) is recommended.

For configurations that utilize mTLS, one must set the --address on the post service to correspond with the address on which the grpc-tls-listener is listening.

Troubleshooting

Dagu Server Fails to Start

Problem : Starting the Dagu server results in an error, or it starts but is not accessible.

Cause : Common causes include port conflicts, missing dags directory, or insufficient permissions.

Solution : Check if another service is using port 8080 and either stop that service or configure Dagu to use a different port. Ensure the dags directory exists within the ./dagu directory and that you have the necessary permissions to execute ./dagu and access the directory.

Node Does Not Recognize Post Services

Problem : After restarting the node, it does not recognize the initialized post services.

Cause : This issue typically arises if the identity.key files are not copied correctly to the node data directory or if the local.key file was not removed.

Solution : Ensure that the identity.key files from each post data folder are correctly copied to the node data directory. Remove the local.key file from the node data directory if it still exists.

High CPU Usage

Problem : Experiencing 100% CPU usage by the node or poet service.

Cause : This is expected behavior in standalone mode due to the poet service running within the same process as the node.

Solution : While this is normal for demonstration purposes, the production setup should be prepared and observed carefully.

Post Services Do Not Start Proving

Problem : Post services are set up and the node is running, but the post services do not start the proving process.

Cause : This could be due to incorrect scheduling of the proving DAG, issues with the wait_for_cg DAG, or the node's grpc-post-listener not being properly configured.

Solution : Verify that the wait_for_cg DAG and the scheduler are correctly set up and running. Check the node configuration to ensure the grpc-post-listener is correctly set and accessible by the post services.

Connection Issues Between Node and Post Services

Problem : The node and post services experience connection issues or fail to communicate.

Cause : Network configuration issues, firewall restrictions, or incorrect address/port settings in the configuration files.

Solution : Ensure network configurations allow for communication between the node and post services. Check firewall settings to allow traffic on necessary ports. Verify that the addresses and ports in the configuration files match and are correct.