ComputeHorde (Subnet 12 of Bittensor)

ComputeHorde is a specialized subnet within the Bittensor network designed to supercharge Bittensor with scalable and trusted GPU computing power.

By transforming untrusted GPUs provided by miners into trusted compute resources, ComputeHorde enables validators of other subnets to access large amounts of decentralized computing power cost-effectively, paving the way for Bittensor to scale beyond its current limitations to support potentially over 1,000 subnets.

Key Features

Decentralized Compute for Bittensor Validators
ComputeHorde aims to become the go-to decentralized source for hardware needed to validate other subnets. The mission is to decrease the Bittensor ecosystem's dependency on centralized services. The assurance of the miners' work quality is essential for the Bittensor's overall reliability.
Fair and Verified Work
ComputeHorde employs mechanisms to ensure miners provide authentic compute work, fairly verified by the validators:
- Execute tasks from validators stake-proportionally
- Handle both organic (external, from other subnets) and synthetic (ComputeHorde miners validation) tasks.
- Match jobs to the advertised hardware (e.g., ensuring A6000 GPUs are used for tasks requiring them).
- Prevent malicious behaviors like "weight-copying" through innovative validation mechanisms.
Scalable Mining with Executors
Each miner in ComputeHorde can spawn multiple executors, performing individual compute tasks. This removes the 256 miner (UID) limit and significantly scales the potentially available computing power.
Hardware Classes
ComputeHorde introduces hardware classes to create a free market for GPU resources, balancing cost-effectiveness with performance. Currently, A6000 is the supported class, with A100 coming next. The end goal is to eventually support all GPU types/configurations required by validators across Bittensor subnets.

Bittensor context

Bittensor is a decentralized network designed to ensure that AI, the most critical technology of our era, remains accessible to everyone and free from the control of centralized entities. Each Bittensor subnet specializes in a digital commodity, ranging from storage and large language models to general computing.

This is achieved by distributing $TAO tokens to incentivize:

Subnet owners to define the most useful and reliable commodities (by designing the incentive mechanism),
Miners to deliver high-quality and efficient services innovatively,
Validators to reward miners based on their performance.

Bittensor's end goal is to create an unstoppable, self-sustaining ecosystem free from single-point control, enabling innovation and resilience for the entire network. ComputeHorde adds GPU-powered validation to this ecosystem, helping other subnets operate effectively without relying on centralized cloud services.

Scoring Mechanism (being reworked currently)

The scoring mechanism in ComputeHorde is designed to incentivize miners to perform organic jobs while maintaining accountability and fairness in the network.

The goal is to eliminate the current disincentive where miners avoid organic jobs to prevent penalties for rejecting synthetic jobs.

Formula (calculated per validator):

1 point for each successfully completed synthetic job.
(in development) 1 point for each successfully completed organic job.
(in development) 1 point for each properly rejected synthetic job.

A successfully completed job is one that finishes within a specified timeout.

A synthetic job is considered properly rejected when the miner provides a receipt proving they are currently occupied with an organic job from another validator (with a minimum of 50k stake).

Dancing Bonus

Miners who implement dancing—moving their executors between different UIDs—receive a 30% bonus (as of December 2024) to their scores.

This encourages variance, which is essential for preventing weight-copying.

Hardware Classes and Configurable Weights

Each hardware class supported by ComputeHorde has a configurable weight parameter. These weights determine the relative contribution of a miner's work to their ultimate score.

This system allows the network to prioritize specific hardware classes based on utility and demand, creating a flexible and fair reward structure.

Components

Facilitator

Acts as a gateway for organic requests (from other subnets’ validators) to enter ComputeHorde.
Sends tasks to chosen validators, who then distribute them to miners.

Validator

Receives organic requests via the Facilitator or generates synthetic tasks for validation.
Distributes both kinds of tasks to miners and evaluates the results:
Uses a separate GPU, called a Trusted Miner, to pre-run part of the validation tasks and establish expected results. The Trusted Miner shares the same code as a regular miner, but is configured differently:
- It is not registered in the metagraph.
- It only accepts tasks from the associated validator.
See validator's README for more details

Miner

Accepts job requests from validators.
Manages executors to perform tasks and sends results back to validators.
See miner's README for more details

Executor

An instance spawned by a miner to perform a single dockerized task.
Operates in a restricted environment, with limited network access necessary for:
- communicating with miners,
- downloading docker images,
- handling job data.
Executors form a horde of a miner and are assigned hardware classes.
See executor's README for more details

Innovations to Highlight

Discouraging Weight-Copying

Commit-Reveal: Validators post hidden weights and reveal them in the next epoch, making the copying of current weights impossible.
Executor Dancing: Miners randomly move GPUs across multiple UIDs, further reducing the effectiveness of copying old weights.

Encouraging Actual Mining

Synthetic tasks are designed to run only on specific hardware (e.g., A6000 GPUs), ensuring miners deliver the advertised compute power.
Scoring system incentivizing for completing organic tasks.

Development goals

Bring organic jobs from other subnets' validators
Allow the free market to regulate demand and prioritize cost-effective hardware, rather than solely focusing on the strongest hardware.
Strengthen Security
Introduce rules and safeguards to prevent malicious actors from exploiting the network, ensuring a fair and secure environment for all participants.
Support Long-Running Jobs
Implement accounting mechanisms for miners to be rewarded proportionally to the time spent, even for incomplete long-running tasks.
Expand Hardware Support
Add support for all GPU classes required by other Bittensor subnets.
Fair Resource Sharing
Allocate resources based on validators' stakes while allowing low-stake validators access when demand is low.

Info pack

ComputeHorde mainnet UID: 12
ComputeHorde testnet UID: 174
ComputeHorde channel within Bittensor discord
Information dashboards:

Running ComputeHorde components

This repository contains the implementations of:

Validator: Requires a Trusted Miner for cross-checking synthetic tasks.
Miner: Modifying the miner code on subnet 12 is discouraged, as the stock implementation manages only communications between components. The competitive edge lies in optimizing executor provisioning. Users can create custom executor managers to scale and optimize mining efficiency. The default executor manager runs a single executor and is not intended for mainnet use.

In the following sections, you can find instructions on running Validator and Miner. There are more details in each component's README and in the Troubleshooting section below.

Modifications to ComputeHorde components are generally not recommended, with the exception of the ExecutorManager class. Customizing this class allows you to implement dedicated logic for handling executors, such as running multiple executors per miner.

Validator

ComputeHorde validator is built out of three components

trusted miner (requires A6000 - the only GPU supported now) for cross-validation
two S3 buckets for sharing LLM data (lots of small text files)
validator machine (standard, non-GPU) - for regular validating & weight-setting

The steps, performed by running installation scripts on your local machine, which has your wallet files. For clarity, these installation scripts are not run on the machine that will become the trusted miner or the validator, the scripts will connect through SSH to those machines from your local machine:

Validator setup

Upgrading already existing deployment

Prepare a trusted miner and S3 buckets (find out how using the links above). Then, set the environment variables directly in the .env file of your validator instance and restart your validator:

$ docker compose down --remove-orphans && docker compose up -d

Deploying from scratch

Set the following environment variables in a terminal on your local machine (on the machine where you have your wallet files):

export TRUSTED_MINER_ADDRESS=...
export TRUSTED_MINER_PORT=...

export S3_BUCKET_NAME_PROMPTS=...
export S3_BUCKET_NAME_ANSWERS=...

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=...

Note: The AWS_DEFAULT_REGION property is optional. Use it when your buckets are not in your default AWS region.

Export AWS_ENDPOINT_URL to use another cloud object storage (s3-compatible) provider. If not given, AWS S3 will be used.

Then execute the following command from the same terminal session:

curl -sSfL https://github.com/backend-developers-ltd/ComputeHorde/raw/master/install_validator.sh | bash -s - SSH_DESTINATION HOTKEY_PATH

Replace:

SSH_DESTINATION with your server's connection info (i.e. username@1.2.3.4)
HOTKEY_PATH with the path of your hotkey (i.e. ~/.bittensor/wallets/my-wallet/hotkeys/my-hotkey)

This script installs the necessary tools in the server, copies the public keys, and starts the validator with the corresponding runner and the default config.

If you want to change the default config, see Validator runner README for details.

If you want to trigger jobs from the validator see Validator README for details.

If anything seems wrong, check the troubleshooting section.

Miner

To quickly start a miner, create an Ubuntu Server and execute the following command from your local machine (where you have your wallet files).

curl -sSfL https://github.com/backend-developers-ltd/ComputeHorde/raw/master/install_miner.sh | bash -s - production SSH_DESTINATION HOTKEY_PATH

Replace SSH_DESTINATION with your server's connection info (i.e. username@1.2.3.4) and HOTKEY_PATH with the path of your hotkey (i.e. ~/.bittensor/wallets/my-wallet/hotkeys/my-hotkey). This script installs necessary tools in the server, copies the keys, and starts the miner with the corresponding runner and default config.

If you want to change the default config, see Miner runner README for details.

Checking that your miner works properly

Check if your miner is reachable from a machine different from the miner: curl {ADDRESS}:{PORT}/admin/login/ -i. Both PORT and ADDRESS can be obtained from the metagraph. If everything is ok the first line should read HTTP/1.1 200 OK. By default, the address is automatically determined by bittensor lib, but you can input your own in .env
Check if you're getting any jobs and what the outcomes are. An admin panel for that is coming but for now you achieve that with docker-compose exec miner-runner docker-compose exec db psql postgres -U postgres -c 'select * from miner_acceptedjob order by id desc;

Migrating servers

If you need to move your miner or validator to a new server, see the migration guide.

Troubleshooting

How to dump the logs

The ComputeHorde software starts several Docker containers, with layout differing slightly between the miner and the validator.

Miner logs

The most relevant logs are from the container with a name ending in app-1.

SSH into the miner machine.
Run docker ps to find the name of the appropriate container (e.g., compute_horde_miner-app-1).
Run docker logs CONTAINER_NAME.

Validator logs

SSH into the validator machine.
Navigate to the directory where the docker compose yaml file is located.
Run docker compose logs.

How to restart the services

To perform a hard restart of all ComputeHorde Docker containers, run the following commands:

docker compose down --remove-orphans
docker compose up

Afterwards, use docker ps to verify that the containers have started successfully.

How to delete persistent volumes

To start fresh and remove all persistent data, follow these steps:

Stop the validator or miner (all running containers)
Run docker volume ls to list all existing volumes and identify the ones to delete. Key volumes to consider:
- Miner: miner_db_data, miner_redis_data
- Validator: validator_db, validator_redis, validator_static
Run the following command to remove all Docker volumes:
```
docker volume rm $(docker volume ls -q)
```
Start the validator or miner again

How to fix issues with installing `cuda-drivers`

Miner installation may occasionally fail with an error about the system being unable to install the cuda-drivers package. This issue is often caused by mismatched drivers already installed before running the installation script.

To resolve this:

Run the following command on the miner machine to purge any conflicting NVIDIA packages:
```
sudo apt-get purge -y '^nvidia-.*'
```
Re-run the install_miner.sh script from your local machine.

How to check if NVIDIA Drivers are working and the GPU is usable

To verify the health of the NVIDIA setup, run the following command on the miner machine:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

If the output indicates a problem (especially immediately after installation), a restart of the services may help.

How to list the contents of S3 buckets

To verify that the S3 buckets are configured correctly, you can list their contents by running the following command on a machine with the AWS CLI installed (check out the Amazon instructions). Replace the placeholders with the appropriate values:

AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... aws s3api list-objects --bucket BUCKET_NAME

The bucket names and required AWS credentials are stored in the validator’s .env file as:

S3_BUCKET_NAME_PROMPTS
S3_BUCKET_NAME_ANSWERS
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

If you encounter a permissions error, such as missing the s3:ListBucket permission, you may need to use the AWS root user credentials.

Name		Name	Last commit message	Last commit date
Latest commit History 1,575 Commits
.github/workflows		.github/workflows
compute_horde		compute_horde
docs		docs
executor		executor
miner		miner
scripts/e2e		scripts/e2e
tests/integration_tests		tests/integration_tests
validator		validator
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
ComputeHorde.svg		ComputeHorde.svg
LICENSE		LICENSE
README.md		README.md
install_miner.sh		install_miner.sh
install_validator.sh		install_validator.sh
noxfile.py		noxfile.py
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup-dev.sh		setup-dev.sh
uv.lock		uv.lock

License

backend-developers-ltd/ComputeHorde

Folders and files

Latest commit

History

Repository files navigation

ComputeHorde (Subnet 12 of Bittensor)

Key Features

Bittensor context

Scoring Mechanism (being reworked currently)

Formula (calculated per validator):

Dancing Bonus

Hardware Classes and Configurable Weights

Components

Facilitator

Validator

Miner

Executor

Innovations to Highlight

Discouraging Weight-Copying

Encouraging Actual Mining

Development goals

Info pack

Running ComputeHorde components

Validator

Validator setup

Upgrading already existing deployment

Deploying from scratch

Miner

Checking that your miner works properly

Migrating servers

Troubleshooting

How to dump the logs

Miner logs

Validator logs

How to restart the services

How to delete persistent volumes

How to fix issues with installing cuda-drivers

How to check if NVIDIA Drivers are working and the GPU is usable

How to list the contents of S3 buckets

About

Resources

License

Stars

Watchers

Forks

Releases 13

Packages 0

Contributors 19

Languages

How to fix issues with installing `cuda-drivers`

Packages