Hivetrain is dedicated to developing a decentralized model training platform on the Commune network. Our goal is to establish a scalable system that fosters collaboration among AI experts to build advanced multimodal models. We prioritize rewarding contributors for both their computational contributions and their innovations in model advancement.
Distributed Continual Pretraining represents a cutting-edge approach within our network aimed at perpetual training and finetuning of foundational models. This initiative seeks to democratize access to large-scale open-source language models, challenging the existing monopolies held by proprietary entities.
Our chief aim is to facilitate the training of a leading open-source large language model (LLM) through distributed methodologies. We are committed to democratizing access to training at the trillion-parameter scale, engaging a wide community of contributors.
- Run a Miner: Contribute by providing computational resources or refining the model through hyperparameter adjustments.
- Run a Validator: Validate and verify the work performed by miners.
- Propose Improvements: Suggest new architectures, training algorithms, or other enhancements.
Our architecture is structured into three main tiers, optimized for scalable and distributed training of large language models:
Miners are crucial, executing primary training tasks using Weight-Decomposed Low-Rank Adaptation (DoRA):
- Efficient Fine-Tuning: Leverages only about 5% of the original model parameters for effective ongoing pretraining.
- Distributed Processing: Each miner processes a segment of the total training data, facilitating extensive parallelism.
- Accessible Training: Ensures compatibility with moderately powered GPUs, expanding our base of potential contributors.
Process:
- Download training data and sync the current model state.
- Execute DoRA-based training on a GPU.
- Update and compute new weight matrices.
- Upload results to a Hugging Face repository for validation.
Validators maintain the training quality and integrity across the network by following these steps:
- Retrieve weights and the averaged model from Hugging Face.
- Evaluate the quality of updates against baseline metrics.
- Assign scores and rewards based on miner contributions and performance.
- Participate in initial weight averaging and submit results for further averaging.
The averager plays a pivotal role in integrating validated updates from validators into a comprehensive model update:
- Gather and average weights verified by validators.
- Apply advanced algorithms to integrate these updates seamlessly.
- Update the main model state and release the latest version on Hugging Face.
Note: Efforts are underway to decentralize the averaging process to enhance validation procedures. The averager's script is available for public review.
- Train Harder: Increase resources by utilizing additional GPUs and devices.
- Train Smarter: Employ sophisticated algorithms and adapt initial scripts to yield improved performance outcomes.
- Implementation of model parallelism.
- Expansion to fully multimodal capabilities.