31 Dec 21:16

rohithn1

Release v1.1.0 Latest

Latest

Release v1.1.0

What's Changed

New recipes

Added support for Llama 3.1 70b and Mixtral 22b 128 node pre-training.
Added support for Llama 3.3 fine-tuning with SFT and LoRA.
Added support for Llama 405b 32k sequence length QLoRA fine-tuning.

All new recipes are listed under "Model Support" section of README.

Assets 2

24 Dec 01:45

jessech-en

Release v1.0.1

Release v1.0.1

What's Changed

Bug fixes

Upgraded Transformers library in the enroot Slurm code path to support running Llama3.2 recipes with an enroot container

Hyperpod Enhancements

Added support for additional Hyperpod instance types including p5e and g6

Assets 2

07 Dec 00:52

jessech-en

Release v1.0.0

Release Notes - v1.0.0

We're thrilled to announce the initial release of sagemaker-hyperpod-recipes!

🎉 Features

Unified Job Submission: Submit training and fine-tuning workflows to SageMaker HyperPod or SageMaker training jobs using a single entry point
Flexible Configuration: Customize your training jobs with three types of configuration files:
- General Configuration (ex: recipes_collection/config.yaml)
- Cluster Configuration (ex: recipes_collection/cluster/slurm.yaml)
- Recipe Configuration (ex: recipes_collection/recipes/training/llama/hf_llama3_8b_seq16k_gpu_p5x16_pretrain.yaml)
Pre-defined LLM Recipes: Access a collection of ready-to-use recipes for training Large Language Models
Cluster Agnostic: Compatible with SageMaker HyperPod (with Slurm or Amazon EKS orchestrators) and SageMaker training jobs
Built on Nvidia NeMo Framework: Leverages the Nvidia NeMo Framework Launcher for efficient job management

🗂️ Repository Structure

main.py: Primary entry point for submitting training jobs
launcher_scripts/: Collection of commonly used scripts for LLM training
recipes_collection/: Pre-defined LLM recipes provided by developers

🔧 Key Components

General Configuration: Common settings like default parameters and environment variables
Cluster Configuration: Cluster-specific settings (e.g., volume, label for Kubernetes; job name for Slurm)
Recipe Configuration: Training job settings including model types, sharding degree, and dataset paths

📚 Documentation

Refer to the README.md for detailed usage instructions and examples

🤝 Contributing

We welcome contributions to enhance the capabilities of sagemaker-hyperpod-recipes. Please refer to our contributing guidelines for more information.

Thank you for choosing sagemaker-hyperpod-recipes for your large-scale language model training needs!

Assets 2