These are end-to-end pipelines that demonstrate the power of
MAX for accelerating common AI workloads, and
more. The umbrella pipelines
Mojo module
contains these pipelines as their own modules, along with shared modules
hosting common functionality.
The pipelines include:
- Llama 3: A text completion demo using the Llama 3 model, implemented in Mojo using the MAX Graph API. This pipeline contains everything needed to run a self-hosted large language model.
- Llama 2: Similar to the Llama 3 text generation pipeline, only with the Llama 2 model. The Llama 2 pipeline also shows how to use a custom kernel in MAX Graphs.
- Replit Code: Code generation via the Replit Code V1.5 3B mode, implemented in Mojo using the MAX Graph API.
- Quantize TinyStories: A demonstration of quantizing a full-precision model using the MAX Graph API, originally trained on the TinyStories dataset.
Instructions for how to run each pipeline can be found in their respective
subdirectories. A shared run_pipeline.🔥
Mojo driver is used to execute
the pipelines.
In addition to the pipelines, common modules contain types and functions shared between the various pipelines. These modules currently include: