fms-extras

This is a repo as part of the foundation-model-stack organization which is used for new features staged to be integrated with foundation-model-stack. This repo is the home for extensions, research and/or in-development work, and fms-based models trained by IBM.

Installation

Local

pip install -e .

Notable Features

MLPSpeculator: a lightweight speculator model that can be used along-side a generative model to speed up inference (currently deployed in IBM TGIS with training in fms-fsdp)
PagedKVCacheManager: an implementation of kv-cache management that provides a user with the proper input to use paged-attention with their own models (currently deployed in IBM TGIS)
PagedLLaMA: a LLaMA implementation that uses paged-attention in Multi-Head Attention. This model is compilable without graph breaks.
speculative generation: a reference implementation of speculative generate using PagedKVCacheManager and MLPSpeculator

Structure and contents of this Repository

This repo follows a similar structure to that of foundation-model-stack

fms_extras/models/ - Pure pytorch implementations of popular model architectures, without requiring any specific common interface beyond nn.Module. Each model configuration is registered with fms.models.register_model() so that instances can be obtained through fms.models.get_model('architecture', 'variant', '/path/to/data'). Each model can also register sources/formats/versions of data to load (e.g. checkpoints provided by meta, HF, or trained from this repo).
fms_extras/models/hf/ - Adapters that compose our native PyTorch FMS model architecture implementations in HF-compatible wrapper interfaces. Each FMS model implements an adapter, and adapted instances are obtained via fms.models.hf.to_hf_api(model)
fms_extras/utils/ - Other operators useful in working with LLMs. These include a speculative_generate() function, PagedKVCacheManager class for easy-to-use kv-cache management with paged attention kernels, etc.
scripts/ - Various scripts for inference (paged generation and speculative generation)
csrc/ - Custom kernels used in fms-extra, currently related to paged-attention

References

Huggingface TGI: https://github.com/huggingface/text-generation-inference
IBM TGIS: https://github.com/IBM/text-generation-inference

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.github/workflows		.github/workflows
csrc/paged_attention		csrc/paged_attention
fms_extras		fms_extras
scripts		scripts
tests		tests
.gitignore		.gitignore
.isort.cfg		.isort.cfg
LICENSE		LICENSE
README.md		README.md
fms-extras-requirements.txt		fms-extras-requirements.txt
pyproject.toml		pyproject.toml
requirements-build.txt		requirements-build.txt
requirements.txt		requirements.txt
setup.py		setup.py
test-requirements.txt		test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fms-extras

Installation

Local

Notable Features

Structure and contents of this Repository

References

About

Releases

Packages

Contributors 7

Languages

License

foundation-model-stack/fms-extras

Folders and files

Latest commit

History

Repository files navigation

fms-extras

Installation

Local

Notable Features

Structure and contents of this Repository

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages