GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
-
Updated
Nov 5, 2024 - C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
A programming framework for agentic AI 🤖
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.
Official inference library for Mistral models
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Superduper: build end-2-end AI applications and templates using your existing data infrastructure and tools of choice
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Standardized Serverless ML Inference Platform on Kubernetes
Sparsity-aware deep learning inference runtime for CPUs
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Code examples and resources for DBRX, a large language model developed by Databricks
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."