llm-inference

Star

Here are 688 public repositories matching this topic...

nomic-ai / gpt4all

Star

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

ai-chat llm-inference

Updated Dec 21, 2024
C++

microsoft / autogen

Star

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated Dec 24, 2024
Python

liguodongiot / llm-action

Star

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

llm llmops llm-serving llm-training llm-inference

Updated Dec 17, 2024
HTML

Lightning-AI / litgpt

Star

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated Dec 24, 2024
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

llama mistral fine-tuning mlops bentoml vicuna llm model-inference llmops llm-serving llm-inference open-source-llm llama2 openllm llm-ops llama3-1 llama3-2 llama3-2-vision

Updated Dec 24, 2024
Python

mistralai / mistral-inference

Star

Official inference library for Mistral models

llm llm-inference mistralai

Updated Nov 12, 2024
Jupyter Notebook

SJTU-IPADS / PowerInfer

Star

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

falcon llama large-language-models llm local-inference llm-inference bamboo-7b

Updated Sep 6, 2024
C++

openvinotoolkit / openvino

Star

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Dec 24, 2024
C++

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Dec 25, 2024
Python

InternLM / lmdeploy

Star

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Dec 25, 2024
Python

superduper-io / superduper

Star

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

Updated Dec 24, 2024
Python

kserve / kserve

Star

Standardized Serverless ML Inference Platform on Kubernetes

kubernetes machine-learning tensorflow sklearn pytorch artificial-intelligence xgboost k8s service-mesh hacktoberfest istio model-serving kubeflow mlops knative model-interpretability kserve genai llm-inference

Updated Dec 25, 2024
Python

neuralmagic / deepsparse

Star

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

Updated Jul 19, 2024
Python

DefTruth / Awesome-LLM-Inference

Star

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention deepseek open-sora flash-attention-3

Updated Dec 22, 2024

NVIDIA / GenerativeAIExamples

Star

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

microservice gpu-acceleration nemo tensorrt rag triton-inference-server large-language-models llm llm-inference retrieval-augmented-generation

Updated Dec 23, 2024
Python

databricks / dbrx

Star

Code examples and resources for DBRX, a large language model developed by Databricks

databricks llm generative-ai gen-ai llm-training llm-inference mosaic-ai

Updated May 1, 2024
Python

FasterDecoding / Medusa

Star

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

llm llm-inference

Updated Jun 25, 2024
Jupyter Notebook

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Dec 23, 2024
Python

intel / intel-extension-for-transformers

Star

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated Oct 8, 2024
Python

microsoft / aici

Star

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

Updated Nov 10, 2024
Rust

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-inference

Here are 688 public repositories matching this topic...

nomic-ai / gpt4all

microsoft / autogen

liguodongiot / llm-action

Lightning-AI / litgpt

bentoml / OpenLLM

mistralai / mistral-inference

SJTU-IPADS / PowerInfer

openvinotoolkit / openvino

bentoml / BentoML

InternLM / lmdeploy

superduper-io / superduper

kserve / kserve

neuralmagic / deepsparse

DefTruth / Awesome-LLM-Inference

NVIDIA / GenerativeAIExamples

databricks / dbrx

FasterDecoding / Medusa

predibase / lorax

intel / intel-extension-for-transformers

microsoft / aici

Improve this page

Add this topic to your repo