REServe: Reliable and Efficient Large Language Models Serving System

All

8 repositories

vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•4.5k•0•0•0•Updated Nov 5, 2024Nov 5, 2024
vllm-pdd
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•4.5k•0•0•0•Updated Oct 22, 2024Oct 22, 2024
ServerlessLLM
Public
Cost-efficient and fast multi-LLM serving.
Python
•
Apache License 2.0
•30•0•0•0•Updated Jul 31, 2024Jul 31, 2024
core
Public
Core components for REServe
Python
•0•0•0•0•Updated Jul 29, 2024Jul 29, 2024
Initializer
Public
Initializer for KServe Cluster
Shell
•
Apache License 2.0
•1•1•0•0•Updated Jul 29, 2024Jul 29, 2024
tensorrtllm_backend
Public
The Triton TensorRT-LLM Backend
Python
•
Apache License 2.0
•104•0•0•0•Updated Jul 29, 2024Jul 29, 2024
TensorRT-LLM
Public
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
C++
•
Apache License 2.0
•979•0•0•0•Updated Jul 9, 2024Jul 9, 2024
kserve
Public
Standardized Serverless ML Inference Platform on Kubernetes
Python
•
Apache License 2.0
•1.1k•0•0•0•Updated Jul 4, 2024Jul 4, 2024