Skip to content
Change the repository type filter

All

    Repositories list

    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.5k000Updated Nov 5, 2024Nov 5, 2024
    • vllm-pdd

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.5k000Updated Oct 22, 2024Oct 22, 2024
    • Cost-efficient and fast multi-LLM serving.
      Python
      Apache License 2.0
      30000Updated Jul 31, 2024Jul 31, 2024
    • core

      Public
      Core components for REServe
      Python
      0000Updated Jul 29, 2024Jul 29, 2024
    • Initializer for KServe Cluster
      Shell
      Apache License 2.0
      1100Updated Jul 29, 2024Jul 29, 2024
    • The Triton TensorRT-LLM Backend
      Python
      Apache License 2.0
      104000Updated Jul 29, 2024Jul 29, 2024
    • TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
      C++
      Apache License 2.0
      979000Updated Jul 9, 2024Jul 9, 2024
    • kserve

      Public
      Standardized Serverless ML Inference Platform on Kubernetes
      Python
      Apache License 2.0
      1.1k000Updated Jul 4, 2024Jul 4, 2024