Skip to content

Commit

Permalink
Benchmarks v2 Merging dev to main (#184)
Browse files Browse the repository at this point in the history
* AutoGPTQ Mistral, Memory profiling support and empirical quality checks (#163)

* Added info about mistral support

* AutoGPTQ now uses base class, with mistral support and memory profiling

* minor changes on change of cli args in bench.sh

* changed requirements with latest update of autogptq

* support for mistral instruct and llama2 chat and latest autogptq installation from source

* Added another common utility to build chat templates for model

* fix bugs for multiple duplicated logging

* PyTorchBenchmark supports mistral, memory profile and uses Base class

* changes in instruction and using latest model for benchmarking, removed Logs

* fixing dependencies with proper versions

* Addition of mistral and llama and also table for precision wise quality comparision

* Added new docs and template for mistral and starting out new benchmark performance logs in templates

* improvements on better logging strategies to log the quality checks output in a readme

* integrated the utility for logging improvements

* using better logging stratgies in bench pytorch

* questions.json has the ground truth answer set from fp32 response

* AutoGPTQ readme improvements and added quality checks examples for llama and mistral

* Using latest logging utilities

* removed creation of Logs folder and unnencessary arguments

* Added fsspec

* Added llama2 and mistral performance logs

* pinned version of huggingface_hub

* Latest info under 'some points to note' section

* Update bench_autogptq/bench.sh

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

* Deepspeed Mistral, Memory profiling support and empirical quality checks (#168)

* Added another common utility to build chat templates for model

* fix bugs for multiple duplicated logging

* PyTorchBenchmark supports mistral, memory profile and uses Base class

* changes in instruction and using latest model for benchmarking, removed Logs

* fixing dependencies with proper versions

* Addition of mistral and llama and also table for precision wise quality comparision

* Added new docs and template for mistral and starting out new benchmark performance logs in templates

* improvements on better logging strategies to log the quality checks output in a readme

* integrated the utility for logging improvements

* using better logging stratgies in bench pytorch

* questions.json has the ground truth answer set from fp32 response

* DeepSpeed now using base class, mistral support and memory profiling

* removed unused imports

* removed Logs and latest improvements w.r.t base class

* README now has quality comparision for deepspeed

* using latest version of deepspeed

* added latest performance logs for llama2 and mistral

* added docs for llama and mistral with latest scores

* updated readme with correct model info

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>

* Ctransformers Mistral and Memory Profiling support (#165)

* Ctransformers support mistral and uses Base class along with memory profiling

* uses latest bench.py arguments, remove making log folders and improvements

* supporting mistral and llama chat models and installation improvements

* added additional requirements which is not support by ctransformers by default

* Added another common utility to build chat templates for model

* fix bugs for multiple duplicated logging

* PyTorchBenchmark supports mistral, memory profile and uses Base class

* changes in instruction and using latest model for benchmarking, removed Logs

* fixing dependencies with proper versions

* Addition of mistral and llama and also table for precision wise quality comparision

* Added new docs and template for mistral and starting out new benchmark performance logs in templates

* improvements on better logging strategies to log the quality checks output in a readme

* integrated the utility for logging improvements

* using better logging stratgies in bench pytorch

* questions.json has the ground truth answer set from fp32 response

* CTransformers using latest logging utilities

* removed unnecessary arguments and creation of Logs folder

* Add precision wise quality comparision on AutoGPTQ readme

* Added performance scores for llama2 and mistral

* Latest info under 'some points to note' section

* added ctransformers performance logs for mistral and llama

* Update bench_ctransformers/bench.sh

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

* CTranslate2 Benchmark with Mistral Support  (#170)

* Added support for BaseClass and mistral with memory profiling

* removed docker support with latest ctranslate release

* Added latest ctranslate2 version

* Removed runs with docker and added mistral model support

* removed docker support and added mistral support

* Added performance logs for mistral and llama

* engine specific readme with qualitative comparision

* Llamacpp mistral (#171)

* fix bug: handle temperature when None

* Added llamacpp engine readme with quality comparision

* Using Base class with mistral support and memory profiling

* shell script cli improvements

* Added newer requirements with version pineed

* small improvements

* removed MODEL_NAME while running setup

* Added performance logs for llama and mistral

* fixed performance metrics of llama for pytorch transformers

* fixed performance metrics of mistral for pytorch transformers

* Fix the name of the models and links of the same

* ExLlamaV2 Mistral, Memory support, qualitative comparision and improvements (#175)

* Added performance logs for mistral and llama for exllamav2 along with qualitative comparisions

* ExLlamaV2 using base class along with support for mistral and memory profiling

* removed old cli args and small improvements

* deleted convert.py script

* pinned latest version and added transformers

* addition of mistral model along with usage of latest exllamav2 repo

* Update bench_exllamav2/bench.sh

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

---------

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

* vLLM Mistral, Memory support, qualitative comparision and improvements  (#172)

* Adding base class with mistral support and memory profling

* small improvements on removing unnecessary cli args

* download support for mistral

* adding on_exit function on get_answers

* Added precision wise qualitative checks for vLLM README

* Added performance logs on docs for mistral and llama

* Update bench_vllm/bench.sh

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

---------

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

* Nvidia TensortRT LLM Mistral, Memory support, qualitative comparision and improvements (#178)

* Added readme with mistral support and qualitative comparision

* TRT LLM using base class with mistral and memory profiling support

* removed old cli args, and some improvements

* Added support for mistral with latest trt llm

* Added support for root dir for handling runs inside and outside docker

* Added performance logs for both mistral and llama

* Added float32 on docs and performance logs

* Added support for float32 precision

* Added support for float32

* revised to int4 for mistral

* Optimum Nvidia Mistral, Memory support, qualitative comparision and improvements (#177)

* Added performance logs for mistral and llama for exllamav2 along with qualitative comparisions

* ExLlamaV2 using base class along with support for mistral and memory profiling

* removed old cli args and small improvements

* deleted convert.py script

* pinned latest version and added transformers

* addition of mistral model along with usage of latest exllamav2 repo

* Using base benchmark class with memory profiling support and mistral model support

* Addition of new contructor argument root_dir to handle paths inside or outside docker

* created a converter script to convert to tensorrt engine file

* Addition of latest update usage of optimum nvidia and also added qualitative comparision

* cli improvements and remove older cli args

* added latest conversion script logic to conver hf weights to engine and mistral support

* Added latest performance logs for both mistral and llama

* removed the conflict with exllamav2

* removed changes from exllamav2

* ONNX Runtime with mistral support and memory profiling  (#182)

* Added comparative quality analysis for mistral and llama and also added nuances related to onnx

* Using base class with memory profiling and mistral support

* removed old cli arguments and some improvements

* removed requirements, since onnx runs using custom docker container

* Added new setup sh file with mistral and llama onnx conversion through docker

* Added performance logs of onnx for llama and mistral

* Lightning AI Mistral and memory integration  (#174)

* Added qualitative comparision of quantity for litgpt

* Using base class with mistral support and memory support

* small cli improvements, removed old arguments

* removed convert logic with latest litgpt

* Added latest inference logic code

* pinned version for dependencies

* Added latest method of installation and model conversions with litgpt

* added performance benchmarks info in litgpt

* updated the memory usage and token per seconds

* chore: minor improvements and added latest info about int4

* Changes in Engine Readmes  (#183)

* Deleted the files related to llama2 in docs

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>
  • Loading branch information
3 people authored Apr 29, 2024
1 parent 80bba57 commit fb72782
Show file tree
Hide file tree
Showing 67 changed files with 2,955 additions and 3,224 deletions.
34 changes: 0 additions & 34 deletions .github/workflows/update_benchmark.yaml

This file was deleted.

288 changes: 184 additions & 104 deletions README.md

Large diffs are not rendered by default.

164 changes: 0 additions & 164 deletions README.md.template

This file was deleted.

Loading

0 comments on commit fb72782

Please sign in to comment.