Changes in Engine Readmes #183

Anindyadeep · 2024-04-24T15:53:27Z

This PR solves:

issue Setup new md files from templates #179
issue Change Main readme #176
issue Complete ML Engines Table #149

Additionally, I also added an archive.md file which contained our previous benchmarks results.

into litgpt-mistral

…or every engines and added some comments

…nged the wordings and sections

… into readme-latest

nsosio

aren't. we missing mistral and mistral template files?

README.md

docs/llama2.md

… float8 in mlengines.md

Anindyadeep · 2024-04-29T11:11:23Z

Hi @nsosio, let me know if it looks good or not :)

nsosio · 2024-04-29T11:12:26Z

README.md

+
+*(Data updated: `30th April 2024`)
+
+> **Note:** Our previous version of Benchmarks supported benchmarking on Metal and M1/M2 CPUs. We did the benchmarking on similar environments (including mac devices) on Llama 2 7B model. However this version is more focussed on enterprices. But if you are curious, you can check that out [here](/docs/archive.md). Also please not that the numbers might be a bit outdated.


However this version is more focussed on enterprices.

What do you mean?

Oh sorry, I need to fix the typos

README.md

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

nsosio

LGTM

* AutoGPTQ Mistral, Memory profiling support and empirical quality checks (#163) * Added info about mistral support * AutoGPTQ now uses base class, with mistral support and memory profiling * minor changes on change of cli args in bench.sh * changed requirements with latest update of autogptq * support for mistral instruct and llama2 chat and latest autogptq installation from source * Added another common utility to build chat templates for model * fix bugs for multiple duplicated logging * PyTorchBenchmark supports mistral, memory profile and uses Base class * changes in instruction and using latest model for benchmarking, removed Logs * fixing dependencies with proper versions * Addition of mistral and llama and also table for precision wise quality comparision * Added new docs and template for mistral and starting out new benchmark performance logs in templates * improvements on better logging strategies to log the quality checks output in a readme * integrated the utility for logging improvements * using better logging stratgies in bench pytorch * questions.json has the ground truth answer set from fp32 response * AutoGPTQ readme improvements and added quality checks examples for llama and mistral * Using latest logging utilities * removed creation of Logs folder and unnencessary arguments * Added fsspec * Added llama2 and mistral performance logs * pinned version of huggingface_hub * Latest info under 'some points to note' section * Update bench_autogptq/bench.sh Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it> --------- Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local> Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it> * Deepspeed Mistral, Memory profiling support and empirical quality checks (#168) * Added another common utility to build chat templates for model * fix bugs for multiple duplicated logging * PyTorchBenchmark supports mistral, memory profile and uses Base class * changes in instruction and using latest model for benchmarking, removed Logs * fixing dependencies with proper versions * Addition of mistral and llama and also table for precision wise quality comparision * Added new docs and template for mistral and starting out new benchmark performance logs in templates * improvements on better logging strategies to log the quality checks output in a readme * integrated the utility for logging improvements * using better logging stratgies in bench pytorch * questions.json has the ground truth answer set from fp32 response * DeepSpeed now using base class, mistral support and memory profiling * removed unused imports * removed Logs and latest improvements w.r.t base class * README now has quality comparision for deepspeed * using latest version of deepspeed * added latest performance logs for llama2 and mistral * added docs for llama and mistral with latest scores * updated readme with correct model info --------- Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local> * Ctransformers Mistral and Memory Profiling support (#165) * Ctransformers support mistral and uses Base class along with memory profiling * uses latest bench.py arguments, remove making log folders and improvements * supporting mistral and llama chat models and installation improvements * added additional requirements which is not support by ctransformers by default * Added another common utility to build chat templates for model * fix bugs for multiple duplicated logging * PyTorchBenchmark supports mistral, memory profile and uses Base class * changes in instruction and using latest model for benchmarking, removed Logs * fixing dependencies with proper versions * Addition of mistral and llama and also table for precision wise quality comparision * Added new docs and template for mistral and starting out new benchmark performance logs in templates * improvements on better logging strategies to log the quality checks output in a readme * integrated the utility for logging improvements * using better logging stratgies in bench pytorch * questions.json has the ground truth answer set from fp32 response * CTransformers using latest logging utilities * removed unnecessary arguments and creation of Logs folder * Add precision wise quality comparision on AutoGPTQ readme * Added performance scores for llama2 and mistral * Latest info under 'some points to note' section * added ctransformers performance logs for mistral and llama * Update bench_ctransformers/bench.sh Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it> --------- Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local> Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it> * CTranslate2 Benchmark with Mistral Support (#170) * Added support for BaseClass and mistral with memory profiling * removed docker support with latest ctranslate release * Added latest ctranslate2 version * Removed runs with docker and added mistral model support * removed docker support and added mistral support * Added performance logs for mistral and llama * engine specific readme with qualitative comparision * Llamacpp mistral (#171) * fix bug: handle temperature when None * Added llamacpp engine readme with quality comparision * Using Base class with mistral support and memory profiling * shell script cli improvements * Added newer requirements with version pineed * small improvements * removed MODEL_NAME while running setup * Added performance logs for llama and mistral * fixed performance metrics of llama for pytorch transformers * fixed performance metrics of mistral for pytorch transformers * Fix the name of the models and links of the same * ExLlamaV2 Mistral, Memory support, qualitative comparision and improvements (#175) * Added performance logs for mistral and llama for exllamav2 along with qualitative comparisions * ExLlamaV2 using base class along with support for mistral and memory profiling * removed old cli args and small improvements * deleted convert.py script * pinned latest version and added transformers * addition of mistral model along with usage of latest exllamav2 repo * Update bench_exllamav2/bench.sh Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it> --------- Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it> * vLLM Mistral, Memory support, qualitative comparision and improvements (#172) * Adding base class with mistral support and memory profling * small improvements on removing unnecessary cli args * download support for mistral * adding on_exit function on get_answers * Added precision wise qualitative checks for vLLM README * Added performance logs on docs for mistral and llama * Update bench_vllm/bench.sh Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it> --------- Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it> * Nvidia TensortRT LLM Mistral, Memory support, qualitative comparision and improvements (#178) * Added readme with mistral support and qualitative comparision * TRT LLM using base class with mistral and memory profiling support * removed old cli args, and some improvements * Added support for mistral with latest trt llm * Added support for root dir for handling runs inside and outside docker * Added performance logs for both mistral and llama * Added float32 on docs and performance logs * Added support for float32 precision * Added support for float32 * revised to int4 for mistral * Optimum Nvidia Mistral, Memory support, qualitative comparision and improvements (#177) * Added performance logs for mistral and llama for exllamav2 along with qualitative comparisions * ExLlamaV2 using base class along with support for mistral and memory profiling * removed old cli args and small improvements * deleted convert.py script * pinned latest version and added transformers * addition of mistral model along with usage of latest exllamav2 repo * Using base benchmark class with memory profiling support and mistral model support * Addition of new contructor argument root_dir to handle paths inside or outside docker * created a converter script to convert to tensorrt engine file * Addition of latest update usage of optimum nvidia and also added qualitative comparision * cli improvements and remove older cli args * added latest conversion script logic to conver hf weights to engine and mistral support * Added latest performance logs for both mistral and llama * removed the conflict with exllamav2 * removed changes from exllamav2 * ONNX Runtime with mistral support and memory profiling (#182) * Added comparative quality analysis for mistral and llama and also added nuances related to onnx * Using base class with memory profiling and mistral support * removed old cli arguments and some improvements * removed requirements, since onnx runs using custom docker container * Added new setup sh file with mistral and llama onnx conversion through docker * Added performance logs of onnx for llama and mistral * Lightning AI Mistral and memory integration (#174) * Added qualitative comparision of quantity for litgpt * Using base class with mistral support and memory support * small cli improvements, removed old arguments * removed convert logic with latest litgpt * Added latest inference logic code * pinned version for dependencies * Added latest method of installation and model conversions with litgpt * added performance benchmarks info in litgpt * updated the memory usage and token per seconds * chore: minor improvements and added latest info about int4 * Changes in Engine Readmes (#183) * Deleted the files related to llama2 in docs --------- Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local> Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

Anindyadeep added 20 commits April 20, 2024 00:46

Added qualitative comparision of quantity for litgpt

4bfd9dd

Using base class with mistral support and memory support

c07b96d

small cli improvements, removed old arguments

ec5a05c

removed convert logic with latest litgpt

a0bc0c0

Added latest inference logic code

063b111

pinned version for dependencies

d184cbf

Added latest method of installation and model conversions with litgpt

c3d938a

added performance benchmarks info in litgpt

0649c42

Merge branch 'dev' into litgpt-mistral

1a47ab8

Merge branch 'dev' into litgpt-mistral

b8806a3

updated the memory usage and token per seconds

bb35dd1

Merge branch 'litgpt-mistral' of https://github.com/premAI-io/benchmarks

2800942

into litgpt-mistral

initial commit for table structure and added info for candle

c025239

Added info about vllm, onnx, ctranslate2, llama cpp

16be406

Merge branch 'dev' into readme-latest

edef0af

Added a good format, and made a classification of different nuances f…

26bb07b

…or every engines and added some comments

Added an archive for all the previous benchmarks

233305f

deleted all the old informations

7ddd10b

chore: removed values for M2

6477a85

Merge branch 'dev' into readme-latest

287ea71

Anindyadeep marked this pull request as draft April 24, 2024 15:54

Anindyadeep self-assigned this Apr 24, 2024

Anindyadeep added 3 commits April 25, 2024 00:56

Refactor: added latest benchmark results and ml engines table and cha…

263e41d

…nged the wordings and sections

fix: table of contents, updated with latest table of contents

8209a47

Merge branch 'readme-latest' of https://github.com/premAI-io/benchmarks…

db75b74

… into readme-latest

Anindyadeep requested a review from nsosio April 24, 2024 19:28

Anindyadeep marked this pull request as ready for review April 24, 2024 19:29

nsosio reviewed Apr 29, 2024

View reviewed changes

README.md Show resolved Hide resolved

docs/llama2.md Outdated Show resolved Hide resolved

Anindyadeep Sannigrahi added 2 commits April 29, 2024 15:05

reverted llama2.md to it's original state

9ef236d

reverted readme to the original state

57aff24

Anindyadeep Sannigrahi added 2 commits April 29, 2024 15:10

Added template for readme in order to be updated through cli

94a3769

revert README to the original form

0c43adc

Anindyadeep requested a review from nsosio April 29, 2024 09:49

Anindyadeep Sannigrahi added 5 commits April 29, 2024 16:27

deleted workflow to update benchmakrs, now it is almost static

5b19af8

deleted the .md and .template files for mistral and llama and removed…

be0a32a

… float8 in mlengines.md

deleted the readme template file

9c0fc15

Added tables related llama 2 in main readme

569ca1b

Added latest ml enignes table in readme.md

dc057a0

Anindyadeep force-pushed the readme-latest branch from 7c63247 to dc057a0 Compare April 29, 2024 11:02

deleted More Benchmarks section in main readme table of contents

6b71e79

nsosio reviewed Apr 29, 2024

View reviewed changes

Anindyadeep and others added 3 commits April 29, 2024 16:58

Update README.md

67809e0

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

Update README.md

c779fb7

Co-authored-by: Nicola Sosio <sosio.nicola94@tiscali.it>

fixed english and typos

318c0a2

Anindyadeep requested a review from nsosio April 29, 2024 11:41

nsosio approved these changes Apr 29, 2024

View reviewed changes

nsosio merged commit b5b90c4 into dev Apr 29, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes in Engine Readmes #183

Changes in Engine Readmes #183

Anindyadeep commented Apr 24, 2024 •

edited

Loading

nsosio left a comment

Anindyadeep commented Apr 29, 2024

nsosio Apr 29, 2024

Anindyadeep Apr 29, 2024

Anindyadeep Apr 29, 2024

nsosio left a comment


		*(Data updated: `30th April 2024`)

		> Note: Our previous version of Benchmarks supported benchmarking on Metal and M1/M2 CPUs. We did the benchmarking on similar environments (including mac devices) on Llama 2 7B model. However this version is more focussed on enterprices. But if you are curious, you can check that out [here](/docs/archive.md). Also please not that the numbers might be a bit outdated.

Changes in Engine Readmes #183

Changes in Engine Readmes #183

Conversation

Anindyadeep commented Apr 24, 2024 • edited Loading

nsosio left a comment

Choose a reason for hiding this comment

Anindyadeep commented Apr 29, 2024

nsosio Apr 29, 2024

Choose a reason for hiding this comment

Anindyadeep Apr 29, 2024

Choose a reason for hiding this comment

Anindyadeep Apr 29, 2024

Choose a reason for hiding this comment

nsosio left a comment

Choose a reason for hiding this comment

Anindyadeep commented Apr 24, 2024 •

edited

Loading