Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* - Added Continuous batching feature - Refactored text generation module Signed-off-by: quic-rishinr <quic_rishinr@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Added cherrypicked continous batching changes. Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Updated the assert condition for bs > 1 and full batch size >1 Updated issue with qpc path creation for non cb execution. Added condition to check CB is enabled for supported architectures Added formatting changes Signed-off-by: quic-rishinr <quic_rishinr@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * update full-batch-size args Signed-off-by: vbaddi <quic_vbaddi@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * update unique cache dir to include arg naming Signed-off-by: vbaddi <quic_vbaddi@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * include env variable in the QEff_MODELS_DIR to override Signed-off-by: vbaddi <quic_vbaddi@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Small bug fix Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Fixed issue with output issue with FBS > 1. Cherry picked the support for Mixtral Added CB suport for Starcoder Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Lint & Format (quic#53) * Lint & Format - Added linting and formatting github actions - Formatted entire codebase - Fixed linter errors - Removed `# noqa` with fix Signed-off-by: Ilango Rajagopal <quic_irajagop@quicinc.com> * Split test config into multiple-lines Signed-off-by: Ilango Rajagopal <quic_irajagop@quicinc.com> * Fix external repo for workflow Signed-off-by: Ilango Rajagopal <quic_irajagop@quicinc.com> * Format newly added files Signed-off-by: Ilango Rajagopal <quic_irajagop@quicinc.com> --------- Signed-off-by: Ilango Rajagopal <quic_irajagop@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * enabling `export_and_compile` for `QEFFAutoModelForCausalLM` (quic#48) * enabling export_and_compile Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * * cleaned API usage, *Integrated export into compile *Addressed comments Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * removed src, simplified automodelclass Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * added base directory in place of src Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * replaced src/auto with transformers/models/modeling_auto Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * ran linter and formatter Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * removed commented code Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * fixed typos Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * fixed testing script Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * removed unitTest dependency using pytest only in all tests Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * added test report for showing on jenkins view Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * updated jenkinsfile to capture test data in xml for jenkins view Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * fixed HL tests Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * fixed cloud tests Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * made ctx_len default argument in exec_kv function, fixed tests/cloud/test_infer Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * run ruff formatter Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * fixed type hint Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * moved use_cache assignment to init so that models initialized via init will also have the flag True Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * ran ruff formatter Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> --------- Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * - Added Continuous batching feature - Refactored text generation module Signed-off-by: quic-rishinr <quic_rishinr@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Updated the assert condition for bs > 1 and full batch size >1 Updated issue with qpc path creation for non cb execution. Added condition to check CB is enabled for supported architectures Added formatting changes Signed-off-by: quic-rishinr <quic_rishinr@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * update unique cache dir to include arg naming Signed-off-by: vbaddi <quic_vbaddi@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Small bug fix Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * rebased the code against mainline Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Created a separate file for scatter and gather CB ops adhering to PR 55 Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Formatted the code using linter Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * removed runtime_args Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Added FBS flag in execute module Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * making HL test alighnment with Continuous Batching Signed-off-by: Abukhoyer Shaik <quic_abukhoye@quicinc.com> * Removed cache path from infer and export module, Updated default cache path in constants Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Rebased against main Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Lint and format Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Removed base_dir_name from export.py Removed TODO from infer Removed CB-specific scatter and gather op from cts_scatter_gather.py Updated CB model architecture change to export_for_cloud module and changed it to NotImplementedError Commented out custom_opsets usage in export_onnx_model Lint fix on conftest.py Removed print statement from text_generation_inference.py Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Lint and format Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Adding test configs Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Added support for pytorch input handler, Added support for fetching FBS from QPC Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Added back check_and_assign_cache_dir in infer and export, reverted custom_opsets in export_utils,Minor fix in text_generation_inference Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Linter formatting and minor bug fixes Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Addressed review comments and fixed the issue with total decode token calculation Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Linter and formating Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * rebased against mainline Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Lint formaatting Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Making CI Running I Signed-off-by: Abukhoyer Shaik <quic_abukhoye@quicinc.com> * Added Transformed models and QPC storage section in readme Removed Constants.CACHE_DIR. Added FBS and BS check in compiler helper. Renamed “perfill time” print statement to “Average prefill time”. Added CB transform class. Updated Modeling file to adhere to CBTransform changes. Renamed Qeff cache folder from qeff_models to qeff_cache. Other review changes. Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Linter and added some missing changes Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Added CI changes and some missing changes Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Updated logic for initializing transform classes for PyTorch transforms Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Rebased and updated doc string Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Resloved Testing bugs Signed-off-by: Abukhoyer Shaik <quic_abukhoye@quicinc.com> * adding some models in json file Signed-off-by: Abukhoyer Shaik <quic_abukhoye@quicinc.com> * Added streamer for CB Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Updated generated ID len Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * removed streamer for CB Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * Changed Tests Configs Signed-off-by: Abukhoyer Shaik <quic_abukhoye@quicinc.com> * Lint format fix Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * updated generated output print logic Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> * added extra line between full batch size prints Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * removed commented code Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * removed commented lines Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> * added infer docstring back Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> --------- Signed-off-by: quic-rishinr <quic_rishinr@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: vbaddi <quic_vbaddi@quicinc.com> Signed-off-by: Ilango Rajagopal <quic_irajagop@quicinc.com> Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> Signed-off-by: Abukhoyer Shaik <quic_abukhoye@quicinc.com> Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Vinayak Baddi <quic_vbaddi@quicinc.com> Co-authored-by: Ilango Rajagopal <quic_irajagop@quicinc.com> Co-authored-by: Onkar Chougule <168134249+ochougul@users.noreply.github.com> Co-authored-by: Abukhoyer Shaik <quic_abukhoye@quicinc.com> Co-authored-by: Onkar Chougule <quic_ochougul@quicinc.com>
- Loading branch information