Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0 add autoquant #402

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

v0 add autoquant #402

wants to merge 4 commits into from

Conversation

michaelfeil
Copy link
Owner

@michaelfeil michaelfeil commented Oct 7, 2024

This pull request introduces several changes to the infinity_emb library, focusing on adding support for a new autoquant data type, updating documentation, and improving the quantization process. The most important changes include adding the autoquant data type, updating the CLI documentation, modifying quantization logic, and adding unit tests for autoquant quantization.

New Features:

  • Added autoquant data type to Dtype enum in libs/infinity_emb/infinity_emb/primitives.py.
  • Updated quantization logic to handle autoquant in libs/infinity_emb/infinity_emb/transformer/quantization/interface.py and libs/infinity_emb/infinity_emb/transformer/quantization/quant.py [1] [2].

Documentation Updates:

  • Updated CLI documentation to include autoquant in docs/docs/cli_v2.md.

Codebase Improvements:

  • Modified Makefile to use poetry run for generating OpenAPI and CLI v2 documentation in libs/infinity_emb/Makefile [1] [2].

Dependency Updates:

  • Added torchao as an optional dependency in libs/infinity_emb/pyproject.toml [1] [2].

Testing Enhancements:

  • Added unit tests for autoquant quantization in libs/infinity_emb/tests/unit_test/transformer/quantization/test_interface.py.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces support for 'autoquant', a new automatic quantization feature in the Infinity project. The changes span multiple files and include implementation, documentation, and testing updates.

  • Added 'autoquant' as a new option in the Dtype enum and CLI documentation, enabling automatic quantization for improved model performance
  • Implemented 'autoquant' support in the SentenceTransformerPatched class and quantization interface
  • Added 'torchao' dependency to pyproject.toml, likely to support the new autoquant functionality
  • Created a new test function to verify the autoquant feature's effectiveness and accuracy
  • Updated README with information on new multi-modal support (CLIP, CLAP) and text classification capabilities

9 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings

@@ -7,6 +7,7 @@

import numpy as np
import requests # type: ignore
import torch.ao.quantization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: This import is unused in the current file. Consider removing it if not needed.

Comment on lines 46 to +53
model = torch.quantization.quantize_dynamic(
model.to("cpu"), # the original model
{torch.nn.Linear}, # a set of layers to dynamically quantize
dtype=torch.qint8,
)
model = torch.ao.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Two quantization methods are applied sequentially. This might lead to unexpected behavior or reduced model performance. Consider using only one method or clarify why both are necessary.

bettertransformer=False,
)
)
sentence = "This is a test sentence."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: This line is unused and can be removed.

Comment on lines +96 to +97
if __name__ == "__main__":
test_autoquant_quantization()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Running a single test function in main might not be ideal. Consider using a test runner or removing this block if not necessary.

@codecov-commenter
Copy link

codecov-commenter commented Oct 7, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 21.42857% with 11 lines in your changes missing coverage. Please review.

Project coverage is 73.24%. Comparing base (0f1b786) to head (55a8b0e).

Files with missing lines Patch % Lines
...infinity_emb/transformer/quantization/interface.py 14.28% 6 Missing ⚠️
...emb/infinity_emb/transformer/quantization/quant.py 0.00% 5 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

❗ There is a different number of reports uploaded between BASE (0f1b786) and HEAD (55a8b0e). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (0f1b786) HEAD (55a8b0e)
2 1
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #402      +/-   ##
==========================================
- Coverage   79.01%   73.24%   -5.77%     
==========================================
  Files          40       40              
  Lines        3173     3184      +11     
==========================================
- Hits         2507     2332     -175     
- Misses        666      852     +186     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants