Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MPNet model #363

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

kozistr
Copy link
Contributor

@kozistr kozistr commented Jul 28, 2024

What does this PR do?

Fixes #250
Fixes #33

feedback or contributions are welcome!

  • inference result
    • CPU
    • GPU (colab T4)
    • Metal
  • MPNetAttentionBias
  • MPNetAttention is now identical to the Python implementation.
    • attention_bias
    • attention_mask

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@OlivierDehaene OR @Narsil

@kozistr kozistr marked this pull request as ready for review July 29, 2024 16:43
@ramipellumbi
Copy link

ramipellumbi commented Aug 18, 2024

Unable to run on Metal local install:

Error: Model backend is not healthy

Caused by:
    Metal contiguous affine U8 not implemented

I can try taking a crack at this if desired

@kozistr
Copy link
Contributor Author

kozistr commented Aug 18, 2024

Unable to run on Metal local install:

Error: Model backend is not healthy

Caused by:
    Metal contiguous affine U8 not implemented

I can try taking a crack at this if desired

thanks for checking in. that sounds great! I’d really appreciate your input. please feel free to dive in whenever you are available :)

@kozistr
Copy link
Contributor Author

kozistr commented Aug 26, 2024

I just made a small change not to support the Metal devices temporarily 13ebffb

@ramipellumbi
Copy link

I just made a small change not to support the Metal devices temporarily 13ebffb

Thank you :) will check into this. Sorry I have been too busy to look into this lately

@kozistr
Copy link
Contributor Author

kozistr commented Aug 27, 2024

I just made a small change not to support the Metal devices temporarily 13ebffb

Thank you :) will check into this. Sorry I have been too busy to look into this lately

no worries! take your time :) if you need any help, please feel free to mention me here

@kozistr
Copy link
Contributor Author

kozistr commented Sep 22, 2024

@ramipellumbi I just made a commit to fix that Metal issue! -> 9b58292

@OlivierDehaene I guess this PR is ready to review

-> % ./target/release/text-embeddings-router --model-id sentence-transformers/all-mpnet-base-v2 --dtype float32 --pooling mean --port 8080
2024-09-22T06:38:19.976274Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "sen*****-************/***-*****-***e-v2", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: Some(Mean), max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-09-22T06:38:19.977004Z  INFO hf_hub: /Users/taehyeonjeon/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/Users/taehyeonjeon/.cache/huggingface/token"
2024-09-22T06:38:19.981266Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-09-22T06:38:19.981292Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-09-22T06:38:19.981296Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-09-22T06:38:19.981314Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-09-22T06:38:19.981359Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:328: Downloading `model.safetensors`
2024-09-22T06:38:19.981380Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 87.625µs
2024-09-22T06:38:19.989996Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 384
2024-09-22T06:38:19.990076Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
2024-09-22T06:38:20.006653Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-09-22T06:38:20.017132Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:233: Starting MPNet model on Metal(MetalDevice(DeviceId(1)))
2024-09-22T06:38:21.068835Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:8080
2024-09-22T06:38:21.068848Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready

@vrdn-23
Copy link

vrdn-23 commented Oct 25, 2024

@OlivierDehaene would it be possible to get this PR merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sbert based mpnet model(related issue #33) Sentence Transformers based mpnet models
3 participants