-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
petals models local support using python env #435
petals models local support using python env #435
Conversation
…ersal-apple-darwin
Co-authored-by: Casper da Costa-Luis <casper.dcl@physics.org>
just merged #440; lemme know if you need any help rebasing @biswaroop1547 :) |
Using latest commit I downloaded Stable Beluga, once I click Open it hangs and seems to never really start src/controller_binaries.rs:97 2023-11-03T00:50:41 [INFO] - serve_command: setup-petals.sh --model-id petals-team/StableBeluga2 --model-path . --dht-prefix StableBeluga2-hf --port 8734
src/controller_binaries.rs:102 2023-11-03T00:50:41 [INFO] - binary_path: "/Users/tiero/Library/Application Support/io.premai.prem-app/models/stable-beluga-2/setup-petals.sh"
src/controller_binaries.rs:122 2023-11-03T00:50:41 [INFO] - args: ["--model-id", "petals-team/StableBeluga2", "--model-path", ".", "--dht-prefix", "StableBeluga2-hf", "--port", "8734"]
src/controller_binaries.rs:166 2023-11-03T00:50:42 [ERROR] - Failed to send request: error sending request for url (http://localhost:8734/v1): error trying to connect: tcp connect error: Connection refused (os error 61)
src/controller_binaries.rs:166 2023-11-03T00:50:43 [ERROR] - Failed to send request: error sending request for url (http://localhost:8734/v1): error trying to connect: tcp connect error: Connection refused (os error 61)
src/controller_binaries.rs:166 2023-11-03T00:50:43 [ERROR] - Failed to send request: error sending request for url (http://localhost:8734/v1): error trying to connect: tcp connect error: Connection refused (os error 61)
src/controller_binaries.rs:166 2023-11-03T00:50:44 [ERROR] - Failed to send request: error sending request for url (http://localhost:8734/v1): error trying to connect: tcp connect error: Connection refused (os error 61)
[TRUNCATED] |
@tiero actually it takes around ~30sec to startup the model server. To check if the server is up after that duration you can also do |
93488c4
to
47157d4
Compare
|
We should have a timeout, as I waited more than 5 minutes and it was keep haging to me. What can I do to debug? |
@tiero that's weird because it shouldn't take more than 30 secs (given you've ran the swarm before, because that creates the python env which'll be reused for petals), but if you're starting anew then it'd take around 3 mins as it also installs and sets up the python environment before starting the server (currently when the python env is being setup after you click "open" we are not showing any message), to debug can you remove |
btw I guess |
Yes, correct @casperdcl. It's just for testing. The service actually works, but it takes a huge amount of time. After the health request was successful it took 60 seconds to load the chat screen. |
@filopedraz yeah it takes longer if it's creating env from start, but if env is already present then it takes minimum 30 secs to maximum 1 min, do we want to show some kind of message when this is happening? (it's mentioned as one of the issues/todos in this PR desc) This is the actual time it takes for the loading of model into memory after starting up the server, up for ideas here on what we can do to reduce this time though 🙏🏻 |
Good for now. I am more worried about the time between the toast and the load of the chat. I don't know what's. It happens to me with Mistral too actually. |
Download doesn't even start now. Here a loom. |
The PR looks good and it works well for me. I created a new issue here for what concerns the generation. |
I suggest we squash-merge because it's ultimately quite small & not worth rebasing/preserving history |
Before merge, few issues that needs to be taken care of:
Manage two execution variants of Stable Beluga (docker & python env local) through different
id
and other fields, in prem-registry. - We can remove the docker variant (manifest file for stablebeluga2 of type:process
using petals registry#88)stable beluga currently concatenates user prompt too in generated responses, make sure it only shows newly generated tokens. (cht-petals: minor edits on paths and default parameters premAI-io/prem-services#125)
add prompt template for stable beluga (llama based ? or orca based ? need to check) (cht-petals: minor edits on paths and default parameters premAI-io/prem-services#125)
[NOT CRITICAL] In UI the generated responses from Stable Beluga shows up as
strikethrough
Takes a bit of time when starting for first time (as env creation also happens if swarm-mode wasn't run before). Need to show some message here?(Not an issue currently)[NOT CRITICAL] generation for
max_new_tokens > around 5
fails sometimes since it takes longer and variable time for generation and generally a timeout occurs from prem-app's side looks like (Streaming support for Petals services premAI-io/prem-services#127):