Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to save a "prequantized_flow" safetensor? #16

Open
smuelpeng opened this issue Sep 9, 2024 · 5 comments
Open

How to save a "prequantized_flow" safetensor? #16

smuelpeng opened this issue Sep 9, 2024 · 5 comments

Comments

@smuelpeng
Copy link

Hello,

The documentation mentions that the --prequantized-flow option can be used to load a prequantized model, which reduces the checkpoint size by about 50% and shortens the startup time (default: False).

However, I couldn’t find any interface in the repository to enable this functionality.
Could you please provide guidance on how to store and load a prequantized model to save resources and initialization time?

Looking forward to your response, thank you!

@aredden
Copy link
Owner

aredden commented Sep 10, 2024

Ah! Essentially it's just the checkpoint which gets created after loading the model and doing at least 12 steps of inference. You could do something like this in the root of the repo-

from flux_pipeline import FluxPipeline, ModelVersion
from safetensors.torch import save_file
prompt = "some prompt"
pipe = FluxPipeline.load_pipeline_from_config_path("./configs/your-config.json")
if pipe.config.version == ModelVersion.flux_schnell:
    for x in range(3):
        pipe.generate(prompt=prompt, num_steps=4)
else:
    pipe.generate(prompt=prompt, num_steps=12)

quantized_state_dict = pipe.model.state_dict()

save_file(quantized_state_dict, "some-model-prequantized.safetensors")

@smuelpeng
Copy link
Author

Thank you for your helpful response. The solution works well for loading pre-quantized SFTs.

However, do you have any suggestions for saving and loading a Torch-compiled Flux model? Currently, the initialization time for compiling the Flux model is quite cumbersome, and I’m looking for ways to streamline this process.

@aredden
Copy link
Owner

aredden commented Sep 13, 2024

Ah- You can speed that up by using nightly torch- for me compilation only takes a few (maybe 3-4) seconds at most.

@Muawizodux
Copy link

I appreciate your amazing work!

for me torch-nightly takes 9-18 sec per inference on first 3 warm-up inferences
and torch takes 1-1.5 minites per inference on first 3 inferences

am i missing something?

@aredden
Copy link
Owner

aredden commented Sep 25, 2024

That seems correct, it's possible that it's just related to the cpu- I have a 7950x so everything runs very fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants