How to save a "prequantized_flow" safetensor? #16

smuelpeng · 2024-09-09T09:23:54Z

Hello,

The documentation mentions that the --prequantized-flow option can be used to load a prequantized model, which reduces the checkpoint size by about 50% and shortens the startup time (default: False).

However, I couldn’t find any interface in the repository to enable this functionality.
Could you please provide guidance on how to store and load a prequantized model to save resources and initialization time?

Looking forward to your response, thank you!

aredden · 2024-09-10T00:46:10Z

Ah! Essentially it's just the checkpoint which gets created after loading the model and doing at least 12 steps of inference. You could do something like this in the root of the repo-

from flux_pipeline import FluxPipeline, ModelVersion
from safetensors.torch import save_file
prompt = "some prompt"
pipe = FluxPipeline.load_pipeline_from_config_path("./configs/your-config.json")
if pipe.config.version == ModelVersion.flux_schnell:
    for x in range(3):
        pipe.generate(prompt=prompt, num_steps=4)
else:
    pipe.generate(prompt=prompt, num_steps=12)

quantized_state_dict = pipe.model.state_dict()

save_file(quantized_state_dict, "some-model-prequantized.safetensors")

smuelpeng · 2024-09-12T09:13:40Z

Thank you for your helpful response. The solution works well for loading pre-quantized SFTs.

However, do you have any suggestions for saving and loading a Torch-compiled Flux model? Currently, the initialization time for compiling the Flux model is quite cumbersome, and I’m looking for ways to streamline this process.

aredden · 2024-09-13T16:02:21Z

Ah- You can speed that up by using nightly torch- for me compilation only takes a few (maybe 3-4) seconds at most.

Muawizodux · 2024-09-24T17:57:38Z

I appreciate your amazing work!

for me torch-nightly takes 9-18 sec per inference on first 3 warm-up inferences
and torch takes 1-1.5 minites per inference on first 3 inferences

am i missing something?

aredden · 2024-09-25T21:03:57Z

That seems correct, it's possible that it's just related to the cpu- I have a 7950x so everything runs very fast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to save a "prequantized_flow" safetensor? #16

How to save a "prequantized_flow" safetensor? #16

smuelpeng commented Sep 9, 2024

aredden commented Sep 10, 2024

smuelpeng commented Sep 12, 2024

aredden commented Sep 13, 2024

Muawizodux commented Sep 24, 2024

aredden commented Sep 25, 2024

How to save a "prequantized_flow" safetensor? #16

How to save a "prequantized_flow" safetensor? #16

Comments

smuelpeng commented Sep 9, 2024

aredden commented Sep 10, 2024

smuelpeng commented Sep 12, 2024

aredden commented Sep 13, 2024

Muawizodux commented Sep 24, 2024

aredden commented Sep 25, 2024