Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Flux Model load hanging forever out of nowhere #3484

Open
2 tasks done
Olivier-aka-Raiden opened this issue Oct 12, 2024 · 4 comments
Open
2 tasks done

[Issue]: Flux Model load hanging forever out of nowhere #3484

Olivier-aka-Raiden opened this issue Oct 12, 2024 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@Olivier-aka-Raiden
Copy link

Issue Description

Hi, it's been a month now that I'm stuck with my setup trying to make FLUX.dev work again. For the record, I tried FLUX on my PC early september with the model "Disty0/FLUX.1-dev-qint4_tf-qint8_te" and it was working on my PC which was a big surprise but a good one.
After being away for a few days, I came back and had many updates (windows, Nvidia and SDNext) to do but after doing all updates nothing was working.
There was multiple errors when reinstalling sdNext so I decided to go with a fresh install and upgrading python to 3.11 (which I read was recommended).
I saw that it was installing Torch with CUDA 12.4 and I realised I didn't have this one installed so I did.
And now comes my issue : after starting SDNext, downloading the Flux model I was using before, puting back the settings as they were. The model "loading" is hanging forever, using a lot of CPU and Memory but nothing really happens in the UI nor produce any logs to debug on.
I thought it could be the system memory offload from my GPU so I made sure it is not activated and it didn't change anything.
I tried going back to previous dev version I was using at the time it was working but it didn't change anything either.
So I thought it was maybe Nvidia firmware and installed the previous version : didn't work as well.
Then I started tweak SDNext settings : model, balanced, sequential offload modes.
For sequential I got an error instead of hanging sometimes :
11:20:49-742597 INFO Autodetect model: detect="FLUX" class=FluxPipeline
file="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b5
9d5b6fca9233accfaed08e0" size=0MB
Downloading shards: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2002.05it/s]
Diffusers 3.61s/it █████████████ 100% 2/2 00:07 00:00 Loading checkpoint shards
Diffusers 15.58it/s ████████ 100% 7/7 00:00 00:00 Loading pipeline components...
11:21:15-487263 INFO Load network: type=embeddings loaded=0 skipped=0 time=0.00
11:21:15-527261 ERROR Setting model: offload=sequential WeightQBytesTensor.new() missing 6 required positional
image
The only thing that is bothering me is that while it's hanging, CPU and RAM are at max usage but GPU is not used at all... And this is happening before inference even starts.

I didn't see anyone having the same issues so I guess this is a very tricky one but I hope someone will have fresh ideas on things I could try to make it work again.

Version Platform Description

Setup :

  • SDnext branch: dev
  • Python Version: 3.11.9
  • Operating System: Windows 10, version 10.0.22631
  • CPU: 12th Gen Intel(R) Core(TM) i7-12700KF
  • Architecture: AMD64
  • GPU: NVIDIA GeForce RTX 3070 Ti
  • RAM: 32GB
  • CUDA Version: 12.4
  • CUDNN Version: 90100
  • GPU Driver: 565.90
  • Memory Optimization: medvram
  • Installed Torch Version: 2.4.1+cu124
  • Installed Diffusers Version: 0.31.0.dev0
  • Installed Gradio Version: 3.43.2
  • Installed Transformers Version: 4.45.2
  • Installed Accelerate Version: 1.0.0
  • Backend: Diffusers
  • Torch Parameters:
    • Backend: CUDA
    • Device: CUDA
    • Data type: torch.bfloat16
    • Attention Optimization: Scaled-Dot-Product
  • Model Loaded: Diffusers - FLUX.1-dev-qint4_tf-qint8_te

Relevant log output

2024-10-12 10:42:32,564 | sd | INFO | launch | Starting SD.Next
2024-10-12 10:42:32,567 | sd | INFO | installer | Logger: file="C:\Users\kille\Documents\Workspace\automatic\sdnext.log" level=INFO size=96903 mode=append
2024-10-12 10:42:32,568 | sd | INFO | installer | Python: version=3.11.9 platform=Windows bin="C:\Users\kille\Documents\Workspace\automatic\venv\Scripts\python.exe" venv="C:\Users\kille\Documents\Workspace\automatic\venv"
2024-10-12 10:42:32,719 | sd | INFO | installer | Version: app=sd.next updated=2024-10-11 hash=f5253dad branch=dev url=https://github.com/vladmandic/automatic.git/tree/dev ui=dev
2024-10-12 10:42:33,269 | sd | INFO | installer | Repository latest available e7ec07f9783701629ca1411ad82aec87232501b9 2024-09-13T16:51:56Z
2024-10-12 10:42:33,284 | sd | INFO | launch | Platform: arch=AMD64 cpu=Intel64 Family 6 Model 151 Stepping 2, GenuineIntel system=Windows release=Windows-10-10.0.22631-SP0 python=3.11.9
2024-10-12 10:42:33,285 | sd | DEBUG | installer | Setting environment tuning
2024-10-12 10:42:33,286 | sd | DEBUG | installer | Torch allocator: "garbage_collection_threshold:0.65,max_split_size_mb:512"
2024-10-12 10:42:33,294 | sd | DEBUG | installer | Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False zluda=False
2024-10-12 10:42:33,302 | sd | INFO | installer | CUDA: nVidia toolkit detected
2024-10-12 10:42:33,431 | sd | INFO | installer | Verifying requirements
2024-10-12 10:42:33,438 | sd | INFO | installer | Verifying packages
2024-10-12 10:42:33,473 | sd | DEBUG | installer | Timestamp repository update time: Fri Oct 11 15:53:46 2024
2024-10-12 10:42:33,473 | sd | DEBUG | installer | Timestamp previous setup time: Fri Oct 11 23:29:15 2024
2024-10-12 10:42:33,473 | sd | INFO | installer | Extensions: disabled=[]
2024-10-12 10:42:33,474 | sd | INFO | installer | Extensions: enabled=['Lora', 'sd-extension-chainner', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg'] extensions-builtin
2024-10-12 10:42:33,479 | sd | DEBUG | installer | Timestamp latest extensions time: Fri Oct 11 22:48:19 2024
2024-10-12 10:42:33,479 | sd | DEBUG | installer | Timestamp: version:1728654826 setup:1728682155 extension:1728679699
2024-10-12 10:42:33,479 | sd | INFO | launch | Startup: quick launch
2024-10-12 10:42:33,480 | sd | DEBUG | paths | Register paths
2024-10-12 10:42:33,481 | sd | INFO | installer | Extensions: disabled=[]
2024-10-12 10:42:33,481 | sd | INFO | installer | Extensions: enabled=['Lora', 'sd-extension-chainner', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg'] extensions-builtin
2024-10-12 10:42:33,483 | sd | INFO | installer | Running in safe mode without user extensions
2024-10-12 10:42:33,487 | sd | DEBUG | installer | Extension preload: {'extensions-builtin': 0.0}
2024-10-12 10:42:33,487 | sd | DEBUG | launch | Starting module: <module 'webui' from 'C:\\Users\\kille\\Documents\\Workspace\\automatic\\webui.py'>
2024-10-12 10:42:33,487 | sd | INFO | launch | Command line args: ['--safe'] safe=True
2024-10-12 10:42:33,488 | sd | DEBUG | launch | Env flags: []
2024-10-12 10:42:43,403 | sd | INFO | loader | System packages: {'torch': '2.4.1+cu124', 'diffusers': '0.31.0.dev0', 'gradio': '3.43.2', 'transformers': '4.45.2', 'accelerate': '1.0.0'}
2024-10-12 10:42:44,254 | sd | DEBUG | shared | Huggingface cache: folder="C:\Users\kille\.cache\huggingface\hub"
2024-10-12 10:42:44,367 | sd | INFO | shared | Device detect: memory=8.0 ptimization=medvram
2024-10-12 10:42:44,369 | sd | DEBUG | shared | Read: file="config.json" json=42 bytes=1948 time=0.000
2024-10-12 10:42:44,369 | sd | INFO | shared | Engine: backend=Backend.DIFFUSERS compute=None device=cuda attention="Scaled-Dot-Product" mode=no_grad
2024-10-12 10:42:44,377 | sd | DEBUG | shared | Read: file="html\reference.json" json=52 bytes=29118 time=0.007
2024-10-12 10:42:44,411 | sd | INFO | devices | Torch parameters: backend=cuda device=cuda config=BF16 dtype=torch.bfloat16 vae=torch.bfloat16 unet=torch.bfloat16 context=no_grad nohalf=False nohalfvae=False upscast=False deterministic=False test-fp16=True test-bf16=True optimization="Scaled-Dot-Product"
2024-10-12 10:42:44,944 | sd | DEBUG | __init__ | ONNX: version=1.19.2 provider=CPUExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
2024-10-12 10:42:45,044 | sd | INFO | shared | Device: device=NVIDIA GeForce RTX 3070 Ti n=1 arch=sm_90 capability=(8, 6) cuda=12.4 cudnn=90100 driver=565.90
2024-10-12 10:42:45,121 | sd | DEBUG | sd_hijack | Importing LDM
2024-10-12 10:42:45,134 | sd | DEBUG | webui | Entering start sequence
2024-10-12 10:42:45,136 | sd | DEBUG | webui | Initializing
2024-10-12 10:42:45,167 | sd | INFO | sd_vae | Available VAEs: path="models\VAE" items=0
2024-10-12 10:42:45,169 | sd | INFO | sd_unet | Available UNets: path="models\UNET" items=0
2024-10-12 10:42:45,170 | sd | INFO | model_te | Available TEs: path="models\Text-encoder" items=0
2024-10-12 10:42:45,171 | sd | INFO | extensions | Disabled extensions: ['sd-extension-chainner', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg']
2024-10-12 10:42:45,173 | sd | DEBUG | modelloader | Scanning diffusers cache: folder="models\Diffusers" items=1 time=0.00
2024-10-12 10:42:45,173 | sd | INFO | sd_models | Available Models: path="models\Stable-diffusion" items=1 time=0.00
2024-10-12 10:42:45,243 | sd | INFO | yolo | Available Yolo: path="models\yolo items=5 downloaded=0
2024-10-12 10:42:45,244 | sd | DEBUG | webui | Load extensions
2024-10-12 10:42:45,301 | sd | INFO | networks | Available LoRAs: items=0 folders=2
2024-10-12 10:42:45,304 | sd | INFO | script_loading | Extension: script='extensions-builtin\Lora\scripts\lora_script.py' �[2;36m10:42:45-301751�[0m�[2;36m �[0m�[34mINFO    �[0m Available LoRAs: �[33mitems�[0m=�[1;36m0�[0m �[33mfolders�[0m=�[1;36m2�[0m
2024-10-12 10:42:45,309 | sd | DEBUG | webui | Extensions init time: 0.06 
2024-10-12 10:42:45,330 | sd | DEBUG | shared | Read: file="html/upscalers.json" json=4 bytes=2672 time=0.006
2024-10-12 10:42:45,331 | sd | INFO | modelloader | Available Upscalers: items=29 downloaded=0 user=0 time=0.02 types=['None', 'Lanczos', 'Nearest', 'AuraSR', 'ESRGAN', 'LDSR', 'RealESRGAN', 'SCUNet', 'SD', 'SwinIR']
2024-10-12 10:42:45,768 | sd | INFO | styles | Available Styles: folder="models\styles" items=288 time=0.44
2024-10-12 10:42:45,773 | sd | DEBUG | webui | Creating UI
2024-10-12 10:42:45,773 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-12 10:42:45,773 | sd | INFO | theme | UI theme: type=Standard name="black-teal"
2024-10-12 10:42:45,777 | sd | DEBUG | ui_javascript | UI theme: css="C:\Users\kille\Documents\Workspace\automatic\javascript\black-teal.css" base="sdnext.css" user="None"
2024-10-12 10:42:45,779 | sd | DEBUG | ui_txt2img | UI initialize: txt2img
2024-10-12 10:42:45,800 | sd | DEBUG | ui_extra_networks | Networks: page='model' items=52 subfolders=2 tab=txt2img folders=['models\\Stable-diffusion', 'models\\Diffusers', 'models\\Reference'] list=0.01 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,800 | sd | DEBUG | ui_extra_networks | Networks: page='lora' items=0 subfolders=0 tab=txt2img folders=['models\\Lora', 'models\\LyCORIS'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,805 | sd | DEBUG | ui_extra_networks | Networks: page='style' items=288 subfolders=1 tab=txt2img folders=['models\\styles', 'html'] list=0.01 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,808 | sd | DEBUG | ui_extra_networks | Networks: page='embedding' items=0 subfolders=0 tab=txt2img folders=['models\\embeddings'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,808 | sd | DEBUG | ui_extra_networks | Networks: page='vae' items=0 subfolders=0 tab=txt2img folders=['models\\VAE'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,809 | sd | DEBUG | ui_extra_networks | Networks: page='history' items=0 subfolders=0 tab=txt2img folders=[] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=8 sort=Default
2024-10-12 10:42:45,922 | sd | DEBUG | ui_img2img | UI initialize: img2img
2024-10-12 10:42:46,102 | sd | DEBUG | ui_control_helpers | UI initialize: control models=models\control
2024-10-12 10:42:46,578 | sd | DEBUG | shared | Read: file="ui-config.json" json=6 bytes=248 time=0.003
2024-10-12 10:42:46,664 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-12 10:42:46,763 | sd | DEBUG | shared | Reading failed: C:\Users\kille\Documents\Workspace\automatic\html\extensions.json [Errno 2] No such file or directory: 'C:\\Users\\kille\\Documents\\Workspace\\automatic\\html\\extensions.json'
2024-10-12 10:42:46,763 | sd | INFO | ui_extensions | Extension list is empty: refresh required
2024-10-12 10:42:47,173 | sd | DEBUG | ui_extensions | Extension list: processed=6 installed=6 enabled=2 disabled=4 visible=6 hidden=0
2024-10-12 10:42:47,251 | sd | DEBUG | webui | Root paths: ['C:\\Users\\kille\\Documents\\Workspace\\automatic']
2024-10-12 10:42:47,307 | sd | INFO | webui | Local URL: http://127.0.0.1:7860/
2024-10-12 10:42:47,309 | sd | DEBUG | webui | Gradio functions: registered=1830
2024-10-12 10:42:47,310 | sd | DEBUG | middleware | FastAPI middleware: ['Middleware', 'Middleware']
2024-10-12 10:42:47,312 | sd | DEBUG | webui | Creating API
2024-10-12 10:42:47,564 | sd | DEBUG | webui | Scripts setup: ['IP Adapters:0.017', 'XYZ Grid:0.018', 'Face:0.01', 'AnimateDiff:0.005', 'CogVideoX:0.005']
2024-10-12 10:42:47,564 | sd | DEBUG | sd_models | Model metadata: file="metadata.json" no changes
2024-10-12 10:42:47,565 | sd | DEBUG | modeldata | Model requested: fn=C:\Users\kille\Documents\Workspace\automatic\webui.py:<lambda>/C:\Program Files\Python311\Lib\threading.py:run
2024-10-12 10:42:47,565 | sd | INFO | sd_models | Load model: select="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te [e40bd0d879]"
2024-10-12 10:42:47,567 | sd | DEBUG | sd_models | Load model: target="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0" existing=False info=None
2024-10-12 10:42:47,567 | sd | DEBUG | sd_models | Load model: path="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0"
2024-10-12 10:42:47,567 | sd | INFO | sd_models | Autodetect model: detect="FLUX" class=FluxPipeline file="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0" size=0MB
2024-10-12 10:42:47,572 | sd | DEBUG | model_flux | Load model: type=FLUX model="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te" repo="Disty0/FLUX.1-dev-qint4_tf-qint8_te" unet="None" t5="T5 QINT8" vae="Automatic" quant=qint8 offload=model dtype=torch.bfloat16
2024-10-12 10:42:48,049 | sd | INFO | modelloader | HF login: token="C:\Users\kille\.cache\huggingface\token" Token is valid (permission: fineGrained).
2024-10-12 10:43:04,389 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 20, 'threshold': 65} gc={'collected': 138, 'saved': 0.0} before={'gpu': 1.1, 'ram': 6.5} after={'gpu': 1.1, 'ram': 6.5, 'retries': 0, 'oom': 0} device=cuda fn=optimum_quanto_model time=0.17
2024-10-12 10:43:04,621 | sd | INFO | server | MOTD: N/A
2024-10-12 10:43:06,825 | sd | DEBUG | theme | UI themes available: type=Standard themes=12
2024-10-12 10:43:06,983 | sd | INFO | api | Browser session: user=None client=127.0.0.1 agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36
2024-10-12 10:43:14,930 | sd | DEBUG | model_flux | Load model: type=FLUX preloaded=['transformer', 'text_encoder_2']
2024-10-12 10:43:15,736 | sd | DEBUG | sd_models | Load module: type=t5 path="T5 QINT8" module="text_encoder_2"
2024-10-12 10:43:15,737 | sd | INFO | textual_inversion | Load network: type=embeddings loaded=0 skipped=0 time=0.00
2024-10-12 10:43:15,738 | sd | DEBUG | sd_models | Setting model: component=VAE upcast=False
2024-10-12 10:43:15,738 | sd | DEBUG | sd_models | Setting model: component=VAE slicing=True
2024-10-12 10:43:15,738 | sd | DEBUG | sd_models | Setting model: component=VAE tiling=True
2024-10-12 10:43:15,738 | sd | DEBUG | sd_models | Setting model: attention="Scaled-Dot-Product"
2024-10-12 10:43:15,749 | sd | DEBUG | sd_models | Setting model: offload=model
2024-10-12 10:43:16,051 | sd | DEBUG | devices | GC: utilization={'gpu': 14, 'ram': 62, 'threshold': 65} gc={'collected': 611, 'saved': 0.0} before={'gpu': 1.1, 'ram': 19.74} after={'gpu': 1.1, 'ram': 19.74, 'retries': 0, 'oom': 0} device=cuda fn=load_diffuser time=0.16
2024-10-12 10:43:16,054 | sd | INFO | sd_models | Load model: time=28.32 load=28.17 move=0.14 native=1024 memory={'ram': {'used': 19.74, 'total': 31.85}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
2024-10-12 10:43:16,058 | sd | DEBUG | script_callbacks | Script callback init time: system-info.py:app_started=0.18
2024-10-12 10:43:16,058 | sd | INFO | webui | Startup time: 42.57 torch=7.63 gradio=2.01 diffusers=0.11 libraries=1.88 extensions=0.06 detailer=0.07 networks=0.44 ui-networks=0.21 ui-txt2img=0.10 ui-img2img=0.06 ui-control=0.09 ui-models=0.29 ui-settings=0.19 ui-extensions=0.43 launch=0.09 api=0.07 app-started=0.18 checkpoint=28.49
2024-10-12 10:43:16,060 | sd | DEBUG | shared | Save: file="config.json" json=42 bytes=1878 time=0.003
2024-10-12 11:04:25,598 | sd | INFO | sd_models | Load model: select="Diffusers\Disty0/FLUX.1-dev-qint4_tf-qint8_te [e40bd0d879]"
2024-10-12 11:04:25,599 | sd | DEBUG | sd_models | Load model: target="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b59d5b6fca9233accfaed08e0" existing=False info=None

Backend

Diffusers

UI

Standard

Branch

Dev

Model

Other

Acknowledgements

  • I have read the above and searched for existing issues
  • I confirm that this is classified correctly and its not an extension issue
@vladmandic
Copy link
Owner

first in windows disable nvidia usage of shared memory (google for instructions)!
when vram spills into ram, entire thing is so slow that it looks like it hangs.

then, lets look at memory utilization: go to windows task manger:

  1. -> settings -> realtime update speed -> low
  2. -> performance -> gpu
    start sdnext
    attempt to load flux as usual
    do a screenshot of taskmanager window after 1min so i can see the gpu vram utilization growth over time.

@vladmandic vladmandic added the question Further information is requested label Oct 12, 2024
@Olivier-aka-Raiden
Copy link
Author

The flux model not loading :
image
Another model that is loading normally :
image

@vladmandic
Copy link
Owner

sorry to be a pain, but you cropped the screenshot so numbers below the graphs are not visible - need to see dedicated/shared splits.

@Olivier-aka-Raiden
Copy link
Author

image
Sorry for the late reply. It's not used at all, anyway.

@vladmandic vladmandic added help wanted Extra attention is needed and removed question Further information is requested labels Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants