DirectML

SD.Next includes support for PyTorch-DirectML.

How to

Add --use-directml on commandline arguments.

For details, go to Installation.

Performance

The performance is quite bad compared to ROCm.

If you are familiar with Linux system, we recommend ROCm.

FAQ

DirectML does not collect garbage memory.

PyTorch-DirectML does not access graphics memory by indexing. Because PyTorch-DirectML's tensor implementation extends OpaqueTensorImpl, we cannot access the actual storage of a tensor.

An error occurs with no error message.

If you met RuntimeError with no error message (or empty), please report us via GitHub issue or Discord.

It does not work properly with FP16.

If it works with FP32, please report us via GitHub issue or Discord.

The terminal is suddenly frozen during generation.

Please report us via GitHub issue or Discord.

Olive (experimental support)

Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation. (from pypi)

Currently, SDXL is not supported.

This feature is EXPERIMENTAL. If you run this, your existing installation may be broken. Run it in a new installation or in a new virtual environment.

How to

You should switch branch to olive.

You don't need to modify your commandline arguments.

Go to System tab → Diffusers Settings and set Diffusers pipeline to ONNX Stable Diffusion (Olive).

Guide on YouTube:

From checkpoint

Model optimization occurs automatically before generation.

Target models can be .safetensors, .ckpt, Diffusers and the optimization takes 5-10 minutes depending on your system.

The optimized models are automatically cached and used later to create images of the same size (height and width).

From Huggingface

If your system memory is not enough to optimize model or you don't want to waste your time to optimize the model yourself, you can download optimized model from Huggingface.

Go to Models → Huggingface tab and download optimized model.

There's an optimized version of runwayml/stable-diffusion-v1-5.

Guide on YouTube:

Performance

Property	Value
Prompt	a castle, best quality
Negative Prompt	worst quality
Sampler	Euler
Sampling Steps	20
Device	RX 7900 XTX 24GB
Version	olive-ai(0.3.3) onnxruntime-directml(1.16.1) ROCm(5.6) torch(olive: 1.13.1, rocm: 2.1.0)
Model	runwayml/stable-diffusion-v1-5 (ROCm), lshqqytiger/stable-diffusion-v1-5-olive (Olive)
Precision	fp16
Token Merging	Olive(0, not supported) ROCm(0.5)

Olive	ROCm

Pros and Cons

Pros

The generation is faster than PyTorch-DirectML.
Uses less graphics memory than PyTorch-DirectML.
Uses graphics memory more efficiently than PyTorch-DirectML.

Cons

Optimization is required for every models and image sizes.
Some features are unavailable.

FAQ

An error occurs at the begin of the generation process.

Run this command and try again:

(venv) $ pip uninstall onnxruntime onnxruntime-directml -y

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DirectML

DirectML

How to

Performance

FAQ

DirectML does not collect garbage memory.

An error occurs with no error message.

It does not work properly with FP16.

The terminal is suddenly frozen during generation.

Olive (experimental support)

How to

From checkpoint

From Huggingface

Performance

Pros and Cons

Pros

Cons

FAQ

An error occurs at the begin of the generation process.

Clone this wiki locally