4090 x8 容器环境推理 DialogGen 量化报错 #218

kanebay · 2024-12-03T10:39:17Z

尝试通过--load-4bit加载量化模型DialogGen

root@nv-4090:/workspace# python sample_t2i.py --prompt "生命的意义" --image-size 1024 1024 --infer-mode fa --load-4bit
2024-12-03 18:20:17.253 | INFO | hydit.inference:init:160 - Got text-to-image model root path: ckpts/t2i
2024-12-03 18:20:17.253 | INFO | hydit.inference:init:169 - Loading CLIP Text Encoder...
2024-12-03 18:20:18.675 | INFO | hydit.inference:init:172 - Loading CLIP Text Encoder finished
2024-12-03 18:20:18.676 | INFO | hydit.inference:init:175 - Loading CLIP Tokenizer...
2024-12-03 18:20:18.718 | INFO | hydit.inference:init:178 - Loading CLIP Tokenizer finished
2024-12-03 18:20:18.718 | INFO | hydit.inference:init:181 - Loading T5 Text Encoder and T5 Tokenizer...
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
/usr/local/lib/python3.10/dist-packages/transformers/convert_slow_tokenizer.py:550: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
warnings.warn(
You are using a model of type mt5 to instantiate a model of type t5. This is not supported for all configurations of models and can yield errors.
2024-12-03 18:20:28.823 | INFO | hydit.inference:init:185 - Loading t5_text_encoder and t5_tokenizer finished
2024-12-03 18:20:28.823 | INFO | hydit.inference:init:188 - Loading VAE...
2024-12-03 18:20:28.941 | INFO | hydit.inference:init:191 - Loading VAE finished
2024-12-03 18:20:28.941 | INFO | hydit.inference:init:195 - Building HunYuan-DiT model...
2024-12-03 18:20:28.965 | INFO | hydit.modules.models:init:206 - Enable Flash Attention.
2024-12-03 18:20:29.406 | INFO | hydit.modules.models:init:239 - Number of tokens: 4096
2024-12-03 18:20:47.221 | INFO | hydit.inference:init:216 - Loading torch model ckpts/t2i/model/pytorch_model_ema.pt...
2024-12-03 18:20:49.960 | INFO | hydit.inference:init:229 - Loading torch model finished
2024-12-03 18:20:49.960 | INFO | hydit.inference:init:254 - Loading inference pipeline...
2024-12-03 18:20:49.982 | INFO | hydit.inference:init:256 - Loading pipeline finished
2024-12-03 18:20:49.982 | INFO | hydit.inference:init:260 - ==================================================
2024-12-03 18:20:49.982 | INFO | hydit.inference:init:261 - Model is ready.
2024-12-03 18:20:49.982 | INFO | hydit.inference:init:262 - ==================================================
2024-12-03 18:20:50.097 | INFO | main:inferencer:21 - Loading DialogGen model (for prompt enhancement)...
Traceback (most recent call last):
File "/workspace/sample_t2i.py", line 31, in
args, gen, enhancer = inferencer()
File "/workspace/sample_t2i.py", line 22, in inferencer
enhancer = DialogGen(str(models_root_path / "dialoggen"), args.load_4bit)
File "/workspace/dialoggen/dialoggen_demo.py", line 153, in init
self.models = init_dialoggen_model(model_path, load_4bit=load_4bit)
File "/workspace/dialoggen/dialoggen_demo.py", line 55, in init_dialoggen_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
File "/workspace/dialoggen/llava/model/builder.py", line 35, in load_pretrained_model
kwargs['quantization_config'] = BitsAndBytesConfig(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/quantization_config.py", line 281, in init
self.post_init()
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/quantization_config.py", line 327, in post_init
if self.load_in_4bit and not version.parse(importlib.metadata.version("bitsandbytes")) >= version.parse(
File "/usr/lib/python3.10/importlib/metadata/init.py", line 996, in version
return distribution(distribution_name).version
File "/usr/lib/python3.10/importlib/metadata/init.py", line 969, in distribution
return Distribution.from_name(distribution_name)
File "/usr/lib/python3.10/importlib/metadata/init.py", line 548, in from_name
raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes

安装缺少的依赖包：

pip install --upgrade "bitsandbytes>=0.43.2"

运行又报错：

root@nv-4090:/workspace# python sample_t2i.py --prompt "生命的意义" --image-size 1024 1024 --infer-mode fa --load-4bit
2024-12-03 18:34:33.930 | INFO | hydit.inference:init:160 - Got text-to-image model root path: ckpts/t2i
2024-12-03 18:34:33.931 | INFO | hydit.inference:init:169 - Loading CLIP Text Encoder...
2024-12-03 18:34:34.734 | INFO | hydit.inference:init:172 - Loading CLIP Text Encoder finished
2024-12-03 18:34:34.735 | INFO | hydit.inference:init:175 - Loading CLIP Tokenizer...
2024-12-03 18:34:34.776 | INFO | hydit.inference:init:178 - Loading CLIP Tokenizer finished
2024-12-03 18:34:34.776 | INFO | hydit.inference:init:181 - Loading T5 Text Encoder and T5 Tokenizer...
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
/usr/local/lib/python3.10/dist-packages/transformers/convert_slow_tokenizer.py:550: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
warnings.warn(
You are using a model of type mt5 to instantiate a model of type t5. This is not supported for all configurations of models and can yield errors.
2024-12-03 18:34:43.393 | INFO | hydit.inference:init:185 - Loading t5_text_encoder and t5_tokenizer finished
2024-12-03 18:34:43.393 | INFO | hydit.inference:init:188 - Loading VAE...
2024-12-03 18:34:43.510 | INFO | hydit.inference:init:191 - Loading VAE finished
2024-12-03 18:34:43.510 | INFO | hydit.inference:init:195 - Building HunYuan-DiT model...
2024-12-03 18:34:43.534 | INFO | hydit.modules.models:init:206 - Enable Flash Attention.
2024-12-03 18:34:43.964 | INFO | hydit.modules.models:init:239 - Number of tokens: 4096
2024-12-03 18:35:00.871 | INFO | hydit.inference:init:216 - Loading torch model ckpts/t2i/model/pytorch_model_ema.pt...
2024-12-03 18:35:03.499 | INFO | hydit.inference:init:229 - Loading torch model finished
2024-12-03 18:35:03.499 | INFO | hydit.inference:init:254 - Loading inference pipeline...
2024-12-03 18:35:03.519 | INFO | hydit.inference:init:256 - Loading pipeline finished
2024-12-03 18:35:03.519 | INFO | hydit.inference:init:260 - ==================================================
2024-12-03 18:35:03.519 | INFO | hydit.inference:init:261 - Model is ready.
2024-12-03 18:35:03.519 | INFO | hydit.inference:init:262 - ==================================================
2024-12-03 18:35:03.639 | INFO | main:inferencer:21 - Loading DialogGen model (for prompt enhancement)...
Traceback (most recent call last):
File "/workspace/sample_t2i.py", line 31, in
args, gen, enhancer = inferencer()
File "/workspace/sample_t2i.py", line 22, in inferencer
enhancer = DialogGen(str(models_root_path / "dialoggen"), args.load_4bit)
File "/workspace/dialoggen/dialoggen_demo.py", line 153, in init
self.models = init_dialoggen_model(model_path, load_4bit=load_4bit)
File "/workspace/dialoggen/dialoggen_demo.py", line 55, in init_dialoggen_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
File "/workspace/dialoggen/llava/model/builder.py", line 141, in load_pretrained_model
model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2977, in from_pretrained
raise ValueError(
ValueError: You can't pass load_in_4bitor load_in_8bit as a kwarg when passing quantization_config argument at the same time.

The text was updated successfully, but these errors were encountered:

kanebay changed the title ~~DialogGen 量化报错~~ 容器环境推理 DialogGen 量化报错 Dec 3, 2024

kanebay changed the title ~~容器环境推理 DialogGen 量化报错~~ 4090x8 容器环境推理 DialogGen 量化报错 Dec 3, 2024

kanebay changed the title ~~4090x8 容器环境推理 DialogGen 量化报错~~ 4090 x8 容器环境推理 DialogGen 量化报错 Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4090 x8 容器环境推理 DialogGen 量化报错 #218

4090 x8 容器环境推理 DialogGen 量化报错 #218

kanebay commented Dec 3, 2024

4090 x8 容器环境推理 DialogGen 量化报错 #218

4090 x8 容器环境推理 DialogGen 量化报错 #218

Comments

kanebay commented Dec 3, 2024

尝试通过--load-4bit加载量化模型DialogGen

安装缺少的依赖包：

pip install --upgrade "bitsandbytes>=0.43.2"

运行又报错：