issue in Generating headshot photo of myself with SDXL lora training #2971

mkafil · 2024-11-18T06:36:12Z

mkafil
Nov 18, 2024

issue in Generating headshot photo of myself with SDXL lora training

I am using kohya_ss SDXL model to generate Headshot Photo of myself. But i don't exactly get the similar face

Here are my Inputs and Params :

Input Photos:

Params :

%cd /kaggle/working/kohya_ss/
!accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py"
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0"
--train_data_dir="/kaggle/working/training-data/img"
--output_dir="/content/drive/MyDrive/model-sdxl"
--logging_dir="/kaggle/working/training-data/log"
--reg_data_dir="/kaggle/working/training-data/reg"
--resolution="768,768"
--network_alpha="16"
--network_dim=32
--save_model_as=safetensors
--network_module=networks.lora
--text_encoder_lr=0.0004
--unet_lr=0.0004
--output_name="dreambooth-lora"
--lr_scheduler_num_cycles="30"
--no_half_vae
--learning_rate=1e-5
--lr_scheduler="cosine"
--train_batch_size="1"
--max_train_steps=5000
--save_every_n_epochs="1"
--mixed_precision="fp16"
--save_precision="fp16"
--optimizer_type="Adafactor"
--optimizer_args scale_parameter=False relative_step=False warmup_init=False
--max_data_loader_n_workers="0"
--gradient_checkpointing
--xformers
--lowram
--face_crop_aug_range="2.0,4.0"
--clip_skip 1 --noise_offset=0.1
--mem_eff_attn --cache_latents --cache_latents_to_disk

Output generated :

I have tried increasing Number steps and learning rate . but in the result the generated images are over cooked (over-fitting) it give the input image as the generated image

I am expecting exact results like Mr.FurkanGozukara. He have managed to generate better results

Note: I am using caption for each image and 2500 reg images

Thank you !!

5KilosOfCheese · 2024-11-18T12:09:19Z

5KilosOfCheese
Nov 18, 2024

Make a more diverse dataset. You have basically just 4 variants of the same exact picture.

The Furkan's example ain't that good either. You see the limited angles and expression - that is because they generally have a fairly limited datasets. Can't commend beyond that as they lock everything behind paywalls.

Anyhow.
Diversify your dataset. And stop stressing about regulation pictures -unless you want to generate things/people other than the model along side the thing you train, you don't need reg images to succeed.

Take picture of your face from like 4-5 angles at 4-5 heights, and then add few 3-5 distances. Give slightly different expression in all of them.

These training methods try to find the similarity between the dataset's contents. It absolutely does not know what it should be learning it takes a statistical model from the dataset. You can do a LoRA with like 10 good pictures, 5 if you got a very clear category of a thing and regulation to support it as This is just a type of that . However you can compensate for lower and less clear pictures by having more - to certain extend, bad ones just need to be removed.

This is a thing youtube videos don't teach you, especially clickbaity ones.

2 replies

mkafil Nov 18, 2024
Author

Thank you, @5KilosOfCheese, for your response. I believe input images play a major role in training a model. Do you have any other suggestions related to the parameters?

Additionally, if i am not using regularization images, the generated outputs often have damaged facial features and distorted eyes. Could you provide guidance on how to address this issue? If possible, could you also share details of your best training experiments?

Which is the Best method ?
LoRa or Dreambooth or fine tune

Which is better model to generate image of myself in stable diffusion ?

SD1.5 or SD2.5 or SDXL or FLUX (in generating similar image of my face)

Thank you

5KilosOfCheese Nov 18, 2024

There is no set "perfect parameters". Every LoRA, concept, and dataset requires their own adjustments. I don't have any special suggestions on that regard, you just have to try and make notes.

Damaged output comes from damaged input. That is about as much as I know.

Ok so difference between Dreambooth and LoRA.
Dreambooth allows you to fundamentally add or remove things from the base model, you adjust the contents of the model itself. It can also correct issues like bad tangential and related concepts.
LoRA gets injected during generation process. It can not fix tangential or related concepts if they aren't included in the training.

Fundamentally they do the same exact thing, however one alters the model during training, the other during processing. Dreambooth allows you fundamentally alter the model. LoRA allows you to alter the generation process.

About base models. They all can be used to make high quality outputs. I am most familiar with SDXL, however I use Flux, SD3.5, and SDXL; and I can make them all do things in equal manner - just different workflows. SD3.5 training is something I have yet to try, because this Kohya_ss doesn't have real support for it (yet) and I find OneTrainer annoying to work with. Everyone will say that their favourite basemodel/architecture is the best option. I say that they all are equally bad and good, depending on workflow, finetunes and whatnot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue in Generating headshot photo of myself with SDXL lora training #2971

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

issue in Generating headshot photo of myself with SDXL lora training #2971

mkafil Nov 18, 2024