How are regularization images used during LoRA training? #2056

belladoreai · 2024-03-10T22:27:47Z

belladoreai
Mar 10, 2024

How are regularization images used during LoRA training? Are they used during training the same way as normal images are, except the prompt is in some way changed?

5KilosOfCheese · 2024-04-05T14:02:53Z

5KilosOfCheese
Apr 5, 2024

You don't necessarily need to use captions with regularisation, or even in training overall. I personally don't use them unless I can't figure a way to train the subject.

But regularisation images are basically to help separate the subject and the class. Imagine like this:
You are trying to train a Vault Boy

And as a class you use "boy" or whatever, which is usually the easiest and shortest route to get the training to stick (not always the best).
However you don't want the LoRA to basically think every boy as "the vault boy", or tokens close to "boy" such as "young man", "man", "kid", "blond"... whatever it is that the training process ends up associating with "the vault boy". Same thing for blue jumpsuit and so forth.

So you give the training process samples of things which can be described as a "boy" or "young man" or "blue jumpsuit" as regularisation images. This is to say that "this is a blue jumpsuit, but it isn't the blue jumpsuit that the vault boy wears". Or "this is a what a boy that isn't the vaultboy looks like". This helps the AI separte the essence - that which makes the vault boy The vault boy - from things that makes a boy a boy.

So lets say you want to train the vaultboy, but you want to make sure that the AI doesn't associate the style the vaultboy is illustrated as the thing that it learns. Regularisation images can help with this - a lot. You can give it similar style images.

Basically the regularisation image is telling the training process that the class "a boy" can be things other than what we are training, this is to prevent it from taking over other concepts and focusing on the essence of the subject.

Personally I have just concluded that regularisation leads either to massive overfit or dampening the training so much that you get nothing. Which is why I use them only if the AI is unable to figure out the essence of the subject.

0 replies

belladoreai · 2024-04-05T15:05:31Z

belladoreai
Apr 5, 2024
Author

Thank you for the explanation. I understand the purpose of why we would want to use regularization images. However, I still don't know how kohya_ss is using them. My original question "Are they used during training the same way as normal images are, except the prompt is in some way changed?" is still open.

9 replies

belladoreai Apr 5, 2024
Author

Since I'm not sure what you mean by "generated prompt" because the training image token can be whatever you want, but if you use the "prepare dataset" then you see that it names reg folder as the class. Assuming you do not add caption files, then all the AI is just given prompt of "boy" with the training images.

Yes, this is exactly what I was asking for: ok, so the prompt is just the class name for regularization images. Great.

What we are trainig is that vaultboy is a type of boy, however not all boys are vaultboy ...

Yes, I understand the high level idea of what we are trying to do with regularization images. That high level idea is explained on a million different websites and you also explained it perfectly well the first time. It's just that this high level idea can be implemented by different low level approaches, and I was trying to find out which specific approach kohya_ss is using. Part of that answer is that the prompt for regularization images is just the class name. This information is important for me when I want to caption my regularization images, so that I caption them correctly.

Regularization images and training images aren't used quite the same way during training, but I was told kohya-ss/sd-scripts#589 (comment) it's very similar.

Alright, so there's apparently more to the story, and some additional differences between how regularization images are treated vs how training images are treated. Thanks for the link. This information helps me in that now I know I definitely need to keep my regularization images under a different folder and not mix them with the training images (which is something I would definitely do if the only difference was the prompt, because it's easiest to write the captions when you have both examples and counter-examples side by side in the same folder).

Serallan Apr 6, 2024

Alright, so there's apparently more to the story, and some additional differences between how regularization images are treated vs how training images are treated. Thanks for the link. This information helps me in that now I know I definitely need to keep my regularization images under a different folder and not mix them with the training images (which is something I would definitely do if the only difference was the prompt, because it's easiest to write the captions when you have both examples and counter-examples side by side in the same folder).

I went ahead and did a quick test while waiting for the reply in the other discussion using what I understood from it: To reproduce the regularization effect using training images alone the images * repeats result must be the same number for the original training set dir and the ex-reg folder.

I only did a few epochs because I don't have much compute time to spare, but hoo boy, when you equalize the numbers the results are pretty much identical*. Sometimes training differences are so subtle they only show up in a few sampling seeds — a lock of hair 5% darker, a tiny change to an eye —, but at 6 epochs none of the 16 seeds tried showed any difference at all. I mean, I've seen bigger changes from renaming a lora! Any tiny deviation increases the more you train, but if is there any in this case so minimal it might not matter, you may never reach the epochs when it changes accumulate enough to alter the look of a result. So, what it means for captions?

If you're using none then what's in your folder names will be used (reg will have only class), otherwise only repeats matter. Thus if you're captioning everything then it's the repeats that will get you.

The big issue with mixing reg and training in the same folder is that you can't control their repeats, so you won't get anything like when you use reg as reg unless: 1) you have the exact same number of reg and training images and/or they originally had the same images * repeats result, 2) they're all captioned.

Edit: I forgot to caveat something: I know of at least one training parameter which applies to the regularization folder, prior loss weight. There may be others that apply exclusively to the regularization or training datasets. You'll get different results if you were using these.

*When comparing results you need to pay attention to a couple of details: 1) The path of the output folder changes a lora training result. Dunno why, it just does, so to compare you need to train them to the same folder. 2) The name of the lora also alters the result. You don't need to train them with the same name, so far when comparing results just renaming them to the same thing made them consistent for me.

5KilosOfCheese Apr 6, 2024

This is one of the problems with regularisation that I face also. Either it does a complete mess, absolutely nothing, or it does so little that I could achieve same and more by just adjusting the training images or parametres. And this is with me not even using captions - I rarely do. Regularisation is always my last desperate bid if I can't make something wrong.

But I know that adding regularisation in to the trainig set does help. Like if I am using photographs as a dataset and I want to ensure flexibility to other styles. Just spice up with 1 or 2 other kind of images - even if not relevant. After you get this done, youn can then generate additional dataset with the subject and do a 2nd version. This has been a very reliable method.

When comparing results you need to pay attention to a couple of details: 1) The path of the output folder changes a lora training result
9/10 these kinds of things happen because of internal clock thing. Even if same seeds are used slight differences in timing will accumulate.
Just like when generating images. Same seed and settings, if you do it 100 times you will get 100 images with just little changes in the details like sharpness, gradients, shades, rogue hairs... You can actually take advatage of this by taking few best versions, overlapping them in photoshob and just masking between layers with a soft brush. Since all the major elements will remain the same it is easy to play with the images in photohop.

Serallan Apr 6, 2024

When comparing results you need to pay attention to a couple of details: 1) The path of the output folder changes a lora training result
9/10 these kinds of things happen because of internal clock thing. Even if same seeds are used slight differences in timing will accumulate.
Just like when generating images. Same seed and settings, if you do it 100 times you will get 100 images with just little changes in the details like sharpness, gradients, shades, rogue hairs... You can actually take advatage of this by taking few best versions, overlapping them in photoshob and just masking between layers with a soft brush. Since all the major elements will remain the same it is easy to play with the images in photohop.

You mean like the passage of time influences training results and generation? I have been training loras for over 6 months and have not experienced this at all. But these two extra factors I mentioned absolutely do influence training and generation.

This is one of the problems with regularisation that I face also. Either it does a complete mess, absolutely nothing, or it does so little that I could achieve same and more by just adjusting the training images or parametres. And this is with me not even using captions - I rarely do. Regularisation is always my last desperate bid if I can't make something wrong.

But I know that adding regularisation in to the trainig set does help. Like if I am using photographs as a dataset and I want to ensure flexibility to other styles. Just spice up with 1 or 2 other kind of images - even if not relevant. After you get this done, youn can then generate additional dataset with the subject and do a 2nd version. This has been a very reliable method.

A lot of advice on regularization and even training is bullshit. Really, it's a sea of bullshit out there and people keep repeating stuff they heard somewhere else without looking into it as if they're proselytizing their new religion.

For the majority of these 6 months I've been using regularization images. When training styles. Which people swear up and down you must not. ¯_(ツ)_/¯

That revelation about how they're really treated in training actually groks with what I've experienced. Loras actively learn from regularization so you must treat them as training images. What you must not do is include reg images you don't want the lora to learn, nothing badly generated, this sort of thing or it'll derail the training. If you have fewer reg images than your training set then they'll get repeated more. The trainer will see them more times and learn more strongly from them, exacerbate issues created by a poorly put-together reg dataset.

After yesterday I came to the conclusion it's easier to keep regs in a secondary folder inside my training dataset and control strength through repeats. I've already wrote a small python function to translate prior weight loss to repeats, and I'll have access to more training parameters (and masks!) this way.

belladoreai Apr 6, 2024
Author

This has been super interesting!

Like if I am using photographs as a dataset and I want to ensure flexibility to other styles. Just spice up with 1 or 2 other kind of images - even if not relevant.

@5KilosOfCheese I will definitely try this.

If you have fewer reg images than your training set then they'll get repeated more. The trainer will see them more times and learn more strongly from them, exacerbate issues created by a poorly put-together reg dataset.

@Serallan Here's my understanding so far, please correct me if I'm wrong on some point:

The default/recommended way of using regularization images is with a configuration where they will be utilized as much as training images.
If we manually tinker with configuration for regularization images (for example, by mixing them into training images), then we can easily make mistakes, for example, where a small number of regularization images is repeated too much during training, causing the LoRA to overlearn features from those images
But perhaps the optimal amount of regularization to training images is not 50-50? To avoid the various unintended side effects from using regularization images, perhaps they should be utilized much less in training compared to the training images?
For best results one should caption both training and regularization images, using appropriate trigger words in the training image captions, and not using those trigger words in regularization image captions.

5KilosOfCheese · 2024-04-06T13:42:11Z

5KilosOfCheese
Apr 6, 2024

For the majority of these 6 months I've been using regularization images. When training styles. Which people swear up and down you must not.

I don't use reg images, I don't even use captions unless I struggle to get the focus. The original training papers and such don't even mention regulation. Since the method and math originates from the papers - I am confident that random people making anime titties aren't any more knowledgeable.

A lot of advice on regularization and even training is bullshit. Really, it's a sea of bullshit out there and people keep repeating stuff they heard somewhere else without looking into it as if they're proselytizing their new religion.
I find that all community sources, especially on social media/youtube and such with flashy titles and " This is how you can make realistic professional amazing art images FOR FREE! Automatic1111 RunPod AnimeTits " flat out don't know what they are doing. And having watched few out of curiosity I know that for sure.

What tilts me a fair bit is that: Beyond some of the more arcane aspects, all the details and information on how these things work is on the papers, githubs or other similar sources and documentation. Obviously you can't really know whether something is actually implemented and working correctly.

By bar for whether a source is worth a damn, is really whether they cite the sources correctly. No I don't mean "link to where I read this" but citing and sourcing at least in manner that would pass in academic or professional setting.

But when it comes to the discussion or reg images. If you get good quality results that meet the criteria you set, without them. Do you actually need them? Because some issue people say can and should be solved with reg images, I have solved by switching the model I train from - only going to base SDXL if I am desperate. Currently the best I have come across and what I use is FluentlyXL, no idea why but it just gives me cleaner results.

You mean like the passage of time influences training results and generation? I have been training loras for over 6 months and have not experienced this at all. But these two extra factors I mentioned absolutely do influence training and generation.

Not the "passage of time" per se. Since all the random components in the training are generated from clock, which is fiddly piece of tech. Minute alternations on what the time is affect things. Nothing in these are 100% deterministic - we are talking about something that is inherently statistical afterall. But the time is never the reason why something works or doesn't work. It is just... tiny little things. But tiny little things can cause noticeable differences, but not to degree where it is "did work" or "didn't work".

Like I talk alot about this stuff, since it is my current obsessive hobby (well training is. I am quite bad at generating, I just like the puzzle of training). And I always wonder whether I am doing "something wrong" to what other people are doing, because I can get things done just fine without captions, reg images, or... other bullshit reddit and whatever deems absolutely mandatory for getting good quality results. I don't do any of that and I get good results. If I can't get something to wrong I deep dive to documentation and papers, and the answers are generally just there in clear print.

16 replies

Serallan Apr 7, 2024

@5KilosOfCheese Can't share, not even most testing results. Sorry, I know it's unhelpful, but I'm investigating the feasibility of using loras to aid work tasks.

If you want to try I think art by Norman Rockwell or JC Leyendecker are good starting points. Not sure because I never tried trained on their work, but they have highly stylized shapes, distinctive shading, and are far off what most models spit for artworks by default.

I went ahead and grabbed a Layendecker to illustrate the struggle. Image and caption based on it:

a painting of two men wearing black tuxedos, sitting, talking, blonde hair and moustache, brown hair, white background, in the style of leyendecker

Default SD 1.5, Dreamshaper 8, one of my loras trained on dreamshaper.

Replacing "portrait" with activation token+<lora> for the lora. It's not trained on leyendecker, 1920s men in tuxedos, artwork with linework, brush strokes like that, but shading is 30% close, shape stylization is very close, and it has letterboxed art. "Leyendecker" must have tugged on it, activating the effect.

Now a stress test with concepts the base model should know but wouldn't be either in a lora trained on Leyendecker nor aren't in mine. The conflicting tokens (buzzcut+woman+changed ethnicity) are to make it sweat. Second row doesn't have "Gemma Ward as" to check who is applying it.

a painting of Gemma Ward as a black woman wearing a military uniform, looking away, black belt, gray pants, (upper body, standing:1.2), brown buzzcut hairstyle, in the style of leyendecker

If my lora can balance features and ethnicity and generate black people better is because I used reg images and careful captioning to erase bad biases in the model. You can see it's still leaking another hairstyle (no buzzcuts in reg) and it's starting to burn in the Gemma version.

--

It's important to clarify we might have different ideas of what style is. I'm under the impression most people creating scraped content "style" loras consider style 90% subject/composition/palette choice 10% shading, and if shading looks half close to the original they're more than happy with the results. From a professional standpoint that's not what style is. You have your subject, composition preferences and bias towards faces and body shapes if doing characters, but you create whatever the client needs. You may also have light and palette preferences, however style is mainly shading, edge choices and shape stylization choices. Subject sits in the bottom 5%.

The challenge in training style loras is that subjects is the very first thing they learn. It'll immediately pickup facial characteristics, but go through a phase of deforming images and flattening the shading, pretty much unlearning previously trained styles, to then start "trying out" ways of approximating your dataset. It's often wildly off-mark at first: Yes, it knows it's not a photo/3d/anime/generic-artistation-scraped-art image and yes, looks vaguely traditional/digital media like your dataset, but similarities end there.

5KilosOfCheese Apr 7, 2024

If you want to try I think art by Norman Rockwell or JC Leyendecker are good starting points. Not sure because I never tried trained on their work, but they have highly stylized shapes, distinctive shading, and are far off what most models spit for artworks by default.

OK. I'm familiar with their works - outside of AI believe it or not. However SDXL is quite good with Leyendecker to begin with.

The issue you are suffering from is a thing that often comes across with anything relating to leydecker. Leydecker is extremely known and overtrained (at least on 1.x) on uniforms. This is because they made a lot of US army recruiment posters during WW2.

So before I start the task of gathering the dataset let me clarify, you want: Art Noveau commercial illustrations with a clearly painterly appearance? Did I understand right? Because library is open tomorrow and I can hit the art history section additional research material.

Serallan Apr 7, 2024

OK. I'm familiar with their works - outside of AI believe it or not. However SDXL is quite good with Leyendecker to begin with.

Hm, can't run SDXL models, but I'm a bit skeptical of the claim it's quite good at Leyendecker. I mean, if you're talking about results similar to that civitai Leyendecker, I'm sorry that looks like a dollar store knockoff. Looks nice but it's really bad at the style.

So before I start the task of gathering the dataset let me clarify, you want: Art Noveau commercial illustrations with a clearly painterly appearance? Did I understand right? Because library is open tomorrow and I can hit the art history section additional research material.

No. I want sleek geometric edges on good silhouettes and shading that breaks into planes depending on the material, with clever subdued hue shifts with well planned values usage. Art noveau is round if not downright swirly and flat because they're earlier lithographs, not by choice.

See? We're not seeing the same thing. Look at the blond guy's moustache, it's so geometrized it could be a folded piece of paper. Curves in the edges are broken into lines, curvature itself is posterized. Even then arm rest of that chair which should end in a swirl had the curve broken into lines. The painterly appearance was added on top of the illustration to make important transitions pop and create texture on large negative spaces. It's a top coat of sugar, not what makes Leyendecker. His deep understanding of planes, values and silhouette is what makes it.

Spot the difference. (photoshop cutout filter with the same settings)

Leyendecker

Mucha (lithograph and painting for painterly component)

In any case, I don't think it's appropriate we continue this discussion. It was about regularization images. Style training was mentioned in this context, as an example of why they exist. They way it's drifting won't help the OP and isn't about regularization at all.

belladoreai Apr 7, 2024
Author

They way it's drifting won't help the OP and isn't about regularization at all.

This has been off topic for a while but it has been interesting to read, I don't mind at all.

5KilosOfCheese Apr 8, 2024

It'll take me a while to gather a dataset to see whether using my methods reg images are even needed - or what they would do - in style training. But the experimentation will yield a lot of answers that will benefit the topic at hand.

madrooky · 2024-04-16T14:20:39Z

madrooky
Apr 16, 2024

Just my 5 cents from experience about class images and tagging:
Class images have a big impact on style and quality of the training. From my testing i figured that if i train people it helps to use pictures with a similar framing and style to the desired outcome as the model compares the training inference to those images. It does not matter where the images come from for the end result, but there might be an advantage of using generated images from the model you are training on If you want only slight fine tuning.
In my testing, where I want break up certain biases, like towards a certain face some models have, I use real images with a high variety in different faces for example, but have a similar framing like the model generated ones. (framing; face close-up, portrait, full body etc...)
Also the posing should be comparable, except if you want to learn new poses. Match the class images in general form and style with the training samples and you are on the right track.

I can't support the statement that tagging is unimportant. It is very important if the training subject includes multiple concepts, and a face already has plenty sub concepts. Every describable feature is its on concept, and everything you want to be changable need to be tagged accurately. But the way you need to tag can vary a lot, like Anime style models tend to be good with keywords, realistic models more with proper sentences. Starting a tagging with "a photo of" can make a big difference how quickly you get a good quality result when training with realistic images. It does not mean that you need to use the term "photo" later when rendering images, but helps the model moving the right weights in the latent space during training.

2 replies

5KilosOfCheese Apr 16, 2024

Yes. If you are going to use captions in training images, it is important that the captions serve the desired goal. Do not use them as just... generic descriptions in which you throw in the thing you are training. You need to have the captions to present the thing you are training in its correct context. Earlier in this topic I presented pictures of the LoRA designed to make a tape gag. I managed to achieve the goal with and without captiosn, however the captions were only of benefit when they served to present the tape gag in context. Otherwise I ended up with the belts and waist lines and arm bands turning in to a duct tape looking thing. Only when the captions supported the idea that this thing I am training exist on the face - more specifically on the mouth - of a person, did it become more reliable. The captions and their words were not necessary for generation at all.

However I have no idea how waifu models tagging and captions work, however I suspect that the fundamental idea of presenting the thing in context doesn't change.

madrooky Apr 16, 2024

You need to have the captions to present the thing you are training in its correct context.

Yes that's the important thing and in general a problem i have with LoRA's. Even though they are small embeddings, many concepts are related to many other concepts, its a huge web of connections, and if the LoRA does not contain the context it might not work properly or causes issues with using multiple LoRA's. Also then there is the compatibility issue since the vectors in different models can be vastly differ even though the rendered images look very alike. That is where the tagging plays the biggest role.
To minimize those problems one might reduce the scope of a LoRA as much as possible.

Typical waifu tagging is wd14 captioning, list of keywords sorted by priority. The "normal" approach would be blip or anything that uses whole sentences.
Although if one has around 12GB vram both should not be used, instead TagGui which lets you use actual multimodal LLM like llava-hf--llava-v1.6-mistral-7b-hf which is my current favorite. But I am still not done testing all the models. They have their own issues and finding a good prompt and settings can be its own challenge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are regularization images used during LoRA training? #2056

{{title}}

Replies: 4 comments 27 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How are regularization images used during LoRA training? #2056

Replies: 4 comments · 27 replies

belladoreai Apr 5, 2024 Author

belladoreai Apr 5, 2024 Author

belladoreai Apr 6, 2024 Author

belladoreai Apr 7, 2024 Author

Replies: 4 comments 27 replies

belladoreai
Apr 5, 2024
Author

belladoreai Apr 5, 2024
Author

belladoreai Apr 6, 2024
Author

belladoreai Apr 7, 2024
Author