[-] FactorSD@lemmy.dbzer0.com 4 points 1 year ago

A lot of the time I try to just let images come out as the AI imagines them - Just running img2img prompts, often in big batches, then picking the pictures that best reflect what I wanted.

But I do also have another process when I want something specific, which involves doing img2img to generate a pose and general composition, flipping that image into both a controlnet (for composition) and a segmentanything mask (for latent couple) and then respinning the same image with the same seed with those new constraints. When you run with the controlnet and the mask you can turn the CFG way down (3 or 4) but keep the coherence in the image so you get much more naturalistic outputs.

This is also a good way to work with LORAs that are either poorly made or don't work well together - The initial output might look really burned, but when you have the composition locked in you can run the LORAs at much lower strength and with lower CFG so they sit together better.

[-] FactorSD@lemmy.dbzer0.com 6 points 1 year ago

I guess YMMV on whether focused is boring or not. I agree that I never really found stimulants to be super interesting, but thats partly because it was too expensive to do coke just to work on whatever project was on my mind.

[-] FactorSD@lemmy.dbzer0.com 4 points 1 year ago

I just wanted to come back and update my earlier post because HOLY SHIT I have radically improved my LORAs over the past couple of days.

Firstly; I was flat out wrong about the need to heavily tag, at least for things like garments. The guide I was following talked about styles and objects as the two types of LORA. I thought I was doing one type when actually I should have been treating it like the other. I tore the tags apart so that almost my whole training set was just trained on the single key concept, some had one or two extra terms at most. So, Mr Shiimiish can tag his helmets in a much more chill style.

Secondly; at least in my experience you need to twiddle with the Kohya settings just slightly to get genuinely good results. I was getting burned out LORAs by generatiton 6 or 7 before, but I turned down the learning rate to 0.00005 (half the default) with 7 repeats per epoch (instead of the laughably high 40) and it's much much better now. The jumps between each epoch are much smaller and you can get a much more granular picture of what is actually going on. I also turned the rank, network and module dropouts to 0.1. Kohya says they are the minimum recommended values, but by default they are set to zero. I have no idea what those do, but I am definitely getting better results. I'll do a codeblock with my full settings at the end.

Finally; and very very very importantly - CHECK THE FUCKING FOLDERS. If you, like me, got a bad LORA, reloaded the old config to change it around a bit then set it running again you will discover that Kohya kept the old training set, and if you changed the number of repeats the old and the new training set will now be in differently named folders but BOTH will be used in the new run, so it'll just fuck up again and also take much much longer on your second run.

I have also been told to use regularization images - Not to download them, to make some then use them. So, go to your SD, plug in the model you are training the LORA on, give a prompt that will create "things without the LORA item on them". So, if you are training for dudes in chainmail, put in "30 year old athletic man standing up". Have SD generate like 500 pictures worth of that (I have no idea why that many, that's what I was told). The idea is that you have your training data of dudes in armour, and then you have the regularization of similarly constructed dudes not in armour, and SD will look back and forth to help it figure out what you are trying to train it on. You need to use the model you are training on to do the rendering though, so make your own and don't download other people's even if it seems similar. Just be patient and let it run.

Since I mention models - It probably goes without saying, but don't use the base SD1.5 model unless you actually intend to generate images with it, because it is... It's not wonderful. It's alright, but get something that's more appropriate to your needs. There's plenty of stuff that's been trained on LOTR and GOT that are at least better at understanding that a cuirass isn't a type of dress shirt. If you are doing anime, use an anime model, etc etc.

Getting all this stuff right has radically improved the LORAs. They are significantly more transparent; the styling in the original image will change somewhat unless you prompt against it (it's an inevitable result of imperfectly balanced training data, it's why so many LORAs make your people look unexpectedly Japanese) but they don't make people's faces melt. With the last LORA run I did today I was testing epoch 29 at CFG11 and still getting good clean images with no distortion. Previously I would be running epoch 3 or 4 at CFG3 to not get a Daliesque nightmare. Huge improvement all around, and I no longer have to take the earliest epoch that reproduces the item, there's a big range to test and see which best preserves detail and plays well with others.

Here's the Kohya settings that were actually successful - You can't quite just copy and paste because you will need to set up your own folders correctly yourself, and choose your model and sample prompt and all that. But you can at least run your eye down the values and copy those across. I'd say you want to run 30ish epochs from this, based on 30 to 50 good pictures. One LORA I ran today was good at about 20, the other at about 30. That might take a while, apologies about that, but I am running on a Shadow pc with an A4500 20GB, and it turned into about 1hr 45 to do 30 epochs which is pretty reasonable.

Settings

"LoRA_type": "Standard", "adaptive_noise_scale": 0, "additional_parameters": "", "block_alphas": "", "block_dims": "", "block_lr_zero_threshold": "", "bucket_no_upscale": true, "bucket_reso_steps": 64, "cache_latents": true, "cache_latents_to_disk": false, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "caption_extension": "", "clip_skip": "1", "color_aug": false, "conv_alpha": 1, "conv_alphas": "", "conv_dim": 1, "conv_dims": "", "decompose_both": false, "dim_from_weights": false, "down_lr_weight": "", "enable_bucket": true, "epoch": 30, "factor": -1, "flip_aug": false, "full_fp16": false, "gradient_accumulation_steps": "1", "gradient_checkpointing": false, "keep_tokens": "0", "learning_rate": 5e-05, "lora_network_weights": "", "lr_scheduler": "cosine", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "lr_warmup": 10, "max_data_loader_n_workers": "0", "max_resolution": "512,512", "max_token_length": "75", "max_train_epochs": "", "mem_eff_attn": false, "mid_lr_weight": "", "min_snr_gamma": 0, "mixed_precision": "fp16", "model_list": "custom", "module_dropout": 0.1, "multires_noise_discount": 0, "multires_noise_iterations": 0, "network_alpha": 128, "network_dim": 128, "network_dropout": 0.1, "no_token_padding": false, "noise_offset": 0, "noise_offset_type": "Original", "num_cpu_threads_per_process": 2, "optimizer": "AdamW8bit", "optimizer_args": "", "persistent_data_loader_workers": false, "prior_loss_weight": 1.0, "random_crop": false, "rank_dropout": 0.1, "resume": "", "sample_every_n_epochs": 1, "sample_every_n_steps": 0, "sample_sampler": "k_dpm_2", "save_every_n_epochs": 1, "save_every_n_steps": 0, "save_last_n_steps": 0, "save_last_n_steps_state": 0, "save_model_as": "safetensors", "save_precision": "fp16", "save_state": false, "scale_v_pred_loss_like_noise_pred": false, "scale_weight_norms": 0, "seed": "", "shuffle_caption": false, "stop_text_encoder_training": 0, "text_encoder_lr": 1e-05, "train_batch_size": 2, "train_on_input": false, "unet_lr": 5e-05, "unit": 1, "up_lr_weight": "", "use_cp": false, "use_wandb": false, "v2": false, "v_parameterization": false, "vae_batch_size": 0, "wandb_api_key": "", "weighted_captions": false, "xformers": true


I figured I'd at least post this up so any future garment enthusiasts could at least learn a bit from my monkey-at-a-typewriter approach.

[-] FactorSD@lemmy.dbzer0.com 4 points 1 year ago

There's a weird modern military turn based strategy game where you fight invading orcs. It's called Spellcross and until recently it only was available through Hall of the Underdogs. Great game, very Xcom, balls hard.

[-] FactorSD@lemmy.dbzer0.com 10 points 1 year ago

Most artists never make any money at all...

[-] FactorSD@lemmy.dbzer0.com 4 points 1 year ago

Hey, if you can't be consistent at least be honest.

[-] FactorSD@lemmy.dbzer0.com 5 points 1 year ago

It's more complex than that - You aren't wrong, but there's a lot more going on. Almost anything made by an employee as part of their job belongs to the company. If Amazon licences your work to make something based on it, that's one thing, but if you are a jobbing writer who gets assigned to develop a new series, Amazon will own everything. You get paid in your salary, not in royalties. And, frankly, a lot of creatives are quite happy with that arrangement (since it's so rare to make money at all).

And that's why it's... Odd. Because the "creator" is some dude who has already been paid; literally has received his salary. But the performance of his show does impact him, at least to some degree. Low ratings don't mean he gets paid less, but it means he's unlikely to earn more in future.

[-] FactorSD@lemmy.dbzer0.com 7 points 1 year ago

There's nuance in the pirate ranks my dude. Some people don't really believe in property rights at all, some people think that piracy is acceptable when you can't afford/obtain the original, some just like to try before they buy.

[-] FactorSD@lemmy.dbzer0.com 6 points 1 year ago

It's true that SaaS does stop you from owning software... But what good does "owning" a piece of software do you if you can't get updates anyway? Back in the pre-internet era we got used to software existing as discrete versions but it hasn't been like that for a LONG time. As soon as patching became a regular occurrence, "ownership" became a service contract with a CD attached. Then the CD vanished, and it just became a service.

While I do dislike needless "as a service" stuff, that model does genuinely suit a lot of people. It's not a conjob; companies offer this stuff because a lot of customers want it. Most of the companies that are selling you SaaS stuff themselves use SaaS things in-house.

18

There are sometimes days where the gods of SD are just mocking you. That hand was made from a depth map extracted from a real human pose, and the map very clearly show FOUR FUCKING FINGERS. And yet...

I am starting to wonder if this model was trained exclusively on people with polydactyly. If so, well played internet.

I have tried a bunch of stuff to get good hands and feet from SD, and nothing is even slightly reliable. TI and negative prompts sometimes just don't work, or even make stuff worse. Inpainting takes forever to make work well. I thought controlnet would crack it but apparently not.

How do you guys deal with dodecadactyl mutants?

[-] FactorSD@lemmy.dbzer0.com 6 points 1 year ago

Knowledge does want to be free, but its a stretch to say Guardians 3 is a unit of "knowledge". Creative works kinda don't want to be free; Guardians is only desirable because of the cast and crew's work, and you acting out the script is not the same at all. We shouldn't devalue creative labour, even as pirates.

Piracy cuts into the profits of studio investors, and that's good, without impacting how much actors and crew are paid. Win/win.

[-] FactorSD@lemmy.dbzer0.com 11 points 1 year ago

I too have just started on my LORA-making journey, and I too am interested in ahem specialist apparel. My experience is that most LORA are made by non-enthusiasts who don't necessarily know how enthusiasts refer to things, and to some degree non-enthusiasts want visual variety so they can churn out "dwarf in armour" and "elf in armour" prompts and get things that actually look different. That is fine for most people, who just want to have some nice pictures to go along with their D&D campaign or whatever. But if you are a discerning connoisseur then yes, you kinda do need to roll up your sleeves and make it yourself.

There are some guides out there for LORA making - As ever, they are a mix of helpful and not helpful, and you are going to end up having to work things out yourself. You are definitely going to end up wasting a lot of compute time on LORAs that just fail. That's part of the process. You are going to see a lot of parameters which you don't understand and that have seemingly absurd values.

Before I jump into the rest of this - I strongly advise you to start out with LORAs that do one specific thing, and only that thing. So, a LORA just for bucket helmets, using just images of bucket helmets. You can make more complex LORA, but holy crap this is a complex process with a lot of moving parts, so start out with just one thing that you can easily tell if its working and how well.

It is good to hear that you are mentally prepared to manually tag your own images because this is utterly essential and you need to do a very thorough job of it. When training stuff the rule is "garbage in, garbage out" and there is no shortcut here. I honestly haven't found a good tagging methodology yet, but the advice I'm trying to follow is that you want to tag anything that you DON'T want included in the training term, and don't tag things that you DO want included.

So, you have a picture of some reenactor in brigandine - You tag it as "brigandine", but you don't tag "rivets" or "visible plates". You would tag "black leather" because brigandine could be any colour, so you call out the colour so the AI can see that the colour is separate from the armour design. You would also tag the trousers, the helmet, the person wearing the armour, the background, the lighting, the time of day; and then also add in "good quality" or "cropped" or other terms about the photograph itself. This sounds like overkill, but if you don't do it right then the LORA will do all kinds of weird things that you didn't intend.

To give an example - On the first run of my first LORA I was actually kinda shocked that I was getting good results for the garment that I had trained on... But the LORA was also changing the skin tones and the white balance in the image. The training data was skewed towards very warm light and tanned skin tones, and I hadn't tagged that, so the result was the AI also associated the training terms with olive skin and incandescent light and they couldn't easily be separated. I had to go back, reprocess the images, retag them and then come back again.

Which brings me on to the images - You want the largest, highest quality images you can find. You want a range from long shots to close ups, but don't use anything too close up because SD needs to see how the armour relates to the rest of the figure.

You don't need to train on huge sets, but I strongly advise that you grab yourself 100+ images and then aggressively prune that collection down to around 30-50 of the best quality images. You should run all of them through some denoising, and for almost all of them you should fiddle with the colour balance. You don't need to perfectly match colours or anything, but when you see that the light is a bit orange or the image is dark, just change the levels a bit so they are more neutral. You also want to manually crop the images, and (AFAIK) they do have to be 1:1 squares which often means having to crop figures.

As for the actual LORA settings - Don't ask me what they do, I don't know. I have been kinda kludging together suggestions from different guides and just seeing what happens. I know for certain that my LORAs are training too fast and typically are burning out by about epoch 6 or 7, but I have no idea how to fix that.

I would recommend that while you are learning you set the trainer to save every epoch and print a sample every epoch. This is because Kohya is fucking twitchy and can sometimes just hang during an 8 hour run, and also so you can monitor the training in real time and see what is happening. I've never had a training session that actually needed to run to the end, but the sample images showed training, then good fit, then blurring and overfitting, and I quit out at that point. When you are saving every epoch you of course keep every version to test out, even if there is a crash or you quit, but you can also do a nice X/Y plot of every version of the LORA next to each other and find where the sweet spot is.

Also, if a LORA is just for personal use you don't necessarily need perfection to get results that will work for you. The standard for most LORA is that they are very transparent and compose well with other LORA and work on lots of models. That's a lot of work man, and if you aren't using it that way you won't even appreciate it all.

Instead, I have been using a combination of control nets, latent couples and composable LORA to apply my mediocre quality outputs to specific parts of specific images. It's a faff, but you can generate a nice knightly figure, freeze his pose, mask off just the torso, specify brigandine, then generate an image that will probably be very good even with amateurish LORA creations. And that way you don't have to worry too much about your LORA melting people's faces and turning their hair pink.

Here are some of the resources that I've used:

https://civitai.com/questions/158/guide-lora-style-training https://rentry.co/lora_train https://civitai.com/models/22530/guide-make-your-own-loras-easy-and-free

God's speed, fellow garment enthusiast!

[-] FactorSD@lemmy.dbzer0.com 4 points 1 year ago

I'm not familiar with that site specifically, but in principle using a web/cloud tool means that your being on mobile has zero impact on the output. You are feeding a more powerful computer prompts and it sends you back the pictures it generates. So this isn't a "mobile" problem.

There's two things to keep in mind though - Firstly, using SD is a art in itself. It's not easy to get good outputs. I know it's kinda presented as being "just type and awesome comes out", but typically a lot of work goes into generating good AI art works. There are a lot of parameters and a lot of possible tools and you do need to spend some time learning how it all fits together. Secondly, running on someone else's platform is always limiting in terms of what parameters you can fiddle with. A big chunk of getting good results is being able to use your own preferred embeddings, LORA and model to get the results you want. SD can do photorealistic aliens and cartoon smut, but it can't do both on the same settings, and if you can't change them then you will always be limited.

You don't necessarily need to move off of mobile, and at least while you are starting out I wouldn't recommend spending lots of money, but you should think about SD as being a workflow and consider what is convenient. Personally, I would work on a laptop if at all possible, even if you are just using the various cloud versions (HuggingFace etc). That's just because you are going to do a lot of copying and pasting and granular tweaking of settings. When you have a big prompt and you need to just change one value, having a trackpad is a lot easier than poking at tiny text on a small screen.

I do generally believe that running a personal instance of SD is the way forward in the long term. The real barrier is technical knowledge more than cash/gpu power; setting it up is not easy if you are someone who doesn't know Python (like me). If you have any device with a mediocre gpu (I started on my laptop 3050ti) then SD will run slow, but will actually run. If you already have that device to use, it's literally free and you get the benefits of a local instance immediately, like being able to do big runs (leave them overnight if they take too long) of X/Y plots to help you learn how parameters work, and being able to try out models and LORAs to get where you want to be.

If you don't have such a device, you can still dip your toe in without spending a lot of money. I do my SD work on a Shadow.tech cloud PC and various other services are available. Yes, in this economy throwing 50 bucks around isn't nothing, but you get a GPU with 20gigs of VRAM and it runs 10x faster (more actually - 10 iterations per second instead of 4 seconds per iteration) than I had before.

You can access any cloud instance via mobile if that's your bag, although it does not work wonderfully on Shadow, because it's so focused on giving you a Windows desktop rather than a mobile front end. You could however be a super cool guy and connect your phone to a USB-C hub, then connect it to a mouse, keyboard and monitor.

All just food for thought :D

view more: next ›

FactorSD

joined 1 year ago