Details on how the big diffusion model finetunes are trained is scarce, so just like with version 1, and version 2 of my model bigASP, I'm sharing all the details here to help the community. However, unlike those versions, this version is an experimental side project. And a tumultuous one at that. I’ve kept this article long, even if that may make it somewhat boring, so that I can dump as much of the hard earned knowledge for others to sift through. I hope it helps someone out there.

To start, the rough outline: Both v1 and v2 were large scale SDXL finetunes. They used millions of images, and were trained for 30m and 40m samples respectively. A little less than a week’s worth of 8xH100s. I shared both models publicly, for free, and did my best to document the process of training them and share their training code.

Two months ago I was finishing up the latest release of my other project, JoyCaption, which meant it was time to begin preparing for the next version of bigASP. I was very excited to get back to the old girl, but there was a mountain of work ahead for v3. It was going to be my first time breaking into the more modern architectures like Flux. Unable to contain my excitement for training I figured why not have something easy training in the background? Slap something together using the old, well trodden v2 code and give SDXL one last hurrah.

TL;DR

If you just want the quick recap of everything, here it is. Otherwise, continue on to “A Farewell to SDXL.”

A Farewell to SDXL

The goal for this experiment was to keep things simple but try a few tweaks, so that I could stand up the run quickly and let it spin, hands off. The tweaks were targeted to help me test and learn things for v3: