Training Z-Image LoRAs with Musubi-Tuner on AMD Strix Halo

This guide walks you through setting up the tools and workflow for training a custom LoRA (Low-Rank Adaptation) using musubi-tuner on AMD Strix Halo APUs with the Z-Image model. For a similar guide on Klein 9B/4B training, see Training Klein 9B/4B LoRAs.

The goal of this guide is to help you set up the tools and environment for LoRA training. The actual training process requires experimentation to find settings that work best for your specific use case. These are the steps I followed to train a Z-Image LoRA based on my photographic style, with examples from my own training where applicable.

Note: There is a debate whether LoRAs trained with Z-Image work with Z-Image Turbo or not. My LoRA has been trained with Z-Image but works best with Z-Image Turbo. Definitely do some tests with both models.

Hardware Requirements

This guide assumes a Strix Halo with 128GB of RAM for the default path. Refer to the Out of VRAM section if you run into problems.

Note: In my personal experience, Z-Image required approximately 110GB of VRAM with the settings I used.

Prerequisites

Make sure you have the following installed:

  • uv - A fast Python package installer and manager
  • git - Version control system

You can install uv and git on most Linux distributions:

# Arch Linux
sudo pacman -S uv git

# Ubuntu/Debian
sudo apt install uv git

Installation

Download the musubi wrapper script and copy it where you want. This is a small script I created to simplify the procedure. The script is easy to read, so if you are curious, have a look at what it does!

Now cd to the directory where the script is located. From there, you will need to:

# Make the script executable
chmod +x musubi.sh

# Install musubi-tuner
./musubi.sh setup

By default this will install musubi-tuner in your home directory. You can override the install directory:

export MUSUBI_TUNER_INSTALL_DIR="/musubi/installation/path"

The script defaults to downloading dependencies for Strix Halo. This can also be overridden:

# You can check all available architectures here: https://rocm.nightlies.amd.com/v2-staging/
# The example below is for Strix Point
export GFX_NAME="gfx1150"

All overrides must be performed before running the setup step.

Downloading Models

We need to download the Z-Image model and its VAE (Autoencoder). The model can be found on HuggingFace.

Z-Image Model

Download the following files from the Z-Image repository:

  1. Diffusion model (BF16): z_image_bf16.safetensors

  2. Text encoder (Qwen 3 4B): qwen_3_4b.safetensors

  3. VAE: ae.safetensors

Place all downloaded files in appropriate directories.

After downloading, note the paths to your model files. You’ll need them in the next steps.

Note: If you already have a VAE and/or text encoder that you are using with other Flux-based models, you can use those files for the following steps instead of downloading them again.

Project Creation

We will now create the LoRA project. Once again, we will rely on the script to create the initial directory structure and a standard musubi-tuner configuration for Z-Image.

# Set the model version to z-image
export MODEL_VERSION="z-image"

# Set the path of the diffusion model (Z-Image model file)
export DIT_MODEL="/path/to/z_image_bf16.safetensors"

# Set the path to the first shard of the text_encoder or to the merged model file
export TEXT_ENCODER="/path/to/qwen_3_4b.safetensors"

# Set the path to the VAE (e.g., ae.safetensors)
export VAE_MODEL="/path/to/vae/ae.safetensors"

# Set the project name. A folder with this name will be created
export PROJECT_NAME="my-zimage-lora"

# Create the project
./musubi.sh create

Dataset Preparation

A good dataset is crucial for training a useful LoRA. The following are technical guidelines and rules of thumb to get you started. However, experimentation is key - different datasets and styles may require different approaches.

Adding Images

Place your training images in the dataset directory of your project. Each image should have:

  • File format: PNG, JPG, or WEBP
  • Resolution: Aim for 1024x1024. Using a single resolution for all images can reduce the number of batches, which is good
  • Aspect ratio: Square images work best but other ratios will work too

As a rule of thumb, anywhere from 20-200 images will work. The quality of the images is more important than the number.

Adding Captions

For each image, create a corresponding text file with the same name but .txt extension:

dataset/
├── image1.jpg
├── image1.txt          # Caption for image1.jpg
├── image2.png
├── image2.txt          # Caption for image2.png
└── ...

Caption rules of thumb:

  • For styles: describing the scene but not the style works. I had good results with just empty captions.
  • For characters: a trigger word + a short description seems to work well.

Editing the Dataset Config

In the project directory, you will find a dataset.toml file. While usable as-is, here is the explanation of some of its parameters:

  • resolution: Target resolution for training
  • batch_size: How many images to process at once (reduce if you run out of VRAM)
  • enable_bucket: Allows different aspect ratios (keeps more detail)
  • num_repeats: How many times to cycle through the dataset per epoch (higher = more training)

Creating Reference Prompts

Another notable file is called reference_prompts.txt. Reference prompts are used to generate sample images during training, so you can see how the LoRA is progressing.

Example with a single prompt:

A close-up portrait by mikkoph of a very young woman with fair skin and striking blue eyes, looking directly at the camera with a soft, serene expression. Her blonde hair is styled in an elegant updo, adorned with numerous small white flowers, possibly daisies, nestled throughout the curls. She wears a floral-patterned blouse with black, white, and gold flowers, a pearl earring in her right ear, and has a manicure with white nail polish. Her hands are gently cupped around her face, with her fingers lightly touching her cheeks. The background is a deep, dark blue, creating a dramatic contrast that highlights her features and the delicate details of her look. --w 1024 --h 1024 --d 42 --s 40

Each line is a separate prompt that will be sampled during training. Adding as many as you need, but keep in mind that this will make the training session longer.

Training Configuration

The following explains the most relevant parameters from the training.toml file in your project directory:

  • network_dim: Dimension of the LoRA (8-16 is usually suitable for simple styles, 32-64 for more complex concepts or characters)
  • learning_rate: 1e-4 is a good starting point (adjust if the loss doesn’t decrease)
  • max_train_epochs: How many training cycles (10-50, depending on dataset size)
  • save_every_n_epochs: How often to save checkpoints
  • save_state: Saves the training state with each checkpoint. This allows stopping and resuming training. It consumes more disk space and VRAM

Important: All the settings mentioned above are starting defaults. There is no one-size-fits-all configuration. You will need to experiment with these values to find what works best for your specific dataset and goals. The values provided are based on my experience, but your results may vary significantly.

Personal Experience: At least in comparison to Flux.2 Klein, it seems like Z-Image requires more epochs to learn.

Running Training

# Cache latents and prompts. This speeds up training considerably
./musubi.sh cache

# Run the training
./musubi.sh train

Note: You are likely to see many warnings when running this command. They are harmless and can be ignored.

Resuming Training

If you have stopped the training or feel that the LoRA is undertrained even after finishing, you can resume training if you set save_state to true (which is the default) in training.toml. To resume, only run:

./musubi.sh train

This will automatically find the latest saved state and restart training from there.

Monitoring Training

During training, you’ll see:

  1. Loss values - Should decrease over time. If it stays flat or increases, your learning rate may be too high. However, do not rely too much on this value.
  2. Sample images - Generated every 2 epochs (or whatever you set) showing how the LoRA is learning
  3. Checkpoint files - Saved to the output directory of the project every 2 epochs (or whatever you set)

What to Look For

EpochWhat to Check
1-5Loss should start decreasing
5-10Sample images should show the style emerging
10-20Check for overfitting (samples look too much like training images)
20+If loss is still decreasing, consider more epochs

However, it is a good idea to test the most promising checkpoints with Z-Image Turbo using ComfyUI. That’s the only way to be sure.

Using Your Trained LoRA

After training completes, you’ll have checkpoint files in the output directory of your project:

output/
├── my-zimage-lora-000002.safetensors    # Epoch 2 checkpoint
├── my-zimage-lora-000004.safetensors    # Epoch 4 checkpoint
├── my-zimage-lora-000006.safetensors    # Epoch 6 checkpoint
└── ...

The final checkpoint won’t have a sequence number.

In ComfyUI

Important: Z-Image LoRAs require a conversion step for ComfyUI compatibility. This is done automatically for the last checkpoint after training, but you can also trigger it manually for other checkpoints.

  1. Automatic conversion: After training completes, the final checkpoint is automatically converted and saved as my-zimage-lora_comfyui.safetensors

  2. Manual conversion: For any other checkpoint, run:

    ./musubi.sh convert output/my-zimage-lora-000016.safetensors

    This creates my-zimage-lora-000016_comfyui.safetensors

  3. Place the _comfyui.safetensors file in your ComfyUI models/loras/ directory

  4. Add a Load LoRA node to your workflow

  5. Connect it to your Z-Image Turbo model nodes

  6. Adjust the LoRA strength. Start with 1.0, but don’t be afraid to push it significantly higher or lower.

In other tools

Most tools that support Stable Diffusion LoRAs will work with Flux LoRAs. Look for a “Load LoRA” or similar node/module. Note that you may need the converted _comfyui.safetensors files for compatibility.

Results

Here are some example images generated using my LoRA. They are generated with the same prompt and seed using Z-Image Turbo.

before after
before after
before after
before after
before after

Merging Checkpoints with EMA

The musubi.sh script includes an ema command that performs Exponential Moving Average (EMA) merging of LoRA checkpoints. This post-training technique combines multiple checkpoints into a single, often improved checkpoint.

EMA works by applying a weighted average to checkpoint parameters, where a beta value (default: 0.95) controls the weighting between earlier and later checkpoints. This can be useful when:

  • You have multiple promising checkpoints but are unsure which is optimal
  • Training showed consistent improvement across epochs
  • You want to avoid testing each checkpoint individually

Usage:

./musubi.sh ema output/my-zimage-lora-000016.safetensors output/my-zimage-lora-000018.safetensors output/my-zimage-lora-000020.safetensors --output_file my-zimage-lora-ema.safetensors

You can also adjust the beta parameter (e.g., --beta 0.97) to give more or less weight to earlier checkpoints. Note that EMA works best when checkpoints are from similar training phases and show consistent improvement.

Troubleshooting

Out of VRAM

If you get “out of memory” errors:

  • Reduce batch_size in dataset.toml
  • Try optimizer_type = "adamw8bit" in training.toml
  • Reduce resolution (try 768x768)
  • Reduce max_data_loader_n_workers (try 1)

Samples Look Bad

  • Train for more epochs (style may need time to emerge)
  • Check your dataset quality (images should be clear, captions should be good)
  • Try a different network_dim (higher for complex styles)
  • Adjust the learning rate (try 5e-5 if loss is too high, 2e-4 if loss is too low)

Conclusion

You now have a complete workflow for setting up and training custom LoRAs with musubi-tuner on AMD GPUs for Z-Image. Start with a small dataset (50-100 images) and experiment with different settings to find what works best for your use case.

For more information, check out:

Happy training!