How to Integrate a New Diffusion Model for FlowGRPO Training
Last updated: 06/02/2026.
This guide walks you through everything required to integrate a new diffusion
model into VeRL-Omni so it can be trained end-to-end with the FlowGRPO
algorithm. The contracts described below (registry hooks, adapter
classmethods, scheduler choice, custom-output field names) are specific to
the FlowGRPO trainer; other RL algorithms may impose different requirements.
Use
integrating_a_new_policy_gradient_algorithm_for_diffusion_model.md
for PPO-like policy-gradient algorithms, and
integrating_a_new_direct_preference_algorithm_for_diffusion_model.md
for direct-preference algorithms.
We use the Qwen-Image integration
(verl_omni/pipelines/qwen_image_flow_grpo/)
as the worked example throughout. Read the source alongside this guide — the
code is the canonical reference.
TL;DR
A new model needs three files in one new package plus two registry hooks. The same adapters work with both the default diffusers + FSDP2 backend and the optional VeOmni backend — backend selection is purely a configuration concern.
verl_omni/pipelines/<model>_flow_grpo/
├── __init__.py # re-exports both adapters
├── diffusers_training_adapter.py # subclass of DiffusionModelBase
└── vllm_omni_rollout_adapter.py # subclass of VllmOmniPipelineBase
Both adapters are picked up by string-based registries that dispatch on
the pair (model_index.json::_class_name, algorithm). By default the
algorithm is read from actor_rollout_ref.model.algorithm, which is the
source-of-truth in the current trainer wiring. Register the package by importing it
from
verl_omni/pipelines/__init__.py,
add an example launch script, and add a smoke test.
Mental Model
There are two execution contexts you must serve, and they share the same algorithm (FlowGRPO) but use different runtimes:
Context |
Runtime |
What you implement |
|---|---|---|
Rollout (sampling trajectories) |
vllm-omni |
|
Training (per-step forward + loss) |
FSDP + diffusers |
|
The trainer runtime can also be VeOmni’s FSDP2-based DiT trainer; see § 5.3. The training-adapter contract (prepare_model_inputs / forward_and_sample_previous_step) is identical on both backends.
┌─────────────────────────┐ ┌──────────────────────────┐
│ Rollout worker │ trajectory │ Trainer worker │
│ (vllm-omni) │ ─────────────▶ │ (FSDP + diffusers) │
│ │ latents, │ │
│ VllmOmniPipelineBase │ log_probs, │ DiffusionModelBase │
│ └─ diffuse() + SDE │ prompt embeds │ └─ prepare_model_inputs │
│ │ │ └─ forward_and_sample… │
└─────────────────────────┘ └──────────────────────────┘
The two adapters must agree on:
Architecture string (the first
@register(...)argument). It must matchmodel_index.json::_class_nameexactly. For Qwen-Image this is"QwenImagePipeline".Algorithm string (the
algorithm=keyword on@register(...)). For this guide the value is always"flow_grpo". When integrating a different RL algorithm use the appropriate algorithm name and the matching algorithm-family guide.Prompt-encoding format of the embeddings shipped through the agent loop. The rollout always returns padded
(B, L, D)+(B, L)mask; the training adapter is free to convert to whatever the transformer needs.Scheduler choice so log-probs computed on each side are comparable.
Prerequisites
Before you start, the new model must already be supported upstream by:
diffusers — provides the transformer (
<Name>Transformer2DModel), scheduler config, and a reference inference pipeline.vllm-omni — provides the rollout-side
<Name>Pipeline. Your rollout adapter inherits from this class.
If either is missing, upstream the model first. Nothing below will work without them.
Step 1 — Read the Upstream Pipelines and Note the Differences
Open the upstream diffusers pipeline (__call__) and the vllm-omni
rollout pipeline (forward). Answer these questions before writing any
code — the answers determine every helper you need:
Latent shape. Packed sequence
(B, seq, 4·C)(Qwen-Image) or 4-D(B, C, H, W)?Text encoder output. Fixed
(B, L, D)plus a mask, or a list of variable-length per-sample tensors?Transformer signature. What kwargs does it accept? Any extras (
img_shapes,txt_seq_lens,guidance, …)?Timestep convention.
t/1000?(1000 - t)/1000? Something else?Output sign. Is the predicted velocity / noise negated before being passed to the scheduler?
CFG flavour. “True CFG” with renormalisation? Standard CFG with optional norm clipping? At what threshold is CFG active?
VAE post-processing.
latents / scaling_factor + shift_factor,latents / std + mean, or other?Prompt template. Does the upstream
_encode_promptprepend a hard-coded system prompt? Whatever it does, your data preprocessor must match exactly so training-time and inference-time tokenisation agree.
Anything model-specific belongs inside the model’s own package;
anything reusable belongs in
pipelines/utils.py or
pipelines/model_base.py.
Step 2 — Scaffold the Package
Create the new package and start by copying
verl_omni/pipelines/qwen_image_flow_grpo/
as a template:
verl_omni/pipelines/<model>_flow_grpo/
├── __init__.py
├── diffusers_training_adapter.py
└── vllm_omni_rollout_adapter.py
The __init__.py re-exports both adapters so the @register(...)
decorators run on import — follow the existing Qwen-Image pattern:
from .diffusers_training_adapter import MyModel
from .vllm_omni_rollout_adapter import MyModelPipelineWithLogProb
__all__ = ["MyModel", "MyModelPipelineWithLogProb"]
Finally, register the package by adding a star-import to
verl_omni/pipelines/__init__.py
so both registries learn about your model when verl_omni.pipelines is
imported:
from .qwen_image_flow_grpo import * # noqa: F401, F403
from .my_model_flow_grpo import * # noqa: F401, F403
__all__ = list(qwen_image_flow_grpo.__all__)
__all__ += my_model_flow_grpo.__all__
Note.
vllm_omniis a hard dependency ofverl-omni, so the rollout adapter import does not need to be guarded.
Step 3 — Write diffusers_training_adapter.py
Subclass DiffusionModelBase,
decorate it with the architecture string, and implement the four
classmethods:
@DiffusionModelBase.register("MyModelPipeline", algorithm="flow_grpo")
class MyModel(DiffusionModelBase):
@classmethod
def build_scheduler(cls, model_config): ...
@classmethod
def set_timesteps(cls, scheduler, model_config, device): ...
@classmethod
def prepare_model_inputs(cls, module, model_config, latents, timesteps,
prompt_embeds, prompt_embeds_mask,
negative_prompt_embeds, negative_prompt_embeds_mask,
micro_batch, step): ...
@classmethod
def forward_and_sample_previous_step(cls, module, scheduler, model_config,
model_inputs, negative_model_inputs,
scheduler_inputs, step): ...
3.1 build_scheduler and set_timesteps
Reuse
FlowMatchSDEDiscreteScheduler
unless you have a strong reason not to — FlowGRPO only requires a
flow-matching scheduler that exposes sample_previous_step(...).
Compute image_seq_len and mu exactly as the upstream diffusers
pipeline does. If they drift, the training-time noise schedule will not
match deployment.
3.2 prepare_model_inputs
This method receives the full batched tensors for the entire
denoising trajectory (latents of shape (B, T, ...), timesteps of
shape (B, T)) together with the step index. Your implementation is
responsible for slicing to the current step, e.g.
latents[:, step] and timesteps[:, step], before building model
inputs. The typical steps are:
Slice
latents[:, step]andtimesteps[:, step]for the current denoising step.Apply per-model timestep rescaling.
Convert padded prompt embeddings + mask to whatever format your transformer expects.
Build the positive input dict and, if CFG is enabled, the negative input dict (same latent + timestep, negative text features).
The dict keys must match the kwargs of the diffusers transformer
class verbatim — the FSDP engine calls module(**model_inputs).
3.3 forward_and_sample_previous_step
Call the transformer once for the positive prompt; if CFG is active,
call it again for the negative prompt and combine them. Always finish with
scheduler.sample_previous_step(...) and return the triple
(log_prob, prev_sample_mean, std_dev_t) — that is what
PPODiffusersFSDPEngine.prepare_model_outputs
consumes.
Tip. If your transformer returns a list (one element per sample), wrap the call in a small helper that re-stacks to
(B, C, H, W)so the rest of the pipeline keeps a single tensor convention.
Step 4 — Write vllm_omni_rollout_adapter.py
Subclass the upstream <Name>Pipeline from vllm_omni.diffusion.models
and decorate with the same architecture/algorithm pair:
@VllmOmniPipelineBase.register("MyModelPipeline", algorithm="flow_grpo")
class MyModelPipelineWithLogProb(MyModelPipeline):
...
Your subclass must do four things:
Replace the upstream scheduler (typically Euler-based) with
FlowMatchSDEDiscreteScheduler.Override
encode_promptto accept pre-tokenisedprompt_idsand the tokenizer attention mask (the agent loop ships these — never raw strings). Always return a padded(B, L, D)tensor and a(B, L)mask so the agent loop can ferry them as plain tensors.Implement
diffuse(...)— the SDE loop that optionally applies CFG and collectsall_latents,all_log_probs, andall_timesteps.Override
forward(req, ...)so that:Sampling parameters come from
req.sampling_params(useextra_argsfor SDE-specific knobs).prompt_embeds,prompt_embeds_mask,negative_prompt_embeds, andnegative_prompt_embeds_maskare placed in the returnedDiffusionOutput.custom_output. The diffusion agent loop (diffusion_agent_loop.py) reads these field names verbatim — do not rename them.
Step 5 — Configure the Pipeline
No code changes are required in the trainer launcher itself. At runtime:
DiffusionModelConfig.architectureis auto-detected frommodel_index.json.DiffusionModelConfig.algorithmis set byactor_rollout_ref.model.algorithm(defaultflow_grpoindiffusion_model.yaml).algorithm.adv_estimatoris templated to read from this same value.DiffusionModelBase.get_class(model_config)resolves to the training adapter registered under(architecture, algorithm).VllmOmniPipelineBase.get_class(architecture, algorithm)resolves to the rollout adapter and is consumed by the vllm-omni rollout worker.
5.1 Pipeline Config Knobs
Pipeline sampling parameters live under actor_rollout_ref.rollout.pipeline.*
(mapped to
DiffusionPipelineConfig)
and are mirrored in actor_rollout_ref.model.pipeline.*.
Always copy the defaults from the upstream HuggingFace model card so RL exploration starts from a known-good operating point.
Knob |
Notes |
|---|---|
|
Must be a multiple of |
|
Steps used during training rollout. Default |
|
Full-quality steps for validation only (e.g. |
|
For Qwen-Image-style true CFG (e.g. |
|
For pipelines whose upstream uses |
|
Must accommodate the templated prompt length your tokenizer produces. |
Config hygiene. Any new field on
DiffusionPipelineConfigmust also be added to:
diffusion_rollout.yaml— both the top-levelpipeline:section andval_kwargs.pipeline:.
diffusion_model.yaml— itspipeline:section, using${oc.select:actor_rollout_ref.rollout.pipeline.<field>,<default>}.
5.2 Example Launch Script and Data Preprocessor
Ship a runnable example so users can launch training without trial and
error. Use
examples/flowgrpo_trainer/run_qwen_image_ocr_lora.sh
and
examples/flowgrpo_trainer/data_process/qwenimage_ocr.py
as templates.
The data preprocessor’s tokenisation must match the upstream
_encode_prompt exactly — same chat template, same special tokens,
same enable_thinking flag, etc. Mismatches here cause silent reward
collapse.
5.3 Use the VeOmni Backend
Backend selection is orthogonal to model integration: the adapters you wrote in Steps 3–4 work unchanged regardless of whether the actor runs on the default diffusers + FSDP2 engine or on VeOmni. Switching is a configuration concern handled by a few Hydra overrides at launch time.
What VeOmni reuses from your model adapter
DiffusionModelBasesubclass (Step 3) — used verbatim. The VeOmni engine calls the sameprepare_model_inputs/forward_and_sample_previous_stepcontract.VllmOmniPipelineBasesubclass (Step 4) — used verbatim. Rollout always runs in vllm-omni, independent of the actor backend.FlowMatchSDEDiscreteScheduler(Step 3.1) — used verbatim.
What VeOmni requires that diffusers does not
Upstream support in VeOmni. Just as diffusers must provide your
<Name>Transformer2DModel, VeOmni must be able to load your model via itsDiTTrainerpath. If VeOmni does not yet support the architecture, upstream it there first (the diffusers prerequisite from Step 1 still applies for rollout — both upstreams are required).config_path/transformer_subfolder. The VeOmni engine loads the transformer from<local_path>/<transformer_subfolder>and the config fromconfig_path(falling back to the weights path). These fields are already onDiffusionModelConfigand are shared with the diffusers backend, so no new model-specific fields are needed.
Launching with the VeOmni backend
diffusion/model_engine=veomni_diffusion switches the entire actor / reference Hydra schema; the other actor-engine fields then live under actor_rollout_ref.actor.veomni_config.* and actor_rollout_ref.ref.veomni_config.*:
python3 -m verl_omni.trainer.main_diffusion \
diffusion/model_engine=veomni_diffusion \
actor_rollout_ref.actor.strategy=veomni \
actor_rollout_ref.actor.veomni_config.strategy=veomni \
actor_rollout_ref.ref.veomni_config.strategy=veomni \
... # everything else identical to your diffusers/FSDP2 recipe
See examples/flowgrpo_trainer/run_qwen_image_ocr_veomni.sh for a complete VeOmni recipe that mirrors run_qwen_image_ocr.sh line-for-line — the diff is only the engine-selection fields. Install instructions for VeOmni alongside vLLM 0.20.2 are in docs/start/install.md.
Mixing override schemas — don’t
diffusion/model_engine=veomni_diffusion selects the Hydra schema as a whole. Do not mix actor.fsdp_config.* and actor.veomni_config.* overrides in the same run — the fields for the other engine will be rejected as unknown keys at config-resolution time.
Step 6 — Add a Smoke Test
Add an end-to-end smoke test under tests/special_e2e/ modelled on
tests/special_e2e/run_flowgrpo_qwen_image.sh.
The script must exercise the full pipeline against a tiny-random/<ModelName>
checkpoint:
Generate dummy parquet data via
tests/special_e2e/create_dummy_diffusion_data.py.Launch
verl_omni.trainer.main_diffusionwith model-specific knobs (architecture, prompt template, CFG parameters, sequence lengths).Assert exit code
0.
Then register the script in
tests/gpu_smoke/run_gpu_smoke_tests.sh
as a new numbered test entry. The runner already exports
PYTHONUNBUFFERED=1 and RAY_DEDUP_LOGS=0 for readable logs — no need
to set them in your script.
When to Refactor Instead of Duplicating
If you are copy-pasting more than a few lines from another model’s adapter, prefer one of:
Extending
pipelines/utils.pywith a generic helper.Adding a method to
DiffusionModelBaseorVllmOmniPipelineBaseso future models do not re-discover the contract.Promoting a helper to a shared module once a second model needs it.
Refactor opportunistically: keep model-specific quirks local until a third model demands the same code, then unify.
Final Checklist
Before opening the PR, confirm every box:
[ ]
verl_omni/pipelines/<model>_flow_grpo/contains__init__.py,diffusers_training_adapter.py, andvllm_omni_rollout_adapter.py.[ ]
verl_omni/pipelines/__init__.pyimports the new package.[ ] Architecture string on both
@register(...)decorators matchesmodel_index.json::_class_name; thealgorithm=keyword matches the algorithm you are integrating against (e.g."flow_grpo"for FlowGRPO).[ ] Scheduler returns latents in fp32 (no
model_output.dtypecast instep()),diffuse()casts to model dtype before transformer forward and casts noise_pred to float32 beforescheduler.step()— see Common Pitfalls.[ ] Any new
DiffusionPipelineConfigfield is mirrored in bothdiffusion_rollout.yamlanddiffusion_model.yaml.[ ] Example launch script in
examples/flowgrpo_trainer/plus a matching data preprocessor underexamples/flowgrpo_trainer/data_process/.[ ] Smoke test
tests/special_e2e/run_<algo>_<model>.shexists and is wired intotests/gpu_smoke/run_gpu_smoke_tests.sh.[ ] Docs updated (this guide if the contract changed; the relevant
docs/algo/...page if you introduce algorithm-level concepts).