Workers Interface
Last updated: Jun 05, 2026 (API docstrings are auto-generated).
VeRL-Omni workers wrap the Diffusers / FSDP training engine, the rollout engine (vLLM-Omni), and the optional reference policy. The single-controller trainer drives them through a unified RPC layer.
Algorithm configuration for the SDE-based diffusion rollout. |
|
Engine Workers
Diffusers FSDP Engine
DiffusersFSDPEngine
is the abstract base that implements the verl.workers.engine.base.BaseEngine
interface for diffusion transformer backbones (e.g. Qwen-Image), including
LoRA, mixed precision, and parameter / optimizer offloading.
Diffusers PPO FSDP Engine
PPODiffusersFSDPEngine
is the concrete engine registered for FlowGRPO-style training (FlowGRPO,
MixGRPO, GRPO-Guard). It subclasses
DiffusersFSDPEngine
and adds PPO forward/backward and batch I/O helpers.
LoRA Adapter Mixin
Reusable PEFT/LoRA helpers for named policy adapters (e.g. default and old).
Used by DiffusersFSDPEngine.
Loss Functions
Padding Utilities
Padding utilities for diffusion model training.
- verl_omni.workers.utils.padding.embeds_padding_2_no_padding(data: TensorDict) TensorDict[source]
Convert TensorDict from prompt embeds with padding to no-padding format. For diffusion model training only.
Currently we expect the prompt embedding mask to be [1111000…] format, which means the valid tokens are continuous and start from the left.
- Parameters:
data – TensorDict with
prompt_embeds,prompt_embeds_mask,negative_prompt_embeds,negative_prompt_embeds_mask.- Returns:
TensorDict where
prompt_embedsandnegative_prompt_embedsare replaced with jaggedtorch.nestedtensors. Tensor masks are also converted to nested tensors after stripping padding; missing or non-tensor masks leave the full embedding sequence intact.
Worker Configs
The configs below are dataclass mirrors of the YAML / Hydra options consumed
by the engine workers. They are typically built from
omegaconf.DictConfig via verl.utils.config.omega_conf_to_dataclass().
- class verl_omni.workers.config.DiffusionModelConfig(_target_: str = '', path: str = '???', architecture: str | None = None, transformer_config: Optional[dict[str, Any]]=None, algorithm: str = '???', local_path: str | None = None, tokenizer_path: str | None = None, local_tokenizer_path: str | None = None, model_type: str = 'diffusion_model', load_tokenizer: bool = True, tokenizer: Any = None, processor: Any = None, use_shm: bool = False, trust_remote_code: bool = False, custom_chat_template: str | None = None, external_lib: str | None = None, enable_gradient_checkpointing: bool = True, attn_backend: str = 'native', lora_rank: int = 0, lora_alpha: int = 64, lora_init_weights: str = 'gaussian', target_modules: Any | None = 'all-linear', target_parameters: list[str] | None = None, exclude_modules: str | None = None, lora: dict[str, typing.Any]=<factory>, lora_adapter_path: str | None = None, policy_state_adapters: tuple[str, ...]=('default', ), lora_dtype: str | None = None, mtp: verl.workers.config.model.MtpConfig | None = <factory>, pipeline: verl_omni.workers.config.diffusion.rollout.DiffusionPipelineConfig = <factory>, algo: verl_omni.workers.config.diffusion.rollout.DiffusionRolloutAlgoConfig | None = <factory>, fsdp_layer_prefixes: list[str] = <factory>, config_path: str | None = None, transformer_subfolder: str = 'transformer')[source]
- class verl_omni.workers.config.DiffusionActorConfig(_target_: str = '', strategy: str = '???', ppo_mini_batch_size: int = 256, ppo_micro_batch_size_per_gpu: int = '???', diffusion_loss: verl_omni.workers.config.diffusion.actor.DiffusionLossConfig = <factory>, loss_scale_factor: float | None = None, use_kl_loss: bool = False, kl_loss_coef: float = 0.001, ppo_epochs: int = 1, shuffle: bool = False, data_loader_seed: int = 42, checkpoint: verl.trainer.config.config.CheckpointConfig = <factory>, optim: verl.workers.config.optimizer.OptimizerConfig = <factory>, engine: verl.base_config.BaseConfig = <factory>, rollout_n: int = '???', model_config: verl_omni.workers.config.diffusion.model.DiffusionModelConfig = <factory>, log_prob_micro_batch_size_per_gpu: int | None = None, profiler: verl.utils.profiler.config.ProfilerConfig | None = None, global_batch_info: dict = <factory>, rollout_correction: verl.trainer.config.algorithm.RolloutCorrectionConfig = <factory>)[source]
- class verl_omni.workers.config.FSDPDiffusionActorConfig(_target_: str = '', strategy: str = 'fsdp', ppo_mini_batch_size: int = 256, ppo_micro_batch_size_per_gpu: int = '???', diffusion_loss: verl_omni.workers.config.diffusion.actor.DiffusionLossConfig = <factory>, loss_scale_factor: float | None = None, use_kl_loss: bool = False, kl_loss_coef: float = 0.001, ppo_epochs: int = 1, shuffle: bool = False, data_loader_seed: int = 42, checkpoint: verl.trainer.config.config.CheckpointConfig = <factory>, optim: verl.workers.config.optimizer.OptimizerConfig = <factory>, engine: verl.base_config.BaseConfig = <factory>, rollout_n: int = '???', model_config: verl_omni.workers.config.diffusion.model.DiffusionModelConfig = <factory>, log_prob_micro_batch_size_per_gpu: int | None = None, profiler: verl.utils.profiler.config.ProfilerConfig | None = None, global_batch_info: dict = <factory>, rollout_correction: verl.trainer.config.algorithm.RolloutCorrectionConfig = <factory>, grad_clip: float = 1.0, fsdp_config: verl.workers.config.engine.FSDPEngineConfig = <factory>)[source]
- class verl_omni.workers.config.DiffusionLossConfig(_target_: str = '', loss_mode: str = 'flow_grpo', clip_ratio: float = 0.0001, adv_clip_max: float = 5.0, mix_beta: float = 0.5, ref_kl_coef: float = 0.0, adaptive_weight_min: float = 1e-05, dpo_beta: float = 2000.0)[source]
- class verl_omni.workers.config.DiffusionRolloutConfig(_target_: str = '', name: str | None = '???', mode: str = 'async', nnodes: int = 0, n_gpus_per_node: int = 8, n: int = 1, seed: int | None = None, prompt_length: int = 512, dtype: str = 'bfloat16', gpu_memory_utilization: float = 0.5, enforce_eager: bool = False, cudagraph_capture_sizes: list | None = None, free_cache_engine: bool = True, data_parallel_size: int = 1, expert_parallel_size: int = 1, tensor_model_parallel_size: int = 2, pipeline_model_parallel_size: int = 1, max_num_batched_tokens: int = 8192, logprobs_mode: str | None = 'processed_logprobs', scheduling_policy: str | None = 'fcfs', val_kwargs: verl_omni.workers.config.diffusion.rollout.DiffusionSamplingConfig = <factory>, max_model_len: int | None = None, max_num_seqs: int = 1024, log_prob_micro_batch_size_per_gpu: int | None = None, disable_log_stats: bool = True, engine_kwargs: dict = <factory>, pipeline: verl_omni.workers.config.diffusion.rollout.DiffusionPipelineConfig = <factory>, calculate_log_probs: bool = False, rollout_adapter: str = 'default', agent: verl.workers.config.rollout.AgentLoopConfig = <factory>, multi_turn: verl.workers.config.rollout.MultiTurnConfig = <factory>, prometheus: verl.workers.config.rollout.PrometheusConfig = <factory>, checkpoint_engine: verl.workers.config.rollout.CheckpointEngineConfig = <factory>, enable_chunked_prefill: bool = True, enable_prefix_caching: bool = True, load_format: str = 'dummy', layered_summon: bool = False, skip_tokenizer_init: bool = True, quantization: str | None = None, enable_rollout_routing_replay: bool = False, enable_sleep_mode: bool = True, mtp: verl.workers.config.model.MtpConfig | None = <factory>, profiler: verl.utils.profiler.config.ProfilerConfig | None = None, algo: verl_omni.workers.config.diffusion.rollout.DiffusionRolloutAlgoConfig | None = <factory>, disaggregation: verl.workers.config.disaggregation.DisaggregationConfig = <factory>, external_lib: str | None = None)[source]
- class verl_omni.workers.config.DiffusionRolloutAlgoConfig(_target_: str = '', noise_level: float = 1.0, sde_type: str = 'sde', sde_window_size: int | None = None, sde_window_range: list[int] | None = None, sample_strategy: str = 'random', iters_per_group: int = 1, sde_window_seed: int = 0)[source]
Algorithm configuration for the SDE-based diffusion rollout.
- class verl_omni.workers.config.DiffusionPipelineConfig(_target_: str = '', height: int = 512, width: int = 512, num_inference_steps: int = 10, true_cfg_scale: float = 1.0, max_sequence_length: int = 512, guidance_scale: float | None = None, num_frames: int = 1)[source]
- class verl_omni.workers.config.DiffusionSamplingConfig(_target_: str = '', n: int = 1, seed: int = 42, pipeline: verl_omni.workers.config.diffusion.rollout.DiffusionPipelineConfig = <factory>, algo: verl_omni.workers.config.diffusion.rollout.DiffusionRolloutAlgoConfig = <factory>)[source]