Training Metrics
Last updated: 05/15/2026
The table below describes metrics specific to diffusion FlowGRPO / GRPO-Guard training, logged each step to your configured backend (console / W&B).
Metric |
Definition |
Interpretation |
|---|---|---|
zero_std_ratio |
\(\frac{1}{B}\lvert\{i : \sigma_i = 0\}\rvert\) |
GRPO derives its learning signal from relative rewards within a group; \(\sigma_i = 0\) means group \(i\) contributes no gradient regardless of absolute reward. A persistently high value (e.g. \(> 0.5\)) indicates reward saturation or poorly calibrated task difficulty. |
std_mean |
\(\frac{1}{B}\sum\limits_{i=1}^{B} \sigma_i\) |
Tracks average reward diversity across the batch. A declining trend is an early warning of saturation, typically visible before zero_std_ratio spikes. |
pg_clipfrac_higher |
\(\hat{P}(r > 1 + \varepsilon)\) |
The policy is reinforcing high-advantage denoising steps beyond the clip threshold. pg_clipfrac_higher \(\gg\) pg_clipfrac_lower signals upward-dominant learning and can guide tuning of the clip ratio or learning rate. |
pg_clipfrac_lower |
\(\hat{P}(r < 1 - \varepsilon)\) |
The policy is suppressing low-advantage denoising steps beyond the clip threshold. Asymmetry between higher and lower clipfrac reveals the dominant learning direction. |
ratio_mean |
\(\mathbb{E}[\rho_t]\) |
Mean importance ratio across the batch. Should stay close to 1; persistent drift indicates the current policy is diverging from the rollout policy. |
ratio_std |
\(\mathrm{Std}(\rho_t)\) |
Spread of the importance ratio. High values signal high-variance gradient updates and may indicate the clip ratio or learning rate is too large. |
timing_per_image_ms |
Latency (ms/image) per stage |
Covers rollout, reference log-prob, old log-prob, advantage computation, and actor update; identifies which stage dominates step time and where to focus optimization effort. |
throughput |
\(\dfrac{B \times n}{t_\mathrm{step} \times N}\) (images / GPU / s) |
Overall training throughput. Use alongside timing_per_image_ms to evaluate scaling efficiency and detect regressions across runs. |
Variables.
\(B\) — number of prompts per training batch
\(n\) — number of images generated per prompt
\(\sigma_i\) — reward standard deviation within group \(i\)
\(\rho_t\) — importance ratio \(\pi_\theta / \pi_{\theta_\mathrm{old}}\) per (image, denoising-timestep) pair
\(r\) — shorthand for \(\rho_t\) in clipping expressions
\(\varepsilon\) — clip ratio
\(N\) — number of GPUs
\(t_\mathrm{step}\) — wall-clock time per training step