Training Metrics

Last updated: 05/15/2026

The table below describes metrics specific to diffusion FlowGRPO / GRPO-Guard training, logged each step to your configured backend (console / W&B).

Metric	Definition	Interpretation
zero_std_ratio	\(\frac{1}{B}\lvert\{i : \sigma_i = 0\}\rvert\)	GRPO derives its learning signal from relative rewards within a group; \(\sigma_i = 0\) means group \(i\) contributes no gradient regardless of absolute reward. A persistently high value (e.g. \(> 0.5\)) indicates reward saturation or poorly calibrated task difficulty.
std_mean	\(\frac{1}{B}\sum\limits_{i=1}^{B} \sigma_i\)	Tracks average reward diversity across the batch. A declining trend is an early warning of saturation, typically visible before zero_std_ratio spikes.
pg_clipfrac_higher	\(\hat{P}(r > 1 + \varepsilon)\)	The policy is reinforcing high-advantage denoising steps beyond the clip threshold. pg_clipfrac_higher \(\gg\) pg_clipfrac_lower signals upward-dominant learning and can guide tuning of the clip ratio or learning rate.
pg_clipfrac_lower	\(\hat{P}(r < 1 - \varepsilon)\)	The policy is suppressing low-advantage denoising steps beyond the clip threshold. Asymmetry between higher and lower clipfrac reveals the dominant learning direction.
ratio_mean	\(\mathbb{E}[\rho_t]\)	Mean importance ratio across the batch. Should stay close to 1; persistent drift indicates the current policy is diverging from the rollout policy.
ratio_std	\(\mathrm{Std}(\rho_t)\)	Spread of the importance ratio. High values signal high-variance gradient updates and may indicate the clip ratio or learning rate is too large.
timing_per_image_ms	Latency (ms/image) per stage	Covers rollout, reference log-prob, old log-prob, advantage computation, and actor update; identifies which stage dominates step time and where to focus optimization effort.
throughput	\(\dfrac{B \times n}{t_\mathrm{step} \times N}\) (images / GPU / s)	Overall training throughput. Use alongside timing_per_image_ms to evaluate scaling efficiency and detect regressions across runs.

Variables.

\(B\) — number of prompts per training batch
\(n\) — number of images generated per prompt
\(\sigma_i\) — reward standard deviation within group \(i\)
\(\rho_t\) — importance ratio \(\pi_\theta / \pi_{\theta_\mathrm{old}}\) per (image, denoising-timestep) pair
\(r\) — shorthand for \(\rho_t\) in clipping expressions
\(\varepsilon\) — clip ratio
\(N\) — number of GPUs
\(t_\mathrm{step}\) — wall-clock time per training step