Reward Interface

Last updated: Jun 05, 2026 (API docstrings are auto-generated).

VeRL-Omni reward pipelines support both rule-based scoring (e.g. JPEG compressibility) and model-based generative reward models (e.g. OCR via a vision-language model served behind an OpenAI-compatible router). Reward computation is dispatched per sample by the VisualRewardManager, which plugs into the standard verl.experimental.reward_loop.RewardLoopManager.

verl_omni.utils.reward_score.default_compute_score_image

Compute the reward score for a visual (image) response.

verl_omni.utils.reward_score.http_scorer_client.compute_score

Compute reward by calling an external HTTP scorer service.

verl_omni.utils.reward_score.unified_reward.compute_score_unified_reward

Compute a human-preference score via UnifiedReward 2.0.

Reward Manager

Default Score Dispatcher

Visual (image) reward scoring functions for VeRL-Omni.

verl_omni.utils.reward_score.default_compute_score_image(data_source, solution_image, ground_truth, extra_info=None, **kwargs)[source]

Compute the reward score for a visual (image) response.

Parameters:
  • data_source (str) – Dataset identifier that determines the scoring method.

  • solution_image – The generated image, as a torch.Tensor in shape (C, H, W) or (N, C, H, W).

  • ground_truth (str) – Ground-truth answer (may be unused for rule-based rewards such as jpeg_compressibility).

  • extra_info (dict, optional) – Additional metadata passed by the reward manager.

Returns:

The computed score (or a dict with a "score" key).

Return type:

float or dict

Raises:

NotImplementedError – If no scorer is registered for data_source.

Built-in Reward Scorers

JPEG Compressibility

The reward function for JPEG compressibility. It is adapted from https://github.com/kvablack/ddpo-pytorch.

verl_omni.utils.reward_score.jpeg_compressibility.compute_score(solution_image)[source]

The scoring function for JPEG compressibility.

Parameters:

solution_image – the solution image or video, in shape (C, H, W) or (N, C, H, W).

GRM-based OCR Reward

OCR scoring backed by a generative reward model (GRM).

The compute_score_ocr() function sends a generated image to a vision language model served behind an OpenAI-compatible router and uses the model’s transcription, compared to a ground truth, to produce a score in [0, 1].

async verl_omni.utils.reward_score.genrm_ocr.compute_score_ocr(data_source: str, solution_image: ndarray | Tensor, ground_truth: str, extra_info: dict, reward_router_address: str, reward_model_tokenizer: PreTrainedTokenizer = None, model_name: str | None = None)[source]

Compute an image OCR score via a generative reward model (GRM).

The image is sent to a GRM via an OpenAI-compatible router; the returned transcription is compared to ground_truth using Levenshtein distance to yield a score in [0, 1] (1 = perfect match).

Parameters:
  • data_source – Source dataset identifier. Unused, kept for interface consistency.

  • solution_image – The solution image or video to be evaluated.

  • ground_truth – The ground truth text for comparison.

  • extra_info – Additional information; frame_interval controls video frame subsampling.

  • reward_router_addresshost:port of the GRM router.

  • reward_model_tokenizer – Tokenizer for the reward model. Unused, kept for interface consistency.

  • model_name – Name or path of the GRM. Defaults to DEFAULT_GRM_MODEL_PATH.

Returns:

{"score": float, "genrm_response": str}.

Return type:

dict

HTTP Scorer Client

Generic HTTP reward client for external scorer services.

Sends generated images to an external HTTP scorer service using pickle protocol and returns the score. Compatible with all scorer services under rewards_services/api_services/ that accept the standard payload format:

POST with pickle-serialized {"images": List[bytes], "prompts": List[str], "metadata": dict}
Response: pickle-serialized {"scores": List[float]}
async verl_omni.utils.reward_score.http_scorer_client.compute_score(solution_image: Tensor, ground_truth: str, server_url: str, **kwargs) dict[source]

Compute reward by calling an external HTTP scorer service.

Parameters:
  • solution_image – Generated image tensor (C, H, W) or (N, C, H, W).

  • ground_truth – Prompt string passed directly to the scorer service.

  • server_url – Full URL of the scorer service (e.g., “http://localhost:19082”).

Returns:

dict with “score” key.

UnifiedReward Scorer

Human-preference scoring backed by UnifiedReward 2.0.

async verl_omni.utils.reward_score.unified_reward.compute_score_unified_reward(data_source: str, solution_image: ndarray | Tensor, ground_truth: str, extra_info: dict, reward_router_address: str, reward_model_tokenizer: PreTrainedTokenizer = None, model_name: str | None = None)[source]

Compute a human-preference score via UnifiedReward 2.0.

The reward model scores the generated image against its text caption on Alignment, Coherence, and Style axes. The returned score is the mean of those axes normalized from the model’s 1-5 scale to [0, 1].

Reward Utilities

verl_omni.utils.reward_score.reward_utils.pil_image_to_base64(image: Image) str[source]

Convert a PIL Image to a base64-encoded data URI string.

Parameters:

image – The PIL Image to convert.

Returns:

A base64-encoded PNG data URI string (e.g. data:image/png;base64,...).