Using an External HTTP Scorer Service

Last updated: 05/28/2026

VeRL-Omni ships a generic HTTP reward client (verl_omni.utils.reward_score.http_scorer_client) that sends generated images to an external scorer service over HTTP and returns the score. This is useful when your reward model is too large to co-locate with training, needs a different runtime (e.g., a separate GPU pool), or is shared across multiple experiments.

How it works

┌──────────────┐        pickle payload         ┌──────────────────┐
│  VeRL-Omni   │  ──── POST (bytes) ────────►  │  Scorer Service  │
│  (training)  │                               │  (Flask/Gunicorn)│
│              │  ◄─── pickle response ──────  │                  │
└──────────────┘                               └──────────────────┘
  1. During reward computation, the client converts the generated image tensor to JPEG bytes (offloaded to a thread pool to avoid blocking the async event loop).

  2. The JPEG bytes and prompt are packed into a pickle payload and sent via HTTP POST.

  3. The scorer service runs inference and returns scores in a pickle response.

Since compute_score is an async function and RewardLoopWorker.compute_score_batch uses asyncio.gather, all samples in a batch hit the server concurrently — no serial bottleneck.

Scorer service protocol

The HTTP scorer client (http_scorer_client.py) communicates with external reward services using a pickle-based protocol, following the interface defined in flow_grpo. A reference implementation is available at deepgen_rl/ocr_scorer_service.

Request

The client sends a POST request with body = pickle.dumps(payload) where:

payload = {
    "images": [bytes, ...],   # List of JPEG-encoded image bytes
    "prompts": [str, ...],    # List of prompt strings (same length as images)
    "metadata": {},           # Reserved for future use
}

Response

The service must return pickle.dumps(response) where:

# Success (HTTP 200):
response = {"scores": [float, ...]}  # One score per image, typically in [0, 1]

# Error (HTTP 200 with error key, or HTTP 5xx):
response = {"error": "description of what went wrong"}

Any service that implements this interface can be used as a reward function — PaddleOCR, HPSv3, aesthetic scorers, CLIP-based scorers, etc. The service runs independently and can use any framework (PaddlePaddle, PyTorch, ONNX, etc.) without conflicting with the training environment.

Setting up a scorer service

A reference implementation is available at deepgen_rl/ocr_scorer_service. Each service follows the same Flask + Gunicorn pattern:

# Clone and start the OCR scorer service
cd rewards_services/api_services/ocr_scorer_service
pip install -r requirements.txt
gunicorn -c gunicorn.conf.py 'app:create_app()'

The default port is configured in each service’s gunicorn.conf.py. You can also write your own service — just implement the pickle-based protocol above (see flow_grpo reward preparation for the specification).

Configuring VeRL-Omni to use the HTTP scorer

In your training launch script, configure the reward function to point to http_scorer_client and pass the server_url:

python3 -m verl_omni.trainer.main_diffusion \
    ...
    "+reward.reward_functions.my_reward.path=pkg://verl_omni.utils.reward_score.http_scorer_client" \
    '+reward.reward_functions.my_reward.name=compute_score' \
    '+reward.reward_functions.my_reward.weight=1.0' \
    "+reward.reward_functions.my_reward.server_url=http://<scorer-host>:<port>" \
    ...

Key points:

  • path: Module path using the pkg:// prefix.

  • name: The async function to call (compute_score).

  • weight: Reward weight when combining multiple reward functions.

  • server_url: Full URL of your scorer service (no trailing slash).

Any extra key-value pairs added under the same reward function config are forwarded as **kwargs to compute_score.

Full example

See the example script: examples/flowgrpo_trainer/run_qwen_image_ocr_reward_server.sh

This script trains Qwen-Image with FlowGRPO using an external OCR reward server.

# 1. Start external OCR reward server (separate process/machine)
# See: https://github.com/deepgenteam/deepgen_rl/tree/main/rewards_services/api_services/ocr_scorer_service
# The service interface follows: https://github.com/yifan123/flow_grpo#3-reward-preparation
cd rewards_services/api_services/ocr_scorer_service
bash run.sh  # Starts on port 19082

# 2. Prepare data (stores full prompt as ground_truth for HTTP service)
python examples/flowgrpo_trainer/data_process/qwenimage_ocr_http_service.py \
    --input_dir ~/dataset/ocr/ --output_dir ~/data/ocr_http

# 3. Run training
OCR_REWARD_SERVER_URL=http://<server-ip>:19082 \
    bash examples/flowgrpo_trainer/run_qwen_image_ocr_reward_server.sh

Notes

  • The HTTP client reuses a single aiohttp.ClientSession across calls to avoid per-request connection overhead.

  • Image serialization (tensor to PIL to JPEG) is offloaded to a thread pool via asyncio.loop.run_in_executor so it does not block the reward manager’s async event loop.

  • The default request timeout is 120 seconds. If your scorer model is slow, consider scaling the service with Gunicorn workers or increasing the timeout in the client code.