Installation
Last updated: 06/05/2026
Requirements
For NVIDIA GPU:
Python: Version >= 3.10
CUDA: Version >= 12.8
For Ascend NPU:
Python: Version >= 3.10
CANN: Version >= 8.5.0
Install
Follow the steps below in order to avoid dependency conflicts:
Create a Python virtual environment:
uv venv --python 3.12 --seed
source .venv/bin/activate
Install
vllmfollowed byvllm-omni.
For NVIDIA GPU:
uv pip install vllm==0.20.2
uv pip install "vllm-omni @ git+https://github.com/vllm-project/vllm-omni.git@c7178d89bb7a70817f239febc84c3b21a714dae7"
For Ascend NPU:
uv pip install "vllm @ git+https://github.com/vllm-project/vllm.git@releases/v0.20.2"
uv pip install "vllm-ascend @ git+https://github.com/vllm-project/vllm-ascend.git@07f6fec2aa4404e1283c4cd6c0981aa878bc5be9"
uv pip install "vllm-omni @ git+https://github.com/vllm-project/vllm-omni.git@c7178d89bb7a70817f239febc84c3b21a714dae7"
Install
verlfollowed byverl-omnifrom source:
# Install verl
uv pip install "verl==0.8.0"
# Install verl-omni from source
git clone https://github.com/verl-project/verl-omni.git
cd verl-omni
uv pip install -e .
Note: Install
vllmandvllm-omnifirst, as they may override your existing PyTorch installation. Installing them beforeverlandverl-omniensures a compatible, hardware-aware PyTorch version.
Optional Dependencies
Extra |
Install |
When needed |
|---|---|---|
OCR reward |
|
FlowGRPO training with OCR-based reward |
Diffusers Flash Attention 3 backend |
|
Using |
VeOmni engine backend |
Running the diffusion trainer with VeOmni instead of the default FSDP2 |
Flash Attention 3 (_flash_3_varlen_hub)
Set actor_rollout_ref.model.attn_backend="_flash_3_varlen_hub" in your
training script to switch from the default native attention to a
Flash-Attention-3-based backend. This requires the kernels package:
uv pip install kernels==0.14.1
Optional engine backends
VeRL-Omni defaults to FSDP2 as the training engine for the policy and reference models. The diffusion trainer can alternatively be switched to VeOmni. The engine is selected at the Hydra command line — see examples/flowgrpo_trainer/run_qwen_image_ocr_veomni.sh for a complete recipe.
Installing VeOmni alongside vLLM 0.20.2
VeOmni 0.1.11’s gpu extra pins torch==2.9.1+cu129, which conflicts with vllm==0.20.2 (depends on torch>=2.11). A plain uv pip install veomni[gpu,dit]==0.1.11 therefore fails dependency resolution.
VeOmni itself runs correctly on torch 2.11 — only the [gpu] extra’s pin is too strict. Install it without dependency resolution so the existing torch/vllm stack is preserved, and add the small set of runtime extras that the verl-omni VeOmni engine actually needs:
uv pip install veomni==0.1.11 --no-deps
uv pip install torchcodec librosa soundfile av
Verify the engine is importable:
python -c "import veomni; print('veomni', veomni.__version__)"
python -c "from veomni.distributed.offloading import load_model_to_gpu, load_optimizer, offload_model_to_cpu, offload_optimizer; print('VeOmni offloading helpers OK')"
If you want VeOmni’s full [gpu,dit] extras (flash-attn variants, liger-kernel, cuda-python, etc.), install them in a separate environment not pinned to vllm 0.20.2; verl-omni does not need them.
Post-Installation Verification
For NVIDIA GPU:
python -c "import torch; print('torch', torch.__version__, '| CUDA', torch.version.cuda)"
python -c "import vllm; print('vllm', vllm.__version__)"
python -c "import verl; print('verl', verl.__version__)"
python -c "import verl_omni; print('VeRL-Omni ready')"
For Ascend NPU:
python -c "import torch; import torch_npu; print('torch', torch.__version__, '| NPU', torch.npu.is_available())"
python -c "import vllm; print('vllm', vllm.__version__)"
python -c "import verl; print('verl', verl.__version__)"
python -c "import verl_omni; print('VeRL-Omni ready')"