Verifiers: Environments for LLM Reinforcement Learning (github.com)

0 points 18 hours ago ago | visit original

🤖 AI Summary

Verifiers is a new, modular library for building RL environments and training LLM agents that can also serve as evaluation suites and synthetic data pipelines. It provides installable Environment modules (loadable via vf.load_environment), HF-style dataset integration (prompt/question, optional answer/info columns), and a Rubric abstraction for composing synchronous or asynchronous reward functions. Verifiers supports single-turn and multi-turn protocols (SingleTurnEnv, MultiTurnEnv, ToolEnv) and lets you expose tools as Python functions (auto-converted to JSON-schema tool specs) for agentic tool use. Rollouts use OpenAI-compatible inference (/v1/chat.completions or /v1/completions) with first-class vLLM sampling-arg support (interrupt/resume, reasoning budgets). Key constraints include a required monotonic token context (tokens appended cannot be removed), which affects some reasoning models and multi-turn rollouts. On the training side Verifiers ships an async GRPO trainer (vf.GRPOTrainer) built around the transformers Trainer and is compatible with prime-rl for large-scale FSDP runs; it’s optimized for efficient dense-transformer training on 2–16 GPUs and supports full-parameter finetuning. The toolchain uses the uv installer, optionally flash-attn for GPU speedups, and integrates with Accelerate/DeepSpeed or prime-rl orchestration. Practical details: vf-eval CLI supports concurrency and vLLM sampling args, environments accept max_concurrent, and common deployment tips (ulimit, NCCL env vars, wandb/HF logins, OPENAI_API_KEY) are documented for robust multi-GPU and high-throughput setups.

Loading comments...

loading comments...