My Experience Using Tinker (www.rajan.sh)

🤖 AI Summary
While wrestling with distributed RL training and reward collapse for a learned-compression project, the author ported their code to Tinker and finished an experiment overnight. Tinker provided simple onboarding, MoE and LoRA support, and a compact API that removed the infrastructure burden: distributed training/inference, checkpoint storage, and per-token policy computation (ratio computation, clipping, policy loss) run server-side. The codebase shrank ~5x by adopting three async primitives—sample_async (inference/rollout), forward_backward_async (loss + grads) and optim_step_async (apply optimizer)—and allowed multiple models to run in one script sharing base model and tokenizer. Tinker also exposes token-level loss outputs (logprobs) from forward_backward_async, enabling dense, token-level ΔlogP rewards and custom reward pipelines while leaving algorithmic choices (e.g., PPO, reward model) entirely to the user. For researchers this matters: it dramatically speeds iteration on complex RL setups without needing low-level CUDA/PyTorch work, and supports efficient adapters (LoRA) and MoE with minimal ops. Friction points include the lack of a first-class batch sampler that minimizes prefill passes for many prompts, some awkwardness converting TensorData to numpy for multi-model experiments, and limited built-in logging and multimodal-RL features. Overall the author found Tinker reliable, flexible, and a major productivity win for RL research.
Loading comments...
loading comments...