Anatomy of a Modern Finetuning API (benanderson.work)

0 points 13 hours ago ago | visit original

🤖 AI Summary

Thinking Machines released Tinker, a low-level fine-tuning API that exposes three primitives—sample, forward_backward, and optim_step—letting users implement both supervised fine-tuning and online reinforcement-style updates from the same interface. That combination is notable: sample lets the model generate outputs that can be scored and used to update weights, enabling domains where scoring is easier than authoring responses. Tinker departs from 2023-era pipelines that pre-upload tokenized datasets and batch to fully saturate GPUs; instead it requires sending each batch (and often completions) over the network, introducing ~100ms+ latency per step. On the face of it that looks inefficient, but the API’s design seems optimized for a multi-tenant, LoRA-first deployment rather than single-run training. Technically, the viability hinges on two trade-offs: using LoRA/adapter updates (10–100 MB) makes model swaps sub-second and lets a hot pool of workers host a small set of base models, while async queuing of forward_backward and optim_step requests hides network latency. This architecture could let many users share inference and even training compute, but online RL still faces serial dependencies (you need the latest completion to score and update). Full weight fine-tuning likely remains more expensive or enterprise-only. Tinker therefore signals a pragmatic frontier-lab approach: favor fast, multi-tenant LoRA workflows to democratize RL and fine-tuning experiments while reserving heavyweight full-tune runs for specialized customers.

Loading comments...

loading comments...