Black-Box On-Policy Distillation of Large Language Models (arxiv.org)

🤖 AI Summary
Researchers introduce Generative Adversarial Distillation (GAD), a black-box, on-policy method for creating student LLMs from proprietary teachers using only the teachers’ text outputs (no logits or parameters). GAD treats the student as a generator and trains a discriminator to tell student responses apart from teacher responses, creating a minimax game where the discriminator serves as a continuously updated, on-policy reward model. Because the reward model co-evolves with the student, GAD supplies adaptive, stable feedback that mitigates distributional mismatch problems common in offline sequence-level knowledge distillation. Empirically, GAD consistently outperforms standard sequence-level distillation and can close large gaps: Qwen2.5-14B-Instruct trained with GAD became comparable to its teacher, GPT-5-Chat, on the LMSYS-Chat automatic evaluation. The approach is significant for the AI/ML community because it enables effective model compression and capability transfer from closed-source teacher models without internal access, offering a practical pathway to replicate behavior from proprietary LLMs. Technical implications include a shift toward adversarially trained reward models for on-policy distillation, improved robustness to covariate shift, and renewed discussion about model extraction risks and IP considerations when only text outputs are exposed.
Loading comments...
loading comments...