🤖 AI Summary
A community implementation of the “Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning” approach has been released that swaps Hugging Face’s slow generate() path for SGLang as the inference engine, yielding roughly 4× faster generation at large batch sizes and keeping the engine initialized between evaluations to cut startup overhead. The repo includes scripts to generate data and run the evolutionary fine-tuning loop (example: training Qwen2.5-7B-Instruct to do 4-digit multiplication via pip install -r requirements.txt; python generate_dataset.py; python evolve.py). Data is stored as a list of {question, answer} samples and reward.py exposes a straightforward, easily customizable reward function; config lives in conf/config.yaml.
This is significant because evolutionary (gradient-free) algorithms allow full-rank fine-tuning of 7B models on a single 24GB GPU (tested on RTX 3090/4090) without storing a second base-model copy or using KL regularization, reducing memory demand and simplifying workflows. The authors also note empirical benefits versus RL: less reward hacking, lower hyperparameter sensitivity, and often better performance. Practical requirements include Python 3.10, ~48 GB system RAM and a 24 GB GPU. For researchers and practitioners who need single-GPU LLM adaptation or want an alternative to RLHF, this repo is a pragmatic, faster path to experiment with evolutionary fine-tuning at scale.
Loading comments...
login to comment
loading comments...
no comments yet