Sample Forge – Research tool for deterministic inference in LLM's (github.com)

🤖 AI Summary
Sample Forge is an open-source desktop tool for running deterministic inference and systematically finding convergent sampling parameters for large language models. It stitches together a local llama-server (llama.cpp, build b6246 or appropriate platform build) with an OpenAI‑style API UI, dataset loaders from LiveBench (via the Hugging Face Hub), benchmark runner, scoring/analysis views, and an automated “Auto Mode” that explores parameter spaces using bandit and ACO (ant‑colony optimization) strategies. The app saves reproducible run metadata and outputs (benchmarks under data/benchmarks/runs and optimization SQLite DBs under data/aco_runs/), previews request JSON from config/openai_api_schema.json, and supports chat/completions endpoints plus health/slots checks. For practitioners this matters because it provides a repeatable, local workflow to tune stochastic samplers (temperature, top_p, samplers, etc.) and validate deterministic behavior across datasets — useful for evaluation, model comparison, and production hardening. Technical specifics: Windows-first GUI (Tkinter) with macOS/Linux Python flows; Python 3.11–3.13 (3.13 validated); pinned deps (datasets, huggingface_hub, pyarrow, requests); first run creates a venv and downloads wheels; users must place llama-server binary and CUDA/Metal runtime together and point the Server Config to the executable (CPU builds are supported). The project is MIT‑licensed and geared toward reproducible, offline LLM benchmarking and parameter optimization.
Loading comments...
loading comments...