Show HN: A Python lib to create task-specific LLMs for NLP without training data (github.com)

0 points 2 days ago ago | visit original

🤖 AI Summary

Artifex is a new open-source Python library for building small, task-specific language models that run offline on CPU and require no hand-labeled training data. Instead of fine-tuning large models, Artifex synthesizes training examples from human-written instructions and prebuilt templates to produce compact classifiers and "guardrail" models (e.g., safety filters, intent classifiers) that you can train, save, load and call locally. The repo includes ready-made modules and simple APIs to train a guardrail or intent classifier in a few lines, iterate on instructions to handle edge cases, and swap out costly LLM safety/API calls for a local function call. For the AI/ML community this matters because it lowers the barrier to deployable, privacy-preserving NLP components: no GPU, lower latency, offline operation, and fewer paid API calls to general-purpose LLMs. Key technical points: synthetic data generation underlies training, models are optimized for CPU inference, outputs are stored under artifex_output/.../output_model/, and developers can retrain iteratively to refine behavior. Artifex offers a free tier (1,500 training datapoints/month and 500 per job) and pay-as-you-go pricing ($1 per 100 datapoints) for extra usage. The project is on GitHub, has demos/Hugging Face examples, and invites contributions — useful for teams that need focused, low-cost NLP tooling but should weigh synthetic-data limits against complex, open-ended tasks.

Loading comments...

loading comments...