Show HN: I trained an AI to write my YouTube titles (joshfonseca.com)

0 points 17 hours ago ago | visit original

🤖 AI Summary

A science-and-tech YouTuber fine-tuned a small GPT-4.1-mini model to generate clickable, honest video titles and saw immediate, measurable gains: training on 734 English transcripts (filtered from a 48k trending-video Kaggle dump) cost ~$11.56, the LLM-powered filtering step using Gemini Pro 1.5 cost ~$2.80, and the resulting "title-gen" model tripled CTR on a test video (from 5% to 15%) and boosted watch time by ~38%. The pipeline: filter category 28 videos with pandas, have Gemini label each row KEEP/REMOVE via a content-strategy prompt to exclude brand-driven corporate hits, fetch transcripts concurrently (thread pool + rotating proxies), format JSONL messages with a system prompt enforcing accuracy and concise, curiosity-driven titles, then fine-tune gpt-4.1-mini for 3 epochs with a 2.0 learning-rate multiplier. For the AI/ML community this is a compact, reproducible case study showing the outsized importance of dataset curation over model size — a cheap, short fine-tune plus an LLM-based filter can realign a model toward high-impact, creator-friendly outputs. It also highlights practical engineering patterns (prompt-based dataset filtering, concurrent scraping, small-model fine-tuning, low latency inference) and points to natural next steps: LoRA for cheaper iteration, rigorous A/B testing, and extending the system to jointly generate thumbnails using vision-capable models. The author notes thumbnail synergy and clickbait trade-offs remain open concerns, and includes guardrails in the system prompt to keep titles truthful.

Loading comments...

loading comments...