Show HN: I built a pipeline to extract UK visa timelines from Reddit comments (github.com)

🤖 AI Summary
A developer built an open-source pipeline that automatically extracts and tracks UK naturalisation (citizenship) application timelines from an r/ukvisa Reddit thread, turning freeform community-reported timelines into a structured dataset and analysis spreadsheet. This fills an information gap for applicants, researchers and policy watchers by producing near-real-time, crowd-sourced processing-time trends where official granular data may be scarce. The project is notable for turning messy comment threads into longitudinal timelines that reflect updates as applications progress, preserving historical entries even when comments are edited or deleted. Technically, the system fetches Reddit thread data (fetch_thread.py), uses the OpenAI API to parse and normalize timeline content (extract_timelines.py) and supports merging manual corrections (merge_manual_edits.py). It detects new/changed comments to avoid reprocessing unchanged content — reducing API calls and cost — and caches non-timeline comments. Defaults target a gpt-5 model (configurable via OPENAI_MODEL), with a tunable RATE_LIMIT_DELAY_SEC for request pacing. The repo invites contributions for better eligibility-type classification, council-level breakdowns, and validation improvements, and is released under a permissive license for reuse.
Loading comments...
loading comments...