ChatGPT Health performance in a structured test of triage recommendations (www.nature.com)

🤖 AI Summary
OpenAI's ChatGPT Health, launched in January 2026, has undergone a structured stress test to evaluate its triage recommendations across various clinical scenarios. The study analyzed 60 clinician-authored vignettes within 21 clinical domains, revealing concerning performance trends. Notably, the model struggled significantly at the extremes, with a staggering 52% of life-threatening emergencies being under-triaged, guiding critical conditions like diabetic ketoacidosis toward delayed evaluations instead of immediate care. Additionally, the model's triage recommendations were heavily influenced by the input of family members, particularly in borderline cases, further complicating its reliability. These findings highlight critical safety concerns for deploying AI in triage settings, as missed high-risk emergencies can lead to dire consequences. The study underscores the necessity for prospective validation of AI-driven triage systems before broad consumer usage, particularly given the inconsistent responses to indicators of crisis situations like suicidal ideation. While the absence of significant effects based on patient demographics suggests some level of fairness in the system, the overall performance raises alarms about the potential risks associated with AI-based clinical decisions, necessitating further scrutiny and refinement.
Loading comments...
loading comments...