LLM Medical Triage: Same Symptoms, Gender-Dependent Urgency (arxiv.org)

🤖 AI Summary
A recent study highlights a significant issue in the use of large language models (LLMs) for medical triage, revealing that these AI systems exhibit gender-dependent biases when assessing identical neurological symptoms. Researchers tested three model families—Gemini 3.5 Flash, Claude Sonnet 4.6, and GPT-5.4-mini—across various demographic conditions, finding that young women were significantly less likely to receive emergency room referrals compared to their male counterparts, despite similar symptom severity ratings. This disparity suggests that the models are influenced by gender-associated diagnoses, often directing young female patients toward lower-urgency care. This finding is crucial for the AI/ML community as it underscores the need for improved fairness in AI-driven medical applications. The models’ reliance on diagnostic substitution based on demographic factors reveals a replication of existing human biases in clinical decision-making. Notably, the disparity disappeared for patients aged 65 and above, indicating an age-related shift in diagnostic approaches. As these AI systems integrate into medical settings, the study urges developers to decouple triage urgency from probabilistic diagnostic priors to enhance equitable patient care. The researchers have made their code and data publicly available, promoting transparency and further investigation into mitigating bias in AI healthcare solutions.
Loading comments...
loading comments...