Images of Sylvester Stallone used to train models for recognizing stroke (retractionwatch.com)

🤖 AI Summary
Researchers Adrian Barnett and Alexander Gibson from the Queensland University of Technology uncovered significant issues with image datasets on Kaggle that were used to train models for stroke detection. One dataset, humorously dubbed "droopy," included numerous duplicate images, including those of celebrities like Sylvester Stallone, rather than genuine stroke cases or diverse patient data. This discovery has serious implications for clinical research, as the reliance on such unreliable datasets has led to the publication of flawed predictive models, some of which have already been retracted due to data provenance concerns. The significance of this finding extends beyond the individual papers, highlighting a systemic problem within online data repositories like Kaggle. Barnett and Gibson’s examination revealed that many studies using these datasets failed basic provenance checks and made unethical clinical recommendations based on dubious data. With the potential impact on patient care, they argue that all tools based on these faulty datasets should be removed until their validity is confirmed. The situation underscores the pressing need for improved oversight and documentation of data used in clinical prediction models, as well as accountability within the research community to prioritize patient welfare over publication pressures.
Loading comments...
loading comments...