🤖 AI Summary
Harvard and Emory released version 4.0 of the Harvard‑Emory ECG Database (HEEDB), a credentialed, de‑identified collection of high‑density clinical 12‑lead ECGs curated for research. Version 4.0 aggregates over 11.4 million ten‑second recordings (10,471,531 ECGs from 1,818,247 patients at institution I0001; 968,680 ECGs from 349,548 patients at institution I0006), sampled at 250 or 500 Hz and stored in WFDB and Matlab (V4) formats with standard .hea headers. The dataset now includes 12SL (GE Healthcare v1) diagnostic outputs, ICD‑9/10 codes, and detailed metadata (demographics, acquisition times, age/death fields), although several demographic fields are absent for I0006. Data were de‑identified using the Safe Harbor method and released under IRB approval for secondary research.
For AI/ML researchers, HEEDB is significant for scale and clinical richness: its millions of 12‑lead traces enable large‑scale pretraining, self‑supervised representation learning, and supervised modeling for arrhythmia detection, outcome prediction, and studies linking cardiac electrophysiology to sleep and long‑term health. Practical notes: recordings are 10 s, 16‑bit, accessible by WFDB tools; 12SL codes map to human‑readable labels via included dictionaries; ICD timestamps are date‑shifted. Users should account for institutional differences, missing metadata in I0006, and prior header corrections (sampling frequency fixes) when training models or combining cohorts to avoid confounding and label noise.
Loading comments...
login to comment
loading comments...
no comments yet