Cross-Modal Representation Alignment for Time-to-Event Modeling (arxiv.org)

0 points 1 hour ago ago | visit original

🤖 AI Summary

Researchers have unveiled a novel framework for time-to-event (TTE) modeling that enhances prediction accuracy using a cross-modal representation alignment strategy. This approach combines CT imaging and electronic health record (EHR) data, addressing challenges like modality imbalance and distribution shifts. By leveraging four fusion strategies—late fusion, contrastive alignment, cross-attention, and co-attention—this model independently encodes each modality before aligning them in a shared latent space. The study tested this framework on substantial patient cohorts for two distinct TTE tasks: pulmonary embolism (PE) mortality and cardiovascular disease (CVD) outcomes, achieving consistent improvements in predictive performance. Significantly, the results demonstrated that multimodal fusion can enhance the concordance index by 1.5-5.4% compared to unimodal approaches, particularly when both modalities contribute comparably. The contrastive multimodal fusion method, especially when applying CLMBR representations, yielded robust improvements for PE mortality predictions. This research represents a significant advancement for the AI/ML community, illustrating the potential of task-aware multimodal alignment to drive more reliable clinical predictions and pave the way for scalable healthcare solutions.

Loading comments...

loading comments...