The Transparent Earth: A Multimodal Foundation Model for the Earth's Subsurface (arxiv.org)

🤖 AI Summary
Researchers introduced "Transparent Earth," a transformer-based multimodal foundation model designed to reconstruct Earth’s subsurface properties from heterogenous observations that vary in sparsity, resolution and type. The architecture combines positional encodings for individual observations with modality encodings produced by a text-embedding model applied to human-readable modality descriptions, enabling the model to accept an arbitrary number of modalities and to be extended with new ones. The current system ingests eight modalities (directional angles, categorical classes, and continuous fields like temperature and layer thickness) and supports in-context learning: it can generate predictions with no inputs or with any subset of observations supplied at inference. Technically notable choices include using text-derived modality embeddings to allow flexible modality addition and a transformer backbone that scales with model size—validation shows larger models improve performance. On held-out data the model cuts error in predicting stress angle by over threefold, demonstrating practical gains for geoscience tasks. For the AI/ML community this is an important case study in applying scalable multimodal transformers to physical-science inverse problems, highlighting how modality-agnostic encodings and in-context conditioning can integrate sparse, mixed-format geophysical data for unified subsurface prediction and downstream applications like resource mapping and hazard assessment.
Loading comments...
loading comments...