🤖 AI Summary
The Well is a new, large-scale collection of physics simulation datasets for machine learning: a 15 TB corpus spanning 16 datasets that cover biological systems, fluid dynamics, acoustic scattering, magneto‑hydrodynamics (including extra‑galactic fluids and supernova simulations) and more. Curated with domain scientists and numerical software authors, the collection is designed for training and benchmarking PDE surrogate models and other spatiotemporal ML systems. Individual dataset sizes range from about 6.9 GB up to 5.1 TB, and most datasets are mirrored on Hugging Face for streaming or can be downloaded locally for faster training.
Technically, The Well ships as a Python package (requires Python ≥3.10) on PyPI and can be installed from source; instructions support selecting PyTorch wheels for specific CUDA versions. Datasets expose a PyTorch-friendly datamodule (WellDataset → DataLoader), and a benchmark suite implements baseline models (e.g., a Fourier Neural Operator baseline) with Hydra-configured training scripts suitable for local runs or Slurm. Checkpoints and many dataset copies are available on Hugging Face. The project provides reference code, baseline results, and encourages the community to develop improved architectures for PDE surrogate modeling; the project is described in a NeurIPS 2024 paper and maintained by Polymathic AI with multiple academic collaborators.
Loading comments...
login to comment
loading comments...
no comments yet