Data-driven fine-grained region discovery in the mouse brain with transformers (www.nature.com)

🤖 AI Summary
Researchers introduced CellTransformer, a self-supervised encoder–decoder graph transformer that discovers fine-grained spatial domains in organ-scale spatial transcriptomics at multi-million-cell scale. Trained to predict a reference cell’s gene expression conditioned on a learned neighborhood-context token, the model aggregates cell-type and expression tokens within a user-specified micron radius via transformer layers and a learned pooling bottleneck to produce neighborhood embeddings. Embeddings from all sections are clustered (GPU-accelerated minibatched k-means) to yield spatial domains. The workflow scales to very large datasets—demonstrated on a 3.9M-cell MERFISH atlas, a 6.5M-cell whole-brain set across four animals (different gene panels), and a combined multi-animal analysis of ~9M cells over >200 sections—and generalizes to other modalities such as Slide-seqV2. This approach matters because it avoids global pairwise or Gaussian-process bottlenecks by operating on local subgraphs, capturing both cytoarchitecture (density/proximity) and molecular variation while enabling cross-section and cross-animal integration. CellTransformer recapitulates canonical Allen CCFv3 regions, reproduces previously described substructures (e.g., subiculum, superficial superior colliculus), and uncovers hundreds of plausibly novel subregions in under-annotated subcortical areas. Technically, its neighborhood-token masked-prediction objective, learned pooling bottleneck, and GPU-minibatched clustering make fine-grained, data-driven domain discovery tractable at atlas scale—opening the door to automated, high-resolution anatomical annotation and comparative analyses across animals and modalities.
Loading comments...
loading comments...