From Structure to Function: AlphaFold's Evoformer Embeddings for Downstream AI (rewire.it)

0 points 5 days ago ago | visit original

🤖 AI Summary

AlphaFold’s hidden output—the high-dimensional embeddings produced by its Evoformer module—is powering a second wave of AI in biology. Beyond the model’s headline 3D predictions (median backbone error ~0.96 Å), the Evoformer builds rich residue- and pairwise representations by reasoning over MSAs and geometric constraints through 48 stacked blocks with cross-talk between an MSA tensor (N_sequences × N_residues) and a pair tensor (N_residues × N_residues). Crucial architectural features include triangular updates that enforce transitive geometric consistency (triangle inequality) and attention-based MSA↔pair communication. Practically this yields a single-residue embedding of shape (N_residues × 384) and a pair embedding (N_residues × N_residues × 128), which must be explicitly saved from the AlphaFold pipeline (community guides exist). Those embeddings have immediate downstream utility: simple regressors on differences in single representations predict ΔΔG with Pearson ≈0.58, AF2BIND uses pairwise attention “baiting” to find binding pockets, AlphaMissense scored 71M missense variants (89% classified), and generative models like PCMol condition molecule design on target embeddings. Compared to sequence-only protein language models (ESM-2, ProtT5; trained on 229M+ sequences), AlphaFold embeddings trade broader evolutionary coverage for geometric precision (PDB-trained ≈200k structures). The takeaway: choose embeddings by task—structure and binding favor AlphaFold; function and disorder favor PLMs—and the future points to multimodal hybrids that merge structural priors with large-scale sequence semantics.

Loading comments...

loading comments...