La-Proteina (github.com)

🤖 AI Summary
La-Proteina is a cutting-edge generative model designed for atomistic protein structure and sequence co-design, tackling one of the toughest challenges in protein generation: producing fully atomistic structures jointly with their amino acid sequences. Its core innovation lies in a partially latent protein representation that explicitly models the coarse backbone while encoding sequence and atomistic side-chain details in fixed-dimensional per-residue latent variables. This architecture circumvents the complexity of variable-length side chains and uses flow matching in latent space to model the joint distribution over sequences and full atomic structures. Significantly, La-Proteina advances the state-of-the-art across multiple benchmarks, including protein diversity, structural validity, and all-atom co-designability, outperforming previous methods in atomistic motif scaffolding tasks that require precise structural conditioning. Importantly, it scales robustly to proteins up to 800 residues long—a range where many existing models fail—demonstrating both scalability and reliability for large protein design. Its transformer-based architecture is optimized for hardware acceleration via PyTorch compilation, facilitating faster training and inference. The framework provides modular autoencoder and latent diffusion components trained on subsets of the AlphaFold Protein Database, with distinct models for unconditional generation and motif-specific scaffolding (both indexed and unindexed, at all-atom or tip-atom resolution). Comprehensive tooling supports dataset handling, model training, and generation, making La-Proteina a powerful new resource for the AI/ML and structural biology communities aiming at de novo protein design with atomic precision.
Loading comments...
loading comments...