Proteinbase – the home of protein design data (proteinbase.com)

0 points 132 days ago ago | visit original

🤖 AI Summary

Proteinbase is a new centralized repository for protein design that aggregates sequences, experimental results, and design-method metadata to make designs easy to share, compare and learn from. The site already hosts hundreds of entries (about 822 protein records), including dozens of experimentally validated designs (≈123 validated designs overall) and target-focused collections—EGFR (62 entries, 51 validated), IL‑7Rα (40, 31 validated), PD‑L1 (27, 12 validated) and MDM2 among them. Individual records include binding measurements (e.g., IFNAR2 binder at 3.9×10−8 M, MDM2 at 4.4×10−8 M, an EGFR miniprotein at 1.4×10−9 M), molecular weights and design provenance (design IDs, whether validation succeeded), and community notes such as re‑validations by groups like EPFL-LPDI, Escalante Bio and Microsoft Research. For the AI/ML community this is significant because it ties experimental outcomes directly to the generative and optimization methods used—enabling reproducible benchmarking, training-data curation, and method comparison. Proteinbase indexes designs created with modern tools (RFdiffusion for backbone generation, EvoDiff, Mosaic, BindCraft1) and fine‑tuning workflows that combine reinforcement learning algorithms (GRPO, DPO) with protein language models (ZymCTRL). By exposing standardized metadata, affinities, and failed vs. successful designs, the platform should accelerate model development, dataset cleaning, and the iterative design–test loop for minibinders and peptide therapeutics.

Loading comments...

loading comments...