🤖 AI Summary
Helixer is a new open-source, AI-driven tool for ab initio eukaryotic gene prediction that combines a deep neural network with a hidden Markov model (HelixerPost) to produce finalized gene models directly from genomic DNA. The neural architecture mixes convolutional and recurrent layers to make base-wise predictions of coding sequence, UTRs and splice boundaries, and HelixerPost decodes these into coherent GFF3 gene models. Pretrained models now cover fungi, plants, vertebrates, invertebrates and mammals, and the pipeline runs without RNA-seq, homology data or species-specific retraining — available via GitHub, a web interface and Galaxy ToolShed.
Why it matters: Helixer addresses a major bottleneck in annotating the flood of new eukaryotic assemblies, especially for species lacking experimental data. Across benchmarks it matches or exceeds established HMM-based tools (GeneMark-ES, AUGUSTUS) for plants and vertebrates, achieves higher phase and genic F1 scores in many tests, and produces annotations that approach curated references. Technical caveats: absolute protein-level precision/recall remain lower than base-wise metrics (expected for the harder task), fungi and some invertebrates remain competitive for HMMs, and the mammal-focused Tiberius outperforms Helixer on tested species. Ablations confirm biologically informed design choices (e.g., transition weighting) are important for accurate splice/start/stop resolution. Overall, Helixer offers a scalable, accessible option for consistent, high-quality eukaryotic genome annotation.
Loading comments...
login to comment
loading comments...
no comments yet