mGene is a computational tool for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It is based on recent advances in machine learning and uses discriminative training techniques, such as support vector machines (SVMs) and hidden semi-Markov support vector machines (HSMSVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. The evaluated developmental version of mGene exhibited the best prediction performance (in terms of the average between sensitivity and specificity) for the multiple-genome prediction tasks on all four evaluation levels (considering, nucleotides, exons, transcripts and genes). The ab-initio version was best on nucleotide, exon and transcript level, and only slightly worse than Augustus on the gene level. The fully developed version shows the best overall performance compared to the submitted gene finders' predictions, including the ones of Fgenesh and Augustus.
mGene.web is a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. mGene.web additionally allows to train the system for other organisms on the push of a button, a functionality that greatly accelerates the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is free of charge, and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).