Personal tools

mGene.web Performance

Here we report on the performance of mGene.web for different species and different data set sizes. We evaluate the prediction performance on the signal and content level. Here, we use the area under the ROC (auROC) and the precision recall curve (auPRC) as evaluation measure. The shown performance measures for signal and content predictors are out-of-sample estimates. We additionally evaluate mGene.web's performance for gene prediction. We show the performance on nucleotide, exon and transcript level on a validation set (except for the "nGASP small" set where the number of genes did not allow for the creation of a reliable validation set and we therefore give evaluation measurements on the training set).

Performance of Signal Predictors for Various Species

Gene Signal TSS TIS ACC DON cdsStop Cleave
  auROC auPRC auROC auPRC auROC auPRC auROC auPRC auROC auPRC auROC auPRC
Caenorhabditis elegans (40 genes) 0.791 0.439 0.888 0.366 0.973 0.744 0.979 0.811 0.897 0.537 0.900 0.597
Caenorhabditis elegans (nGASP confirmed) 0.948 0.886 0.883 0.563 0.991 0.937 0.993 0.946 0.932 0.731 0.898 0.575
Caenorhabditis elegans (nGASP all) 0.961 0.933 0.886 0.566 0.991 0.941 0.988 0.932 0.919 0.694 0.889 0.817
Drosophila melanogaster 0.934 0.714 0.951 0.795 0.986 0.934 0.992 0.959 0.965 0.858 0.954 0.780
Saccharomyces cerevisiae 0.999 0.991 0.954 0.954 0.939 0.750 0.995 0.940 0.987 0.934 0.996 0.974
Arabidopsis thaliana 0.965 0.797 0.959 0.817 0.986 0.929 0.991 0.953 0.960 0.816 0.938 0.653
Aspergillus nidulans 0.999 0.988 0.946 0.760 0.965 0.827 0.987 0.927 0.960 0.806 0.998 0.973
Tetraodon nigroviridis 0.945 0.795 0.897 0.606 0.974 0.877 0.985 0.922 0.930 0.739 0.937 0.728
Anopheles gambiae 0.935 0.728 0.925 0.720 0.961 0.871 0.975 0.910 0.951 0.817 0.923 0.708
Ciona savignyi 0.779 0.321 0.848 0.428 0.947 0.832 0.964 0.872 0.930 0.699 0.852 0.475

Performance of Content Predictors for Various Species

Gene Segment Type Intergenic utr5_exon cds_exon utr3_exon intron
  auROC auPRC auROC auPRC auROC auPRC auROC auPRC auROC auPRC
Caenorhabditis elegans (40 genes) 0.983 0.829 0.759 0.145 0.862 0.847 0.794 0.531 0.702 0.119
Caenorhabditis elegans (nGASP confirmed) 0.864 0.772 0.838 0.205 0.980 0.969 0.760 0.126 0.851 0.588
Caenorhabditis elegans (nGASP all) 0.818 0.521 0.885 0.495 0.965 0.942 0.976 0.788 0.959 0.924
Drosophila melanogaster 0.436 0.258 0.818 0.297 0.960 0.945 0.822 0.509 0.716 0.180
Saccharomyces cerevisiae 0.949 0.895 0.892 0.704 0.932 0.928 0.978 0.854 0.956 0.869
Arabidopsis thaliana 0.778 0.736 0.864 0.422 0.955 0.908 0.813 0.263 0.940 0.856
Aspergillus nidulans 0.890 0.633 0.889 0.504 0.847 0.470 0.977 0.788 0.945 0.842
Tetraodon nigroviridis 0.704 0.387 0.861 0.394 0.957 0.925 0.949 0.645 0.948 0.911
Anopheles gambiae 0.818 0.157 0.607 0.091 0.883 0.565 0.596 0.017 0.871 0.723
Ciona savignyi 0.581 0.161 0.666 0.277 0.933 0.695 0.660 0.136 0.871 0.723

Performance of Gene Prediction for various Species

Evaluation level Exon Transcript
  SN SP SN SP
Caenorhabditis elegans (40 genes) [*] 0.675 0.691 0.231 0.310
Caenorhabditis elegans (nGASP confirmed) 0.860 0.841 0.429 0.478
Caenorhabditis elegans (nGASP all) 0.727 0.736 0.346 0.410
Drosophila melanogaster 0.812 0.825 0.536 0.612
Saccharomyces cerevisiae 0.929 0.912 0.917 0.932
Arabidopsis thaliana 0.887 0.894 0.634 0.680
Aspergillus nidulans 0.754 0.674 0.507 0.546

[*]For the small C. elegans gene set all genes have been used for training and testing (i.e. there was no independent test set).

Disclaimer

These are preliminary results generated with a beta version of mGene.web. They may change in the near future. Moreover, this evaluation uses a different setup than for instance [1].

References

[1]Schweikert et al. mGene: Accurate Computational Gene Finding with Application to Nematode Genomes. In Genome Research 2009.
Document Actions