Main Menu

Back to Chapters

  • JustOrthologs: A Fast, Accurate, and User-Friendly Ortholog Identification Algorithm

     

    Authors: Justin Miller, Brandon Pickett, Perry Ridge

    Bioinformatics

    Article

    GitHub

    August, 2018

    Altmetric Score of 51

        315 PDF Downloads

        Tweeted by 84 people, reached 71,787 followers

        Top 5% of all reasearch outputs: #257,057 out of 11,614,825

    2 Forks on GitHub

    7 Stars on GitHub

    OmicTools

  • Novelty of Approach

     

    -Does not use all-versus-all BLAST comparisons

    -Uses conservation in CDS region length to reduce pairwise comparisons

    -Uses dinucleotide composition to further reduce runtime

     

  • Results

     

    -Reduce ortholog identification runtime by 96%

    -Maintain overall precision and accuracy

    -Genes with more CDS regions have higher precision and accuracy

    -Confirm gene annotations for 384,120 genes

    -Grouped 1,675,415 genes in previously unreported ortholog groups

    -Identified 51,429 potentially mislabeled genes

    -Annotated 622,843 ortholog groups

     

  • Implications

     

    -Whole genome analyses are now possible

    -Algorithm not based on pairwise BLAST comparisons

    -Annotate more orthologous genes

        -Provide functional insights for genes of unknown functions

        -Phylogenetic inference

     

  • Whole Genome Comparison of Different Species

    Species 1 Species 2 Number of Genes in Species 1 Number of Genes in Species 2 Number of Shared Ortholog Annotations from HGNC True Positives Reported False Positives Reported Unnamed genes reported in orthologous pairs Precision (%) Recall (%)
    Homo sapiens Pan paniscus 20 088 17 900 14 653 14 119 462 905 96.83 96.36
    Homo sapiens Equus  20 088 16 691 12 725 8 229 150 246 98.21 64.67
    caballus
    Homo sapiens Falco  20 088 12 643 10 659 841 38 35 95.68 7.89
    peregrinus
    Gallus gallus Falco  16 420 12 643 9 163 5 132 139 597 97.36 56.01
    peregrinus
    Astyanax mexicanus Danio rerio 21 920 22 408 5 832 683 296 688 69.77 11.71
    Cynoglossus semilaevis Danio rerio 19 450 22 408 5 699 199 104 205 65.68 3.49
    Oncorhynchus kisutch Salmo salar 30 680 40 642 2 800 2 424 183 18 300 92.98 86.57
    Oreochromis niloticus Pundamilia nyererei 27 785 21 832 8 645 8 326 94 9 857 98.88 96.31
    Alligator mississippiensis Crocodylus porosus 17 492 13 837 10 993 10 238 4 1615 99.96 93.13
    Mus musculus Rattus norvegicus 21 815 21 481 15 199 12 183 720 279 94.42 80.16
    Bos taurus Capra hircus 17 980 19 208 12 894 11 929 97 1 337 99.19 92.52
    Bos taurus Vicugna pacos 17 980 16 297 11 411 7 991 18 502 99.78 70.03
    Calypte anna Haliaeetus leucocephalus 12 225 14 150 9 825 7 041 15 662 99.79 71.66
    Calypte anna Chaetura pelagica 12 225 11 852 8 770 6 565 14 695 99.79 74.86
    Prunus avium Prunus mume 24 179 22 628 0 0 0 14 004 N/A N/A
  • Large Ortholog Groups Recovered Using JustOrthologs

    Genes with the Same Annotation Genes with Other Annotations Genes with Unknown Annotations Total Genes Reason for Other Annotations
    127 0 63 190 N/A
    178 0 7 185 N/A
    172 1 7 180 XP_018109801.1 has 100% BLAST identity with NP_001087532.1, which is annotated the same as the other 172 genes
    155 2 21 178 The nucleotide composition and exon length of XP_001959559.1 and XP_002071834.1 are similar to XP_010179458.1. However, the alignment is very different. These two genes are probably incorrectly reported as orthologous by JustOrthologs.
    169 0 9 178 N/A
    169 1 5 175 XP_414807.2 has a 99% BLAST identity with XP_015732072.1 from a closely related species, which is annotated the same as the other 169 genes.
    166 0 5 171 N/A
    165 1 5 171 NP_068697.1 is annotated Trp53inp1 instead of TP53INP1.
    163 1 6 170 XP_014347657.1 is annotated LRRC8E instead of LRRC8C
    165 0 4 169 N/A
    161 0 7 168 N/A
    162 0 5 167 N/A
    161 1 4 166 XP_020368157.1 is incorrectly reported as orthologous by JustOrthologs. The CDS region lengths matched some exons in XP_005866852.1, but the alignment of the sequences was very poor.
    163 0 3 166 N/A
    152 1 13 166 XP_018123052.1 is annotated grb10.L instead of GRB10
    161 0 4 165 N/A
    156 0 9 165 N/A
    159 0 6 165 N/A
    160 0 5 165 N/A
    160 0 4 164 N/A
    159 0 5 164 N/A
    158 0 5 163 N/A
    156 1 5 162 XP_017312051.1 is incorrectly reported as orthologous by JustOrthologs. The CDS region lengths matched several exons within XP_020920808.1, but the alignment of the sequences was poor.
    156 0 5 161 N/A
    158 0 3 161 N/A
    153 0 7 160 N/A
    149 0 9 158 N/A
    154 0 3 157 N/A
    146 0 11 157 N/A
    153 0 4 157 N/A