BIOL 2650:

Using Bioinformatics Software

Enables students to use and evaluate currently available bioinformatics software packages

 

   Credits                    3.0

   Prerequisites        BIOL 1650

   Required Books   None

 

This course is designed for biologists and bioinformaticians who want to know the best practices and techniques for analyzing biological data. We examine various alignment, ortholog identification, phylogenetic tree reconstruction, and student-driven algorithms. After taking this course, students should feel comfortable downloading, installing, running, and assessing different algorithms in a Linux environment.

Thousands of bioinformatics algorithms exist on GitHub and SourceForge, but it is often difficult to determine which algorithm should be used. BIOL 2650 helps students analyze the pros and cons of various algorithms by assessing their runtime, precision, accuracy, usage, and usability. Students are expected to write a review paper on different algorithms for a topic of their choice by the end of the semester. Groups are encouraged to submit their work for publication after the semester.

Possible Topic: Ortholog Identification

As more genomes are sequenced, the need for accurate annotations increases. Orthologs are genes in two species that are inherited from the same common ancestor and  typically share the same or similar functions. While many ortholog identification algorithms exist, it is often difficult to assess which algorithm is the best because each algorithm has its own pros and cons. Evaluate as many ortholog identification algorithms as you can, and accurately assess their strengths and weaknesses.

If I were trying to identify the best ortholog identification algorithm, I would first download as many implementations as I could: OMA, OrthoMCL, JustOrthologs, JustOrthoGroups, etc. I would identify these algorithms by reading the literature. Next, I would determine if any ortholog datasets exist: they do. Then I would run each algorithm against the datasets, assessing each for runtime, usability, precision, accuracy, false positive rate, Robinson-Foulds distance from phylogeny, etc. Finally, I would chart each of these metrics and determine an overall scoring criteria to fairly assess each of the algorithms.