Manuscripts that are Currently Under Review

Human-Infecting Viruses Mimic Host Codon Usage Biases



ACCEPTED FOR PUBLICATION!! All revisions are currently being submitted and it should be published soon in BMC: Genomics.


Abstract— Viruses with human hosts mimic the codon usage bias of human proteins. By performing 7,052,621 pairwise codon usage bias comparisons of proteins from humans versus 113 viruses, we determined which pairs were most highly correlated. We found that 16 of 113 viruses analyzed were found to average a significant correlation in codon usage with over 500 human genes for each viral gene. Of the remaining 97 viruses, we found that the codon usage bias of 58 viruses were highly correlated with an average of at least 100 human genes and 37 viruses were significantly correlated with an average of at least one human gene per viral gene at an alpha level of 7.09 x 10-9 (0.05 alpha / 7,052,621 comparisons). Only two viruses were not highly correlated with an average of one human gene per viral gene. While relatively few of the interactions were previously documented, the high statistical correlations suggest that researchers may be able to determine which tissues a virus is most likely to infect through analyzing codon usage biases.

Missing Something?: Codon Non-Usage as a Character in Phylogenetic Inference in Tetrapods.



ACCEPTED FOR PUBLICATION!! All revisions have been submitted and it should be published in the next issue of Cladistics.


This project is the first idea that I conceived and implemented from start to finish. I consider this project my first real publication, because I have spent so much time and effort to make sure that everything is done perfectly and to my specifications.


Abstract—Although many studies have documented codon usage bias among different species, the importance of codon usage in a phylogenetic framework remains largely unknown. We recovered the tetrapod phylogeny using the codon usage and non-usage bias of 17,717 genes across 72 species, and found that a phylogenetic signal was present using a simple parsimonious analysis of a binary matrix of codon characters. After a phylogenetic signal was determined across all codons, we found that the stop codons had the most phylogenetic signal when compared with the Open Tree of Life project. It was determined that while each codon is present in a species, each species does not use every codon in every ortholog. This phenomenon allowed us to map codon usage and non-usage as a two-state character. We show that the phylogenies can be recovered because some clades use a given codon within a gene, while other clades do not use it. These results indicate that a simple binary representation of codon usage and non-usage allows us to accurately reconstruct both shallow and deep phylogenies.

Kmer-SSR: A Fast and Exhaustive SSR Search Algorithm



ACCEPTED FOR PUBLICATION!! All revisions are currently being submitted and it should be published in BMC: Bioinformatics.


Abstract— One of the main challenges with bioinformatics software is that the size and complexity of datasets necessitate trading speed for accuracy, or completeness. To combat this problem of computational complexity, a plethora of heuristic algorithms have arisen that report a “good enough” solution to biological questions. However, in instances such as Simple Sequence Repeats (SSRs), a “good enough” solution may not accurately portray results in population genetics, phylogenetics, and forensics, which require accurate SSRs to calculate intra- and inter-species interactions. To address this issue, we present Kmer-SSR, which finds all SSRs faster than most heuristic SSR identification algorithms in a parallelized, easy-to-use manner. The exhaustive Kmer-SSR option has 100% precision and 100% recall and accurately identifies every SSR of any specified length. To identify more biologically pertinent SSRs, we also developed several filters that allow users to easily view a subset of SSRs based on user input. Kmer-SSR, coupled with the filter options, accurately and intuitively identifies SSRs quickly and in a more user-friendly manner than any other SSR identification algorithm.

Assembly of 809 whole mitochondrial genomes with clinical, imaging and fluid biomarker phenotyping: the Alzheimer's Disease Neuroimaging Initiative



ACCEPTED FOR PUBLICATION!! All revisions are currently being submitted and it should be published in BMC: Genomics.


My portion of this manuscript was relatively small. I annotated all of the mitochondrial genomes and compared the variants against publicly accessible databases. This project focuses on the assembly and annotation of 809 mitochondrial genomes from the ADNI dataset.