Evaluating the necessity of PCR duplicate removal from next-generation sequencing
data and a comparison of approaches
Authors: Mark Ebbert*, Mark Wadsworth*, Lyndsay Staley*, Kaitlyn Hoyt, Brandon Pickett, Justin Miller, John Duce, for the Alzheimer's Disease Neuroimaging Initiative, John SK Kauwe, Perry Ridge
BMC Bioinformatics
20 Citations
July, 2016
Above Average Altmetric Score of 3
Tweeted by 5 people
130 Mendeley Readers
Novelty of Approach
-Evaluated PCR duplicate removal on final genome assembly
-Compared CHIP seq data with whole genome sequencing (WGS) data
-Performed depth of coverage analysis on WGS data
Results
-92 % of the 17+ million variants called were called whether we removed duplicates with Picard or SAMTools, or left the PCR duplicates in the dataset.
-No significant differences between the unique variant sets
Implications
-Removing PCR duplicates is unnecessary
-Save compute and analysis time by not removing PCR duplicates