cn.MOPS: Mixture Of PoissonS for discovering Copy Number variations in next generation sequencing data
cn.MOPS is an algorithm 
that accurately detects copy number variations in next generation 
sequencing data in a study of multiple samples. 
Quantitative analyses of next generation sequencing (NGS) 
data, such as the detection of copy number variations 
(CNVs), remain challenging. Current methods detect CNVs 
as changes in the depth of coverage along chromosomes. 
Technological or genomic variations in the depth of coverage 
thus lead to a high false discovery rate (FDR), even upon 
correction for GC content. In the context of association 
studies between CNVs and disease, a high FDR means 
many false CNVs, thereby decreasing the discovery power of 
the study after correction for multiple testing. We propose 
“Copy Number estimation by a Mixture Of PoissonS” 
(cn.MOPS), a data processing pipeline for CNV detection 
in NGS data. In contrast to previous approaches, cn.MOPS 
incorporates modeling of depths of coverage across samples 
at each genomic position. Therefore, cn.MOPS is not 
 affected by read count variations along chromosomes. Using 
 a Bayesian approach, cn.MOPS decomposes variations in 
the depth of coverage across samples into integer copy 
numbers and noise by means of its mixture components 
 and Poisson distributions, respectively. The noise estimate 
allows for reducing the FDR by filtering out detections 
having high noise which are likely to be false detections. We 
compared cn.MOPS with the five most popular methods 
for CNV detection methods in NGS data using four 
benchmark data sets: (1) simulated data, (2) NGS data 
from a male HapMap individual with implanted CNVs from 
the X chromosome, (3) data from HapMap individuals 
 with known CNVs, (4) high coverage data from the 
1000 Genomes Project. cn.MOPS outperformed its five 
competitors in terms of precision (1–FDR) and recall for 
both gains and losses in all benchmark data sets. 
Please cite:
Günter Klambauer, Karin Schwarzbauer, Andreas Mayr, Djork-Arné Clevert, Andreas Mitterecker, Ulrich Bodenhofer, Sepp Hochreiter.
"cn.MOPS: mixture of Poissons for discovering copy number variations in next generation 
sequencing data with a low false discovery rate."  Nucleic Acids Research  2012 40(9); doi:10.1093/nar/gks003.
Abstract
 
Application areas: Although the original publication of cn.MOPS included only human whole genome sequencing data, cn.MOPS has been shown to work well for non-human genomes, for haploid genomes, for exome sequencing data, and for single cell sequencing. See links to publications below:
- cn.MOPS for copy number detection in Whole Genome Sequencing data Link1
 - cn.MOPS for copy number detection in non-human species Link1
 - cn.MOPS for copy number detection in haploid and bacterial genomes Link1 Link2 Link3
 - cn.MOPS for copy number detection in Exome Sequencing data Link1 Link2 Link3
 - cn.MOPS for copy number detection in Single Cell Sequencing data Link1
 
Paper:
Supplementary Notes:
Citation:
Official Link & DOI:
- 
http://nar.oxfordjournals.org/content/40/9/e69 
 - DOI: 10.1093/nar/gks003
 
Download the R-package:
- Available at Bioconductor: 
cn.mops R package 
 
Additional normalization functions:
Copy number analysis of German outbreak strain E. Coli EHEC O104:H4
Datasets:
The benchmarking data sets used in our publication can be downloaded below. 
- Simulated data with different coverages:
ReadCountsSimulated.RData PositionsSimulated.RData ReadCountsSimulatedHighCoverage.RData (R 2.14 checked) - Benchmarking data set with implanted CNVs from the X chromosome:
ReadCountsBenchmark.RData PositionsBenchmark.RData (R 2.14 checked)
 

