Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can
potentially provide a detailed picture of the somatic mutations that characterize
the tumor. However, analysis of such sequence data can be complicated by the presence
of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer
size of the raw data. In particular, determination of copy number variations from
exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism
(SNP) arrays have often been used for this task. Recently, algorithms to estimate
absolute, but not allele-specific, copy number profiles from tumor sequencing data
have been described.We developed Sequenza, a software package that uses paired tumor-normal
DNA sequencing data to estimate tumor cellularity and ploidy, and to calculate allele-specific
copy number profiles and mutation profiles. We applied Sequenza, as well as two previously
published algorithms, to exome sequence data from 30 tumors from The Cancer Genome
Atlas. We assessed the performance of these algorithms by comparing their results
with those generated using matched SNP arrays and processed by the allele-specific
copy number analysis of tumors (ASCAT) algorithm.Comparison between Sequenza/exome
and SNP/ASCAT revealed strong correlation in cellularity (Pearson's r = 0.90) and
ploidy estimates (r = 0.42, or r = 0.94 after manual inspecting alternative solutions).
This performance was noticeably superior to previously published algorithms. In addition,
in artificial data simulating normal-tumor admixtures, Sequenza detected the correct
ploidy in samples with tumor content as low as 30%.The agreement between Sequenza
and SNP array-based copy number profiles suggests that exome sequencing alone is sufficient
not only for identifying small scale mutations but also for estimating cellularity
and inferring DNA copy number aberrations.