The prevalence of malignant cells in clinical specimens, or tumour purity, is affected
by both intrinsic biological factors and extrinsic sampling bias. Molecular characterization
of large clinical cohorts is typically performed on bulk samples; data analysis and
interpretation can be biased by tumour purity variability. Transcription-based strategies
to estimate tumour purity have been proposed, but no breast cancer specific method
is available yet. We interrogated over 6000 expression profiles from 10 breast cancer
datasets to develop and validate a 9-gene Breast Cancer Purity Score (BCPS). BCPS
outperformed existing methods for estimating tumour content. Adjusting transcriptomic
profiles using the BCPS reduces sampling bias and aids data interpretation. BCPS-estimated
tumour purity improved prognostication in luminal breast cancer, correlated with pathologic
complete response in on-treatment biopsies from triple-negative breast cancer patients
undergoing neoadjuvant treatment and effectively stratified the risk of relapse in
HER2+ residual disease post-neoadjuvant treatment.