Validating prognostic or predictive candidate genes in appropriately powered breast
cancer cohorts are of utmost interest. Our aim was to develop an online tool to draw
survival plots, which can be used to assess the relevance of the expression levels
of various genes on the clinical outcome both in untreated and treated breast cancer
patients. A background database was established using gene expression data and survival
information of 1,809 patients downloaded from GEO (Affymetrix HGU133A and HGU133+2
microarrays). The median relapse free survival is 6.43 years, 968/1,231 patients are
estrogen-receptor (ER) positive, and 190/1,369 are lymph-node positive. After quality
control and normalization only probes present on both Affymetrix platforms were retained
(n = 22,277). In order to analyze the prognostic value of a particular gene, the cohorts
are divided into two groups according to the median (or upper/lower quartile) expression
of the gene. The two groups can be compared in terms of relapse free survival, overall
survival, and distant metastasis free survival. A survival curve is displayed, and
the hazard ratio with 95% confidence intervals and logrank P value are calculated
and displayed. Additionally, three subgroups of patients can be assessed: systematically
untreated patients, endocrine-treated ER positive patients, and patients with a distribution
of clinical characteristics representative of those seen in general clinical practice
in the US. Web address: www.kmplot.com . We used this integrative data analysis tool
to confirm the prognostic power of the proliferation-related genes TOP2A and TOP2B,
MKI67, CCND2, CCND3, CCNDE2, as well as CDKN1A, and TK2. We also validated the capability
of microarrays to determine estrogen receptor status in 1,231 patients. The tool is
highly valuable for the preliminary assessment of biomarkers, especially for research
groups with limited bioinformatic resources.