Abstract
A Machine Learning Model to Accurately Predict Taxane Response in Neoadjuvant Breast Cancer
Background:
Taxane-based (docetaxel or paclitaxel) chemotherapy remains a cornerstone of neoadjuvant treatment for early-stage breast cancer. Given the variability in response rates, there is a critical need for predictive biomarkers that can personalize and improve the outcomes of taxane-based treatments.
Methods:
We developed a predictor of taxane response using gene expression profiles from fresh-frozen breast cancer biopsies. We generated gene signatures by (1) correlating gene expression data from Cancer Cell Line Encyclopedia (CCLE) with docetaxel half-maximal inhibitory concentration (IC50) values from the Genomics of Drug Sensitivity in Cancer (GDSC) and (2) using a variance filter based on >6000 patient biopsies. L2-regularized logistic regression was used to train and evaluate predictive models from the gene signatures on a publicly available dataset from Gene Expression Omnibus (GEO) (GEO ID: GSE140494), consisting of patients neoadjuvantly treated with taxane followed by 5-fluorouracil, epirubicin and cyclophosphamide (T-FEC), and limited to those with binary outcome of pathological complete response (pCR) or no change (n=42). To ensure cross-cohort compatibility, analysis was restricted to genes common to Affymetrix Human Genome U133A and U133 Plus 2.0 arrays. Model performance was assessed using the median area under the curve (AUC) across 20 iterations.
Results:
The top-performing model, comprising 113 genes, achieved a mean test AUC of 0.73 (SD=0.13). This model was validated across five independent datasets (GEO IDs: GSE20194, GSE20271, GSE23988, GSE32646, GSE230881) with a total of 819 patients receiving neoadjuvant T-FEC treatment and 161 pCR cases. The model significantly distinguished pCR from non-responders (non-pCR) in all cohorts (weighted average AUC = 0.769; SD = 0.039 across datasets; all p-values < 0.01). The signature is enriched for genes involved in mitotic cell cycle, spindle assembly and chromosome segregation.
Conclusions:
We present a machine learning-based predictor of taxane response trained on cell line and patient-derived data. The model demonstrates robust performance across five independent neoadjuvant breast cancer cohorts and holds promise for guiding taxane use in early- and late-stage disease.
All authors and affiliations
Tobias Berg (presenting author): Berg, T. (1, 2)
Jacob Hansen Niklassen: Niklassen, J.H. (3)
Bent Ejlertsen: Ejlertsen, B. (1, 2)
Beatrice Hahn: Hahn, B. (3)
Jan Nart: Nart, J. (3)
Peter Buhl Jensen: Jensen, P.B. (3)
Ulla Hald Buhl: Buhl, U.H. (3)
Ida Kappel Buhl: Buhl, I.K. (3)
Berg, T. (1, 2), Niklassen, J.H. (3), Ejlertsen, B. (1, 2), Hahn, B. (3), Nart, J. (3), Jensen, P.B. (3), Buhl, U.H. (3), Buhl, I.K. (3)
1: Department of Oncology, Rigshospitalet, Denmark
2: Danish Breast Cancer Group, Rigshospitalet, Denmark
3: Aida Oncology ApS, Denmark
Contact:
Jacob Hansen Niklassen: jacob@aidaoncology.com +45 28923818