Summary
This project leverages machine learning to enhance the accuracy of chronic health condition classification in the Childhood Cancer Survivor Study (CCSS) by integrating self-reported data with clinical assessments, genetic sequencing, and treatment data from the St. Jude Lifetime Cohort (SJLIFE).
What they want
The project aims to refine chronic health condition classification for conditions like diabetes, hypertension, and cardiomyopathy. Machine learning methods will identify patterns in misclassification by leveraging predictors such as treatment exposures, genetic risk scores, demographic factors, and complex dependencies among survey responses. Training data will include 2,000 survivors from both CCSS and SJLIFE, plus 25,735 CCSS participants. Robust predictive models will be developed and evaluated using advanced cross-validation, regularization, ensemble methods, and interpretability tools (SHAP, LIME) to avoid overfitting, followed by independent validation in 436 remaining survivors from both cohorts.
Deliverables
- Refined chronic health condition classification for 25,735 CCSS survivors
- Robust predictive models of CCSS participants’ chronic health condition classifications
- Methodological insights to inform future CCSS analyses
Technical requirements
- Machine learning methods
- Cross-validation techniques
- Regularization
- Ensemble methods
- Interpretability tools (SHAP, LIME)
- Germline whole genome sequencing