Reinforcement Learning To Optimize The Treatment Of Multiple Myeloma

BLOOD(2019)

引用 4|浏览48
暂无评分
摘要
Over the last decade we have witnessed an explosion in the number of therapeutic options available to patients with multiple myeloma (MM). In spite of the marked improvements in patient outcomes paralleling these approvals, MM remains an incurable malignancy for the vast majority of patients following a course of therapeutic successes and failures. As such, there remains a dire need to develop new tools to improve the management of MM patients. A number of groups are leading efforts to combine big data and artificial intelligence to better inform patient care via precision medicine. At Moffitt, in collaboration with the M2Gen/ORIEN (Oncology Research Information Exchange Network), we have begun to accumulate big data in MM. Patients opt in to (consent) for collection of rich clinical data (demographics, staging, risk, complete disease course treatment data) and in the setting of bone marrow biopsy the allocation of CD138-selected cells for molecular analysis (whole exome sequencing (WES) and RNA sequencing as well as peripheral blood mononuclear cells for WES). To date, we have collected over 1000 samples for over 800 individual patients with plasma cell disorders. In the setting of oncology, the ultimate goal of model will be selection of ideal treatments. We expect that AI analysis may validate of patient response to treatments and enable cohort selection, as real patient cohorts can be selected from those predicted by the model. One approach is to utilize reinforcement learning (RL). In RL, the algorithm attempts to learn actions to optimize a type action a defined state and weight any tradeoffs for maximal reward. Our initial utilization of RL involved a relatively small cohort of 402 patients with treatment medication data. This encompassed 1692 lines of treatment with a mean of 4.21 lines of therapy per patient (Median of 4 lines per patient). This included 132 combinations of 22 myeloma therapeutics. The heterogeneity in treatment is highlighted by the fact that no pathways overlap after line 4. Each Q-value in Q-table is the current reward for an action in a state plus the discounted anticipated future reward for taking that action. Iteration helps you converge on the actual values for the future reward (can be model-free). The end result is a policy, P(s), that tells you what the ideal action is at state. There are a near infinite number of possible states, considering treatment history, age, GEP, cytogenetics, comorbidities, staging and others. We presume that action makes intuitive sense as medication (treatment) only and that reward should be some form of treatment response. We have begun the iterative process of trying different state and reward functions. Median imputation shows 5% improvement in response accuracy over listwise, but median imputation throws off practical accuracy in a binary reward case. While we found that the exercise has great potential. We found that there are possible improvements (multiple imputation). We will need to expand covariate analysis. Combinatorics need to be considered in machine learning in medium-sized data sets. Model-free machine learning is limited on medium-sized data. As such, combined resources and/or utilization of large networks such as ORIEN will be critical for the successful integration of RL or other AI tools in MM. We also learned that adding variables to the model doesn't necessarily increase accuracy. Future work will involve continued application of alternate state/reward functions. Loosen iQ-learning framework to allow for better covariate selection for state/reward functions. Improve imputation techniques to include more covariates and have more certainty in model accuracy. We may also refine accuracy metric to allow for prediction of bucketed response and temporal disease burden (M-spike vs. time). Updated data on a larger cohort will be presented at the annual meeting.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要