Comparability Among Modes of Data Collection for Patient-Reported Outcome Measures: Opening the Gates for Faithful Migration.

Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research（2023）

引用 0|浏览3

暂无评分

摘要

ISPOR has recently updated its recommendations on the evidence needed to support measurement comparability among modes of data collection for patient-reported outcome measures (PROMs).1O’Donohoe P, Reasner DS, Kovacs SM, et al. Updated recommendations on evidence needed to support measurement comparability among modes of data collection for patient-reported outcome measures: a Good Practices Report of an ISPOR Task Force. Value Health. In pressGoogle Scholar This new ISPOR Good Practices report is an update to the recommendations from the ISPOR 2009 Electronic Patient-Reported Outcomes (ePROs)2Coons S.J. Gwaltney C.J. Hays R.D. et al.Recommendations on evidence needed to support measurement equivalence between electronic and paper-based patient-reported outcome (PRO) measures: ISPOR ePRO Good Research Practices Task Force report.Value Health. 2009; 12: 419-429Abstract Full Text PDF PubMed Scopus (360) Google Scholar and 2014 PRO Mixed Modes Good Research Practices Task Force3Eremenco S. Coons S.J. Paty J. et al.PRO data collection in clinical trials using mixed modes: report of the ISPOR PRO mixed modes good research practices task force.Value Health. 2014; 17: 501-516Abstract Full Text Full Text PDF PubMed Scopus (45) Google Scholar reports. The final report, which was the product of substantial effort by task force members, incorporates feedback from 2 rounds of written review by ISPOR’s Clinical Outcomes Assessment (COA) Special Interest Group. Finalization of these recommendations is an important accomplishment given the enormous amount of new research in this field, the challenges of collaborating across 2 separate task forces, and the coordination needed to achieve consensus among the diverse stakeholders represented, including the US Food and Drug Administration, academia, research organizations, electronic COA service providers, and the pharmaceutical industry. The COA field has evolved since the 2009 and 2014 reports were published, especially as researchers navigate collecting COA data during a pandemic. Those earlier reports created the foundation of a “faithful migration” of instruments from 1 mode to another, typically from paper-based administration to electronic data collection. The stated goal of a faithful migration was ensuring that subjects interpret and respond to the questions/items on the PRO instrument the same way, regardless of the mode of data collection. Furthermore, earlier reports’ recommendations discouraged mixing modes within a single study, if possible, especially if there was not documented and sufficient evidence to support pooling scores from the different modes. Multiple research designs for gathering necessary evidence were described for both qualitative and quantitative studies. Finally, readers were warned of the risk of increased measurement error caused by the pooled scores, which may lead to a loss of power to detect treatment benefit, and they were cautioned against ignoring their recommendations that discouraged mixing modes. Between the 2014 and 2023 reports, the COA field experienced many changes. The 2023 report provides an excellent summary of the immense quantity of published research in this area, with detailed references that can facilitate deeper investigation. Building upon these resources and the insights from the task force members, the 2023 report authors describe 2 important goals for the update:1.To empower readers to make a meaningful assessment of their specific situation, which will vary by PROM, technology, and patient population; and2.To make the good practices generalizable enough to apply to future scenarios and technologies. The report goes a long way toward meeting these goals, but ISPOR could produce a valuable and practical companion to the report, by providing case studies of the complete process. These case studies could serve as a detailed roadmap for considering measurement comparability and evaluating supportive evidence. The report states that additional testing is unnecessary if, (1) a faithful migration has been completed following ePRO design best practices and (2) sufficient evidence exists to support the changes implemented. In these cases, providing evidence of best practices being followed and a summary of the existing literature should be adequate. One way to demonstrate that best practices have been followed, as recommended in the report, is to document an expert screen review from a qualified individual. It seems plausible that the expert screen reviewer would be conducting an informal forward and backward migration assessment, similar to the evaluation process conducted to support language/cultural translations. The documentation of the expert screen review may take a similar form to the translation certificates typically provided in support of a PRO with multiple translations included in a specific study. Although this report provides a consolidation of the research at this point and outlines good practice, there will be further debate on a number of issues. One debate is the remaining use of the term equivalence: “Over the last decade, a substantial amount of evidence emerged that repeatedly demonstrated measurement comparability (also referred to as ‘equivalence’) between paper and electronic modes, and among different electronic modes for a multitude of PROMs, meaning that scores recorded for the same item using different modes of data collection fall within an accepted range of each other.”1O’Donohoe P, Reasner DS, Kovacs SM, et al. Updated recommendations on evidence needed to support measurement comparability among modes of data collection for patient-reported outcome measures: a Good Practices Report of an ISPOR Task Force. Value Health. In pressGoogle Scholar However, comparability is not equivalence. As mentioned later in this report, multiple studies have demonstrated that capturing PROM data electronically can result in more complete data, improved compliance related to the timing of completion, and avoidance of data entry errors and skip pattern problems. With these advantages, electronic data capture is likely to result in less measurement error and “better” measurement rather than “equivalent” measurement. Given the conventional use of ePRO instruments, the evidence cited in the new report is adequate for comparability purposes, but we suggest that the quantitative evidence proposed does not support the strict definition of equivalence. The report also suggests that “for evidence to be considered sufficient, it should be relevant to the question (ie, involve the measure of interest or a similar measure), be unbiased and reflect balanced research, and support the assumption of measurement comparability and, furthermore, the preponderance of available evidence should point to the same conclusion.” We agree with this, but we recommend that greater emphasis is placed on the “use” of the measure’s scores when considering comparability. Measures are neither valid nor invalid; validity concerns the use of the measure, and so it is important for researchers to carefully consider whether the evidence is sufficient to support the proposed use of the measure. The relevance of the supporting evidence may be judged as inadequate based on details of the intended study design and the construction of the study endpoints using the PRO scores. For example, typical comparability studies are conducted over a relatively short timeframe and, by necessity, focus on patients who are stable in their disease state. Clinical trials often span several years and patients may experience significant progression or deterioration in disease severity. In such circumstances, evidence suggesting that these long-term changes in the patients’ status do not affect the comparability of scores across time (ie, severity) would be needed. Another topic to consider, which is specifically related to the transition to ePRO and “bring your own device,” is security and privacy. For example, in an environment with an increased use of bots, administrations that are not secure risk data integrity and, therefore, the scores based on ePRO could include greater measurement error rather than less. Discussions of these topics are beyond the remit of the report, but they are still important to the process of faithful migration and achieving the advantages of electronic administration. Coons et al4Coons S.J. Eremenco S. Lundy J.J. O’Donohoe P. O’Gorman H. Malizia W. Capturing patient-reported outcome (PRO) data electronically: the past, present, and promise of EPRO measurement in clinical trials.Patient. 2015; 8: 301-309Crossref PubMed Scopus (126) Google Scholar provide the key points to consider. The critical question is the use of scores. In most clinical trial programs, the focus is on group-level decisions, rather than on individual decisions (in which score equivalence would be the hurdle rather than comparability). We agree that it is not often necessary to do a full comparability study, but we think that it is still useful to at least collect relevant data within the clinical trial of interest that can be useful for assessing the mode effect. Such practices are similar to the inclusion of other aspects of the study design (eg, site, baseline value) within the core efficacy evaluations to assess the robustness of conclusions. If there is concern about modes not being adequately mixed across treatment groups, and if the sample size allows, post hoc evaluations - such as a review of the empirical cumulative distribution functions by treatment and mode may be considered. The typical quantitative evaluation described in the report (and adequate for the typical faithful migration) is based on smaller sample sizes and methods such as intraclass correlation coefficients, Bland-Altman plots, and comparison of psychometric properties (eg, construct validity correlations, internal consistency reliability) to assess the similarities in the patterns of results. These studies are likely powered to detect only large-score differences. If the evaluation is intended for individual decisions, Lee et al5Lee M.K. Beebe T.J. Yost K.J. et al.Score equivalence of paper-, tablet-, and interactive voice response system-based versions of PROMIS, PRO-CTCAE, and numerical rating scales among cancer patients.J Patient Rep Outcomes. 2021; 5: 95Crossref PubMed Scopus (2) Google Scholar and Terluin et al6Terluin B. Brouwers E.P.M. Marchand M.A.G. de Vet H.C.W. Assessing the equivalence of web-based and paper-and-pencil questionnaires using differential item and test functioning (DIF and DTF) analysis: a case of the Four-Dimensional Symptom Questionnaire (4DSQ).Qual Life Res. 2018; 27: 1191-1200Crossref PubMed Scopus (8) Google Scholar describe methods aimed at assessing score equivalence that incorporate comparisons using differential item functioning, differential test functioning, and a priori equivalence margins. The sample sizes for these studies were quite large and supported these more sophisticated quantitative analyses. For designs that may result in a large proportion of participants changing modes throughout the study, we agree with the report that researchers should consider stricter assumptions regarding device selection. For example, suppose the device chosen at a specific time point depends, to some degree on the disease severity level of the patient. Including the mode of data collection as a covariate within the efficacy analysis would mask the estimated (adjusted) differences on the PRO. If the scores from different modes are not comparable, they would have to be adjusted before the analysis using, for example, item-response-theory-based estimates that account for different mode effects or scores equated across modes. When the impact of mode switching is a concern, it may be worthwhile to additionally collect data on why a particular device was chosen. These data could be used to develop an argument for why the device changing was not problematic. The task force closed its report with a statement of hope that researchers can now focus on the power of technology to provide patient experience insights, given that the comparability question has been “answered.” As a next step, we look forward to future reports in which the comparability or equivalence of scores are no longer the goals, because the technologies themselves have enabled us to capture what matters most to patients about their experiences. We may be able to go beyond our present-day assessments, by using tailored measures based on what matters most to each individual participant, and through the increased use of computerized adaptive measures.

查看译文

关键词

outcome measures,migration,data collection,patient-reported

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要