O-15 Occupational Health: A Multi-Cohort Job Title Cleaning Project by Algorithm

Occupational and Environmental Medicine(2021)

引用 0|浏览4
暂无评分
摘要
Introduction Occupational data in prospective cohort studies is often underutilized due to the human and financial resources required to code open-ended text, such as job titles. Recognizing the value of occupational data in health research, as well as potential errors associated with manual coding, an Automated Coding Algorithm (ACA)-NOC algorithm was developed utilizing a Natural Language Processing approach. Objectives We tested the ACA-NOC algorithm on two regional cohorts of a pan-Canadian cohort study, which represents the largest dataset an algorithm of this kind has been applied to. This process will harmonize and greatly expand the utility of the occupational data, enrich the research platforms, and further refine the efficiency of the algorithm. Methods The ACA-NOC algorithm was tested on data from the Canadian Partnership for Tomorrow’s Health (CanPath), a longitudinal cohort examining the role of genetic, environmental, lifestyle, and behavioural factors in the development of cancer and chronic disease. Using an iterative and interactive approach, the algorithm was applied to job title data from 111,000 questionnaires from two regional cohorts, coding the data to the Canadian National Occupation Classification (NOC) system. The algorithm was further refined based on each round of analysis, increasing the quantity of accurately coded data. Results Results from this research demonstrate the ability to refine the ACA-NOC algorithm with a 10% overall improvement in exact matching from the baseline algorithm. There were also instances where the algorithm performance was superior to the manual coding. The utilization of the algorithm offers significant savings in time, human resources and cost compared to a singular manual coding approach. Conclusions The coding and harmonization of this multi-cohort data demonstrates the value of the ACA-NOC algorithm, while increasing the utility of the CanPath data and research related to occupational health. Future research may involve comparisons between CanPath and international cohorts.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要