Agreement And Reliability Of A New Polysomnography Sleep Staging Algorithm Against Multiple Human Scorers

Ulysses Magalang,Brendan Keenan,Bethany Staley,Peter Anderer,Marco Ross,Andreas Cerny,Raymond Vasko,Samuel Kuna,Jessie Bakker

Sleep（2021）

引用 1|浏览3

暂无评分

摘要

Abstract Introduction Scoring algorithms have the potential to increase polysomnography (PSG) scoring efficiency while also ensuring consistency and reproducibility. We sought to validate an updated sleep staging algorithm (Somnolyzer; Philips, Monroeville PA USA) against manual sleep staging, by analyzing a dataset we have previously used to report sleep staging variability across nine center-members of the Sleep Apnea Global Interdisciplinary Consortium (SAGIC). Methods Fifteen PSGs collected at a single sleep clinic were scored independently by technologists at nine SAGIC centers located in six countries, and auto-scored with the algorithm. Each 30-second epoch was staged manually according to American Academy of Sleep Medicine criteria. We calculated the intraclass correlation coefficient (ICC) and performed a Bland-Altman analysis comparing the average manual- and auto-scored total sleep time (TST) and time in each sleep stage (N1, N2, N3, rapid eye movement [REM]). We hypothesized that the values from auto-scoring would show good agreement and reliability when compared to the average across manual scorers. Results The participants contributing to the original dataset had a mean (SD) age of 47 (12) years and 80% were male. Auto-scoring showed substantial (ICC=0.60-0.80) or almost perfect (ICC=0.80-1.00) reliability compared to manual-scoring average, with ICCs (95% confidence interval) of 0.976 (0.931, 0.992) for TST, 0.681 (0.291, 0.879) for time in N1, 0.685 (0.299, 0.881) for time in N2, 0.922 (0.791, 0.973) for time in N3, and 0.930 (0.811, 0.976) for time in REM. Similarly, Bland-Altman analyses showed good agreement between methods, with a mean difference (limits of agreement) of only 1.2 (-19.7, 22.0) minutes for TST, 13.0 (-18.2, 44.1) minutes for N1, -13.8 (-65.7, 38.1) minutes for N2, -0.33 (-26.1, 25.5) minutes for N3, and -1.2 (-25.9, 23.5) minutes for REM. Conclusion Results support high reliability and good agreement between the auto-scoring algorithm and average human scoring for measurements of sleep durations. Auto-scoring slightly overestimated N1 and underestimated N2, but results for TST, N3 and REM were nearly identical on average. Thus, the auto-scoring algorithm is acceptable for sleep staging when compared against human scorers. Support (if any) Philips.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要