CRFs based de-identification of medical records

Journal of Biomedical Informatics(2015)

引用 43|浏览13
暂无评分
摘要
Display Omitted The method used in our work is completely machine-learning-based.Pattern-matching techniques are used in the preprocessing module.Three groups of features are extracted to train the de-identifier model.Missing errors are the main component of the errors in system output. De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which tokenizes the medical records using regular expressions and an off-the-shelf tokenizer, is introduced, and three groups of features are extracted to train the de-identifier model. The experiment shows that our system is effective in the de-identification of medical records, achieving a micro-F1 of 0.9232 at the i2b2 strict entity evaluation level.
更多
查看译文
关键词
Conditional random fields,De-identification,Medical records,Protected health information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要