De-Identification of Student Writing in Technologically Mediated Educational Settings

Polyphonic Construction of Smart Learning Ecosystems(2022)

引用 1|浏览8
暂无评分
摘要
When conducting research with data from smart learning systems, there is a need to protect user identities because the release of personally identifiable information (PII) poses a significant risk to participants and creates a barrier to analyzing data and/or creating open datasets. Massive open online courses (MOOCs) are a good example of learning systems where PII concerns may hamper data analysis, the well-being of users, and system innovation. PII is particularly hard to locate and clean because of the variations in formatting, texts, and assignments found in unstructured data. In particular, identifying and removing students’ names has proven difficult. This study examines the potential to use large, pre-trained language models to de-identify MOOC data and compares performance on these language models to human annotations. On a validation set, a pre-trained language model fine-tuned using spaCy default hyperparameters achieved 97% recall of student names in the validation set, including partial matches, and 30% precision. On a larger, unseen test set (n = 3,077), the model achieved 93% recall and 24% precision. The majority of the false positives leading to lower recall in the test set were known names belonging to authors and/or lecturers. The results of the ensemble approach used here show considerable promise for a difficult de-identification task and indicate that automated de-identification is, likely, mature enough for use on some education datasets. Clearing PII from smart learning systems would ethically protect learners within the systems, allowing for the release of large datasets that could be analyzed for intelligent insights to forward innovation within smart learning systems.
更多
查看译文
关键词
De-identification, Personally identifiable information, Natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要