Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents

ACM Journal on Responsible Computing(2024)

Cited 0|Views13
No score
Abstract
From science to law enforcement, many research questions are answerable only by poring over a large amount of unstructured text documents. While people can extract information from such documents with high accuracy, this is often too time-consuming to be practical. On the other hand, automated approaches produce nearly-immediate results, but are not reliable enough for applications where near-perfect precision is essential. Motivated by two use cases from criminal justice, we consider the benefits and drawbacks of various human-only, human–machine, and machine-only approaches. Finding no tool well suited for our use cases, we develop a human-in-the-loop method for fast but accurate extraction of structured data from unstructured text. The tool is based on automated extraction followed by human validation, and is particularly useful in cases where purely manual extraction is not practical. Testing on three criminal justice datasets, we find that the combination of the computer speed and human understanding yields precision comparable to manual annotation while requiring only a fraction of time, and significantly outperforms the precision of all fully automated baselines.
More
Translated text
Key words
Human-computer collaboration,human-in-the-loop,information extraction,weak supervision
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined