Learning extraction patterns for subjective expressions
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing, pp.105-112, (2003)
This paper presents a bootstrapping process that learns linguistically rich extraction patterns for subjective (opinionated) expressions. High-precision classifiers label unannotated data to automatically create a large training set, which is then given to an extraction pattern learning algorithm. The learned patterns are then used to ide...More
PPT (Upload PPT)
- Many natural language processing applications could benefit from being able to distinguish between factual and subjective information.
- Question answering systems should distinguish between factual and speculative answers.
- Document-level classification can distinguish between “subjective texts”, such as editorials and reviews, and “objective texts,” such as newspaper articles.
- Editorial articles frequently contain factual information to back up the arguments being made, and movie reviews often mention the actors and plot of a movie as well as the theatres where it’s currently playing.
- Newspaper articles are generally considered to be relatively objective documents, but in a recent study (Wiebe et al, 2001) 44% of sentences in a news collection were found to be subjective
- Many natural language processing applications could benefit from being able to distinguish between factual and subjective information
- We have developed a bootstrapping process for subjectivity classification that explores three ideas: (1) highprecision classifiers can be used to automatically identify subjective and objective sentences from unannotated texts, (2) this data can be used as a training set to automatically learn extraction patterns associated with subjectivity, and (3) the learned patterns can be used to grow the training set, allowing this entire process to be bootstrapped
- The scheme was inspired by work in linguistics and literary theory on subjectivity, which focuses on how opinions, emotions, etc. are expressed linguistically in context (Banfield, 1982)
- We evaluated whether the learned patterns can improve the coverage of the highprecision subjectivity classifier (HP-Subj), to complete the bootstrapping loop depicted in the top-most dashed line of Figure 1
- We showed that an extraction pattern learning technique can learn subjective expressions that are linguistically richer than individual words or fixed phrases
- We augmented our original high-precision subjective classifier with these newly learned extraction patterns. This bootstrapping process resulted in substantially higher recall with a minimal loss in precision
- 4.1 Subjectivity Data
The text collection that the authors used consists of Englishlanguage versions of foreign news documents from FBIS, the U.S Foreign Broadcast Information Service.
- The authors' system takes unannotated data as input, but the authors needed annotated data to evaluate its performance.
- The scheme was inspired by work in linguistics and literary theory on subjectivity, which focuses on how opinions, emotions, etc.
- The goal is to identify and characterize expressions of private states in a sentence.
- Private state is a general covering term for opinions, evaluations, emotions, and speculations (Quirk et al, 1985).
- In sentence (1) the writer is expressing a negative evaluation
- This research explored several avenues for improving the state-of-the-art in subjectivity analysis.
- The authors augmented the original high-precision subjective classifier with these newly learned extraction patterns.
- This bootstrapping process resulted in substantially higher recall with a minimal loss in precision.
- The authors plan to experiment with different configurations of these classifiers, add new subjective language learners in the bootstrapping process, and address the problem of how to identify new objective sentences during bootstrapping
- Table1: Bootstrapping the Learned Patterns into the High-Precision Sentence Classifier
- Table2: Examples of Learned Patterns Used by HP-Subj and Sample Matching Sentences
- ∗This work was supported by the National Science Foundation under grants IIS-0208798, IIS-0208985, and IRI-9704240
Study subjects and analysis
documents with a total of 210 sentences: 13
A private state may have low, medium, high or extreme strength. To allow us to measure interannotator agreement, three annotators (who are not authors of this paper) independently annotated the same 13 documents with a total of 210 sentences. We begin with a strict measure of agreement at the sentence level by first considering whether the annotator marked any private-state expression, of any strength, anywhere in the sentence
- C. Baker, C. Fillmore, and J. Lowe. 1998. The Berkeley FrameNet Project. In Proceedings of the COLING-ACL-98.
- T. Ballmer and W. Brennenstuhl. 1981. Speech Act Classification: A Study in the Lexical Analysis of English Speech Activity Verbs. Springer-Verlag.
- A. Banfield. 1982. Unspeakable Sentences. Routledge and Kegan Paul, Boston.