AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
In addition to providing better learning techniques, developing an understanding for when and why learning works in this context is a necessary step in studying the role of learning in higher-level natural language inferences

Learning in Natural Language

Learning in Natural Language, pp.898-904, (2000)

Cited by: 84|Views137
EI
Full Text
Bibtex
Weibo

Abstract

Statistics-based classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, leaving open the question of why these app...More

Code:

Data:

Introduction
  • Generative probability models provide a principled way to the study of statistical classification in complex domains such as natural language.
  • In the context of natural language most classifiers are derived from probabilistic language models which estimate the probability of a sentence s, say, using Bayes rule, and decompose this probability into a product of conditional probabilities according to the generative model assumptions.
  • The generative models used to estimate these terms typically make Markov or other independence assumptions
  • It is evident from looking at language data that these assumptions are often patently false and that there are significant global dependencies both within and across sentences.
  • Classifiers built based on these false assumptions seem to behave quite robustly in many cases
Highlights
  • Generative probability models provide a principled way to the study of statistical classification in complex domains such as natural language
  • When using (Hidden) Markov Model (HMM) as a generative model for the problem of part-of-speech tagging, estimating the probability of a sequence of tags involves assuming that the part of speech tag ti of the word Wi is independent of other words in the sentence, given the preceding tag ti-1
  • We show that a variety of models used for learning in Natural Language make their prediction using Linear Statistical Queries (LSQ) hypotheses
  • We show how different models used in the literature can be cast as Linear Statistical Queries hypotheses by selecting the statistical queries appropriately and how this affects the robustness of the derived hypothesis
  • Our goal is to show that an algorithm that is able to learn under these restrictions is guaranteed to produce a robust hypothesis
  • In addition to providing better learning techniques, developing an understanding for when and why learning works in this context is a necessary step in studying the role of learning in higher-level natural language inferences
Conclusion
  • In the last few years the authors have seen a surge of empirical work in natural language. A significant part of this work is done by using statistical machine learning techniques. Roth [1998] has investigated the relations among some of the commonly used methods and taken preliminary steps towards developing a better theoretical understanding for why and when different methods work.
  • Roth [1998] has investigated the relations among some of the commonly used methods and taken preliminary steps towards developing a better theoretical understanding for why and when different methods work.
  • In addition to providing better learning techniques, developing an understanding for when and why learning works in this context is a necessary step in studying the role of learning in higher-level natural language inferences
Reference
  • [Anthony and Holden, 1993] M. Anthony and S. Holden. On the power of polynomial discriminators and radial basis function networks. In Proc. 6th Annu. Workshop on Cornput Learning Theory, pages 158-164. ACM Press, New York, NY, 1993.
    Google ScholarLocate open access versionFindings
  • [Aslam and Decatur, 1995] J. A. Aslam and S. E. Decatur. Specification and simulation of statistical query algorithms for efficiency and noise tolerance. In Proc. 8th Annu. Conf. on Comput. Learning Theory, pages 437-446. ACM Press, New York, NY, 1995.
    Google ScholarLocate open access versionFindings
  • [Darroch and Ratcliff, 1972] J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43(5): 1470-1480, 1972.
    Google ScholarLocate open access versionFindings
  • [Decatur, 1993] S. E. Decatur. Statistical queries and faulty PAC oracles. In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory, pages 262268. ACM Press, 1993.
    Google ScholarLocate open access versionFindings
  • [Delcher et al., 1993] A. Delcher, S. Kasif, H. Goldberg, and W. Xsu. Application of probabilistic causal trees to analysis of protein secondary structure. In National Conference on Artificial Intelligence, pages 316-321, 1993.
    Google ScholarLocate open access versionFindings
  • [Duda and Hart, 1973] R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.
    Google ScholarFindings
  • [Gale et al, 1993] W. Gale, K. Church, and D. Yarowsky. A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26:415-439, 1993.
    Google ScholarLocate open access versionFindings
  • [Golding and Roth, 1999] A. R. Golding and D. Roth. A winnow based approach to context-sensitive spelling correction. Machine Learning, 1999. Special issue on Machine Learning and Natural Language;. Preliminary version appeared in ICML-96.
    Google ScholarLocate open access versionFindings
  • [Golding, 1995] A. R. Golding. A Bayesian hybrid method for context-sensitive spelling correction. In Proceedings of the 3rd workshop on very large corpora, ACL-95, 1995.
    Google ScholarLocate open access versionFindings
  • [Grove and Roth, 1998] A. Grove and D. Roth. Linear concepts and hidden variables: An empirical study. In Neural Information Processing Systems. MIT Press, 1998.
    Google ScholarLocate open access versionFindings
  • [Haussler, 1992] D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78150, September 1992.
    Google ScholarLocate open access versionFindings
  • [Hoffgen and Simon, 1992] K. Hoffgen and H. Simon. Robust trainability of single neurons. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 428-439, New York, New York, 1992. ACM Press.
    Google ScholarLocate open access versionFindings
  • [Jaynes, 1982] E. T. Jaynes. On the rationale of maximumentropy methods. Proceedings of the IEEE, 70(9):939-952, September 1982.
    Google ScholarLocate open access versionFindings
  • [Kearns et al., 1992] M. J. Kearns, R. E. Schapire, and L. M. Sellie. Toward efficient agnostic learning. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 341352. ACM Press, New York, NY, 1992.
    Google ScholarLocate open access versionFindings
  • [Kearns, 1993] M. Kearns. Efficient noise-tolerant learning from statistical queries. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pages 392-401, 1993.
    Google ScholarLocate open access versionFindings
  • [Kupiec, 1992] J. Kupiec. Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language, 6:225-242, 1992.
    Google ScholarLocate open access versionFindings
  • [Rabiner, 1989] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): 257-285, 1989.
    Google ScholarLocate open access versionFindings
  • [Ratnaparkhi et al, 1994] A. Ratnaparkhi, J. Reynar, and S. Roukos. A maximum entropy model for prepositional phrase attachment. In ARPA, Plainsboro, NJ, March 1994.
    Google ScholarLocate open access versionFindings
  • [Ratnaparkhi, 1997] A. Ratnaparkhi. A linear observed time statistical parser based on maximum entropy models. In EMNLP-97, The Second Conference on Empirical Methods in Natural Language Processing, pages 1-10, 1997.
    Google ScholarLocate open access versionFindings
  • [Roth and Zelenko, 1998] D. Roth and D. Zelenko. Part of speech tagging using a network of linear separators. In COLING-ACL 98, The 17th International Conference on Computational Linguistics, pages 1136-1142, 1998.
    Google ScholarLocate open access versionFindings
  • [Roth, 1998] D. Roth. Learning to resolve natural language ambiguities: A unified approach. In Proc. National Conference on Artificial Intelligence, pages 806-813, 1998.
    Google ScholarLocate open access versionFindings
  • [Schiitze, 1995] H. Schiitze. Distributional part-of-speech tagging. In Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics, 1995.
    Google ScholarLocate open access versionFindings
  • [Valiant, 1984] L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, November 1984.
    Google ScholarLocate open access versionFindings
  • [Vapnik, 1982] V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York, 1982.
    Google ScholarFindings
  • [Vapnik, 1995] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
    Google ScholarFindings
  • [Yamanishi, 1992] K. Yamanishi. A learning criterion for stochastic rules. Machine Learning, 1992.
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科