AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Our method consistently and significantly outperforms the alternative baselines in terms of p@1, mean average precision, normalized discounted cumulative gain, and Mean Reciprocal Rank
Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System.
SIGIR, pp.55-64, (2016)
To establish an automatic conversation system between humans and computers is regarded as one of the most hardcore problems in computer science, which involves interdisciplinary techniques in information retrieval, natural language processing, artificial intelligence, etc. The challenges lie in how to respond so as to maintain a relevant ...More
PPT (Upload PPT)
- To have a virtual assistant and/or chat companion system in open domains with adequate artificial intelligence has seemed illusive, and might only exist in Sci-Fi movies for a long time.
- The goal of creating an automatic human-computer conversation system, as the personal assistant or chat companion, is no longer an illusion far away.
- It is likely to be a great timing to build data-driven, open-domain conversation systems between humans and computers.
- The conversational inputs are restricted and predictable; it would be easier—compared with open-domain systems—to design the logic, create the rules, prepare the data and construct the candidate replies to handle the particular task .
- The underlying system design philosophy is nearly impossible to generalize to the open domain
- To have a virtual assistant and/or chat companion system in open domains with adequate artificial intelligence has seemed illusive, and might only exist in Sci-Fi movies for a long time
- Our system outperforms standard and state-of-the-art baselines regarding a variety of evaluation metrics in terms of p@1, mean average precision (MAP), normalized discounted cumulative gain (nDCG) and Mean Reciprocal Rank (MRR) metrics
- We propose to establish an automatic conversation system between humans and computers
- Given a human-issued message as the query, our proposed system will return the corresponding responses based on a deep learning-to-respond schema
- There are 3 major contributions in this work: 1) we propose a contextual query reformulation framework with ranking fusions for the conversation task; 2) we integrate multi-dimension of ranking evidences, i.e., queries, postings, replies and contexts; 3) we establish the deep neural network architecture featured with above strategies and components
- Our method consistently and significantly outperforms the alternative baselines in terms of p@1, MAP, nDCG, and MRR
- EXPERIMENTS AND EVALUATION
the authors evaluate the model for conversation task against a series of baselines based on a huge conversation resource.
- The authors constructed the dataset of 1,606,583 samples to train the deep neural networks, 357,018 for validation, and 11,097 for testing.
- It is important that the dataset for learning does not overlap with the database for retrieval, so that the authors strictly comply with the machine learning regime.
- For each training and validation sample, the authors randomly chose a reply as a negative sample.
- The authors hired workers on a crowdsourcing platform to judge the appropriateness of 30 candidate replies retrieved for each query.
- Each sample was judged by 7 annotators via majority voting based on the appropriateness for the response given the query and contexts: “1” denotes an appropriate response and “0” indicates an inappropriate one
- Given the ranking lists for test queries, the authors evaluated the performance in terms of the following metrics: precision@1 (p@1), mean average precision (MAP) [31, 43], and normalized discounted cumulative gain [8, 41].
- The authors provided the top-k ranking list for the test queries using nDCG and MAP, which test the potential for a system to provide more than one appropriate responses as candidates.
- |T | Z log(1 + i) q∈T i=1 where T indicates the testing query set, k denotes the top-k position in the ranking list, and Z is a normalization factor obtained from a perfect ranking. ri is the relevance score for the i-th candidate reply in the ranking list (i.e., 1: appropriate, 0: inappropriate).
- The authors propose to establish an automatic conversation system between humans and computers.
- There are 3 major contributions in this work: 1) the authors propose a contextual query reformulation framework with ranking fusions for the conversation task; 2) the authors integrate multi-dimension of ranking evidences, i.e., queries, postings, replies and contexts; 3) the authors establish the deep neural network architecture featured with above strategies and components.
- The authors examine the effect of the proposed DL2R model with several baselines on a series of evaluation metrics.
- The authors can incorporate more additional features and more conversationoriented formulations, such as dialogue acts, conversational logics, and discourse structures, etc
- Table1: An example of the original microblog posting and the associated replies. Each posting might have more than one reply, e.g., Reply1 and Reply2. To create our database of conversation data, we separate different replies to a same post, and obtain ⟨post-reply⟩ pairs. We store two Posting-Reply pairs in the conversational dataset, i.e., ⟨P osting-Reply1 ⟩ and ⟨P ostingReply2 ⟩. User accounts are anonymized
- Table2: Part (I) indicates a real human (denoted by A) - computer (denoted by B) conversation scenario, while Part (II) indicates our proposed task modeling and formulations. A2 is the current user-issued query. We have contexts and reformulated queries as listed. ‘ ’ is the literal concatenation action. Note that the selected response Reply1 is associated with a P osting in the conversational database shown in Table 1
- Table3: Symbols and annotations for problem formulation
- Table4: Data statistics. Postings and replies are all unique
- Table5: Retrieval performance against baselines with our proposed adaption of contextual reformulation. ‘⋆’ indicates that we accept the improvement hypothesis of DL2R over the best baseline by Wilcoxon test at a significance level of 0.01. Performance of both generative methods and retrieval methods. For generative methods, they generate one response given each query. Hence the p@1 in fact refers to accuracy. Other metrics are not applicable
- Table6: Performance evaluations of different contextual query reformulation strategies
- Table7: Performance evaluations of different components with multi-dimension of ranking evidences
- 2.1 Conversation Systems
Early work on conversation systems is generally based on rules or templates and is designed for specific domains [33, 36]. These rule-based approaches requires no data or little data for training, while instead require much manual effort to build the model, or to handcraft rules, which is usually very costly. The conversation structure and status tracking in vertical domains are more feasible to learn and infer . However, the coverage of such systems are also far from satisfaction. Later, people begin to pay more attention to automatic conversation systems in open domains [31, 6].
From specific domains to open domain, the need for a huge amount of data is increasing substantially to build a conversation system. As information retrieval techniques are developing fast, researchers obtain promising achievements in (deep) question and answering systems. In this way, an alternative approach is to build a conversation system with a knowledge base consisting of a number of question-answer pairs. Leuski et al build systems to select the most suitable response to the current message from the question-answer pairs using a statistical language model in crosslingual information retrieval , but have a major bottleneck of the creation of the knowledge base (i.e., question-answer pairs) . Researchers propose to augment the knowledge base with question-answer pairs derived from plain texts [24, 3]. The number of resource pairs can be, to some extent, expanded, but are still relatively small while the performance is not quite stable either.
- This work is supported by the National Basic Research Program of China (No 2014CB340505)
- Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1–127, 2009.
- F. Bessho, T. Harada, and Y. Kuniyoshi. Dialog system using real-time crowdsourcing and Twitter large-scale corpus. In SIGDIAL, pages 227–231, 2012.
- G. Cong, L. Wang, C.-Y. Lin, Y.-I. Song, and Y. Sun. Finding question-answer pairs from online forums. In SIGIR, pages 467–474.
- A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In Proc. Acoustics, Speech and Signal Processing, pages 6645–6649, 2013.
- H. He, K. Gimpel, and J. Lin. Multi-perspective sentence similarity modeling with convolutional neural networks. In EMNLP, pages 1576–1586, 2015.
- R. Higashinaka, K. Imamura, T. Meguro, C. Miyazaki, N. Kobayashi, H. Sugiyama, T. Hirano, T. Makino, and Y. Matsuo. Towards an open domain conversational system fully based on natural language processing. In COLING, 2014.
- B. Hu, Z. Lu, H. Li, and Q. Chen. Convolutional neural network architectures for matching natural language sentences. In NIPS, pages 2042–2050, 2014.
- K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4):422–446, 2002.
- Z. Ji, Z. Lu, and H. Li. An information retrieval approach to short text conversation. CoRR, abs/1408.6988, 2014.
- N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188, 2014.
- C.-J. Lee, Q. Ai, W. B. Croft, and D. Sheldon. An optimization framework for merging multiple result lists. In CIKM ’15, pages 303–312, 2015.
- A. Leuski, R. Patel, D. Traum, and B. Kennedy. Building effective question answering characters. In SIGDIAL, pages 18–27, 2009.
- A. Leuski and D. Traum. NPCEditor: Creating virtual human dialogue using information retrieval techniques. AI Magazine, 32(2):42–56, 2011.
- H. Li and J. Xu. Semantic matching in search. Foundations and Trends in Information Retrieval, 8:89, 2014.
- J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan. A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055, 2015.
- X. Li, L. Mou, R. Yan, and M. Zhang. Stalematebreaker: A proactive content-introducing approach to automatic human-computer conversation. In IJCAI, 2016.
- Z. Lu and H. Li. A deep architecture for matching short texts. In NIPS, pages 1367–1375, 2013.
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.
- L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional neural networks over tree structures for programming language processing. In AAAI, pages 1287–1292, 2016.
- L. Mou, H. Peng, G. Li, Y. Xu, L. Zhang, and Z. Jin. Discriminative neural sentence modeling by tree-based convolution. In EMNLP, pages 2315–2325, 2015.
- L. Mou, M. Rui, G. Li, Y. Xu, L. Zhang, R. Yan, and Z. Jin. Recognizing entailment and contradiction by tree-based convolution. arXiv preprint arXiv:1512.08422, 2015.
- M. Nakano, N. Miyazaki, N. Yasuda, A. Sugiyama, J.-i. Hirasawa, K. Dohsaka, and K. Aikawa. WIT: A toolkit for building robust and real-time spoken dialogue systems. In SIGDIAL, pages 150–159.
- E. Nouri, R. Artstein, A. Leuski, and D. R. Traum. Augmenting conversational characters with generated question-answer pairs. In AAAI Fall Symposium: Question Generation, 2011.
- H. Palangi, L. Deng, Y. Shen, J. Gao, X. He, J. Chen, X. Song, and R. Ward. Deep sentence embedding using the long short term memory network: Analysis and application to information retrieval. arXiv preprint arXiv:1502.06922, 2015.
- A. Ritter, C. Cherry, and W. B. Dolan. Data-driven response generation in social media. In EMNLP, pages 583–593, 2011.
- T. Rocktäschel, E. Grefenstette, K. M. Hermann, T. Kocisky, and P. Blunsom. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664, 2015.
- A. Severyn and A. Moschitti. Learning to rank short text pairs with convolutional deep neural networks. In SIGIR ’15, pages 373–382.
- L. Shang, Z. Lu, and H. Li. Neural responding machine for short-text conversation. In ACL-IJCNLP, pages 1577–1586, 2015.
- R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. In EMNLP, pages 151–161, 2011.
- H. Sugiyama, T. Meguro, R. Higashinaka, and Y. Minami. Open-domain utterance generation for conversational dialogue systems using Web-scale dependency structures. In SIGDIAL, pages 334–338, 2013.
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104–3112, 2014.
- M. A. Walker, R. Passonneau, and J. E. Boland. Quantitative and qualitative evaluation of darpa communicator spoken dialogue systems. In ACL, pages 515–522, 2001.
- R. S. Wallace. The Anatomy of ALICE. Springer, 2009.
- H. Wang, Z. Lu, H. Li, and E. Chen. A dataset for research on short-text conversations. In EMNLP, pages 935–945, 2013.
- J. Williams, A. Raux, D. Ramachandran, and A. Black. The dialog state tracking challenge. In SIGDIAL, pages 404–413, 2013.
- Y. Xu, R. Jia, L. Mou, G. Li, Y. Chen, Y. Lu, and Z. Jin. Improved relation classification by deep recurrent neural networks with data augmentation. arXiv preprint arXiv:1601.03651, 2016.
- Y. Xu, L. Mou, G. Li, Y. Chen, H. Peng, and Z. Jin. Classifying relations via long short term memory networks along shortest dependency paths. In EMNLP, 2015.
- R. Yan. i, poet: Automatic poetry composition through recurrent neural networks with iterative polishing schema. In IJCAI, 2016.
- R. Yan, M. Lapata, and X. Li. Tweet recommendation with graph co-ranking. In ACL, pages 516–525, 2012.
- R. Yan, C.-T. Li, H.-P. Hsieh, P. Hu, X. Hu, and T. He. Socialized language model smoothing via bi-directional influence propagation on social networks. In WWW ’16, pages 1395–1405, 2016.
- R. Yan, X. Wan, J. Otterbacher, L. Kong, X. Li, and Y. Zhang. Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. In SIGIR ’11, pages 745–754, 2011.
- R. Yan, I. E. Yen, C.-T. Li, S. Zhao, and X. Hu. Tackling the achilles heel of social networks: Influence propagation based language model smoothing. In WWW ’15, pages 1318–1328, 2015.
- K. Zhai and D. J. Williams. Discovering latent structure in task-oriented dialogues. In ACL, pages 36–46, 2014.
- B. Zhang, J. Su, D. Xiong, Y. Lu, H. Duan, and J. Yao. Shallow convolutional neural network for implicit discourse relation recognition. In EMNLP, pages 2230–2235, 2015.