QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

    ICLR, Volume abs/1804.09541, 2018.

    Cited by: 294|Bibtex|Views61|Links
    EI
    Keywords:
    recurrent neuralattention flowStanford Question Answering Datasetquestion answeringGoogle’s NMTMore(16+)
    Wei bo:
    We propose a fast and accurate end-to-end model, QANet, for machine reading comprehension

    Abstract:

    Current end-to-end machine reading and question answering (Qu0026A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature of RNNs. We propose a new Qu0026A model that does not require recurrent networks:...More

    Code:

    Data:

    Introduction
    • There is growing interest in the tasks of machine reading comprehension and automated question answering.
    • A successful combination of these two ingredients is the Bidirectional Attention Flow (BiDAF) model by Seo et al (2016), which achieve strong results on the SQuAD dataset (Rajpurkar et al, 2016).
    • A weakness of these models is that they are often slow for both training and inference due to their recurrent nature, especially for long texts.
    • The slow inference prevents the machine comprehension systems from being deployed in real-time applications
    Highlights
    • There is growing interest in the tasks of machine reading comprehension and automated question answering
    • The most successful models generally employ two key ingredients: (1) a recurrent model to process sequential inputs, and (2) an attention component to cope with long term interactions
    • In this paper, aiming to make the machine comprehension fast, we propose to remove the recurrent nature of these models
    • F1 measures the portion of overlap tokens between the predicted answer and groundtruth, while exact match score is 1 if the prediction is exactly the same as groundtruth or 0 otherwise
    • Our model trained on the original dataset outperforms all the documented results in the literature, in terms of both Exact Match and F1 scores
    • We propose a fast and accurate end-to-end model, QANet, for machine reading comprehension
    Methods
    • The authors conduct experiments to study the performance of the model and the data augmentation technique.
    • SQuAD contains 107.7K query-answer pairs, with 87.5K for training, 10.1K for validation, and another 10.1K for testing.
    • The authors test the model on another dataset TriviaQA (Joshi et al, 2017), which consists of 650K context-query-answer triples.
    • According to the previous work (Joshi et al, 2017; Hu et al, 2017; Pan et al, 2017), the same model would have similar performance on both Wikipedia and Web, but the latter is five time larger.
    • To keep the training time manageable, the authors omit the experiment on Web data
    Results
    • The F1 and Exact Match (EM) are two evaluation metrics of accuracy for the model performance.
    • To make a fair and thorough comparison, the authors both report both the published results in their latest papers/preprints and the updated but not documented results on the leaderboard.
    • The authors deem the latter as the unpublished results.
    • The authors' result on the official test set is 76.2/84.6, which significantly outperforms the best documented result 73.2/81.8
    Conclusion
    • The authors propose a fast and accurate end-to-end model, QANet, for machine reading comprehension.
    • The authors' core innovation is to completely remove the recurrent networks in the encoder.
    • The resulting model is fully feedforward, composed entirely of separable convolutions, attention, linear layers, and layer normalization, which is suitable for parallel computation.
    • The resulting model is both fast and accurate: It surpasses the best published results on SQuAD dataset while up to 13/9 times faster than a competitive recurrent models for a training/inference iteration.
    • The authors find that the authors are able to achieve significant gains by utilizing data augmentation consisting of translating context and passage pairs to and from another language as a way of paraphrasing the questions and contexts
    Summary
    • Introduction:

      There is growing interest in the tasks of machine reading comprehension and automated question answering.
    • A successful combination of these two ingredients is the Bidirectional Attention Flow (BiDAF) model by Seo et al (2016), which achieve strong results on the SQuAD dataset (Rajpurkar et al, 2016).
    • A weakness of these models is that they are often slow for both training and inference due to their recurrent nature, especially for long texts.
    • The slow inference prevents the machine comprehension systems from being deployed in real-time applications
    • Methods:

      The authors conduct experiments to study the performance of the model and the data augmentation technique.
    • SQuAD contains 107.7K query-answer pairs, with 87.5K for training, 10.1K for validation, and another 10.1K for testing.
    • The authors test the model on another dataset TriviaQA (Joshi et al, 2017), which consists of 650K context-query-answer triples.
    • According to the previous work (Joshi et al, 2017; Hu et al, 2017; Pan et al, 2017), the same model would have similar performance on both Wikipedia and Web, but the latter is five time larger.
    • To keep the training time manageable, the authors omit the experiment on Web data
    • Results:

      The F1 and Exact Match (EM) are two evaluation metrics of accuracy for the model performance.
    • To make a fair and thorough comparison, the authors both report both the published results in their latest papers/preprints and the updated but not documented results on the leaderboard.
    • The authors deem the latter as the unpublished results.
    • The authors' result on the official test set is 76.2/84.6, which significantly outperforms the best documented result 73.2/81.8
    • Conclusion:

      The authors propose a fast and accurate end-to-end model, QANet, for machine reading comprehension.
    • The authors' core innovation is to completely remove the recurrent networks in the encoder.
    • The resulting model is fully feedforward, composed entirely of separable convolutions, attention, linear layers, and layer normalization, which is suitable for parallel computation.
    • The resulting model is both fast and accurate: It surpasses the best published results on SQuAD dataset while up to 13/9 times faster than a competitive recurrent models for a training/inference iteration.
    • The authors find that the authors are able to achieve significant gains by utilizing data augmentation consisting of translating context and passage pairs to and from another language as a way of paraphrasing the questions and contexts
    Tables
    • Table1: Comparison between answers in original sentence and paraphrased sentence
    • Table2: The performances of different models on SQuAD dataset
    • Table3: Speed comparison between our model and RNN-based models on SQuAD dataset, all with batch size 32. RNN-x-y indicates an RNN with x layers each containing y hidden units. Here, we use bidirectional LSTM as the RNN. The speed is measured by batches/second, so higher is faster
    • Table4: Speed comparison between our model and BiDAF (<a class="ref-link" id="cSeo_et+al_2016_a" href="#rSeo_et+al_2016_a">Seo et al, 2016</a>) on SQuAD dataset
    • Table5: An ablation study of data augmentation and other aspects of our model. The reported results are obtained on the development set. For rows containing entry “data augmentation”, “×N ” means the data is enhanced to N times as large as the original size, while the ratio in the bracket indicates the sampling ratio among the original, English-French-English and English-German-English data during training
    • Table6: The F1 scores on the adversarial SQuAD test set
    • Table7: The development set performances of different single-paragraph reading models on the Wikipedia domain of TriviaQA dataset. Note that ∗ indicates the result on test set
    • Table8: Speed comparison between the proposed model and RNN-based models on TriviaQA Wikipedia dataset, all with batch size 32. RNN-x-y indicates an RNN with x layers each containing y hidden units. The RNNs used here are bidirectional LSTM. The processing speed is measured by batches/second, so higher is faster
    Download tables as Excel
    Related work
    • While the concept of backtranslation has been introduced before, it is often used to improve either the same translation task Sennrich et al (2016) or instrinsic paraphrase evaluations Wieting et al (2017); Mallinson et al (2017). Our approach is a novel application of backtranslation to enrich training data for down-stream tasks, in this case, the question answering (QA) task. It is worth to note that (Dong et al, 2017) use paraphrasing techniques to improve QA; however, they only paraphrase questions and did not focus on the data augmentation aspect as we do in this paper.

      Handling SQuAD Documents and Answers. We now discuss our specific procedure for the SQuAD dataset, which is essential for best performance gains. Remember that, each training example of SQuAD is a triple of (d, q, a) in which document d is a multi-sentence paragraph that has the answer a. When paraphrasing, we keep the question q unchanged (to avoid accidentally changing its meaning) and generate new triples of (d , q, a ) such that the new document d has the new answer a in it. The procedure happens in two steps: (i) document paraphrasing – paraphrase d into d and (b) answer extraction – extract a from d that closely matches a.
    • Machine reading comprehension and automated question answering has become an important topic in the NLP domain. Their popularity can be attributed to an increase in publicly available annotated datasets, such as SQuAD (Rajpurkar et al, 2016), TriviaQA (Joshi et al, 2017), CNN/Daily News (Hermann et al, 2015), WikiReading (Hewlett et al, 2016), Children Book Test (Hill et al, 2015), etc. A great number of end-to-end neural network models have been proposed to tackle these challenges, including BiDAF (Seo et al, 2016), r-net (Wang et al, 2017), DCN (Xiong et al, 2016), ReasoNet (Shen et al, 2017b), Document Reader (Chen et al, 2017), Interactive AoA Reader (Cui et al, 2017) and Reinforced Mnemonic Reader (Hu et al, 2017).

      Recurrent Neural Networks (RNNs) have featured predominatnly in Natural Language Processing in the past few years. The sequential nature of the text coincides with the design philosophy of RNNs, and hence their popularity. In fact, all the reading comprehension models mentioned above are based on RNNs. Despite being common, the sequential nature of RNN prevent parallel computation, as tokens must be fed into the RNN in order. Another drawback of RNNs is difficulty modeling long dependencies, although this is somewhat alleviated by the use of Gated Recurrent Unit (Chung et al., 2014) or Long Short Term Memory architectures (Hochreiter & Schmidhuber, 1997). For simple tasks such as text classification, with reinforcement learning techniques, models (Yu et al, 2017) have been proposed to skip irrelevant tokens to both further address the long dependencies issue and speed up the procedure. However, it is not clear if such methods can handle complicated tasks such as Q&A. The reading comprehension task considered in this paper always needs to deal with long text, as the context paragraphs may be hundreds of words long. Recently, attempts have been made to replace the recurrent networks by full convolution or full attention architectures (Kim, 2014; Gehring et al, 2017; Vaswani et al, 2017b; Shen et al, 2017a). Those models have been shown to be not only faster than the RNN architectures, but also effective in other tasks, such as text classification, machine translation or sentiment analysis.
    Funding
    • Adams Wei Yu is supported by NVIDIA PhD Fellowship and CMU Presidential Fellowship
    Study subjects and analysis
    evidence documents: 6
    EXPERIMENTS ON TRIVIAQA

    In this section, we test our model on another dataset TriviaQA (Joshi et al, 2017), which consists of 650K context-query-answer triples. There are 95K distinct question-answer pairs, which are authored by Trivia enthusiasts, with 6 evidence documents (context) per question on average, which are either crawled from Wikipedia or Web search. Compared to SQuAD, TriviaQA is more challenging in that: 1) its examples have much longer context (2895 tokens per context on average) and may contain several paragraphs, 2) it is much noisier than SQuAD due to the lack of human labeling, 3) it is possible that the context is not related to the answer at all, as it is crawled by key words.

    In this paper, we focus on testing our model on the subset consisting of answers from Wikipedia

    additional augmented datasets: 2
    Each character embedding is randomly initialized as a 200-D vector, which is updated in training as well. We generate two additional augmented datasets obtained from Section 3, which contain 140K and 240K examples and are denoted as “data augmentation × 2” and “data augmentation × 3” respectively, including the original data. Training details

    samples: 102
    Train time to get 77.0 F1 on Dev set 3 hours 15 hours 5.0x. Train speed 102 samples/s 24 samples/s. Inference speed 259 samples/s 37samples/s

    samples: 259
    Train speed 102 samples/s 24 samples/s. Inference speed 259 samples/s 37samples/s. 4.1.3 ABALATION STUDY AND ANALYSIS

    evidence documents: 6
    In this section, we test our model on another dataset TriviaQA (Joshi et al, 2017), which consists of 650K context-query-answer triples. There are 95K distinct question-answer pairs, which are authored by Trivia enthusiasts, with 6 evidence documents (context) per question on average, which are either crawled from Wikipedia or Web search. Compared to SQuAD, TriviaQA is more challenging in that: 1) its examples have much longer context (2895 tokens per context on average) and may contain several paragraphs, 2) it is much noisier than SQuAD due to the lack of human labeling, 3) it is possible that the context is not related to the answer at all, as it is crawled by key words

    Reference
    • Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda B. Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR, abs/1603.04467, 2016. URL http://arxiv.org/abs/1603.04467.
      Findings
    • Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. CoRR, abs/1607.06450, 2016. URL http://arxiv.org/abs/1607.06450.
      Findings
    • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2015.
      Google ScholarLocate open access versionFindings
    • Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer opendomain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 1870–1879, 2017.
      Google ScholarLocate open access versionFindings
    • Francois Chollet. Xception: Deep learning with depthwise separable convolutions. CoRR, abs/1610.02357, 2016. URL http://arxiv.org/abs/1610.02357.
      Findings
    • Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
      Findings
    • Christopher Clark and Matt Gardner. Simple and effective multi-paragraph reading comprehension. CoRR, abs/1710.10723, 201URL http://arxiv.org/abs/1710.10723.
      Findings
    • Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu, and Guoping Hu. Attention-overattention neural networks for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 593–602, 2017.
      Google ScholarLocate open access versionFindings
    • Li Dong, Jonathan Mallinson, Siva Reddy, and Mirella Lapata. Learning to paraphrase for question answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 875–886. Association for Computational Linguistics, 2017. URL http://aclweb.org/anthology/D17-1091.
      Locate open access versionFindings
    • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. In International Conference on Machine Learning, 2017.
      Google ScholarLocate open access versionFindings
    • Yichen Gong and Samuel R. Bowman. Ruminating reader: Reasoning with gated multi-hop attention. CoRR, abs/1704.07415, 2017. URL http://arxiv.org/abs/1704.07415.
      Findings
    • Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp. 1693–1701, 2015.
      Google ScholarLocate open access versionFindings
    • Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fandrianto, Jay Han, Matthew Kelcey, and David Berthelot. Wikireading: A novel large-scale language understanding task over wikipedia. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, 2016.
      Google ScholarLocate open access versionFindings
    • Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. The goldilocks principle: Reading children’s books with explicit memory representations. CoRR, abs/1511.02301, 2015.
      Findings
    • Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735–1780, 1997.
      Google ScholarLocate open access versionFindings
    • Minghao Hu, Yuxing Peng, and Xipeng Qiu. Reinforced mnemonic reader for machine comprehension. CoRR, abs/1705.02798, 2017. URL http://arxiv.org/abs/1705.02798.
      Findings
    • Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, pp. 646–661, 2016.
      Google ScholarLocate open access versionFindings
    • Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp. 2021–2031, 2017.
      Google ScholarLocate open access versionFindings
    • Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 August 4, Volume 1: Long Papers, pp. 1601–1611, 2017.
      Google ScholarLocate open access versionFindings
    • Lukasz Kaiser, Aidan N Gomez, and Francois Chollet. Depthwise separable convolutions for neural machine translation. arXiv preprint arXiv:1706.03059, 2017.
      Findings
    • Yoon Kim. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 2529, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1746– 1751, 2014.
      Google ScholarLocate open access versionFindings
    • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. URL http://arxiv.org/abs/1412.6980.
      Findings
    • Kenton Lee, Tom Kwiatkowski, Ankur P. Parikh, and Dipanjan Das. Learning recurrent span representations for extractive question answering. CoRR, abs/1611.01436, 2016.
      Findings
    • Rui Liu, Junjie Hu, Wei Wei, Zi Yang, and Eric Nyberg. Structural embedding of syntactic trees for machine comprehension. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp. 826–835, 2017a.
      Google ScholarLocate open access versionFindings
    • Xiaodong Liu, Yelong Shen, Kevin Duh, and Jianfeng Gao. Stochastic answer networks for machine reading comprehension. CoRR, abs/1712.03556, 2017b. URL http://arxiv.org/abs/1712.03556.
      Findings
    • Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attentionbased neural machine translation. In EMNLP, 2015.
      Google ScholarLocate open access versionFindings
    • Minh-Thang Luong, Eugene Brevdo, and Rui Zhao. Neural machine translation (seq2seq) tutorial. https://github.com/tensorflow/nmt, 2017.
      Findings
    • Jonathan Mallinson, Rico Sennrich, and Mirella Lapata. Paraphrasing revisited with neural machine translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 881–893. Association for Computational Linguistics, 2017. URL http://aclweb.org/anthology/E17-1083.
      Locate open access versionFindings
    • Boyuan Pan, Hao Li, Zhou Zhao, Bin Cao, Deng Cai, and Xiaofei He. MEMEN: multi-layer embedding with memory networks for machine comprehension. CoRR, abs/1707.09098, 2017.
      Findings
    • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word//w representation. In Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, 2014. URL http://www.aclweb.org/anthology/D14-1162.
      Locate open access versionFindings
    • Jonathan Raiman and John Miller. Globally normalized reader. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp. 1070–1080, 2017.
      Google ScholarLocate open access versionFindings
    • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100, 000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pp. 2383–2392, 2016.
      Google ScholarLocate open access versionFindings
    • Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. In ACL (1). The Association for Computer Linguistics, 2016.
      Google ScholarLocate open access versionFindings
    • Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. CoRR, abs/1611.01603, 2016. URL http://arxiv.org/abs/1611.01603.
      Findings
    • Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. Disan: Directional self-attention network for rnn/cnn-free language understanding. CoRR, abs/1709.04696, 2017a. URL http://arxiv.org/abs/1709.04696.
      Findings
    • Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. Reasonet: Learning to stop reading in machine comprehension. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pp. 1047–1055, 2017b.
      Google ScholarLocate open access versionFindings
    • Rupesh Kumar Srivastava, Klaus Greff, and Jurgen Schmidhuber. Highway networks. CoRR, abs/1505.00387, 2015. URL http://arxiv.org/abs/1505.00387.
      Findings
    • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017a. URL http://arxiv.org/abs/1706.03762.
      Findings
    • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Neural Information Processing Systems, 2017b.
      Google ScholarLocate open access versionFindings
    • Shuohang Wang and Jing Jiang. Machine comprehension using match-lstm and answer pointer. CoRR, abs/1608.07905, 2016. URL http://arxiv.org/abs/1608.07905.
      Findings
    • Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, and Ming Zhou. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 189–198, 2017.
      Google ScholarLocate open access versionFindings
    • Zhiguo Wang, Haitao Mi, Wael Hamza, and Radu Florian. Multi-perspective context matching for machine comprehension. CoRR, abs/1612.04211, 2016. URL http://arxiv.org/abs/1612.04211.
      Findings
    • Dirk Weissenborn, Georg Wiese, and Laura Seiffe. Making neural QA as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada, August 3-4, 2017, pp. 271–280, 2017.
      Google ScholarLocate open access versionFindings
    • John Wieting, Jonathan Mallinson, and Kevin Gimpel. Learning paraphrastic sentence embeddings from back-translated bitext. In EMNLP, pp. 274–285. Association for Computational Linguistics, 2017.
      Google ScholarLocate open access versionFindings
    • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
      Findings
    • Caiming Xiong, Victor Zhong, and Richard Socher. Dynamic coattention networks for question answering. CoRR, abs/1611.01604, 2016. URL http://arxiv.org/abs/1611.01604.
      Findings
    • Adams Wei Yu, Hongrae Lee, and Quoc V. Le. Learning to skim text. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 1880–1890, 2017.
      Google ScholarLocate open access versionFindings
    • Yang Yu, Wei Zhang, Kazi Saidul Hasan, Mo Yu, Bing Xiang, and Bowen Zhou. End-to-end reading comprehension with dynamic answer chunk ranking. CoRR, abs/1610.09996, 2016. URL http://arxiv.org/abs/1610.09996.
      Findings
    • Published as a conference paper at ICLR 2018 Junbei Zhang, Xiao-Dan Zhu, Qian Chen, Li-Rong Dai, Si Wei, and Hui Jiang. Exploring question understanding and adaptation in neural-network-based question answering. CoRR, abs/1703.04617, 2017. Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp. 649–657, 2015. Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, and Ming Zhou. Neural question generation from text: A preliminary study. CoRR, abs/1704.01792, 2017. URL http://arxiv.org/abs/1704.01792.
      Findings
    Your rating :
    0

     

    Tags
    Comments