Learning to Compute Word Embeddings On the Fly
arXiv: Learning, Volume abs/1706.00286, 2018.
Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the ``long tailu0027u0027 of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embedding...More