A Mutual Information Maximization Perspective of Language Representation Learning
Weibo:
Abstract:
We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextu...More
Code:
Data:
Tags
Comments