Towards Controllable Biases in Language Generation

EMNLP(2020)

引用 120|浏览442
暂无评分
摘要
We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce or avoid biases in generated text containing mentions of specified demographic groups. We then analyze two scenarios: 1) inducing biases for one demographic and avoiding biases for another, and 2) mitigating biases between demographic pairs (e.g., man and woman). The former scenario gives us a tool for detecting the types of biases present in the model, and the latter is useful for mitigating biases in downstream applications (e.g., dialogue generation). Specifically, our approach facilitates more explainable biases by allowing us to 1) use the relative effectiveness of inducing biases for different demographics as a new dimension for bias evaluation, and 2) discover topics that correspond to demographic inequalities in generated text. Furthermore, our mitigation experiments exemplify our technique's effectiveness at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.
更多
查看译文
关键词
language generation,controllable biases
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要