OpenFST library extension for String-to-Dependency Statistical Machine Translation

user-5ebe282a4c775eda72abcdce(2014)

引用 0|浏览2
暂无评分
摘要
In [3] Part Of Speech (POS) and word dependencies information (obtained from a parser) are added to the target side of rules in the translation hiero grammar as additional features to improve translation quality. When a set of translation hypotheses is represented using a FST, as in HiFST [1], arc weights are obtained from the cost of each grammar rule applied during decoding (other components as, for example, the LM hypothesis score, can also contribute to the final weights). FSTs provide a compact representation of the translation space so that operations such as partial hypotheses concatenation, pruning, etc. can be efficiently applied by means of standard FST operations as, for example, concatenation and pruning.The library described here encodes POS and word dependencies information using FST string weights. Although in general such weights do not represent a valid semiring, it is possible to set some constraints on the FST topology so that the weight class with the necessary (extended) binary operations (times, plus, etc.) will have the semiring properties, allowing the above operations to be applied. In the following sections the basic concepts of the string-to-dependency algorithm are reviewed. An extended hiero grammar to handle the dependency information is presented and examples of grammar rule application in terms of operations on FSTs are given.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要