基本信息
浏览量:1

个人简介
I previously worked at Apple, Johns Hopkins University (where I also completed my PhD), MIT Lincoln Laboratory, and Rincon Research Corporation on topics including text-to-speech, machine translation (MT), bitext curation and filtering, automatic MT evaluation, multilingual modeling, paraphrasing, cross-language information retrieval, domain adaptation, and digital signal processing.
I developed Vecalign for the ParaCrawl parallel data acquisition project. Vecalign is an accurate sentence alignment algorithm based on multilingual sentence embeddings which is linear in complexity with respect to the number of sentences being aligned. In conjunction with LASER, Vecalign makes it easy to perform sentence alignment in about 100 languages (i.e. 100^2 language pairs), without the need for a machine translation system or lexicon. At the time of writing, Vecalign has the best reported performance on the test set released with Bleualign.
I also developed Prism, an automatic MT metric which uses a sequence-to-sequence paraphraser to score MT system outputs conditioned on their respective human references. Prism uses a multilingual neural MT model as a zero-shot paraphraser, which eliminates the need for synthetic paraphrase data and results in a single model which works in many languages (we release a model in 39 languages). At the time of publication, Prism outperformed or statistically tied with all metrics submitted to the WMT 2019 metrics shared task at segment-level human correlation. I developed bitext filtering code to preprocess the data used to train Prism, but the code is general enough to use for any MT training and is released here.
研究兴趣
论文共 12 篇作者统计合作学者相似作者
按年份排序按引用量排序主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
CoRR (2023)
引用0浏览0EI引用
0
0
CoRR (2023)
引用0浏览0EI引用
0
0
Sweta Agrawal,Antonios Anastasopoulos,Luisa Bentivogli,Ondřej Bojar, Claudia Borg,Marine Carpuat,Roldano Cattoni,Mauro Cettolo, Mingda Chen,William Chen, Khalid Choukri, Alexandra Chronopoulou,
引用0浏览0EI引用
0
0
arxiv(2022)
引用0浏览0引用
0
0
International Conference on Spoken Language Translation (IWSLT) (2022): 11-21
加载更多
作者统计
合作学者
合作机构
D-Core
- 合作者
- 学生
- 导师
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn