学术报告(5月22日)
时间: 2017-05-19 发布者: 文章来源: 必威 审核人: 浏览次数: 378

 

题目: Cross-lingual Big Data Curation and Applications —From LIVAC to Patent MT

报告人:Benjamin K. Tsou 邹嘉彦

时间:5月22日10:00

地点:天赐庄校区理工楼321室

 

报告摘要:Chinese Natural language processing has developed rapidly in recent decades. The judicious use of large scale training data and their cultivation are critical concerns which deserve attention.. We shall explore the relevant methodological basis and some applications in the light of LIVAC, developed since 1995 (https://en.wikipedia.org/wiki/LIVAC_Synchronous_Corpus). We shall also explore subsequent developments involving the use of rigorously curated data in the Machine Translation of Chinese patents, which have seen phenomenal growth in recent years, along with progression from SMT to NMT, and enhancement of text mining techniques. A comparison of the results of several C-E MT systems available to the public will be attempted, and we shall explore how precision and recall in patent search may be improved through cognitive saliency indices.

 

报告人简介:

Emeritus Chair Professor of Asian Languages and Language Information Sciences of the City University of Hong Kong, and Academician of Académie Royale des Sciences d'Outre-Mer (Belgium), he began working on the natural language processing of Chinese at the Mechanical Translation Group of the Research Laboratory of Electronics at MIT, and then supervised a Chinese-English MT project at the University of California, Berkeley. Later, as Director of the Research Centre on Language Information Sciences at the City University of Hong Kong, he initiated in 1995 the cultivation of a gigantic Chinese synchronous corpus, LIVAC (https://en.wikipedia.org/wiki/LIVAC_Synchronous_Corpus). Since 2008, he has worked on cultivating a bilingually aligned Chinese-English parallel sentence corpus from more than 300,000 Chinese-English comparable patents (https://catalog.ldc.upenn.edu/LDC2016T22). Following this effort, his team was invited to provide the training corpus and assessment in the Chinese Patent MT Competitions organized by NTCIR in Tokyo in 2009 and 2010. In 2016, he developed a Chinese to English patent MT platform with Northeastern University, China, and a separate cognitive saliency based search platform for Chinese patents.