Williaml Lewis, Microsoft Redmond

Title: Collecting, Annotating and Repurposing Data for Under-Resourced Languages through Structural Projections
Until fairly recently, the bulk of Natural Language Processing work was done on a very small subset of the world's languages, leaving the vast majority of languages without substantive resources or tools. In this talk, I discuss our recent work on building a corpus of data for a large number of under-resourced languages, and how we have enriched this corpus through structural projections. The broad focus of this talk will be on the projection and enrichment algorithms we used and the utility of the resulting data to NLP. More narrowly, however, I will drill down on our recent work in automatically detecting and repairing divergent structural patterns, and how this work has potential uses in low-resource Machine Translation.