We want to learn all languages! Applications of translation alignment in digital environments

We want to learn all languages! Applications of translation alignment in digital environments
Date
25 June 2021, 5.00pm - 6.15pm
Type
Seminar
Description

Chiara Palladino, Furman

Tariq Yousef, Leipzig

Live online at:https://youtu.be/R2Ms6yAMZss

All welcome


In this seminar, we will introduce the topic of translation technologies, with particular regard to text and translation alignment, one of the most important and complex tasks of NLP. Then, we will present Ugarit (http://ugarit.ialigner.com/​), a web-based tool for manual and automatic alignment of parallel corpora. Conceived as a Citizen Science project to collect training data for the implementation of statistical machine translation of Ancient Greek, Persian, and English, Ugarit has now become one of the most used digital environments for the manual alignment of texts in underrepresented and historical languages. Currently, Ugarit hosts corpora in 43 languages, and has been widely used in scholarly projects for the study of Armenian, Persian, Arabic, Ancient Greek, Latin, Portuguese, and Egyptian. It has also been successfully applied in language teaching to facilitate a direct approach to original texts through the scaffolding provided by the systematic comparison with translations.


While most translation technologies are limited to the coverage of modern, widely indexed languages like English, Ugarit introduces a new way of working with languages that is based on manual alignment between parallel texts: with the systematic support of translations in a known language, users can create datasets of aligned pairs to support language learning for themselves, study the reception of a particular text, or to provide alignments for other readers. Moreover, users contribute training data for the implementation of statistical machine translation for underrepresented languages, which has been tested for Persian and Ancient Greek. The database of Ugarit can also be visualized and queried to investigate relationships across languages that have not been directly aligned: by using the underlying graph database, we can visualize connections between words in two different languages by using a third language as a bridge, with which both languages have been aligned, and investigate broader phenomena such as word frequency across languages and common tendencies in translations.

Contact

Valerie James
valerie.james@sas.ac.uk
020 7862 8716