In recent decades, a variety of quantitative, statistical, and machine learning tools have been developed to analyse digital corpora. Among these, computational methods for detecting semantic change in diachronic corpora have gained increasing attention. This paper explores how word embeddings can be used to trace semantic change in Latin, with a focus on lexical phenomena driven by the spread of Christianity.
Latin underwent significant lexical transformation under Christianity’s influence, both due to biblical translations from Greek and the need to introduce new religious concepts. This makes Christian Latin a particularly rich sociolect to examine through these innovative methods.
Building on previous research, this study employs both static embeddings, which assign a single vector representation per word type, and contextual embeddings, which generate unique representations for each word token based on context. The analysis is conducted on a subset of LatinISE, covering texts from 300 bce to 600 ce. By segmenting the corpus into distinct time frames—one pre-dating Christian Latin texts and one including them—we can detect changes in each word’s vector representation, which in turn signal a possible shift in usage (and potentially in meaning). To extract contextual embeddings, we utilize a fine-tuned version of Latin BERT trained on our selected corpus for a more tailored evaluation.
The results from these models will be compared against each other and against close-reading analysis for a selected set of lexemes. The goals of this study are: (1) evaluate the comparative strengths of static vs. contextual embeddings, (2) determine the extent to which embedding models align with philological evidence, and (3) contribute to broader discussions on integrating digital approaches in Classics and Historical Linguistics.
Advanced booking required for in person attendance
Streamed live on Youtube at: https://youtu.be/nHxdU4hDA8c