Leveraging non-named entities in Coptic antiquity

10 September 2021, 5.00pm - 6.15pm

Amir Zeldes, Georgetown

Caroline Schroeder, Oklahoma

Lance Martin, CUA

In this paper we present the latest work on large scale, semi-automatic and quantitative analysis of the body of entities mentioned in texts from Coptic Antiquity. Unlike Greek and Roman materials, which have been studied extensively, digital treatment of Coptic data from the first millennium has lagged behind until recently, in part due to the smaller research community and the morphological complexity of the language, which is fusional and features agglutination, compounding and incorporation of nouns into complex verbs.

We show how annotating named and non-named entities enriches Coptic corpora, including the identification of nested entities and entity linking. We will focus especially on non-named Coptic entities, which are of great interest to scholars working on monasticism and asceticism, since many central texts revolve around unnamed protagonists (e.g. ‘an ascetic’) in unidentified locations (‘a monastery’). In many cases, the proportion of named entities is well below 5%.

The relevance of entity annotation is further demonstrated through visualizations of Coptic entities, which enable researchers to access a variety of information in Coptic corpora through distant reading, allowing them to explore easily types of places in different works, to get lists of nouns referring to organizations, events or animals, examine feminine vs. masculine nouns denoting people and more. From a comparative quantitative perspective, the prevalence of non-named entities of different types can also reveal dissimilarities between texts.


