Kyriaki Giannikou, Navigating Digital Frontiers: Unveiling Formulaicity in Byzantine Book Epigrams


Byzantine book epigrams, featuring as paratexts in manuscript margins, seamlessly intertwine poetic expression with practical details, illuminating aspects such as the manuscripts’ patrons and the identities of the scribes involved in transcription. Although deeply rooted in traditional book production practices and very formulaic in nature, these epigrams present noteworthy linguistic variation. While their formulaicity has been acknowledged, a thorough exploration of the formulaic sequences present in the Database of Byzantine Book Epigrams (DBBE) or similar corpora remains a gap in current research. My research, to be conducted on the well-established DBBE corpus, acts as a bridge between linguistic research on formulas inherent in everyday speech and those studied within the context of oral poetry.

This interdisciplinary project, adopting a corpus-driven approach, seeks to combine close-reading along with digital methods for navigating a vast corpus of Byzantine book epigrams. This research addresses the challenge of identifying formulaic constructions (i.e. pairings of form and meaning in the context of Construction Grammar) that function as “verse building blocks” and their variation within a historical linguistic corpus that combines poetic expression and practical information. However, the digital journey of pattern identification encounters challenges arising from inherent complexities of Greek – from flexible syntax to extensive morphological variety – compounded by great linguistic variation across registers, ranging from Homeric and classicizing Greek to medieval forms interwoven with vernacular elements. The absence of critical texts for numerous epigrams further complicates matters, preserving the idiosyncrasies of original scribal choices on the one hand, but impeding uniformization for digital analysis on the other.

This presentation serves to illuminate the challenges inherent in working on Byzantine paratextual material in the Digital Humanities context of a project that endeavours to unravel the intricate linguistic nuances within Byzantine book epigrams, displaying commitment to deeper understand the complexities inherent in the intersection of Byzantine literature and Digital Humanities.

Practical information

This lecture will be given at the international workshop ‘The Impact of Digital Methods and Approaches on Ancient Studies Research‘ (13-14 May 2024, Berlin).

Date & time: Monday 13 May 2024, 4:40 pm

Location: Freie Universität Berlin (Hittorfstraße 18, 14195 Berlin)


More information about this workshop and the full programme can be found here.

Eleonora Lauro, Alongside the Text: Byzantine Metrical Paratexts in Gospel Manuscripts from Medieval Southern Italy


This paper aims to investigate the relationship between Byzantine metrical paratexts, also known as book epigrams, and the biblical text found in Gospel-Books and Lectionaries from medieval Southern Italy.

In the field of New Testament textual scholarship, recent years have witnessed an increased interest in aspects of manuscripts that extend beyond their textual content. Scholars now recognize that insights gained from studying scribal corrections and paratextual features help us to understand how texts were transmitted and received in historical contexts (Lanier-Han, 2021).

Despite this considerable shift in New Testament studies, Byzantine book epigrams and their affiliation to the biblical text remain an intriguing and less-explored domain. These paratexts represent an interesting research object for philologists and historians studying the manuscript tradition of the Greek New Testament. Often copied alongside the main text, book epigrams can help to establish genealogies between manuscripts. Moreover, they offer relevant information on the communities writing and reading these books.

Specifically, my research will consider the following questions:

  1. What kind of book epigrams can be found in Gospel-Books and Lectionaries produced in medieval Southern Italy? Are they original compositions or just conventional formulas?
  2. Do the metrical paratexts reveal specific regional and cultural influences? And how do they differ from Gospel-Books and Lectionaries from other regions?
  3. Are there thematic correlations between book epigrams and biblical text?
  4. Which reading strategies do the book epigrams prescribe?
  5. what is the relation between the chain of transmission of the metrical paratexts and that of the main texts?

This study will focus on a corpus of Byzantine book epigrams found in a selected group of Gospel-Books and Lectionaries produced in Southern Italy (10th-13th century). The combination of cultural exploration and examination of textual and extratextual features presents a model for integrating various disciplines to enrich our understanding of New Testament manuscript tradition.

Practical information

This lecture will be given at the international CSNTM Text & Manuscript Conference ‘Intersection. Interdisciplinary Approaches to New Testament Text and Manuscript Studies‘, organised by the Center for the Study of New Testament Manuscripts in Plano (Texas).

The “Intersection” theme aims to explore how the many disciplines of the study of ancient Christian documents (paleography, art history, exegesis, paratext, linguistics, conservation, etc.) collaborate to help us better understand their content.

Date & time: Thursday 30 May 2024, 10:55 pm

Location: The Marriott at Legacy Town Center (7121 Bishop Rd Plano, TX 75024)


More information about this conference and the full programme can be found here.

Data-driven Approaches to Ancient Languages (DAAL)

On Thursday 27 June 2024, the Database of Byzantine Book Epigrams project (DBBE) is organising a workshop on Data-driven Approaches to Ancient Languages (DAAL) in Ghent, Belgium. This workshop will follow immediately after the conference “Paratexts in Premodern Writing Cultures”.

Premodern or historically attested languages are invaluable resources of both the study of diachronic linguistics and their contemporary culture. Although these languages might be from various language families or have a different script, researchers face common challenges, among which illegible or lost text (parts), inexistent gold standards and, very important these days, scarcity of data. Luckily, more and more texts become available, but the language of those texts might be so different from their modern pendant — should that modern pendant exist — that it considerably impacts the performance of existing tools. This workshop aims to provide a platform to a broad field of researchers engaged in digital approaches to pre-modern languages.


For all further information, please visit the conference website:
For any additional questions you may have, please contact the organisers at

Maxime Deforche, Ilse De Vos, Antoon Bronselaer & Guy De Tré, An Orthographic Similarity Measure for Graph-Based Text Representations

This presentation will be given at theThe Dutch-Belgian DataBase Day (DBDBD), a yearly one-day workshop, organized in a Belgian or Dutch university, whose general topic is database research. DBDBD 2023 will be held in Ghent, Belgium.

At DBDBD 2023, junior and senior researchers from the Netherlands and Belgium can present their recent results, and meet fellow researchers in the field of data management. It is an excellent opportunity to meet up with your Belgian/Dutch colleagues, and to get informed about the (recent) database-related research performed in Belgian/Dutch universities. The workshop welcomes non-Belgian/Dutch participants (presentations are in English). DBDBD has a tradition of favouring presentations by junior researchers.

Practical information

Date & time: Thursday 21 December 2023, 10:30am

Location:Technicum (building T2) (Sint-Pietersnieuwstraat 41, 9000 Gent)

More information about this workshop and the full programme can be found here.

Crash Course in Greek Palaeography

The Leiden University Centre for the Arts in Society, Leiden University Library and the Greek department of Ghent University offer a two-day course in Greek palaeography in collaboration with the Research School OIKOS. The course is intended for MA, ResMA and doctoral students in the areas of Classics, Ancient History, Ancient Civilizations and Medieval studies with a good command of Greek. It offers a chronological introduction into Greek palaeography from the Hellenistic period until the end of the Middle Ages and is specifically aimed at acquiring practical skills for research involving literary and documentary papyri and/or manuscripts. This course gives the unique opportunity to practice reading on original papyri and manuscripts from the collection of the Leiden Papyrological Institute and the special collections of the Leiden University Library.


The course is set up as an intensive two-day seminar. Five lectures by specialists in the field will give a chronological overview of the development of Greek handwriting, each followed by a practice session reading relevant extracts from papyri and manuscripts in smaller groups under the supervision of young researchers.

Monday, May 27

  • 10:00 Introduction
  • 10:15-11:15 Papyri of the Ptolemaic and Roman period (3rd cent. BCE – 3rd cent. CE) (Dr. Joanne Stolk)
  • 11:15-12:30 Practice with papyri of the Ptolemaic and Roman period
  • 12:30-13:30 Lunch break
  • 13:30-14:30 Papyri of the Byzantine period (4th-8th centuries) (Dr. Yasmine Amory)
  • 14:30-15:45 Practice papyri of the Byzantine period
  • 15:45-16:15 Coffee break
  • 16:15-17:00 Presentation of Greek manuscripts from the Leiden University Library
  • 17:00-17:45 Presentation of Greek papyri from the Leiden Papyrological Institute
  • 19:00 Dinner


Tuesday, May 28

  • 9:00-10:00 Majuscule and early minuscule bookhands (4th-9th centuries) (Dr. Rachele Ricceri)
  • 10:00-11:15 Practice majuscule and early minuscule bookhands
  • 11:15-11:45 Coffee break
  • 11:45-12:45 The development of minuscule script (10th-12th centuries) (Prof. dr. Floris Bernard)
  • 12:45-13:45 Lunch break
  • 13:45-15:00 Practice minuscule script of the 10th-12th centuries
  • 15:00-15:30 Coffee break
  • 15:30-16:30 Manuscripts and scholars of the Palaeologan period (13th-15th centuries) (Prof. dr. Andrea Cuomo)
  • 16:30-17:45 Practice manuscripts of the Palaeologan period

Practical information

The study load is the equivalent of 2 ECTS (2×28 hours). Participants will be asked to read up on secondary literature in preparation for the seminar (distributed several weeks before the course). Extra material will be handed out during the course in order to continue to improve your reading skills afterwards.

There are no fees for participation in this course. Lunches on both days and dinner on the first day are provided free of charge. Travel costs and accommodation in Leiden are at your own expense.


Please register by sending an e-mail with a short motivation (ca. 300 words, including your background, research interests and why you would like to follow this course) to Priority is given to OIKOS doctoral students and those who did not have the opportunity to follow course(s) on palaeography before. Registration closes by the final deadline of February 15th, 2024. Successful applicants will be notified soon afterwards.

Rachele Ricceri, The Database of Byzantine Book Epigrams: Getting People In and Out Again

This lecture will be given at the PROSOPON Workshop ‘Entangled Prosopographies: Connecting the “Prosopographies of the Later Roman and Byzantine Worlds” Across the Eastern Mediterranean and Beyond’ (The University of Edinburgh, 8-9 December 2023). It is part of Round Table 2: ‘Archives and Manuscripts’.

The workshop brings together a large number of current prosopographical research projects with a focus from the late antique to the late Byzantine periods and is dedicated to exploring ways of going forward, connecting projects and researchers. It offers ample opportunity to discuss the methods and practices of prosopographical research, to learn from each other, and develop closer ties of cooperation.

Practical information

Date & time: Friday 8 December 2023, 1:30pm

Location: Meadows Lecture Theatre, Old Medical School, Doorway 4 (Teviot Place, Edinburgh)

More information about this conference and the full programme can be found here.

Paratexts in Premodern Writing Cultures

The Database of Byzantine Book Epigrams project (DBBE) will organise a conference on “Paratexts in Premodern Writing Cultures”, which will take place in Ghent on 24-26 June 2024. 

With this conference we aim to bring together scholars engaged in the exploration of premodern paratexts transmitted in a variety of languages (such as Arabic, Armenian, Greek, Coptic, Hebrew, Latin, Slavonic, Syriac). It is our aim to discuss the nature of paratextuality in medieval manuscripts, to reveal similarities and peculiarities of paratexts across language borders, and to understand the broader cultural and historical ramifications of paratexts. We are interested both in the textual evidence of medieval paratexts and in their material transmission.


For all further information, please visit the conference website:
For any additional questions you may have, please contact the organisers at

Colin Swaelens, Ilse De Vos and Els Lefever, DBBErt: Part-of-Speech Tagging of Pre-Modern Greek Text


This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.

Practical information

This poster will be presented at the CLARIN Annual Conference 2023.

Date & time: to be confirmed

Location: Irish College Leuven (Janseniusstraat 1, Leuven, Belgium)

Colin Swaelens, Ilse De Vos and Els Lefever, Annotation pipeline for unedited Byzantine Greek


The Database of Byzantine Book Epigrams or DBBE (Ricceri et al. 2023) contains over 12,000 epigrams. They are stored both as occurrences – the epigrams exactly as they occur in the manuscripts – and as types – their orthographically normalised counterparts. The decision to link multiple occurrences to a single type was pragmatic as well as conceptual. Creating fewer types not only freed up time to trace new occurrences, it was also a straightforward way to group similar occurrences. Soon however, this all-or-nothing system ran against its limitations: What exactly does “similar” mean? How “similar” do occurrences need to be for them to be put under the same type? In order to add linguistic information enabling more advanced similarity detection and visualisation, we developed the first morphological analyzer for non-normalised Byzantine Greek.

To develop a part-of-speech tagger for Ancient and Byzantine Greek, we first compared three different transformer-based language models with embedding representations: BERT (Devlin 2018), ELECTRA (Clark 2020), and RoBERTa (Liu 2019). To train these models, two data sets were compiled: one consisting of all Ancient and Byzantine Greek text corpora that are available online, and that same set complemented with the Modern Greek Wikipedia data. This allowed us to ascertain whether or not Modern Greek contributes to the modelling of Byzantine Greek.

For the supervised task of fine-grained part-of-speech tagging, we compiled a training set based on existing treebanks and complemented it with a small set of 2,000 manually annotated tokens from DBBE occurrences. To train the part-of-speech tagger, we made use of the FLAIR framework (Akbik et al. 2019), where the contextual token embeddings from DBBErt were stacked with randomly initialised character embeddings. These were processed by a bi-LSTM encoder (hidden size of 256) and a CRF decoder. For evaluation, a gold standard containing 10,000 tokens of non-normalised Byzantine Greek epigrams out of the DBBE corpus was compiled, manually annotated and validated through an inter-annotator agreement study.

The experimental results look very promising, with the BERT model trained on all Greek data achieving the best performance both for assigning the part-of-speech (82.76%) and for full-fledged morphological analysis (68.75%). A comparison with the RNN Tagger (Schmid 2019) revealed that our tagger outperforms the latter with almost 4% on the DBBE gold standard.

Practical information

This poster will be presented at The 33rd Meeting of Computational Linguistics in The Netherlands.

Date & time: Friday 22 September 2023, 12:10 pm

Location: building R of the University of Antwerp (Rodestraat 14, Antwerp, Belgium)

Colin Swaelens, Ilse De Vos and Els Lefever, Evaluating Existing Lemmatizers on Unedited Byzantine Greek Poetry


This paper reports on the results of a comparative evaluation of four existing lemmatizers, all pre-trained on Ancient Greek texts, on a novel corpus of unedited, Byzantine Greek texts. The aim of this study is to get insights into the pitfalls of existing lemmatisation approaches as well as the specific challenges of our Byzantine Greek corpus, in order to develop a new lemmatizer that can cope with its peculiarities. The results of the experiment show an accuracy drop of 20% on our corpus, which is further investigated in a qualitative error analysis.

Practical information

This poster will be presented at the international conference Recent Advances in Natural Language Processing 2023.

Date & time: Friday 8 September 2023, 12:00 pm

Location: Hotel “Cherno More” (bul. “Slivnitsa” 33, Varna, Bulgaria)