Colin Swaelens, Ilse De Vos and Els Lefever, DBBErt: Part-of-Speech Tagging of Pre-Modern Greek Text

Abstract

This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.

Practical information

This poster will be presented at the CLARIN Annual Conference 2023.

Date & time: to be confirmed

Location: Irish College Leuven (Janseniusstraat 1, Leuven, Belgium)