Maxime Deforche, An Orthographic Similarity Measure for Graph-based Text Representation

Abstract

Computing the orthographic similarity between words, sentences, paragraphs and texts has become a basic functionality of many text mining and flexible querying systems and the resulting similarity scores are often used to discover similar text documents. However, when dealing with a corpus that is inherently known for its orthographic inconsistencies and intricate interconnected nature on multiple levels (words, verses and full texts), as is the case with Byzantine book epigrams, this task becomes complex. In this paper, we propose a technique that tackles these two challenges by representing text in a graph and by computing a similarity score between multiple levels of the text, modelled as subgraphs, in a hierarchical manner. The similarity between all words is computed first, followed by the calculation of the similarity between all verses (resp. full texts) by using the formerly determined similarity scores between the words (resp. verses). The resulting similarities, on each level, allow for a deeper insight into the interconnected nature in (parts of) text collections, indicating how and to what degree the texts are related to each other.

Practical information

This lecture will be presented at the 15th Internation Conference on Flexible Query Answering Systems.

Date & time: Wednesday 6 September 2023, 12:00 pm

Location: Campus Universitat de les Illes Balears (Carretera de Valldemossa, km 7.5, Palma de Mallorca)

Colin Swaelens, Ilse De Vos & Els Lefever, Medieval Social Media: Manual and Automatic Annotation of Byzantine Greek Marginal Writing

Abstract

In this paper, we present the interim results of a transformer-based annotation pipeline for Ancient and Medieval Greek. As the texts in the Database of Byzantine Book Epigrams have not been normalised, they pose more challenges for manual and automatic annotation than Ancient Greek, normalised texts do. As a result, the existing annotation tools perform poorly. We compiled three data sets for the development of an automatic annotation tool and carried out an inter-annotator agreement study, with a promising agreement score. The experimental results show that our part-of-speech tagger yields accuracy scores that are almost 50 percentage points higher than the widely used rule-based system Morpheus. In addition, error analysis revealed problems related to phenomena also occurring in current social media language.

Practical information

This paper will be presented at “The 61st Annual Meeting of the Association for Computational Linguistics” (Toronto, 9-14 July 2023). It is part of “The 17th Linguistic Annotation Workshop“.

Linguistic annotation of natural language corpora is the backbone of supervised methods of statistical natural language processing. The Linguistic Annotation Workshop (LAW) is the annual workshop of the ACL Special Interest Group on Annotation (SIGANN), and it provides a forum for the presentation and discussion of innovative research on all aspects of linguistic annotation, including the creation and evaluation of annotation schemes, methods for automatic and manual annotation, use and evaluation of annotation software and frameworks, representation of linguistic data and annotations, semi-supervised “human in the loop” methods of annotation, crowd-sourcing approaches, and more. As in the past, the LAW will provide a forum for annotation researchers to work towards standardization, best practices, and interoperability of annotation information and software.

Date & time: Thursday 13 July 2023; 09:45 am

Location: Westin Harbour Castle (1 Harbour Square, Toronto)

Workshop on Editorial Practices of Byzantine Texts

During the past few decades, scholars have initiated debates about the methodologies of editing Byzantine texts. Several questions that had not been asked before, especially in relation to the specificity of Byzantine texts and manuscripts, have finally come to the forefront.

The intellectual authorship of a Byzantine text and its physical materialization often overlap and interact with each other. Many manuscripts, if not literally autographs, stand very close to the original version of texts. Sometimes, there is not even one single original, but the different versions are the reflection of authorial drafts or later elaborations. Manuscripts are often nonuniform and unstable, and present a complex and multilayered hierarchy of texts. Also, the changing linguistic reality of the Middle Ages in tension with a strong school tradition of grammar produces texts that invite the interventions of editors.

This workshop gathers together a group of scholars willing to share their reflections and experiences with editing medieval Byzantine texts. The workshop will address these and other similar questions:

  • How should editors deal with punctuation and accentuation? Which are the meaningful practices in manuscripts? And how do these relate to the oral performance and visual layout of texts?
  • How should editors reproduce unconventional orthography, linguistic flexibility and the fluctuation of registers? Which role does “school grammar” play in this respect?
  • Which is the role of literary genres and textual types? How should editions mark intertextuality and parallels? And what about the case of metaphrasis and rewriting?
  • What is the best way to edit texts that depend on other texts, such as commentaries and marginal scholia? And how can editors synoptically display the layers of successive annotations and textual expansions?
  • Why and how should we edit unfinished and preliminary texts, especially when a more accomplished version is preserved? Similarly, how should we treat apographa, especially the late copies of pre-Byzantine texts?

Programme

 

Date: Wednesday 24 May 2023

Location: leslokaal 0.4 (Blandijnberg 2, 9000 Gent)

 

9-9.30: Introduction (Floris Bernard – Julián Bértola)

 

9.30-10.10: “The challenges of editing rhetorical texts” (Antonia Giannouli)

10.10-10.50: “The complexities of editing florilegia” (Alessandra Bucossi)

 

10.50-11.10: Coffee break

 

11.10-11.50: “Editing Andronikos Kallistos’ works: Problems, remarks, solutions” (Luigi Orlandi)

11.50-12.30: “Editing Aristotle’s Organon in 1495: The models for Aldus Manutius’s Editio princeps of the First Analytics” (José Maksimczuk)

 

12.30-14: Lunch break

 

14-14.40: “A liturgical poem on the passion of Christ (BHG 413m) and its editorial challenges” (Maria Tomadaki)

14.40-15.20: “Open traditions: Use and reuse of book epigrams” (Rachele Ricceri)

 

15.20-15.40: Coffee break

 

15.40-16.20: “Between Symeon the Logothete and Theophanes Continuatus: How to edit the intermediary versions (Logothete B)” (Staffan Wahlgren)

16.20-17: “Byzantine linguistic reality and the edition of texts” (Martin Hinterberger)

 

17-17.30: Wrap-up session

Registration

This event is open for anyone who is interested to attend in person or online (a link will be sent the day before the conference).

To attend the conference, please register here.

Crash Course in Greek Palaeography

The Greek department of Ghent University offers a two-day course in Greek palaeography in collaboration with the Research School OIKOS. The course is intended for MA, ResMA and doctoral students in the areas of Classics, Ancient History, Ancient Civilizations and Medieval studies with a good command of Greek. It offers a chronological introduction into Greek palaeography from the Hellenistic period until the end of the Middle Ages and is specifically aimed at acquiring practical skills for research involving literary and documentary papyri and/or manuscripts. We will also provide the unique opportunity to read from original papyri in the papyrus collection of the Ghent University Library and become familiar with the ongoing research projects at Ghent University.

Programme

The course is set up as an intensive two-day seminar. Five lectures by specialists in the field will give a chronological overview of the development of Greek handwriting, each followed by a practice session reading relevant extracts from papyri and manuscripts in smaller groups under the supervision of young researchers.

 

Monday, May 22

9:30 Welcome with coffee

10:00 Introduction

10:30-11:45 Papyri of the Ptolemaic and Roman period (Dr. Joanne Stolk)

11:45-13:00 Practice with papyri of the Ptolemaic and Roman period

13:00-14:00 Lunch break

14:00-14:30 Presentation of papyri from the collection of the Ghent University Library (Serena Causo)

14:30-15:45 Papyri of the Byzantine period (Dr. Yasmine Amory)

15:45-17:00 Practice papyri of the Byzantine period

19:00 Dinner (optional)

 

Tuesday, May 23

9:00-10:15 Majuscule and early minuscule bookhands (4th-9th centuries) (Dr. Rachele Ricceri)

10:15-11:30 Practice majuscule and early minuscule bookhands

11:30-12:00 Coffee break

12:00-13:15 The development of minuscule script (10th-12th centuries) (Prof. dr. Floris Bernard)

13:15-14:15 Lunch break

14:15-15:30 Practice minuscule script of the 10th-12th centuries

15:30-16:00 Coffee break

16:00-17:15 Manuscripts and scholars of the Palaeologan period (13th-15th centuries) (Prof. dr. Andrea Cuomo)

17:15-18:30 Practice manuscripts of the Palaeologan period

Practical information

The study load is the equivalent of 2 ECTS (2×28 hours). Participants will be asked to read up on secondary literature in preparation for the seminar (distributed several weeks before the course). Extra material will be handed out during the course in order to continue to improve your reading skills afterwards.

There are no fees for participation in this course. Lunches and coffee on both days are provided free of charge. There is an optional dinner on Monday at your own expense. Travel costs and accommodation in Ghent are also at your own expense.

Registration

Please register by sending an e-mail with a short motivation (including your background, research interests and why you would like to follow this course) to yasmine.amory@ugent.be. Priority is given to OIKOS doctoral students and those who did not have the opportunity to follow course(s) on palaeography before. Registration closes by the final deadline of March 1st, 2023. Successful applicants will be notified soon afterwards.

Workshop on Editorial Practices of Byzantine texts

We would like to draw your attention to a scientific workshop which will be organized in tandem with the Crash Course. This one-day workshop will take place in Ghent immediately following after the Crash Course (Wednesday May 24th) and will be devoted to editorial practices of Byzantine texts. It is organized by Julián Bértola and Floris Bernard (who are also teachers at the Crash Course). Experts will share experiences and insights concerning critical editions of Byzantine texts and manuscripts. The program will be circulated soon. Crash Course participants are warmly invited to stay one day longer in Gent and make use of this opportunity to attend this scholarly conference.

Maxime Deforche, Ilse De Vos & Colin Swaelens, From Umbrellas to Nodes. The Ever-Evolving Database of Byzantine Book Epigrams

Abstract

The Database of Byzantine Book Epigrams (DBBE) at Ghent University contains over 12.000 unique epigrams. They are stored both as occurrences – the epigrams exactly as they occur in the manuscripts – and as types – normalised versions of the occurrences in terms of spelling.

The relationship between occurrences and types is not one-to-one. For example, type 2148 represents 70 two-verse occurrences of the ὥσπερ ξένοι epigram which was used widely by scribes to mark their joy of having reached the end of the manuscript 4 and thus of their copying task. The decision to link multiple occurrences to a single type was both pragmatic and conceptual. Creating fewer types not only freed up time to trace new occurrences, it was also by far the most straightforward way to group similar occurrences. As such, types became umbrellas.

Soon however, this all-or-nothing system ran against its limitations: What exactly does “similar” mean? How “similar” do occurrences need to be for them to be put under the same type? The ὥσπερ ξένοι epigram for example circulated in many different versions, some counting three or four verses. To deal with this variety, increasingly more types were created, each of them covering different subsets of occurrences. To (re)connect these subsets, a complementary system was introduced allowing to link individual verses regardless of the type their occurrence belongs to. As for the ὥσπερ ξένοι epigram, no less than 202 instances of its first verse are to be found in DBBE.

Although a huge step forward, this system still treats similarity as a dichotomy whereas it clearly is a continuum. Also, it does not allow to visualise variation within the more complex lists of “similar” verses nor to take into account different parameters, both textual and other.

A state of the art graph database will offer a versatile and highly visual alternative to the current static representation and rigid treatment of the data, which is inextricably linked to the fact that underlying the user interface is a traditional relational database consisting of tables. A graph database on the contrary can be modelled to efficiently represent the similarity between all epigrams and verses. Instead of using dedicated pieces of data as umbrellas, similar occurrences can be found by simply retrieving a group of nodes – the building stones of a graph database – and the relationships between them. Moreover, it can do so based on any kind of criteria available in the graph, including metadata such as author, time, and place.

In order to maximise the benefits of shifting to such a graph database, it is necessary to enrich the existing data. Therefore, a linguistic pipeline is being developed to perform automatic tokenisation, morphological analysis, and lemmatisation of the entire DBBE corpus. These linguistic annotations will push forward the ways in which similarity can be calculated, far beyond the current level of orthography. The results of the experiments carried out so far are highly promising. Does this mean the end for the types? Quite the contrary. We will always need types as readable representatives of occurrences. The less we need them as umbrellas, the more they can be just that.

Practical information

This lecture will be given at the international workshop “Repetition and Ritual, Text and Edition, Challenges and Solutions”  (Austrian Academy of Sciences, 24-25 November 2022). The workshop is organised by Eirini Afentoulidou in the framework of the project “Female Identities at a Liminal State: An Analysis of Childbed Prayers in Byzantine Prayerbooks”.

Date & time: Friday 25 November 2022, 9:30 am

Location: Austrian Academy of Sciences, Institute for Medieval Research (Hollandstraße 11-13, 1020 Vienna) & Zoom (pre-registration is mandatory for the online event; please contact: ekaterini.mitsiou@oeaw.ac.at)

 

Rachele Ricceri, ‘Text and Image, Text as Image: The Beauty of the Book in Byzantine Book Epigrams’

Abstract

Book epigrams, or metrical paratexts, abound in Byzantine manuscripts. These compositions are the joining link between verse inscriptions, written on any kind of support, and manuscript anthologies, which transmit literary epigrams. Byzantine book epigrams have been collected in an online Database (DBBE, www.dbbe.ugent.be), hosted by Ghent University, with the scope of gathering and making available a large corpus of metrical paratexts dating up to the 15th century.

This paper offers some reflections on the aesthetics of books as presented in book epigrams. In the first part of the lecture I will present some epigrams that clearly refer to the physical or spiritual beauty of the book in which they are inscribed.

Moreover, I will discuss some book epigrams potentially dealing with images in their double function of pieces of poetry and of “objects” themselves. Firstly, metrical captions frequently explain, comment upon and enhance the presence of manuscript miniatures. These captions are often clustered in cycles that appear in one or more manuscripts featuring similar miniatures. Secondly, epigrams can also replace miniatures and perform a peculiar visual function. Book epigrams can be placed where manuscript miniatures might be expected to be found and describe miniatures that are actually not present in the manuscript.

The relationship between text and image in book epigrams is a bidirectional one. This fluid interrelation make metrical paratexts a particularly suitable corpus to investigate how words and images coexist on the manuscript folio.

Practical information

This lecture will be given at the international conference “Versus ad picturas. Text/Image Relationship in Greek, Latin and Arabic poetry between Late Antiquity and the Middle Ages” (University of Strasbourg, 28-30 September 2022). It is part of the session “Culture grecque. Antiquité tardive et littérature byzantine”.

The conference Versus ad picturas, conceived within the framework of the research of the international group GIRPAM on Greek and Latin poetry in Late Antiquity and the Middle Ages and in particular the activities of the Gutenberg Chair 2021 on biblical poetry, aims to contribute to the study of the relationship between the images we now call artistic, painted on walls, fabrics, stained glass or parchments, and the verses that often accompany them materially or ideally, and that are now increasingly recognized as indispensable to their cultural understanding and social location.

Date & time: Thursday 29 September 2022, 10:00 am

Location: the lecture will be broadcast via streaming: https://us02web.zoom.us/j/84177136590?pwd=ZXV1YVBLYUQzM1hBdVNiSmlIS0U1Zz09

  • Meeting ID: 841 7713 6590
  • Passcode: 754872

More information about this conference and the full programme can be found here.

LW Research Day 2022: poster session

The third LW Research Day will take place on Thursday April 28, 2022, in the Faculty Library. Central theme is ‘research valorisation’.

The Faculty of Arts and Philosophy wants to encourage the social and economic valorisation of research. Research valorisation is the process of transferring and deploying scholarly knowledge and expertise outside the scientific field. During the LW Research Day they organize several lectures and information sessions, and they present various initiatives of the faculty.

The DBBE team will present a poster on the research valorisation and outreach activities organised in the framework of the Database of Byzantine Book Epigrams.

More information can be found on the LW Research Day website.

Julián Bértola, Towards a Reassessment of Ephraim of Ainos

From 9 to 11 February 2022, Krystina Kubina convenes a conference on “Poetry in Late Byzantium” within the framework of her project at the Department of Byzantine Research (Austrian Academy of Sciences) devoted to the same topic. During these three days, more than 30 scholars from across the world will discuss forms, functions and developments of this important aspect of medieval Greek literature.

The programme can be found here.

The DBBE will be represented by Julián Bértola, who will give a talk entitled “Towards a Reassessment of Ephraim of Ainos”.

 

Practical information

Date & time: Thursday 10 February 2022, 14:15 CET

Location: Institute for Medieval Research, Division of Byzantine Research (Hollandstrasse 11-13, 1020 Vienna) and/or online via Zoom.
Pre-registration is mandatory for participating online; please contact krystina.kubina@oeaw.ac.at.

Colin Swaelens, You shall know a verse by the company it keeps. Detecting orthographic and semantic similarity between epigrams

The Argentine Committee of Byzantine Studies (CAEBiz) cordially invites you to its Online workshop on Digital Humanities. The meeting will be co-ordinated by José Maksimczuk (Universität Hamburg – CSMC) and Tom Gheldof (KU Leuven) and will take place on FridayMarch 4, 202214.00 CET.

Representing DBBE, Colin Swaelens will give a talk entitled “You shall know a verse by the company it keeps. Detecting orthographic and semantic similarity between epigrams”.

 

Practical information

Date & time: Friday 4 March 2022, 3:45 pm

Location: the workshop will be held via Zoom (no registration is required): https://uni-hamburg.zoom.us/j/66093457970?pwd=N0h3ZjM4VFYzTlFJQWVXVUpUMmxIZz09

  • Meeting ID: 660 9345 7970
  • Passcode: 23868106

DBBE Workshop

The Database of Byzantine Book Epigrams, based at Ghent University, will be launched on-line in the near future. The database has been created thanks to a Hercules grant from the Flemish Government. Further research on book epigrams within our team, including elaboration of the DBBE, will benefit from a Ghent University GOA grant from 2015 on. Therefore, this seemed the right moment to organize a workshop. We have asked Byzantinists (philologists, palaeographers, cultural historians) to try out a test version of the database, exploring it with their own research interests in mind. They will present their observations, propose improvements, and offer suggestions for future research. The one-day workshop will be concluded with a round-table discussion. Everyone is invited to attend the workshop and to participate in the discussion. Please contact floris.bernard@ugent.be for further information and for registration.

Participants: Patrick Andrist, Theodora Antonopoulou, Elizabeth Jeffreys, Michael Jeffreys, Paolo Odorico, Emilie van Opstall, Inmaculada Pérez-Martin, Andreas Rhoby, Véronique Somers.

 

You can find the full programme of the workshop here: ProgramWorkshopDBBE.