Heidi Jauhiainen (University of Helsinki), Machine-Readable Texts for Egyptologists
Friday July 9, 2021, 17:00 (UK time/UTC+1)
In
order to use digital methods to study texts, one needs them in
machine-readable form. Assyriology has freely downloadable corpora of
machine-readable texts, such as Open Richly Annotated Cuneiform Corpus,
but the lack of similar corpora hinders the digital study of ancient
Egyptian texts. A transliterated text in digital format, for example as a
text or TEI file, is machine-readable. Producing transliterated texts
manually is time consuming and, hence, there have been experiments in
automatically producing transliterated texts. However, in order to
produce machine-readable texts with automated transliteration, one needs
machine-readable hieroglyphic texts. There is a tradition in Egyptology
of using encoding to represent hieroglyphic texts so that the
information on the signs themselves and their places in regard to each
other is being maintained. Various types of encoding have been used when
publishing texts in books but those machine-readable texts are not
openly available. Such encoded texts could be produced by OCRing
hieroglyphic texts, but this approach requires a lot of texts in the
same handwriting for training the method.
In
this paper, I present Machine-Readable Texts for Egyptologists, which
is a three-year project that started in the beginning of 2021. The aim
is to produce a large number of manually encoded hieroglyphic texts and
then to develop an iterative process and methods for automatically
transliterating the encoded texts. During the process, the automatically
transliterated texts will be validated and, if necessary, corrected and
then used for making the method more accurate. Both the coded texts and
their transliterations will eventually be offered for free download.