Export from CATMA and import into INCEpTION

15 views
Skip to first unread message

Yanir Marmor

unread,
Sep 9, 2021, 6:57:41 AM9/9/21
to inception-users
Hello,

I'm curious if anyone has gone through a similar process: exporting tagged data from CATMA and importing it into INCEPTION. Is there a quick and easy way to do it?
I understand that CATMA data can be exported in TEI XML format and that  INCEPTION  data can be uploaded using XMI. Is there a simple way to bridge the gap between these two formats?
I'd be grateful for your thoughts.

Best,
Yanir

Richard Eckart de Castilho

unread,
Sep 9, 2021, 7:01:54 AM9/9/21
to inception-users
Hi Yanir,
INCEpTION has a certain level of TEI support, but probably not the specifics that CATMA is using. In such a case, what we have done in the past is implementing a Python script that
parses the specific TEI dialect in question, maps it to a specific configuration of layers
and features and then writes that as XMI using DKPro Cassis - but this was always project-specific.

Could you share an example of the TEI dialect produced by CATMA?

Cheers,

-- Richard

Yanir Marmor

unread,
Sep 9, 2021, 7:15:22 AM9/9/21
to inception-users
Thank you for the quick reply.

The CATMA-TEI Export Format coumentation is here.

In addition, I've attached a short example of text and annotations I made before.


Best,

Yanir



מכילתא_דרי_פסחא_א_Default_Annotations.xml
מכילתא_דרי_פסחא_א.txt

Richard Eckart de Castilho

unread,
Sep 9, 2021, 2:38:30 PM9/9/21
to incepti...@googlegroups.com
HI,

> On 9. Sep 2021, at 13:15, Yanir Marmor <yan...@gmail.com> wrote:
>
> The CATMA-TEI Export Format coumentation is here.
>
> In addition, I've attached a short example of text and annotations I made before.

I believe a generic mapping from the CATMA data model to the INCEpTION data model
is currently not possible. In particular the tag hierarchy in CATMA and the ability
of a property to have multiple values are problematic.

What also seems to be a bit inconvenient for an automatized conversion process is that
the CATMA TEI file nowhere contains the name of the text file.

However, the file you provided looks rather simple with a flat list of tags that have
no properties - unless I overlook something. Such structure should be mappable to
INCEpTION, either by mapping each tag to a separate annotation layer named as the tag
or by mapping all tags to the same annotation layer and using the tag name as a string
feature value.

The appears to be a catma-py Python library which could be used in conjunction with the
DKPro Cassis library to write a script which loads, maps, and saves the result.

Cheers,

-- Richard

Yanir Marmor

unread,
Sep 10, 2021, 9:16:53 AM9/10/21
to inception-users
Thank you  Richard
Reply all
Reply to author
Forward
0 new messages