Rich-annotated XML files as Input and UNITEX strips all XML-Annotations, later merging them back

42 views
Skip to first unread message

Maximilian Hadersbeck

unread,
Apr 7, 2017, 7:18:47 AM4/7/17
to Unitex-GramLab

Hi Unitex users,


We use rich-annotated XML files as Input and UNITEX strips all XML-Annotations.

Inside UNITEX they become  "pure" .txt Files, which we process.

Our local grammars for Personal-Names, and so on adds new XML-tags via transducer output to the output-files.


But then we want to have our XMl-Tags from before back is the file.

So we must merge the XML-Files from before with the new annotations from Unitex. This is not trivial!


So my question: Are there solutions?

One chance would be, to hide some Index-Informations in our XML-Files, which is not removed though UNITEX-XML-Input and not taken into account from the Local Grammars in UNiTEX.

This Index-information could be used afterwards to merge easily the OLD-XML-Annotations together with the new UNITEX Annotations.


Thanks for your help

Max Hadersbeck, Munich, CIS

Maximilian Hadersbeck

unread,
Apr 11, 2017, 10:33:03 AM4/11/17
to Unitex-GramLab
Eric gave me the tip to use the "dumpoffsets" program and there is a rather short documentation in the manual, Sections 13.13. Are there any examples how to use it?
Thank you
Max

 
Reply all
Reply to author
Forward
0 new messages