Defining custom XML import formats

4 views
Skip to first unread message

Faith Lawrence

unread,
Feb 16, 2026, 2:26:28 PM (8 days ago) Feb 16
to inception-users
Heya,

I'm experimenting with defining custom XML formats. I've read the documentation at https://inception-project.github.io/releases/39.4/docs/user-guide.html#sect_formats_xml_custom but I'm having a few issues with getting the documents to display correctly. 

I have a complex XML format (LegalDocMl) that I want to support eventually, but I'm testing with a much smaller and simpler set of XML elements to see if I can get it working at all. 

I have a couple of questions:

  1. Is there any other documentation on defining custom XML formats for example what values can go in the YAML file and what do they mean? e.g. PASS, PASS_NO_NS etc. I have had a look but I can't see it documented but I may have missed it.
  2. If my custom XML document is importing without any errors, but I am not seeing part of the the text displayed is that likely to be a CSS issue? Or could it be an issue with the document definition (either in the plugin json or in the yaml policy)? And is there any way that I can tell?
  3. Also, if I change the definition or the CSS do I need to re-load the document or will the changes be picked up immediately?
Any help or suggestions gratefully received.

Thanks,

Faith

Richard Eckart de Castilho

unread,
Feb 16, 2026, 2:54:04 PM (8 days ago) Feb 16
to incepti...@googlegroups.com
Hi,

thanks for trying the custom XML formats. It is really an interesting feature.

There are some examples here:

https://github.com/inception-project/inception-xml-formats-examples

And here is the default HTML policy:

https://github.com/inception-project/inception/blob/main/inception/inception-external-editor/src/main/java/de/tudarmstadt/ukp/inception/externaleditor/policy/DefaultHtmlDocumentPolicy.yaml

> On 16. Feb 2026, at 10:28, 'Faith Lawrence' via inception-users <incepti...@googlegroups.com> wrote:
>
> • Is there any other documentation on defining custom XML formats for example what values can go in the YAML file and what do they mean? e.g. PASS, PASS_NO_NS etc. I have had a look but I can't see it documented but I may have missed it.

It's in the source code itself.

https://github.com/inception-project/inception/blob/main/inception/inception-support/src/main/java/de/tudarmstadt/ukp/inception/support/xml/sanitizer/AttributeAction.java
https://github.com/inception-project/inception/blob/main/inception/inception-support/src/main/java/de/tudarmstadt/ukp/inception/support/xml/sanitizer/ElementAction.java

Here is a summary.

# Elements

PASS -- Element is passed through.
SKIP -- Element is not passed through filter but any child text nodes are passed.
DROP -- Element is not passed through.
PRUNE -- Element is not passed through and neither are any descendants even if they might otherwise be marked to pass.

# Attributes

PASS -- Pass attribute as-is.
DROP -- Attribute is not passed on - it is dropped.
PASS_NO_NS -- Pass attribute but remove the namespace.

The CSS `attr(XXX)` construct is unable to access attributes that are not in
the default namespace. Support for adding access to namespaced-attributes appears to have
been present in early proposals of the CSS3 namespace enhancements [1]
but appear to have been dropped for the final recommendation.
Also, browsers do not appear (yet) to have implemented support for this on their own.

Thus, if the attribute contains data that needs to be accessed using
`content: attr(XXX)`, then use this.

[1] https://www.w3.org/1999/06/25/WD-css3-namespace-19990625/#attr-function

> • If my custom XML document is importing without any errors, but I am not seeing part of the the text displayed is that likely to be a CSS issue? Or could it be an issue with the document definition (either in the plugin json or in the yaml policy)? And is there any way that I can tell?

You should add

```
debug: true
```

At the root level of the YAML file.

You will also need to start INCEpTION with

```
-Dlogging.level.de.tudarmstadt.ukp.inception.support.xml.sanitizer.SanitizingContentHandler=DEBUG
```

or add this to your `settings.properties`

```
logging.level.de.tudarmstadt.ukp.inception.support.xml.sanitizer.SanitizingContentHandler=DEBUG

```

Then you should be able to see helpful information about how the policy applies in the INCEpTION logs.

> • Also, if I change the definition or the CSS do I need to re-load the document or will the changes be picked up immediately?

If you make changes to the CSS or policy files, save them and reload in the browser, you should see the changes immediately.
You may want to open the browser's developer tools though and tick "disable cache" on the "network" tab to make sure the
browser does really load the files freshly when you reload (and keep the developer tools open).

You can also right-click into the empty editor panel and choose "Inspect" to see what (if anything) goes through the
policy to the browser. That can also help you debug your CSS. When the "debug" key is set in the policy file, you
will also see traces of elements filtered out by the policy.

I hope those tips help. If you get stuck, let me know.

Cheers,

-- Richard



Faith Lawrence

unread,
Feb 17, 2026, 9:06:02 AM (8 days ago) Feb 17
to inception-users
Thanks for that, that is really helpful.

I was doing some more experiments yesterday and was able to confirm that what I was seeing was that the XML elements which happened to have names that were the same as elements in html were displaying and the other elements weren't. I'll go through the things that you have flagged in your response and hopefully I will be able to disentangle what I have done wrong.

Best,

Faith

Richard Eckart de Castilho

unread,
Feb 17, 2026, 3:36:57 PM (7 days ago) Feb 17
to inception-users
Hi,

> On 17. Feb 2026, at 15:06, 'Faith Lawrence' via inception-users <incepti...@googlegroups.com> wrote:
>
> I was doing some more experiments yesterday and was able to confirm that what I was seeing was that the XML elements which happened to have names that were the same as elements in html were displaying and the other elements weren't. I'll go through the things that you have flagged in your response and hopefully I will be able to disentangle what I have done wrong.

Make sure that you actually import your XML documents using the custom format, i.e.
when you are in the documents panel in the project settings, select your custom format
from the dropdown. Also verify that the name of your format is shown in the document
table after the import. Otherwise INCEpTION would fall back to the default HTML policy.

If you still get stuck or believe you might have hit a bug, let me know.

-- Richard

Reply all
Reply to author
Forward
0 new messages