MISC conllu column

1 view
Skip to first unread message

Agustin Dei

unread,
Feb 20, 2026, 10:23:48 AM (5 days ago) Feb 20
to inception-users
Hi,

why when uploading conllu file with misc column with information inception erases the information on it ?

When I uploaded an already annotated file to inception the MISC column was empty when I downloaded back the file.

Why this happens ? is there a method to prevent this ?

Thanks

Richard Eckart de Castilho

unread,
Feb 21, 2026, 6:47:54 AM (4 days ago) Feb 21
to incepti...@googlegroups.com
Hi,
INCEpTION imports data from the CoNLL-U file to its internal format. It only imports the data it understands.
It does not understand the misc column. It only understands

* Part-of-speech tagging (built-in),
* Lemmatization (built-in),
* Morphological analysis (built-in),
* Dependency parsing (built-in),
* Text normalization (built-in)

See: https://inception-project.github.io/releases/39.6/docs/user-guide.html#sect_formats_conllu

When when export, it is the same. It only exports the information it knows how to represent in CoNLL-U (see above).
So unsupported or custom layers are not exportable into CoNLL-U.

The misc column of CoNLL-U is not standardized. Different corpora use it differently.
Also it is rather restricted. It only has key-value pairs and no type information
(e.g. does a key accept a number or string, etc.).
So it is unclear what the information in the misc column means and which layer/feature INCEpTION should
map it to. Likewise, it is unclear how unsupported/custom layers/features would be reasonably mappable
into the misc column.

Theoretically, INCEpTION could "weave" updated annotations into the original CoNLL-U files. So if you
imported a CoNLL-U and exported it again, INCEpTION could try to just updated e.g. POS information in
the original file instead of writing out a fresh file. That would in theory be able to preserve the misc
column. However, this approach would require that token and sentence boundaries must not be changeable
in INCEpTION. While currently, they are not changeable by default, there is already experimental support
for making them changeable. And when this is used, it would be unclear how to handle the information
in the misc column again, e.g. if a token was split into two, what happens to the misc data?

What information is stored in the misc column in your case?

Cheers,

-- Richard

Milica Ikonić Nešić

unread,
Feb 22, 2026, 5:35:18 AM (3 days ago) Feb 22
to incepti...@googlegroups.com

Dear Richard,

I apologize for joining the conversation, but I noticed that also the Named Entity layer is not being recognized, neither during import nor export. 

Could you please clarify why this is the case?

Kind regards, 

Milica


--
You received this message because you are subscribed to the Google Groups "inception-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inception-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/inception-users/39DF1052-1FAD-4601-B2D9-C3A4944C2547%40gmail.com.

Richard Eckart de Castilho

unread,
Feb 22, 2026, 5:37:51 AM (3 days ago) Feb 22
to incepti...@googlegroups.com
Hi,

> On 22. Feb 2026, at 11:35, Milica Ikonić Nešić <milica.ikon...@gmail.com> wrote:
>
> I apologize for joining the conversation, but I noticed that also the Named Entity layer is not being recognized, neither during import nor export.
> Could you please clarify why this is the case?

That is because the CoNLL-U format does not support Named Entities. See here:

https://universaldependencies.org/format.html

• ID: Word index, integer starting at 1 for each new sentence; may be a range for multiword tokens; may be a decimal number for empty nodes (decimal numbers can be lower than 1 but must be greater than 0).
• FORM: Word form or punctuation symbol.
• LEMMA: Lemma or stem of word form.
• UPOS: Universal part-of-speech tag.
• XPOS: Optional language-specific (or treebank-specific) part-of-speech / morphological tag; underscore if not available.
• FEATS: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
• HEAD: Head of the current word, which is either a value of ID or zero (0).
• DEPREL: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
• DEPS: Enhanced dependency graph in the form of a list of head-deprel pairs.
• MISC: Any other annotation.

Cheers,

-- Richard

Reply all
Reply to author
Forward
0 new messages