You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Annif Users
I have recently started investigating annif because of a comment on a HuggingFace thread I posted about a month ago. I am really excited about it, because we were already working on an ontology in Danish, inspired by the YSO ontology, and I was able to feed the one we have directly into Annif.
I was wondering why the Document entity in the code is a namedtuple and not a class. This makes inheriting from it harder, and thus it is a bit cumbersome to incorporate extra properties we might have about the documents we train our model on. It could be simple things like a document ID, a header or other things.
I wasn't able to create a branch in the repo, so I can't make a PR for it (so I have worked on a forked repo).
It's just a suggestion, but I figured I might as well post it here.
Best regards,
/Noah
Osma Suominen
unread,
Jan 15, 2024, 8:56:59 AMJan 15
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to annif...@googlegroups.com
Hi Noah,
thanks a lot for your message, it's really exciting to hear that you are
investigating Annif and that you (DBC?) are working on a YSO-style
ontology in Danish. Do you have any pointers to more information about
the ontology project (any language goes)? Also I'm curious about the HF
discussion that mentioned Annif...
The reason Document and some other data structures are defined as
namedtuples is that it was the simplest approach that worked. But it's
of course possible to change this later on.
Related to this, Annif isn't really intended to be used as a Python
library although of course you can do that, it's just not something we
have documented, and we don't want to make any commitments about the
Python API so it might change in future releases even between minor or
patch releases.
I'd be interested to know what other features or properties you are
planning for Documents and how you intend to use them? Perhaps some kind
of generic mechanism for extra properties could be useful.