Suggestion: Document as a class

33 views
Skip to first unread message

Noah-dbc

unread,
Jan 15, 2024, 4:35:02 AMJan 15
to Annif Users
I have recently started investigating annif because of a comment on a HuggingFace thread I posted about a month ago. I am really excited about it, because we were already working on an ontology in Danish, inspired by the YSO ontology, and I was able to feed the one we have directly into Annif.

I was wondering why the Document entity in the code is a namedtuple and not a class. This makes inheriting from it harder, and thus it is a bit cumbersome to incorporate extra properties we might have about the documents we train our model on. It could be simple things like a document ID, a header or other things.

I wasn't able to create a branch in the repo, so I can't make a PR for it (so I have worked on a forked repo).

It's just a suggestion, but I figured I might as well post it here.

Best regards,

/Noah

Osma Suominen

unread,
Jan 15, 2024, 8:56:59 AMJan 15
to annif...@googlegroups.com
Hi Noah,

thanks a lot for your message, it's really exciting to hear that you are
investigating Annif and that you (DBC?) are working on a YSO-style
ontology in Danish. Do you have any pointers to more information about
the ontology project (any language goes)? Also I'm curious about the HF
discussion that mentioned Annif...

The reason Document and some other data structures are defined as
namedtuples is that it was the simplest approach that worked. But it's
of course possible to change this later on.

Related to this, Annif isn't really intended to be used as a Python
library although of course you can do that, it's just not something we
have documented, and we don't want to make any commitments about the
Python API so it might change in future releases even between minor or
patch releases.

You should be able to make a PR from a forked repo. This is the normal
procedure in GitHub for external contributions. See e.g.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork

I'd be interested to know what other features or properties you are
planning for Documents and how you intend to use them? Perhaps some kind
of generic mechanism for extra properties could be useful.

Best,
Osma
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com
> <mailto:annif-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/annif-users/fbc2130e-cf32-40f6-8e1b-28e8d798af40n%40googlegroups.com <https://groups.google.com/d/msgid/annif-users/fbc2130e-cf32-40f6-8e1b-28e8d798af40n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi
Reply all
Reply to author
Forward
0 new messages