Legal issues

32 views

Skip to first unread message

Unni Knutsen

unread,

Feb 20, 2025, 9:31:29 AM2/20/25

to Annif Users

Hi all!

At the University of Oslo Library we are planning to embark on a project where we wish to use machine learning to generate subject headings from the Humord thesaurus (UB-Data: Humord) to documents in our research archive (DUO Research Archive - University of Oslo) .

We aim at using a local installation of Annif.

We are thinking of using titles and abstracts from the research publications as training data, but acknowledge that there may be copyright, licensing and other legal issues involved.

From what we understand some of your projects must also have involved such considerations. We will, of course, seek legal advice at our universities, but would be very grateful if some of you could share your experiences and considerations.

Best wishes from

Unni Knutsen

University of Oslo Library

Manager of metadata and collection development

anna.k...@googlemail.com

unread,

Feb 20, 2025, 10:07:27 AM2/20/25

to Annif Users

Hello Unni!

Oh boy -- this is a wiiiiide field which is the topic of a lot of hot discussions in our organization as well.

We are using titles and author keywords right now. These are clearly part of the metadata.

We are also looking at using at abstracts, which are in a grey area -- not quite clear if these are part of the metadata or of the content, and abstracts can actually ba subject to copyright.

The big problem right now is that the publishers are panicking right now and are prohibiting the use of their data -- even the metadata!! -- for training AI-based methods.

Just yesterday we received a package of articles with the explicit note "No part of this book may be used to train artificial intelligence systems without permission in writing from the MIT Press."

What we are doing:

- On the one hand, renegotiate with data providers / publishers, and agree on a more specific wording what exactly you are doing with the data.
In most cases, publishers are concerned about generative AI, and currently, not generative AI method is included in Annif.
We also try to convince them of the fact that the purpose we are using it for actually makes their resources more discoverable, and nothing else.

- On the other hand, lobby for a European or at least a national solution. The German Urheberrechtsgesetz actually allows us to do TDM on all data from German-based publishers
(but they might still dispute that).

I realize that these answers are not very helpful but I sympathize deeply!

Feel free to ask me more specific questions, and I will try to share what is the case in our institutions.

Best wishes

Argie

……………………………………………………………………………………………………………………………………………………………………….........…………………………………………………………………………

Dr. Anna (Argie) Kasprzik (non-binär; präferierte Anrede: „Anna/Argie Kasprzik“ statt „Herr/Frau Kasprzik“; Pronomen engl: they/their/them|dt: en/ens/em/en)
Theoretische Informatik, Leitung Automatisierung der Sacherschließung / Theoretical Computer Scientist, Coordination of Automated Subject Indexing

Menschen bestimmen ihre Identität selbst. Bitte teilen Sie mir mit, wie Sie angesprochen werden wollen.
Every person determines their own identity. Please let me know how you would like to be addressed.

Bitte denken Sie an die Umwelt, bevor Sie für die Kommunikation mit mir (insbesondere kommerzielle) KI-gestützte Werkzeuge nutzen.
Please consider the environment before using (especially corporate) AI functionality to communicate with me.

ZBW – Leibniz-Informationszentrum Wirtschaft
Neuer Jungfernstieg 21, 20354 Hamburg
T: +49 173 3986387 ; E: a.kas...@zbw.eu

https://www.zbw.eu/en/about-us/knowledge-organisation/automation-of-subject-indexing-using-methods-from-artificial-intelligence
……………………………………………………………………………………………………………………………………………………………………….........…………………………………………………………………………

Reply all

Reply to author

Forward

0 new messages