Good morning,
During our attempt to train Annif for Brazilian publications, we chose to use the FastText backend. I believe that the correct format for the training file would be as follows:
__label__ID Normalized text
In a .txt file, each line represents a publication, and each label ID corresponds to a subject ID. For testing purposes, I referred to the yso-nlf folder and, based on the subjects.csv file, indexed some test publications using the IDs from that file. If the test was successful, I intended to apply the same process to our Bibliodata publications.
After loading the YSO vocabulary, the test file I used was as follows:
__label__2851 Art is the expression of human creativity.
__label__2851 The painting is a form of artistic expression.
__label__331 This text is about Alice and the world of wonder.
According to the subjects.csv file, ID 2851 corresponds to "art" and ID 331 to "sociology." (print3)
However, when attempting to run the training, I received the following error: (print1)
Next, I tried modifying the spacing between the label and the normalized text as follows:
__label__2851 Art is the expression of human creativity.
__label__2851 The painting is a form of artistic expression.
__label__331 This text is about Alice and the world of wonder.
Yet, I encountered another error message, this time indicating that each word from the normalized text was being interpreted as a subject label. (print2)
I would like to know if I am doing something wrong in this process or if there is an alternative recommended way to create an appropriate training file.
Thank you in advance for your attention, and I look forward to your response.
Sincerely,
Renan Luiz
Brazilian Institute of Information in Science and Technology (IBICT)
Good morning, I hope you’re doing well.
First of all, thank you for your previous response. Thanks to it, we were able to create a training file using data from our own institution. The initial results are very promising, but two questions arose during the process:
Thank you in advance for your attention.
Renan Luiz
--
You received this message because you are subscribed to the Google Groups "Annif Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to annif-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/annif-users/55c1d52d-0d50-435f-a180-c3e6f73a4421n%40googlegroups.com.
Good morning, I hope you are doing well.
I would like to sincerely thank you for all the support we have received from your side. The guidance you provided was crucial for us to move forward with the exploration of the Annif tool in the context of Ibict.
After completing the exploration phase, we concluded that Annif shows great potential for our needs, and we have started working on developing an API integrated with the tool. During the exploration, we used GitHub Codespaces as our working environment. However, for the creation of the API, I identified the need to install Annif locally on my machine.
Following the instructions available at this link: https://github.com/NatLibFi/Annif-tutorial/blob/main/exercises/01_install_annif.md, I attempted to install Annif using both the "VirtualBox based install" and the "Docker based install" options.
Unfortunately, I encountered issues with both approaches. When trying the VirtualBox installation, I received an error, as shown in Attachment 1.
Then, when attempting the Docker installation, the process fails to locate the file in the specified path, even though the file is correctly placed, as shown in Attachment 2 e 3.
I would like to kindly ask if you could provide any guidance on what might be going wrong, so that I can make the necessary corrections and continue with the project.
Thank you very much for your attention and support.
Sincerely,
Renan Luiz
To unsubscribe from this group and stop receiving emails from it, send an email to annif-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/annif-users/7dabf25c-634f-48bd-af42-6952d4b3bf5d%40helsinki.fi.