Hello Parthasarathi,
the command is a simple pipeline combining grep (used for filtering) and
sed (used to perform a search-replace operation). The input is a file in
N-Triples format, and this happens to be a format where it's quite easy
to use this kind of line-based tools.
Here's a brief explanation of grep and sed pipes:
https://www.themoderncoder.com/a-simple-introduction-to-grep-and-sed/
The grep command looks for the string '/fast/' so in practice, it will
drop all lines that represent triples from non-FAST vocabularies, as all
FAST concepts have this string in their URI.
The sed command looks for the full URI of the schema:Instance type and
replaces it with the full URI of the skos:Concept type.
The result is a file, still in N-Triples format, which contains only
FAST concepts (thanks to the grep) and whose types have been changed
from schema:Instance to skos:Concept (thanks to the sed).
Regarding the oneDNN warning, I think that should be harmless. It most
likely comes from TensorFlow.
-Osma
Parthasarathi Mukhopadhyay kirjoitti 10.11.2022 klo 15.36:
> Hello Osma
>
> Your solution worked like a charm.
>
> Thanks a lot.
>
> We are now getting 437945 rows in the subjects.csv file with entries like -
>
>
http://id.worldcat.org/fast/1000320 <
http://id.worldcat.org/fast/1000320>
>
>
>
> Lithopone
>
>
> I've observed that blocks are changing after the command in this way:
>
> *Earlier*
>
> <
http://id.worldcat.org/fast/1000948
> skos:prefLabel "Lizard watching"@en .
> *
> *
> *Now*
> *
> *
> **<
http://id.worldcat.org/fast/1000948
> skos:prefLabel "Lizard watching"@en .
>
> Plz explain what this magic command is doing.
>
> Another issue I forgot to mention earlier - after upgrading to 0.59,
> annif commands were producing a warning like - /oneDNN custom operations
> are on. You may see slightly different numerical results due to
> floating-point round-off errors from different computation orders. To
> turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`./
> /
> /
> Issuing - export TF_ENABLE_ONEDNN_OPTS=0 stopped that warning. Hopefully
> it's okay with the process.
>
> Thanks again and best regards
>
>
> On Thu, Nov 10, 2022 at 4:24 PM Osma Suominen <
osma.s...@helsinki.fi
> <mailto:
osma.s...@helsinki.fi>> wrote:
>
> Hello Parthasarathi,
>
> it appears that the FAST concepts in that file do not have skos:Concept
> set as their type. Instead they have been given the type
> schema:Intangible. But the same file also contains several other
> vocabularies, including AGROVOC, GND and others, and these do have
> skos:Concept as their type. Annif only looks for entities with the
> skos:Concept type, so in this case it will ignore the FAST concepts but
> instead use the concepts from other vocabularies. This is a bit
> unfortunate and I don't really understand the design choices in this
> FAST file.
>
> However, here is a quick and dirty Linux command line recipe for
> filtering out the non-FAST concepts and changing the type of FAST
> concepts into skos:Concept:
>
> grep '/fast/' FASTTopical.nt|sed -e
> 's|
http://schema.org/Intangible|http://www.w3.org/2004/02/skos/core#Concept| <
http://schema.org/Intangible%7Chttp://www.w3.org/2004/02/skos/core#Concept%7C>'
> >FAST-SKOS.nt
>
> The resulting file FAST-SKOS.nt should contain only FAST concepts with
> the right concept type. I don't think you need to use Skosify on it,
> but
> it should do no harm either.
>
> I didn't test this with Annif but I expect it should work, or at least
> take you a step closer to what you want.
>
> -Osma
>
>
> Parthasarathi Mukhopadhyay kirjoitti 10.11.2022 klo 6.30:
> > Hello Osma
> >
> > Thanks for showing the right path as usual.
> >
> > Sorry for being late in reporting the result. Actually I was
> getting a
> > strange result and thought that I should report it after the second
> > round of attempt.
> > Here it goes -
> >
> > 1. Upgraded Annif from 0.57 to 0.59 (the process was extremely
> smooth)
> >
> > 2. Skosify the NT file of FAST with language label -
> > (--eliminate-redundancy --default-language=en
> > --label="
http://id.worldcat.org/fast/
> <
http://id.worldcat.org/fast/> <
http://id.worldcat.org/fast/
> > <mailto:
osma.s...@helsinki.fi
> > <mailto:
annif-users%2Bunsu...@googlegroups.com
> <mailto:
annif-users%252Buns...@googlegroups.com>>
> > > <mailto:
annif-users...@googlegroups.com
> <mailto:
annif-users%2Bunsu...@googlegroups.com>
> > <mailto:
annif-users%2Bunsu...@googlegroups.com
> <mailto:
annif-users%252Buns...@googlegroups.com>>>.
> > > To view this discussion on the web visit
> > >
> >
>
https://groups.google.com/d/msgid/annif-users/CAGM_5uaNV7q%3DvGP6hz%2B%3DNqSPH-i3mKpj0CZy2QPXVEgZ3zWyAQ%40mail.gmail.com <
https://groups.google.com/d/msgid/annif-users/CAGM_5uaNV7q%3DvGP6hz%2B%3DNqSPH-i3mKpj0CZy2QPXVEgZ3zWyAQ%40mail.gmail.com> <
https://groups.google.com/d/msgid/annif-users/CAGM_5uaNV7q%3DvGP6hz%2B%3DNqSPH-i3mKpj0CZy2QPXVEgZ3zWyAQ%40mail.gmail.com <
https://groups.google.com/d/msgid/annif-users/CAGM_5uaNV7q%3DvGP6hz%2B%3DNqSPH-i3mKpj0CZy2QPXVEgZ3zWyAQ%40mail.gmail.com>> <
https://groups.google.com/d/msgid/annif-users/CAGM_5uaNV7q%3DvGP6hz%2B%3DNqSPH-i3mKpj0CZy2QPXVEgZ3zWyAQ%40mail.gmail.com?utm_medium=email&utm_source=footer <
https://groups.google.com/d/msgid/annif-users/CAGM_5uaNV7q%3DvGP6hz%2B%3DNqSPH-i3mKpj0CZy2QPXVEgZ3zWyAQ%40mail.gmail.com?utm_medium=email&utm_source=footer> <
https://groups.google.com/d/msgid/annif-users/CAGM_5uaNV7q%3DvGP6hz%2B%3DNqSPH-i3mKpj0CZy2QPXVEgZ3zWyAQ%40mail.gmail.com?utm_medium=email&utm_source=footer <
https://groups.google.com/d/msgid/annif-users/CAGM_5uaNV7q%3DvGP6hz%2B%3DNqSPH-i3mKpj0CZy2QPXVEgZ3zWyAQ%40mail.gmail.com?utm_medium=email&utm_source=footer>>>.
> <mailto:
osma.s...@helsinki.fi <mailto:
osma.s...@helsinki.fi>>
> > <mailto:
annif-users%2Bunsu...@googlegroups.com
> <mailto:
annif-users%252Buns...@googlegroups.com>>.
> > To view this discussion on the web visit
> >
>
https://groups.google.com/d/msgid/annif-users/6bb4a47f-1740-4954-33a9-51cde0a0f7d2%40helsinki.fi <
https://groups.google.com/d/msgid/annif-users/6bb4a47f-1740-4954-33a9-51cde0a0f7d2%40helsinki.fi> <
https://groups.google.com/d/msgid/annif-users/6bb4a47f-1740-4954-33a9-51cde0a0f7d2%40helsinki.fi <
https://groups.google.com/d/msgid/annif-users/6bb4a47f-1740-4954-33a9-51cde0a0f7d2%40helsinki.fi>>.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Annif Users" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to
annif-users...@googlegroups.com
>
https://groups.google.com/d/msgid/annif-users/CAGM_5ubgv_Fdq_%3DPgYYVf13aF-8k6jRUvgOW_5vjDOO%2BJ3nYVQ%40mail.gmail.com <
https://groups.google.com/d/msgid/annif-users/CAGM_5ubgv_Fdq_%3DPgYYVf13aF-8k6jRUvgOW_5vjDOO%2BJ3nYVQ%40mail.gmail.com> <
https://groups.google.com/d/msgid/annif-users/CAGM_5ubgv_Fdq_%3DPgYYVf13aF-8k6jRUvgOW_5vjDOO%2BJ3nYVQ%40mail.gmail.com?utm_medium=email&utm_source=footer <
https://groups.google.com/d/msgid/annif-users/CAGM_5ubgv_Fdq_%3DPgYYVf13aF-8k6jRUvgOW_5vjDOO%2BJ3nYVQ%40mail.gmail.com?utm_medium=email&utm_source=footer>>.
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 15 (Unioninkatu 36)
> 00014 HELSINGIN YLIOPISTO
> Tel.
+358 50 3199529
>
https://groups.google.com/d/msgid/annif-users/5af0a1c2-452b-0674-ed27-b5be1fcc39d6%40helsinki.fi <
https://groups.google.com/d/msgid/annif-users/5af0a1c2-452b-0674-ed27-b5be1fcc39d6%40helsinki.fi>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
annif-users...@googlegroups.com
> <mailto:
annif-users...@googlegroups.com>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/annif-users/CAGM_5uZh0r6pGbh7rjLBNk6G5Yh5q%2BQoqHSvJkk4csAb0N0Z_Q%40mail.gmail.com <
https://groups.google.com/d/msgid/annif-users/CAGM_5uZh0r6pGbh7rjLBNk6G5Yh5q%2BQoqHSvJkk4csAb0N0Z_Q%40mail.gmail.com?utm_medium=email&utm_source=footer>.