Hi!
I'm looking to use the nanopub dataset as a whole for research on RDF engines' performance, because the dataset is big and clearly licensed. However, the only interfaces I found for downloading the nanopubs are various APIs and apps such as this one, which only provides compressed packages of 1000 nanopubs each.
I don't want to unnecessarily overload anyone's servers by
making thousands of API calls, so – is there any way to download
"the whole thing"? Or should I just use the existing APIs? I
would be grateful for any hints. :)
-- Piotr Sowiński Systems Research Institute Polish Academy of Sciences
To unsubscribe from this group and stop receiving emails from it, send an email to nanopub-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nanopub-users/BF640981-F55B-4861-B192-B899B025FB58%40maastrichtuniversity.nl.
For more options, visit https://groups.google.com/d/optout.
Thanks Barend,
That is helpful, and I agree.
However, it also links to what I see in research data management in general. We focus on getting the data “in”, far less on how to use it again. Yes, we could build a tool that follows that provenance trail and discovers the evidence at the end. Or the lack thereof… Like, being able to quickly follow the trail for “mouth masks are not a good idea during a pandemic” would probably have helped our RIVM a lot 😊. So in general, I would see a bit more effort on tools that reuse what we create.
Best, Chris
For whatever it is worth:
In my mind there are at least two kinds of provenance -
1. (data provenance or origin) how/when/by who/by what was the
digital data sourced / transformed. E.g., I used curl to download
https://example.org/data.zip, then used zip to extract table.tsv
from the zip file. Then, I generated a new digital object using
"cut -f1"
2. (knowledge provenance or origin) what is the evidence cited support a specific claim? Who supported the claim, who refuted the claim? E.g., Dr Dena Abbasi claims that dog treats are an effective way to pacify dogs and cites various sources, all of them published journals with an impact factor > 2 (whatever that means these days). Also, Dr Lupus made a similar claim 20 years ago. The claim has not been refuted except by Felix, the neighbors cat.
So, for 1. its really about resource locations, transformations and bits and bytes.
2. relates to associations between claims, people, published knowledge etc.
In my mind these two are very different, and likely require
different approaches.
And, I have an interest in 1. and have built tools (e.g., https://github.com/bio-guoda/preston ) to help capture data provenance. In my mind 2. is hard to solve at scale (using robots) unless 1. is tackled at scale (using robots).
I am probably repeating stuff that was already said, or am making a common mistake, so I am eager to hear your thoughts on this.
-jorrit