Message from discussion
Advice sought for DOI metadata, taxonomic name-finding and resolution
Received: by 10.236.180.33 with SMTP id i21mr2917034yhm.1.1334849691159;
Thu, 19 Apr 2012 08:34:51 -0700 (PDT)
X-BeenThere: dryad-dev@googlegroups.com
Received: by 10.236.119.195 with SMTP id n43ls3566084yhh.3.gmail; Thu, 19 Apr
2012 08:34:50 -0700 (PDT)
Received: by 10.101.126.13 with SMTP id d13mr887862ann.23.1334849690693;
Thu, 19 Apr 2012 08:34:50 -0700 (PDT)
Received: by 10.101.126.13 with SMTP id d13mr887861ann.23.1334849690664;
Thu, 19 Apr 2012 08:34:50 -0700 (PDT)
Return-Path: <hl...@nescent.org>
Received: from relay.nescent.org (relay.nescent.org. [152.3.101.24])
by gmr-mx.google.com with ESMTP id k54si2479225yhh.5.2012.04.19.08.34.50;
Thu, 19 Apr 2012 08:34:50 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of hl...@nescent.org designates 152.3.101.24 as permitted sender) client-ip=152.3.101.24;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of hl...@nescent.org designates 152.3.101.24 as permitted sender) smtp.mail=hl...@nescent.org
Received: from [10.0.1.132] (airport-grey.nescent.org [152.3.59.69])
by relay.nescent.org (Postfix) with ESMTP id 41546F933F;
Thu, 19 Apr 2012 11:35:58 -0400 (EDT)
Subject: Re: [dryad-dev] Advice sought for DOI metadata, taxonomic name-finding and resolution
Mime-Version: 1.0 (Apple Message framework v1257)
Content-Type: multipart/alternative; boundary="Apple-Mail=_71CC2BB3-95CC-41D5-A220-C475DCEA968B"
From: Hilmar Lapp <hl...@nescent.org>
In-Reply-To: <CAMA9Da7NJ4qV4GZJ_E01L2uosN8FryoU1oESqSUL3_2kboxiYw@mail.gmail.com>
Date: Thu, 19 Apr 2012 11:34:47 -0400
Cc: Dryad Developers <dryad-dev@googlegroups.com>
Message-Id: <B3EED8A1-7BC2-4747-A719-ED0F87C7D36A@nescent.org>
References: <1878781.951.1334777173691.JavaMail.geo-discussion-forums@ynhs12> <CAMA9Da7NJ4qV4GZJ_E01L2uosN8FryoU1oESqSUL3_2kboxiYw@mail.gmail.com>
To: Mark Diggory <mdigg...@atmire.com>,
David Shorthouse <davidpshortho...@gmail.com>
X-Mailer: Apple Mail (2.1257)
--Apple-Mail=_71CC2BB3-95CC-41D5-A220-C475DCEA968B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=iso-8859-1
David - do I understand your email correctly in that you wanted to crawl =
the actual data files for taxonomic names, and not the metadata?
Programmatic access to the actual data files is possible, but not quite =
as straightforward and clean as we want to be. The progress and aims are =
documented here: http://wiki.datadryad.org/Data_Access
I think the DataONE data access API will be live within the week or two, =
if it isn't already since with the just-released v1.11. (Ryan can update =
us on that.)
-hilmar
=20
On Apr 19, 2012, at 11:28 AM, Mark Diggory wrote:
>=20
>=20
> On Wed, Apr 18, 2012 at 12:26 PM, David Shorthouse =
<davidpshortho...@gmail.com> wrote:
> Folks,
>=20
> I noticed on your development list, =
http://wiki.datadryad.org/Repository_Development_Plan, that you are =
considering ingestion of taxonomic / vernacular names to help supplement =
search across your holdings. I also understand that you have been in =
touch with David Patterson, PI of the NSF-funded Global Names project.
>=20
> Am looking for a simple way for you to take advantage of the Global =
Names taxonomic name-finding and resolution services. These are still =
under development and receiving feedback from consumers. Nonetheless, =
one of our services can take a URL as a query parameter and find all =
names. This URL could point to a PDF, image, doc, xls, etc and does OCR =
on-the-fly as needed. The response is a list of unique names. Another =
service of ours can take a flat list of names and resolve these against =
other lists (e.g. Catalogue of Life, NCBI, EOL, GBIF) and produce their =
local identifiers for a linking service as well as their tree paths to =
root for possible concept expansion in your index.
>=20
> So, I'm writing to inquire if you have plans to include direct links =
to data packages (and MIME type, though not immediate necessary) in =
responses to DOI content negotiations.
>=20
> For example:
> curl -LH "Accept: application/rdf+xml" =
"http://dx.doi.org/10.5061/dryad.584" (or any of your other supported =
content types as expressed at =
http://data.datacite.org/10.5061/dryad.584)
>=20
> ...gives me some nice metadata, but doesn't actually give me a link to =
the data package that I'm most interested in. The only apparent way to =
get the package is to visit =
http://datadryad.org/resource/doi:10.5061/dryad.584 and fish for it. Had =
a link to the package been provided, you'd be pretty close to scratching =
"...Search over hierarchical concepts (e.g., "all lizards")..." off your =
list.
>=20
> There are however going to be some limitations and requirements for =
names within any of your submitted data packages. These may have to feed =
back to data depositors if they wish to have names within their =
submissions recognizable and indexable. We can chat more about that at a =
later date.
>=20
>=20
> David,
>=20
> Having worked on the "/resource/" service for Dryad I can advise that =
the idea was that it eventually support content negotiation and exposure =
of the resource in various Citation, LoD and XML formats. At this time =
however, this is still limited. I did want to give you an example of =
how metadata is exposed via the resource service.
>=20
> Currently we are supporting the following without content negotiation:
> http://datadryad.org/resource/doi:10.5061/dryad.584/citation/ris
> http://datadryad.org/resource/doi:10.5061/dryad.584/citation/bib
> http://datadryad.org/resource/doi:10.5061/dryad.584/mets.xml
>=20
> Eventually OAI-ORE available from
> http://datadryad.org/metadata/handle/10255/dryad.37949/ore.xml
>=20
> would be available from
> http://datadryad.org/resource/doi:10.5061/dryad.584/ore.xml
>=20
> And in the case of an RDF representation, the intention would be to =
eventually see use of 303 redirection and content negotiation to expose =
it on the following variations for rdf browsers=20
>=20
> http://datadryad.org/resource/doi:10.5061/dryad.584/rdf.xml
> http://datadryad.org/resource/doi:10.5061/dryad.584/rdf.n3
>=20
>=20
> Conversely, the approaches we are taking to populate Datacite =
identifiers and metadata are backside calls to those services at the =
time it is appropriate to be releasing the identifier, metadata or =
content to be publicly available. I could see similar approaches =
globalnames.org. It could be feasible to add providers that would =
interact with the API to submit content and retrieve and store suggested =
fields/values that may be appropriate for the data package and data file =
records. An initial strategy for this could operate asynchronously in =
the background after submissions have been completed workflow processing =
and were archived and could be shared with external services. This =
feedback would allow the curators and submitters to cleanup suggested =
values after the submission had been completed.
>=20
> If its your interest to pre-calculate those records by harvesting the =
data beforehand, I do imagine calculating and traversing the above paths =
to be feasible. Eventually, RDF representations could be traversed via =
the content negotiation of the /resource/ reference from Datacite =
redirection. =20
>=20
> At this time I think the hierarchical graph of DataPackage and =
DataFiles is not something we can easily push into Data-cite. The =
actual Data File Record and its parent Package Record are separate =
entires in Datacite, and the actual content bitstreams would be poorly =
described in the service.
>=20
> I would be interested to find out more about the API for =
globalnames.org.
>=20
> Regards,
>=20
> --=20
> =20
> Mark Diggory (Schedule a Meeting)
> 2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010
> Esperantolaan 4, Heverlee 3001, Belgium
> http://www.atmire.com
>=20
>=20
>=20
> --=20
> You received this message because you are subscribed to the Google =
Groups "dryad-dev" group.
> To post to this group, send email to dryad-dev@googlegroups.com.
> To unsubscribe from this group, send email to =
dryad-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at =
http://groups.google.com/group/dryad-dev?hl=3Den.
--=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
--Apple-Mail=_71CC2BB3-95CC-41D5-A220-C475DCEA968B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=iso-8859-1
<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">David =
- do I understand your email correctly in that you wanted to crawl the =
actual data files for taxonomic names, and not the =
metadata?<div><br></div><div>Programmatic access to the actual data =
files is possible, but not quite as straightforward and clean as we want =
to be. The progress and aims are documented here: <a =
href=3D"http://wiki.datadryad.org/Data_Access">http://wiki.datadryad.org/D=
ata_Access</a></div><div><br></div><div>I think the DataONE data access =
API will be live within the week or two, if it isn't already since with =
the just-released v1.11. (Ryan can update us on =
that.)</div><div><br></div><div><span class=3D"Apple-tab-span" =
style=3D"white-space:pre"> =
</span>-hilmar</div><div> <br><div><div>On Apr 19, 2012, at 11:28 =
AM, Mark Diggory wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><br><br><div=
class=3D"gmail_quote">On Wed, Apr 18, 2012 at 12:26 PM, David =
Shorthouse <span dir=3D"ltr"><<a =
href=3D"mailto:davidpshortho...@gmail.com">davidpshortho...@gmail.com</a>&=
gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Folks,<div><br></div><div>I noticed on your development list, <a =
href=3D"http://wiki.datadryad.org/Repository_Development_Plan" =
target=3D"_blank">http://wiki.datadryad.org/Repository_Development_Plan</a=
>, that you are considering ingestion of taxonomic / vernacular names to =
help supplement search across your holdings. I also understand that you =
have been in touch with David Patterson, PI of the NSF-funded Global =
Names project.</div>
<div><br></div><div>Am looking for a simple way for you to take =
advantage of the Global Names taxonomic name-finding and resolution =
services. These are still under development and receiving feedback from =
consumers. Nonetheless, one of our services can take a URL as a query =
parameter and find all names. This URL could point to a PDF, image, doc, =
xls, etc and does OCR on-the-fly as needed. The response is a list of =
unique names. Another service of ours can take a flat list of names and =
resolve these against other lists (e.g. Catalogue of Life, NCBI, EOL, =
GBIF) and produce their local identifiers for a linking service as well =
as their tree paths to root for possible concept expansion in your =
index.</div>
<div><br></div><div>So, I'm writing to inquire if you have plans to =
include direct links to data packages (and MIME type, though not =
immediate necessary) in responses to DOI content =
negotiations.</div><div><br></div><div>
For example:</div><div>curl -LH "Accept: application/rdf+xml" "<a =
href=3D"http://dx.doi.org/10.5061/dryad.584" =
target=3D"_blank">http://dx.doi.org/10.5061/dryad.584</a>" (or any of =
your other supported content types as expressed at <a =
href=3D"http://data.datacite.org/10.5061/dryad.584" =
target=3D"_blank">http://data.datacite.org/10.5061/dryad.584</a>)<br>
</div><div><br></div><div>...gives me some nice metadata, but doesn't =
actually give me a link to the data package that I'm most interested in. =
The only apparent way to get the package is to visit <a =
href=3D"http://datadryad.org/resource/doi:10.5061/dryad.584" =
target=3D"_blank">http://datadryad.org/resource/doi:10.5061/dryad.584</a> =
and fish for it. Had a link to the package been provided, you'd be =
pretty close to scratching "...Search over hierarchical concepts (e.g., =
"all lizards")..." off your list.</div>
<div><br></div><div>There are however going to be some limitations and =
requirements for names within any of your submitted data packages. These =
may have to feed back to data depositors if they wish to have names =
within their submissions recognizable and indexable. We can chat more =
about that at a later date.</div>
=
</blockquote><div><br></div><div><br></div><div>David,</div><div><br></div=
><div>Having worked on the "/resource/" service for Dryad I can advise =
that the idea was that it eventually support content negotiation and =
exposure of the resource in various Citation, LoD and XML formats. =
At this time however, this is still limited. I did want to =
give you an example of how metadata is exposed via the resource =
service.</div>
<div><br></div><div>Currently we are supporting the following without =
content negotiation:</div><div><a =
href=3D"http://datadryad.org/resource/doi:10.5061/dryad.584/citation/ris">=
http://datadryad.org/resource/doi:10.5061/dryad.584/citation/ris</a></div>=
<div><a =
href=3D"http://datadryad.org/resource/doi:10.5061/dryad.584/citation/bib">=
http://datadryad.org/resource/doi:10.5061/dryad.584/citation/bib</a></div>=
<div><a =
href=3D"http://datadryad.org/resource/doi:10.5061/dryad.584/mets.xml">http=
://datadryad.org/resource/doi:10.5061/dryad.584/mets.xml</a></div>
<div><br></div><div>Eventually OAI-ORE available from</div><div><a =
href=3D"http://datadryad.org/metadata/handle/10255/dryad.37949/ore.xml">ht=
tp://datadryad.org/metadata/handle/10255/dryad.37949/ore.xml</a></div><div=
><br></div>
<div>would be available from</div><div><a =
href=3D"http://datadryad.org/resource/doi:10.5061/dryad.584" =
target=3D"_blank">http://datadryad.org/resource/doi:10.5061/dryad.584</a>/=
ore.xml</div><div><br></div><div>And in the case of an RDF =
representation, the intention would be to eventually see use of 303 =
redirection and content negotiation to expose it on the =
following variations for rdf browsers </div>
<div><a href=3D"http://datadryad.org/resource/doi:10.5061/dryad.584" =
target=3D"_blank"><br =
class=3D"Apple-interchange-newline">http://datadryad.org/resource/doi:10.5=
061/dryad.584</a>/rdf.xml</div><a =
href=3D"http://datadryad.org/resource/doi:10.5061/dryad.584" =
target=3D"_blank">http://datadryad.org/resource/doi:10.5061/dryad.584</a>/=
rdf.n3<div>
<br></div><div><br></div><div>Conversely, the approaches we are taking =
to populate Datacite identifiers and metadata are backside calls to =
those services at the time it is appropriate to be releasing the =
identifier, metadata or content to be publicly available. =
I could see similar approaches <a =
href=3D"http://globalnames.org/">globalnames.org</a>. It could be =
feasible to add providers that would interact with the API to submit =
content and retrieve and store suggested fields/values that may be =
appropriate for the data package and data file records. An initial =
strategy for this could operate asynchronously in the =
background after submissions have been completed workflow =
processing and were archived and could be shared with external services. =
This feedback would allow the curators and submitters to =
cleanup suggested values after the submission had been completed.</div>
<div><br></div><div>If its your interest =
to pre-calculate those records by harvesting the data =
beforehand, I do imagine calculating and traversing the above paths =
to be feasible. Eventually, RDF representations could =
be traversed via the content negotiation of the /resource/ =
reference from Datacite redirection. </div>
<div><br></div><div>At this time I think the hierarchical graph of =
DataPackage and DataFiles is not something we can easily push =
into Data-cite. The actual Data File Record and its parent =
Package Record are separate entires in Datacite, and =
the actual content bitstreams would be poorly described in the =
service.</div>
<div><br></div><div>I would be interested to find out more about the API =
for <a =
href=3D"http://globalnames.org/">globalnames.org</a>.</div><div><br></div>=
<div>Regards,</div></div><div><br></div>-- <br><span></span><div><div =
style=3D"margin-top:0px;margin-right:0px;margin-bottom:8px;margin-left:0px=
;font-family:Verdana,Arial,Helvetica,sans-serif">
<font size=3D"3"><span style=3D"font-size:13px"><table =
border=3D"0"><tbody><tr valign=3D"top"><td><img =
src=3D"http://atmire.com/images/@mire_web_2.jpg" alt=3D"@mire =
Inc."><span style=3D"font-size:small"> <br></span></td><td =
style=3D"text-align:left">
<strong style=3D"color:rgb(128,128,128);font-size:small">Mark Diggory =
</strong><span =
style=3D"color:rgb(128,128,128);font-size:small">(</span><span =
style=3D"color:rgb(128,128,128);font-size:small"><a =
href=3D"https://tungle.me/markdiggory" target=3D"_blank">Schedule a =
Meeting</a></span><span =
style=3D"color:rgb(128,128,128);font-size:small">)</span><br>
<span style=3D"font-size:small;color:rgb(136,136,136)"><em>2888 Loker =
Avenue East, Suite 305, Carlsbad, CA. 92010</em></span><br><span =
style=3D"font-size:small;color:rgb(136,136,136)"><em>Esperantolaan 4, =
Heverlee 3001, Belgium</em></span><br>
<font color=3D"#666699"><a href=3D"http://www.atmire.com/" =
target=3D"_blank">http://www.atmire.com</a><br><br></font></td></tr></tbod=
y></table></span></font></div></div><br><div><br =
class=3D"webkit-block-placeholder"></div>
-- <br>
You received this message because you are subscribed to the Google =
Groups "dryad-dev" group.<br>
To post to this group, send email to <a =
href=3D"mailto:dryad-dev@googlegroups.com">dryad-dev@googlegroups.com</a>.=
<br>
To unsubscribe from this group, send email to <a =
href=3D"mailto:dryad-dev+unsubscribe@googlegroups.com">dryad-dev+unsubscri=
be@googlegroups.com</a>.<br>
For more options, visit this group at <a =
href=3D"http://groups.google.com/group/dryad-dev?hl=3Den">http://groups.go=
ogle.com/group/dryad-dev?hl=3Den</a>.<br>
</blockquote></div><br><div apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; =
font-style: normal; font-variant: normal; font-weight: normal; =
letter-spacing: normal; line-height: normal; orphans: 2; text-align: =
auto; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0; "><div><font =
class=3D"Apple-style-span" face=3D"Monaco" size=3D"3"><span =
class=3D"Apple-style-span" style=3D"font-size: 11px; =
">-- </span></font></div><div><font class=3D"Apple-style-span" =
face=3D"Monaco" size=3D"3"><span class=3D"Apple-style-span" =
style=3D"font-size: 11px; =
">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</span></font></div><div><font =
class=3D"Apple-style-span" face=3D"Monaco" size=3D"3"><span =
class=3D"Apple-style-span" style=3D"font-size: 11px; ">: Hilmar Lapp =
-:- Durham, NC -:- <a =
href=3D"http://informatics.nescent.org">informatics.nescent.org</a> =
:</span></font></div><div><font class=3D"Apple-style-span" face=3D"Monaco"=
size=3D"3"><span class=3D"Apple-style-span" style=3D"font-size: 11px; =
">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</span></font></div><div><br =
class=3D"webkit-block-placeholder"></div></span><br =
class=3D"Apple-interchange-newline">
</div>
<br></div></body></html>=
--Apple-Mail=_71CC2BB3-95CC-41D5-A220-C475DCEA968B--