I'm sharing a google docs spreadsheet with you. It contains candidate
search predicates we would like to expose through a TreeBASE web service
interface. In addition, it contains the subjects they may apply to, the
value space of the objects, where/how they would be expressed and retrieved
in nexml and a short description of the application of each of these
predicates.
All implementation details aside, we imagine one should be able to search
for example on dc.title='foo' and get a result set where the study titles
match 'foo'. The list of predicates is a combination of dublin core/prism
(for publication metadata) and a tb (TreeBASE) prefix.
As a request for team CDAO, are any of the tb predicates in the spreadsheet
concepts in CDAO? Could they be?
To everyone else, please comment on the naming scheme. For example, it
seems redundant to have taxonID and taxaID and treeID (etc.), on the other
hand, it disambiguates the subject of the query. Should things be renamed?
Does it make sense as is?
Thanks,
Rutger
TreeBASE search predicates
http://spreadsheets.google.com/ccc?key=rL--O7pyhR8FcnnG5-ofAlw
On Fri, Jun 12, 2009 at 5:23 PM, William Piel<willia...@yale.edu> wrote:
> Thanks Rutger. This is really useful.
>
> Some questions:
>
> -- Regarding the "prism.startingPage" and "prism.endingPage", I think our
> model stores these in one field (i.e. "123-132") -- I guess that means
> splitting the field with some sort of regular expression -- e.g.
> /^(\d+)[\s-\.]+(\d*)$/ -- unless prism also offers a combined "pages"
> option.
There is a pageRange property, which I've added.
> -- In instances where an LSID exists (e.g. all taxonNamebankIDs have LSIDs),
> would it be better to offer that, or stick with CDAO?
W.r.t. the identifiers I'm the least pleased with what I'm suggesting.
Now the identifiers are treated as TreeBASE specific (e.g.
tb:taxonID). It's possible that these can be moved into CDAO, or, if
objects have IDs doesn't seem to fit in with CDAO's mission of
representing the core knowledge of phylogenetics (and IDs are more of
an implementation detail) maybe they should be moved into a PhyloWS
vocabulary? And should different classes of IDs have different syntax,
e.g. a special predicate for LSIDs, versus namespaced IDs for other
authorities (say, "TreeBASE:Tr1231", "Dryad:2324" etc.)?
> -- I was, in a way, chagrined to see that a new "superset" of taxa is
> available -- the GNI (http://globalnames.org/). They've essentially grabbed
> all of uBio's data and added Species2000 and ZooBank to become a source of
> names for EOL and GBIF, together with a names architecture
> (http://gnapartnership.org/gna/wiki) that is under development. Given (a)
> the similarity with uBio's mission, and (b) the fact that big money players
> are involved while uBio seems to be languishing, it may be that this marks
> the beginning of the end for uBio. And that may mean that some day a lot of
> our taxon intel work will need to be rewritten. I only mention this in case
> a bit of foresight, while designing our API terms, might help us adapt to a
> future changing name informatics landscape.
> -- I take it that separate dc.creator elements are created for each author:
> is there a way to communicate author order?
Actually, this is treated inconsistently in practice: I've seen
multiple dc:creator annotations with one author each and I've seen
them all concatenated within a single dc:creator annotation. I would
like us to be as granular as possible so I'd favour the former.
Alternatively, authors could be annotated using FOAF, so we can break
it down in first/last/middle name, and add other contact info (email).
> -- Is there a dc. or prism. for author email, abstract, or keywords?
There is a prism.keyword (used as a set of atomic annotations) and
dc.subject (best practice dictates this would be a comma-separated
list of terms from a controlled vocab.). If we want to make available
more about authors/editors perhaps we might use FOAF?
> [Actually, I just realized that I was think about all this vocabulary
> largely in terms of decorating returned NeXML with metadata rather than as
> PhyloWS search terms. Of course people don't need to search on "email"
> (etc)]
Mmmm... maybe they do need to search on "email", I don't want to
presume to know that :)
Rutger
> On Jun 12, 2009, at 7:45 PM, rutge...@gmail.com wrote:
>
>> Hi,
>>
>> I'm sharing a google docs spreadsheet with you. It contains candidate
>> search predicates we would like to expose through a TreeBASE web service
>> interface. In addition, it contains the subjects they may apply to, the
>> value space of the objects, where/how they would be expressed and retrieved
>> in nexml and a short description of the application of each of these
>> predicates.
>>
>> All implementation details aside, we imagine one should be able to search
>> for example on dc.title='foo' and get a result set where the study titles
>> match 'foo'. The list of predicates is a combination of dublin core/prism
>> (for publication metadata) and a tb (TreeBASE) prefix.
>>
>> As a request for team CDAO, are any of the tb predicates in the
>> spreadsheet concepts in CDAO? Could they be?
>>
>> To everyone else, please comment on the naming scheme. For example, it
>> seems redundant to have taxonID and taxaID and treeID (etc.), on the other
>> hand, it disambiguates the subject of the query. Should things be renamed?
>> Does it make sense as is?
>>
>> Thanks,
>>
>> Rutger
>>
>> TreeBASE search predicates
>> http://spreadsheets.google.com/ccc?key=rL--O7pyhR8FcnnG5-ofAlw
>
>
>
>
>
--