Taking a couple of examples from LSRN
ATCC
http://www.atcc.org/ATCCAdvancedCatalogSearch/ProductDetails/tabid/452/Default.aspx?Template=cellBiology&ATCCNum=__ID__
ATCC_dna
http://www.atcc.org/ATCCAdvancedCatalogSearch/ProductDetails/tabid/452/Default.aspx?Template=bioproducts&ATCCNum=__ID__
COG_Cluster
http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi?__ID__
COG_Function
http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi?fun=__ID__
COG_Pathway
not sure if it still exists
(note that the current templates from LSRN, NCBI, and Freebase, at
least, for both these sets are broken. The URLs given above are
correct)
Splitting these providers records into several sets makes the
redirection possible by a template that only takes the identifier as
argument - note the variations in the atcc URL "cellBiology" versus
"bioproducts".
The reason this leads to instability is that we don't know if the
provider will change the classification and tags they use in the URLs.
If they simplify, merging categories, we are fine i.e. if ATCC dropped
the Template parameter we would redirect both ATCC and ATCC_dna to
http://www.atcc.org/ATCCAdvancedCatalogSearch/ProductDetails/tabid/452/Default.aspx?ATCCNum=__ID__
However in the other direction we can get messed up. Suppose they
split bioproducts into dna and cdna, and the difference wasn't
apparent from the accession. Then, we wouldn't know how how to
redirect a given identifier, without going to a more complex strategy
for deciding redirection.
Therefore we need to figure out what to do about this. Options I see:
1. Use the current classification, but commit to moving to a model of
having redirections for every accession stored in our database,
instead of a template, should the provider URLs change to something
incompatible with the simple redirection.
2. Contact the providers and see whether they are willing to provide a
single URL pattern to access any record they provide, assuming
identifier spaces don't overlap.
3. Move immediately to support per-access redirects, rather than the
template per space.
In all likelihood we will have to support 1, since in the worst case
that's the only remedy, short of having clients change their URLs,
which I consider to be out of the question.
It may or may not be worth pursuing 2. Don't know.
We should have this on the technical agenda for discussion.
-Alan