'XB' value in location_country_iso

15 views
Skip to first unread message

Lynette

unread,
Jan 6, 2009, 8:41:50 PM1/6/09
to Biodiversity Collections Index
Hi,
Could you please explain why you chose to use the location_country_iso
field ("The two letter ISO code for the country the collection is
located in") to hold information about whether the collection is
embedded within another collection (value "XB")?

Is the 'embedding' of one collection within another not already
adequately catered for through a child-parent collection relationship?

Currently, embedded collections do not appear in the results of a
search by a specific country location. Retrieving a list of
collections located in Australia which have distinct LSIDs, for
example, is problematic.

Would welcome your advice.

Lynette.

rogerhyam

unread,
Jan 8, 2009, 8:47:20 AM1/8/09
to Biodiversity Collections Index
Hi Lynette,

This is one of those design issues that I wrestled with and opted to
go for the most simple (and least useful) approach.

The question that vexed me along these line was concerning collection
size rather than location but it is much the same. This is because the
question is so often asked "How many specimens are out there?". To
answer this question whilst handling parent-child relationships one
would have to ask a whole series of other questions: What should the
size of a parent collection be? Should it always be the sum of the
sizes of the child collections? Greater than the sum of the sizes of
the child collections? Not controlled by the size of the child
collections? How do we control what people can enter? Should you
automatically increase the size of parent collections when people
change the child collections? What if different records for any one
collection have different numbers in them? etc etc etc.

I made the design decision that each collection record be a free
standing entity. It would basically be impossible to design an
adequate system that handled all the possible relationships between
collections and maintain data integrity accordingly - let alone build
and test such a system with the resources available.

At on point I actually implemented a system where there were a range
of different parent-child relationships but found that I could
describe how they should be used. When is a collection a sub-
collection and not an 'absorbed' collection etc. I threw it away and
went back to the simplest possible relationship between collections
that really only acts like a "see also" link.

The origin of the X* ISO codes is for collections where we don't know
its location, it hasn't been specified by the source or it should be
taken from on of the parents (XB). There is no integrity checking here
so it is possible to say it is embedded but not have a parent. Or the
parent information may be in a note or even unknown. This is just a
flag to say "We aren't saying where it is but you may be able to tell
from the parents". Imagine the code that would be necessary to prevent
circular references and control integrity with multiple records for
each collection!

By the way the X* codes is a valid extension mechanism for two letter
ISO codes so we aren't being bad in that sense.

It would simple be too complex to do expanding queries for collections
that took into account relationships between collections in the core
BCI application and keep everyone happy. This doesn't mean that you
couldn't do it locally if you need to. You can download a CSV
snapshot. Load it into an SQL database and write the appropriate
queries against it. You could also try writing queries against the
JSON service. This is the kind of thing that would also be fun to try
with a triple store. The dataset isn't so large it couldn't be loaded
into an OWL ontology in Protege if anyone has the time to try this?

I am sorry this is such a long reply. I guess it reflects the fact
that I don't have a satisfactory answer.

If you have a minor change I could make that would make things easier
please suggest it.

All the best,

Roger

Lynette

unread,
Jan 15, 2009, 10:37:50 PM1/15/09
to Biodiversity Collections Index
Hi Roger,
Thank you for providing such a comprehensive explanation! It's
instructive, as a user, to see things from the other side!

I fully support your decision to consider each collection (represented
by a collection record) as “a free standing entity”. I’m
uncomfortable with ‘XB’ because it seems to depart from this ‘atomic’
view, switching my focus from a particular collection to its
relationship with another. When I ask “Where the bloody hell are
you?” (to borrow a phrase from a recent Australian government tourism
advertising campaign), I don’t expect you to tell me who you’re
shacked up with!

I understand that location doesn’t make sense for virtual collections,
and that location (country) may not be known for others. So ‘XA’ and
‘XC’ look fine to me as values of location_country_iso to cover ‘not
applicable’ and ‘unknown’ cases, respectively.

However, neither ‘XB’ nor ‘XS’ seems to me to represent “.. the
country the collection is located in”. I see these as addressing
entirely different concepts: the first, the relationship between
collections; the second, the relationship between records describing
collections.

So where should the information currently being captured by ‘XB’ go?

I appreciate that handling relationships can be tricky. Perhaps the
key here, as in other aspects of life, is to define their nature
unambiguously and communicate those definitions clearly! What kinds
of relationships amongst collections does BCI want to represent:
isPartOf? isAbsorbedWithin? isA? isEmbeddedIn? seeAlso? Users
just need to know which, so we can insert an appropriate value in the
appropriate place, and correctly interpret others’ contributions. (Is
a child collection a part of a parent collection, for example, or a
supplementary annex?) We simply need clear, full-text definitions in
order to do this properly.

To deal with the more complex data quality issues to which you refer
(eg. regarding adjustments to the size of a collection when one of its
component collections grows), I think users must take responsibility.
Although BCI might provide a framework, I believe the data is ours.
The data is ours to maintain in a fit and proper manner.

I think BCI could help us, however, to understand better the network
of collections resulting from the relationships we assert. For me,
personally, visualization of the network (eg. as a directed graph, if
relationships are directional) would be a tremendous help in assessing
the consequences of changing any part of that network. Even a simple
alert, indicating to a user that the collection they’re trying to
update is involved in a pre-existing relationship(s), would perhaps be
beneficial. (Call it the wedding ring alert!)

So, finally, in response to your invitation to suggest minor changes,
I submit the following: remove ‘XB’, starting with collections which
do not have any children. For each (‘XB’) collection, check out its
parents. If no parents, replace ‘XB’ with ‘XC’. For those with
parents, ignore any ‘XA’ and ‘XC’ parental records and look at what’s
left. If, there’s only one code remaining, then take it! (The
collection has only one parent, which has location_country_iso NOT IN
{‘XA’, ‘XC’}.) Otherwise, get up, get a cup of tea, and prepare to
answer some difficult questions!

Look on the bright side: without ‘XB’, it will no longer be possible
for a collection to be registered as ‘embedded’ but not have a parent!

Cheers,
Lynette.
> > Lynette.- Hide quoted text -
>
> - Show quoted text -
Reply all
Reply to author
Forward
0 new messages