Thanks ingrid, Basil and Robina. Just following up on a few points.
On spatial data, while text values for places are better than nothing,
they're not going to show up in Research Data Australia's map
interface, so their usefulness for discovery is going to be limited.
Obviously it would be good to share geospatial coordinates if
possible, and I'm wondering what people are already doing in regard to
geocoding locations in collection dbs.
I haven't done any of this in the museums area, but did do some
heavy-duty geocoding for Mapping our Anzacs. I'm wondering whether
there are tools and approaches that we could share that would benefit
both the MME project and beyond. As Basil pointed out, the GeoNames
API is available. You can also download the GeoNames db and use it,
for example, to populate an autocomplete field in your own app.
ANDS is funding the development of web services on top of Geoscience
Australia's Gazetteer, so this will make the geolocation of Australian
places much easier. I'm hoping too that placenames in the Australian
Gazetteer will be linked to GeoNames ids, to connect up with the
Linked Open Data cloud. But even when we have these APIs we'll have to
think about the best ways to use them.
Of course, all that assumes you already have your placenames
identified. If you're pulling them from text descriptions there's
things like Yahoo Placemaker, and the ever-growing range of entity
extraction tools.
On people/organisations, Ingrid are you saying that your only
intending to include these as subjects? Something like <subject
type="nla-party">http://nla.gov.au/nla.party-615689</subject>? This
might be ok if the person actually is the subject of the collection,
but if they're the collector, then it's quite misleading. It would
seem much better to me to include people/organisations as related
objects in RIF-CS and describe the relationships appropriately.
Indeed, I'm wondering whether 'isSubjectOf' should be added to the
relation types in RIF-CS. (See pp. 39-41 of the ANDS Content Providers
Guide for list of relation types -
http://ands.org.au/guides/content-providers-guide.html)
I don't understand what you mean by not getting into 'duplicating
party records'. What we're talking about is using (or minting) party
ids in People Australia to identify related objects in RIF-CS.
Similarly, the NLA isn't assigning relationships to collections,
that's what's meant to be happening in RDA. That's our responsibility.
Of course this all raises the bigger question of how the museums
sector is using People Australia identifiers in their collection dbs.
I'd be very interested to know what people are doing. People Australia
provides us with world-leading infrastructure for linking people data
across collections, databases, sectors and projects.
There's two parts to this - finding and using ids for people/orgs
already in People Australia, and providing data on people/orgs
associated with our own collections for harvest into People Australia
(entailing disambiguation and the minting of new ids where necessary).
The first is easy, you can look them up in Trove and just add the
identifiers to your db. My Identity Browser
(http://wraggelabs.com/identities/) makes it even easier by providing
a bookmarklet that you can use in any web-based form to easily look up
a name. It also provides some RDFa markup that you could use to
identify a person in a text description.
To see how you can use People Australia identifiers to build rich
semantic annotations around collection material, you might like to
check out the Flickr Machine Tag Challenge
(http://wraggelabs.com/fmtc/). Over 1000 photos in Flickr have been
annotated with machine tags using PA identifiers.
In terms of contributing to People Australia, as Basil has noted there
have been some new tools developed for the ARDC-PIP project that could
be very useful.
I've always thought that part of the point of the MME project is to
facilitate and encourage metadata enhancement, so while I appreciate
the time pressures associated with the project it would seem useful to
talk a bit about things like people and places before we lock into a
model.
Cheers, Tim
--
Tim Sherratt (t...@discontents.com.au)
National Museum of Australia
Adjunct Associate-Professor, Digital Design + Media Arts Research Cluster,
Faculty of Arts and Design, University of Canberra
Words - http://www.discontents.com.au
Experiments - http://wraggelabs.com
@wragge on Twitter
> On people/organisations, Ingrid are you saying that your only
> intending to include these as subjects? Something like <subject
> type="nla-party">http://nla.gov.au/nla.party-615689</subject>? This
> might be ok if the person actually is the subject of the collection,
> but if they're the collector, then it's quite misleading.
The short answer; we're using a core and an upper ontology that may or
may not map to popular vocabs like DC and FOAF and friends (like
RIF-CS relationships), including multi-layered thesaurii. Suggestions
more than welcome.
> It would
> seem much better to me to include people/organisations as related
> objects in RIF-CS and describe the relationships appropriately.
The granularity and extensibility of the RIF-CS seems rather limited
to me, but that may be because I'm somewhat new to those vocabs
defined through it. (I don't see roles beyond collector, owner and
manager, like curator, consumer, physical vs. abstract management and
so on, unless they by that mean managed?) I also don't like binary
directional associations in federated meta data, but perhaps that's
just a personal preference. Is there better description of the vocabs
and their relationships outside of the RIF-CS spec and guide (which
mostly duplicates the spec)?
> Indeed, I'm wondering whether 'isSubjectOf' should be added to the
> relation types in RIF-CS. (See pp. 39-41 of the ANDS Content Providers
> Guide for list of relation types -
> http://ands.org.au/guides/content-providers-guide.html)
Wouldn't it be better if this vocab was defined outside the RIF-CS as
an extensible ontology instead, and let in include the various part of
the problem space and not just relations between objects?
> I'd be very interested to know what people are doing. People Australia
> provides us with world-leading infrastructure for linking people data
> across collections, databases, sectors and projects.
I can't speak for what is already in use (I suspect very little if
any), but having an authoritative repository of identifiers is a
welcome change provided the mechanisms for resolvable
human-understandable data be flexible (we must avoid semantic drift at
all costs). Has anyone raised a cross-linking to WikiPedia or other
external sources for anchoring, for example?
> There's two parts to this - finding and using ids for people/orgs
> already in People Australia, and providing data on people/orgs
> associated with our own collections for harvest into People Australia
> (entailing disambiguation and the minting of new ids where necessary).
> The first is easy, you can look them up in Trove and just add the
> identifiers to your db.
How is the second part supposed to work? I'm interested in the
creation of good identifiers, duplicates, synonymous / antonymous
identifiers, weak semantics, and similar things.
> To see how you can use People Australia identifiers to build rich
> semantic annotations around collection material, you might like to
> check out the Flickr Machine Tag Challenge
> (http://wraggelabs.com/fmtc/). Over 1000 photos in Flickr have been
> annotated with machine tags using PA identifiers.
Are you suggesting here that we use entities in PA as a basis for some
shared ontology?
> I've always thought that part of the point of the MME project is to
> facilitate and encourage metadata enhancement, so while I appreciate
> the time pressures associated with the project it would seem useful to
> talk a bit about things like people and places before we lock into a
> model.
Absolutely. Reusing PA meta data and identifiers is a good thing that
we'll push as hard as we can, but I do fear there will be semantic
mismatches across these layers of the metadata exchange, unless one
party is to be policing these things through harvesting and analysis?
Kind regards,
Alexander
--
Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ ----------------------------------------------
------------------ http://www.google.com/profiles/alexander.johannesen ---
> Alex (Long time no see !)
Indeed. Thought you'd see the last of me, I'm sure. :)
> We treat RIF-CS as an inpt format which we map to the richer EAC-CPF
> standard. The ARDC Party Infrastructure Project will be providing
> records mapped from EAC-CPF to RDF. This mapping is not complete yet
> but included in the thinking are dc, foaf, bio and skos onologies.
Ok, that's interesting and match my thinking. Any docos or other
sharings on this work?
I guess the bigger question is if EAC-CPF is a worthy core ontology?
It looks rather de-normalized and untyped for relational data, but
that may be my poor understanding of its use. I'm happy to use it more
directly (specifically, by converting it to an ontological expression
in Topic Maps or similar) if you guys are happy with it.
Btw, who's on the technical backend on this project? :)
Regards,
Alex
I think we need to clarify what we're talking about here. My questions
were based on the templates and examples on the MME website. They seem
to reflect a rather limited model and I couldn't see how they would
enable the data of contributors to be expressed fully as RIF-CS (as
required of course by the project). So I wasn't arguing for the use of
RIF-CS, but that we need something that will *at least* enable us to
define the sorts of relationships between objects that are present in
RIF-CS.
If, however, the MME is developing a much richer model, as you
indicate, then hooray! I'm looking forward to seeing some details and
understanding how the inputs and outputs are mapped.
I think I am probably also guilty of mixing up the question of what
metadata we provide to MME with the question of how we might extend
and enrich the metadata we store in our own systems. It seems to me
that the real, long-term value of the MME project is not in the
development of an aggregation service (we all understand the problems
of sustainability), but in these sorts of discussions about what we
want to know, model and share about our collections. So I'm interested
in extending the discussion beyond what we supply to MME to tools,
approaches, methods, recipes etc.
I'll let Basil handle all the People Australia questions... :-)
>> To see how you can use People Australia identifiers to build rich
>> semantic annotations around collection material, you might like to
>> check out the Flickr Machine Tag Challenge
>> (http://wraggelabs.com/fmtc/). Over 1000 photos in Flickr have been
>> annotated with machine tags using PA identifiers.
>
> Are you suggesting here that we use entities in PA as a basis for some
> shared ontology?
err umm am I? I thought I was just showing how using existing
ontologies like FOAF and DC together with existing technologies like
machine tags and exsiting identifiers like People Australia we could
start right now in creating semantic linkages between people and
collection items.
>> I've always thought that part of the point of the MME project is to
>> facilitate and encourage metadata enhancement, so while I appreciate
>> the time pressures associated with the project it would seem useful to
>> talk a bit about things like people and places before we lock into a
>> model.
>
> Absolutely. Reusing PA meta data and identifiers is a good thing that
> we'll push as hard as we can, but I do fear there will be semantic
> mismatches across these layers of the metadata exchange, unless one
> party is to be policing these things through harvesting and analysis?
I don't understand what you mean here. Could you give some examples? I
like examples...
> My questions
> were based on the templates and examples on the MME website.
My bad. Any pointers? (I'm a new addition to this thing :)
> If, however, the MME is developing a much richer model, as you
> indicate, then hooray! I'm looking forward to seeing some details and
> understanding how the inputs and outputs are mapped.
I'll make it as rich as it needs to be, but I do want to re-use
whatever I can, taking special care for the thesaurii / labels part of
the equation. I'm happy to base the core ontology on the RIF-CS, for
example, if people who are familiar with it claim it to be good and
valid for most use cases we're bound to bump into. (I have some
experience in the past with EAD which was less satisfactory, for
example)
>> Are you suggesting here that we use entities in PA as a basis for some
>> shared ontology?
>
> err umm am I?
I don't know, it was a genuine question. :) I noticed that a lot of
those machine codes were linking to entities in PA, which is why I
asked. It's the end entities in any relationship we need to worry
about, all the RDFa / MC stuff is just wrappers with pointers. Since a
lot of the type entities used are ultimately in PA, I thought maybe
someone had created a collection of them, wrapped them in a handy
ontology, and created a simple hierarchy of PI's we could use.
> I thought I was just showing how using existing
> ontologies like FOAF and DC together with existing technologies like
> machine tags and exsiting identifiers like People Australia we could
> start right now in creating semantic linkages between people and
> collection items.
Yes, DC is fine for most stuff, especially if you mean the extended
DC. FoaF I have allergies of, but they can be rectified with
hand-holding and promises that it will turn out alright.
>> Reusing PA meta data and identifiers is a good thing that
>> we'll push as hard as we can, but I do fear there will be semantic
>> mismatches across these layers of the metadata exchange, unless one
>> party is to be policing these things through harvesting and analysis?
>
> I don't understand what you mean here. Could you give some examples? I
> like examples...
The simplest example is if two parties create records for the same
thing X, yet the meta data provided by both are ambiguous enough to
not be matched by the internal PA software, creating two identifiers
for the same thing. Two parties now have two different identifiers for
something they both would like to make statements on. And if they are
merged, what are the mechanics for updating the individual parties own
systems? Are there going to be mechanics to fix these problems on the
fly, interfaces for merging of semantics and so on, self-contained
repository of valid data to match against, etc? (And yes, I'll let
Basil handle the PA questions :).
Kind regards,
Alex
On Wed, Oct 13, 2010 at 4:29 PM, Alexander Johannesen
<alexander....@gmail.com> wrote:
>> My questions
>> were based on the templates and examples on the MME website.
>
> My bad. Any pointers? (I'm a new addition to this thing :)
These are the only guidelines at the moment:
As institutions are being asked to make commitments based on this
information, it seems important to try and work out in some detail
what it all means!
> I'll make it as rich as it needs to be, but I do want to re-use
> whatever I can, taking special care for the thesaurii / labels part of
> the equation. I'm happy to base the core ontology on the RIF-CS, for
> example, if people who are familiar with it claim it to be good and
> valid for most use cases we're bound to bump into. (I have some
> experience in the past with EAD which was less satisfactory, for
> example)
Well EAD is enough to send anyone mad, but then it was never designed
as a data model (although they're working on changing that now). The
'I' in RIF-CS is for 'interchange' and as you've already noted is has
major limitations. I would just be thinking of it as one export
format.
> I don't know, it was a genuine question. :) I noticed that a lot of
> those machine codes were linking to entities in PA, which is why I
> asked. It's the end entities in any relationship we need to worry
> about, all the RDFa / MC stuff is just wrappers with pointers. Since a
> lot of the type entities used are ultimately in PA, I thought maybe
> someone had created a collection of them, wrapped them in a handy
> ontology, and created a simple hierarchy of PI's we could use.
The Flickr Machine Tag Challenge only uses PA identifiers. The machine
tags themselves are generated by my Identity Browser. There's more on
the 'About' page.
Cheers, Tim
EAD = Encoded Archival Description (http://www.loc.gov/ead/). "EAD stands for Encoded Archival Description, and is a non-proprietary de facto standard for the encoding of finding aids for use in a networked (online) environment. Finding aids are inventories, indexes, or guides that are created by archival and manuscript repositories to provide information about specific collections. While the finding aids may vary somewhat in style, their common purpose is to provide detailed description of the content and intellectual organization of collections of archival materials. EAD allows the standardization of collection information in finding aids within and across repositories." (http://www.archivists.org/saagroups/ead/aboutEAD.html)
EAC-CPF = Encoded Archival Context - Corporate bodies, Persons and Families (http://eac.staatsbibliothek-berlin.de/). " [EAC-CPF] ... primarily addresses the description of individuals, families and corporate bodies that create, preserve, use and are responsible for and/or associated with records in a variety of ways. ... currently its primary purpose is to standardize the encoding of descriptions about agents to enable the sharing, discovery and display of this information in an electronic environment. It supports the linking of information about one agent to other agents to show/discover the relationships amongst record-creating entities, and the linking to descriptions of records and other contextual entities." (http://eac.staatsbibliothek-berlin.de/)
Basil
Basil Dewhurst | Project Manager, ARDC Party Infrastructure Project | National Library of Australia
p: +61 2 6262 1046 | f: +61 2 6273 1180 | e: bdew...@nla.gov.au | w: wiki.nla.gov.au/display/ardcpip