Omeka XML Schema

Jim Safley

unread,

Aug 26, 2009, 3:32:53 PM8/26/09

to Omeka Dev

Omeka developers,

As Omeka matures and its data model solidifies, we're facing an
increasing demand to release a formal Omeka XML schema. Well, here is
our first attempt:

http://omeka.org/schemas/omeka-xml/v1/omeka-xml-1-1.xsd

The schema provides a fluent template for building standalone (non-web-
centric) Omeka XML instances, while minimizing metadata that is
specific to Omeka administration. The root element can be most any
derivation of Omeka, from multiple Omeka repositories to an individual
item, file, or tag.

We plan on using this schema to provide an "omeka-xml" output format
for selected Omeka components. So take a look and tell us what you
think.

Jim

Jim Safley

unread,

Aug 27, 2009, 5:12:45 PM8/27/09

to Omeka Dev

I'm particularly interested in how to best contextualize the root
element. Currently there is no way to distinguish one omeka-xml
response from another. For example, when requesting the omeka-xml
output for an items browse page (items/browse/page/1?output=omeka-
xml), the root element is <itemContainer> without any attributes that
distinguish it from other item containers. Furthermore, when
requesting the omeka-xml output for an items show page (items/show/1?
output=omeka-xml), the root element is <item> with an @itemId
attribute, but without further attributes that distinguish it from
items with identical IDs from other repositories.

This omission will be problematic if someone collects and stores omeka-
xml metadata without also storing their context. My idea is to include
a @uri attribute on the root element that contains either 1) the
absolute URL of the page, or 2) a tag URI ( http://taguri.org/ ).
Given Omeka's clean routing format, the URL may be the best bet, but a
tag URI is guaranteed to be unique and valid, even if the URL becomes
obsolete.

Jim

Jim Safley

unread,

Sep 3, 2009, 11:18:33 AM9/3/09

to Omeka Dev

Here's a good explanation why we should not use permalinks as IDs, and
why we should use tag URIs instead:

http://diveintomark.org/archives/2004/05/28/howto-atom-id

If the Atom specification does not trust permalinks to be, well,
permanent, I think we should follow suit.

Jim

Dave Lester

unread,

Sep 4, 2009, 10:04:22 AM9/4/09

to Omeka Dev

For my purposes, it'd be useful to see total_items(); as well as the
page number included in the data output. Right now, when I receive a
data output for a browse page of items, no context is given. All I
know is that there's metadata for 10 items, but I don't know how many
other items there are, or where in that giant list of records they're
from.

By adding total_items() and the page number, it'd also be possible to
simulate pagination via an application that skins data outputs
remotely. For example, I've created a prototype of an Omeka mobile
app that pulls data from the json output of items/browse/. If I knew
the number of total items, as well as my current page number, I could
anticipate whether or not to look for an addition feed on the next
page, or if this was the last available set of items.

Dave

Jim Safley

unread,

Sep 4, 2009, 5:51:32 PM9/4/09

to Omeka Dev

Those are good ideas, Dave, and I think the schema should be flexible
enough to facilitate some client processing. But as we make changes to
the schema we should be careful not to make precipitous decisions
based on single use cases. That is not to say your idea is without
merit, it's just that we have to be careful as we proceed.

Regarding pagination in the schema, I see two options:

1) We could explicitly add @pageNumber and @totalPageNumber attributes
to all the container types.
2) We could add the <xsd:anyAttribute> element to all the container
types, which will allow us to extend the container elements with the
pagination data you are requesting.

The first option is troubling because I'm concerned that such
attributes would start to accumulate, adding unnecessary weight to a
schema that is designed to "minimize Omeka-specific metadata." The
second option is troubling because there would be no way to
standardize the pagination attributes across all XML instances,
potentially leading to parser incompatibility issues. Right now I'm
leaning toward the first option because I can see the widespread
utility of pagination data for client applications, and, moreover,
container elements are well suited to include pagination data.

This is probably a good time to explain my reasoning behind minimizing
Omeka-specific metadata in the schema. Omeka as software exists
independently from the items, files, and collections contained
therein. The fact that the Omeka web interface displays, say, 10 items
at a time is incidental to the repository as a whole. It is specific
to the particular Omeka installation.

Furthermore, it is inaccurate to think of omeka-xml as only a feed (or
syndication) format; rather it is a storage and transfer medium that
can be used as a feed. An appropriate use of the schema is to fill an
<itemContainer> with all the items needed, without pagination. However
I recognize that this is not practical, so I am open to one of the two
options above.

Jim

Jim Safley

unread,

Sep 6, 2009, 12:32:00 PM9/6/09

to Omeka Dev

Another question I have regarding pagination is whether to include a
URL to the next page. OAI-PMH uses resumption tokens to re-issue a
list request, but I question the worth of a building out such a
system. Thankfully Omeka has a generally laid out URL structure for
browse pages -- such as items/browse/page/1 and items/browse/page/2 --
so including a URL the points to the next page (even the previous
page) should be easy.

I guess my only question is whether we should 1) explicitly add a
@nextPageUrl attribute to the container types, or 2) assume parsers
can use the @uri attribute to build the URL to the next page. I
automatically see a weakness in the latter option: the tag URI does
not preserve the resolution scheme, so there is no way to tell if the
page is using http:// or https://. (On second thought, this weakness
may reduce the utility of the tag URI in root-level elements, given
that it offers no way to reliably rebuild the URL. I'm not sure how
important that is, though, since it may be more important to have a
permanently unique identifier.)

Jim

Dave Lester

unread,

Sep 14, 2009, 3:41:38 PM9/14/09

to Omeka Dev

What about this idea:

Is there a way that a plugin could extend the basic metadata included
in the loop of the itemContainer included in a data output? If so,
perhaps the pageNumber and totalPageNumber could be left out of the
schema in favor of allowing users to create their plugins which
enhance this basic output. I can think of other data that would be
useful in an item feed, such as its collection id, and my hunch is
that this is inappropriate for the schema.

Dave

Jim Safley

unread,

Sep 16, 2009, 12:54:18 PM9/16/09

to Omeka Dev

I'm going to explore your suggestions and add the <xsd:any> element to
every container element in the schema. This way, individual
implementations can add custom elements based on their need. Of
course, by not explicitly setting these elements in the schema there
would be no way to standardize them across all XML instances; but
given the potential for schema bloat, I think this is a good
compromise. At any rate, the Omeka implementation (?output=omeka-xml)
would theoretically serve as formal representation of the schema, so
adding a <pagination> element to the item container could be seen as
canonical.

The in-process schema can be viewed here: http://omeka.org/schemas/omeka-xml/v2/omeka-xml-2-1.xsd

Unless anyone has objections, I'm going to implement the changes.

Jim

Jim Safley

unread,

Sep 17, 2009, 10:00:02 AM9/17/09

to Omeka Dev

I hate to beat a dead horse, but I'm wondering whether a tag URI is
the best way to uniquely identify an XML instance. As I mentioned in a
previous post, "the tag URI does not preserve the resolution scheme,

so there is no way to tell if the page is using http:// or https://."

This makes it impossible for parsers to reliably rebuild the URL if
needed. Not only that, the tag URI specification does not account for
URL query strings, given that the equal sign "=" is not a legal
character in the "specific" component. This is problematic for XML
instances of requests that must include query strings to process the
response, like a search query "?search=foo&submit_search=Search".

In sum, I question the utility of the tag URI for the purposes of
uniquely identifying an Omeka XML instance for two reasons:

1) It does not preserve the resolution scheme (http, https)
2) It cannot adequately preserve a URL query string

For these reasons I propose that, when an XML instance derives from a
HTTP request, we use a combination of @uri and @accessDate on the root
element, where @uri is the absolute URL of the request (including the
query string) and @accessDate is the current ISO 8601 date. When an
XML instance derives from other places, such as a script outside the
Web context, we may use a tag URI.

Jim

Dave Lester

unread,

Sep 21, 2009, 10:28:13 AM9/21/09

to Omeka Dev

This sounds like a good solution to me.

Dave

Reply all

Reply to author

Forward