[omeka-dev] Importing Dublin Core XML into Omeka

145 views
Skip to first unread message

Ethan Gruber

unread,
Apr 19, 2010, 2:55:46 PM4/19/10
to omek...@googlegroups.com
I have some Dublin Core XML that I want to import into the Omeka database.  I could convert the XML into a CSV file and import with the CSVImport plugin, but my XML file has some repeatable fields, example:

    <dc-record>
        <type>physical object</type>
        <type>original</type>
        <type>cultural</type>
        <title>Mortar</title>
        <description>(2) Mortar Fragments</description>
        <subject>Mortar</subject>
        <creator>Idendified by:Processed by JPR</creator>
        <creator>Identified date:06-04-2002</creator>
        <contributor>Excavated date:11-13-1980</contributor>
        <identifier>44PG114.11.1</identifier>
        <coverage>Test Pit Pier 1 2N 0E L2</coverage>
    </dc-record>

Is it possible to preserve these repeatable fields with any sort of import mechanism?

Thanks,

Ethan Gruber
Scholars' Lab
University of Virginia Library

--
You received this message because you are subscribed to the Google Groups "Omeka Dev" group.
To post to this group, send email to omek...@googlegroups.com.
To unsubscribe from this group, send email to omeka-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/omeka-dev?hl=en.

Patrick Murray-John

unread,
Apr 19, 2010, 3:14:17 PM4/19/10
to omek...@googlegroups.com
Ethan,

I'm hoping that there's nothing too sketchy about how I'm doing this for
the FeedImporter plugin, but now's probably a good time to find out!

For each Item I'm creating, I'm using insert_item to stuff it into the
database:

$newOmekaItem = insert_item($metadataArray, $elementTextsArray);

$elementTextsArray lines everything up by Element Set and and array of
Elements, so you should be able to pretty easily convert your XML into
the array structure:

$elementTextsArray = array();
$elementTextsArray['Dublin Core']['type'][] = array('physical object',
false); //this is an array of the text and boolean for whether it is HTML
$elementTextsArray['Dublin Core']['type'][] = array('original', false)

Hope that helps
Patrick

Ethan Gruber

unread,
Apr 19, 2010, 4:49:24 PM4/19/10
to omek...@googlegroups.com
Thanks.  It's kind of surprising that for a database modeled on Dublin Core fields that there isn't already a function for importing Dublin Core.  This may be something I will try to sit down and write a plugin for, if no one else has done it yet.  Seems like it should be pretty straight-forward.

Ethan

Patrick Murray-John

unread,
Apr 19, 2010, 4:57:35 PM4/19/10
to omek...@googlegroups.com
Agreed ... but maybe a more generalized XML importer with lots of
configuration options would be worthwhile, too, if we'd be looking at
parsing out XML here anyway.

Maybe straightforward checkboxes for expected DC records, but advanced
configuration for arbitrary XML, kinda like CSVImport.

'Course, my brain is only leaning that direction because of my
scope-creep with making lots of different configuration options for
tags/categories from feeds, so that might ultimately be the wrong trail!

Patrick
> omek...@googlegroups.com <mailto:omek...@googlegroups.com>.
> To unsubscribe from this group, send email to
> omeka-dev+...@googlegroups.com
> <mailto:omeka-dev%2Bunsu...@googlegroups.com>.
> For more options, visit this group at
> http://groups.google.com/group/omeka-dev?hl=en.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Omeka Dev" group.
> To post to this group, send email to omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>.
> To unsubscribe from this group, send email to
> omeka-dev+...@googlegroups.com
> <mailto:omeka-dev%2Bunsu...@googlegroups.com>.

Ethan Gruber

unread,
Apr 19, 2010, 5:06:04 PM4/19/10
to omek...@googlegroups.com
I don't know if one could write a generic XML import plugin unless one restricted it to flat XML used only for data transmission (in this case, Dublin Core could work).  I have worked on an importer for EAD, and EAD is too hierarchical and complex for any sort of generalized import process.

I think it would be the most useful to have a handful of plugins for mapping recognized XML standards (EAD, DC, VRA Core, etc.) into the Omeka database and for outliers that do not fit into a known standard, treat those case by case (if they are not flat XML that can be mapped with a generalized plugin).  In this case, it may just be more effective to learn some XSLT to convert an arbitrary XML document into a CSV file and import it into Omeka with the CSVImport plugin.

Ethan

Patrick Murray-John

unread,
Apr 19, 2010, 8:56:28 PM4/19/10
to omek...@googlegroups.com
Yes, I was thinking there of a flat XML file, though I wish that I had
some more in-depth knowledge of some varieties of XML exports. I'm
thinking here of being able to take a WordPress or a Drupal export and
pull in into Omeka for an archive.

And point taken about EAD. Ultimately, I'm realizing that when I say
'generalized' I'm also counting on an administrator/curator to do some
complex work configuring each import. That is, have a parser that'll
push data to an interface that'll let the admin/curator select options
for how to import. So the parser would be the general part, and the
metadata geek setting up the config options for the import would do the
heavy lifting of making it all work.

Basically, I see Omeka as being in a great spot to fundamentally
distinguish itself from WordPress and Drupal. If you need to do
_curation_ of knowledge, Omeka looks to become the right choice. The
various pushes around importers is what got me started on this line of
thinking. The plugins for importing via various standards are falling
into place, but Jim's Zotero import really blew open the doors, I think,
to look at Omeka as a tool for curating (by which I mean
anti-info-entropy -- I'm trying to figure that out for a blog post) the
social knowledge-sharing going on on the web.

Yeah, that's why I'm succumbing to scope-creep on the FeedImporter. And
why I'm excited about the idea of a FlickrImporter. And, ultimately, an
RDFImporter. We can't let Drupal have all the fun!

There are lots of steps to get there, and working out XSLT for
individual projects is, I think, the most useful step for most cases
right now. I'd like to learn from the needs of those cases to aim toward
what Omeka admins really do with those imported Items and how they
curate them -- what questions they ask and how they organize things. I
think a lot of exciting stuff will come out of that.
> <mailto:omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>>.
>
> To unsubscribe from this group, send email to
> omeka-dev+...@googlegroups.com
> <mailto:omeka-dev%2Bunsu...@googlegroups.com>
> <mailto:omeka-dev%2Bunsu...@googlegroups.com
> <mailto:omeka-dev%252Buns...@googlegroups.com>>.
>
> For more options, visit this group at
> http://groups.google.com/group/omeka-dev?hl=en.
>
>
> -- You received this message because you are subscribed
> to the Google
> Groups "Omeka Dev" group.
> To post to this group, send email to
> omek...@googlegroups.com <mailto:omek...@googlegroups.com>
> <mailto:omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>>.
>
> To unsubscribe from this group, send email to
> omeka-dev+...@googlegroups.com
> <mailto:omeka-dev%2Bunsu...@googlegroups.com>
> <mailto:omeka-dev%2Bunsu...@googlegroups.com
> <mailto:omeka-dev%252Buns...@googlegroups.com>>.

Ethan Gruber

unread,
Apr 20, 2010, 1:26:23 PM4/20/10
to omek...@googlegroups.com
Patrick,

Is there anything I have to include or reference in order to get the insert_item() function to work?  I'm iterating through each <dc-record> element in my XML file and creating the $elementTextsArray array for each record, so I want to perform an insert_item on each dc-record after populating the array with data.  The line $newOmekaItem = insert_item($metadataArray, $elementTextsArray); tanks my script.  Here's an example from my code:

foreach($xml_doc->getElementsByTagName("dc-record") as $item)
                          {
                              $elementTextsArray = array();
                              foreach($item->childNodes as $child){
                                  $elementTextsArray['Dublin Core'][$child->nodeName][] = array($child->nodeValue, false);
                              }
                              $newOmekaItem = insert_item($metadataArray, $elementTextsArray);
                        }                        

If I comment out $newOmekaItem = insert_item($metadataArray, $elementTextsArray); the script successfully generates the arrays.

Thanks for your help,
Ethan

Patrick Murray-John

unread,
Apr 20, 2010, 1:44:00 PM4/20/10
to omek...@googlegroups.com
Ethan,

There might be two things to look at. First, if $metadataArray isn't an
array, that'll probably bork it. That array has info like collection id
and item type id:

$metadataArray['collection_id'] = 1

insert_item() is pretty heavily documented in
application/libraries/globals.php

The somewhat stickier part is probably in how Omeka lines up the names
of Dublin Core Elements. Instead of "subject", I think Omeka is going to
require "Subject". So it might do the trick to just uppercase the
$child->nodeName .

Hope that helps!
Patrick
> <mailto:omeka-dev%2Bunsu...@googlegroups.com
> <mailto:omeka-dev%252Buns...@googlegroups.com>
> <mailto:omeka-dev%252Buns...@googlegroups.com
> <mailto:omeka-dev%25252Bun...@googlegroups.com>>>.
>
>
> For more options, visit this group at
> http://groups.google.com/group/omeka-dev?hl=en.
>
>
> -- You received this message because you are
> subscribed
> to the Google
> Groups "Omeka Dev" group.
> To post to this group, send email to
> omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>
> <mailto:omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>>
> <mailto:omeka-dev%2Bunsu...@googlegroups.com
> <mailto:omeka-dev%252Buns...@googlegroups.com>
> <mailto:omeka-dev%252Buns...@googlegroups.com
> <mailto:omeka-dev%25252Bun...@googlegroups.com>>>.

Ethan Gruber

unread,
Apr 20, 2010, 2:23:13 PM4/20/10
to omek...@googlegroups.com
Thanks,

I think I'm a step closer, but it still isn't actually putting my data into the database.

For now I am specifying $metadataArray as

$metadataArray = array();
$metadataArray['collection_id'] = 1;

for testing purposes.

I looked at the documentation for insert_item, and nothing about what I have done really stands out as incorrect.  There might be some slight error with the array I'm generating from each dc-record element, but I'm not seeing it.  Here is a var_dump of $elementTextsArray:

array(1) { ["Dublin Core"]=> array(8) { ["Type"]=> array(3) { [0]=> array(2) { [0]=> string(15) "physical object" [1]=> bool(false) } [1]=> array(2) { [0]=> string(8) "original" [1]=> bool(false) } [2]=> array(2) { [0]=> string(8) "cultural" [1]=> bool(false) } } ["Title"]=> array(1) { [0]=> array(2) { [0]=> string(6) "Mortar" [1]=> bool(false) } } ["Description"]=> array(1) { [0]=> array(2) { [0]=> string(20) "(2) Mortar Fragments" [1]=> bool(false) } } ["Subject"]=> array(1) { [0]=> array(2) { [0]=> string(6) "Mortar" [1]=> bool(false) } } ["Creator"]=> array(2) { [0]=> array(2) { [0]=> string(30) "Idendified by:Processed by JPR" [1]=> bool(false) } [1]=> array(2) { [0]=> string(26) "Identified date:06-04-2002" [1]=> bool(false) } } ["Contributor"]=> array(1) { [0]=> array(2) { [0]=> string(25) "Excavated date:11-13-1980" [1]=> bool(false) } } ["Identifier"]=> array(1) { [0]=> array(2) { [0]=> string(12) "44PG114.11.1" [1]=> bool(false) } } ["Coverage"]=> array(1) { [0]=> array(2) { [0]=> string(24) "Test Pit Pier 1 2N 0E L2" [1]=> bool(false) } } } }

I'm not sure if there is anything that really stands out here.

Thanks,
Ethan

John Flatness

unread,
Apr 20, 2010, 2:50:46 PM4/20/10
to omek...@googlegroups.com
Hi Ethan,

The problem I see with your ElementText array is that the documentation for insert_item specifies that the interior arrays should have textual array keys of 'text' and 'html', where the var_dump you shared has integer array keys.

So, taking your previous example code, the line:

> $elementTextsArray['Dublin Core'][$child->nodeName][] = array($child->nodeValue, false);

Should be:

$elementTextsArray['Dublin Core'][ucwords($child->nodeName)][] = array('text' => $child->nodeValue, 'html' => false);

In my limited testing, the former method returns an error about improper ElementText formatting, while the latter successfully inserts the item (I included a call to ucwords to avoid confusion about upper-case/lower-case Element names).

-John Flatness

Ethan Gruber

unread,
Apr 20, 2010, 2:55:57 PM4/20/10
to omek...@googlegroups.com
Excellent!  Works perfectly now.  Thanks a bunch.

Ethan

Patrick Murray-John

unread,
Apr 20, 2010, 3:00:45 PM4/20/10
to omek...@googlegroups.com
Ethan,

OH! Epically my bad! It looks like the array for the Element Text is
hashed! Instead of
array('physical object' , false)

it looks like it should be
array('text'=>'physical object', 'html'=>false )

Maybe give that a whirl and lets hope it does the trick.

Patrick
> <mailto:omeka-dev%252Buns...@googlegroups.com
> <mailto:omeka-dev%25252Bun...@googlegroups.com>
> <mailto:omeka-dev%25252Bun...@googlegroups.com
> <mailto:omeka-dev%2525252Bu...@googlegroups.com>>>>.
>
>
>
> For more options, visit this group at
>
> http://groups.google.com/group/omeka-dev?hl=en.
>
>
> -- You received this message because you are
> subscribed
> to the Google
> Groups "Omeka Dev" group.
> To post to this group, send email to
> omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>
> <mailto:omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>>
> <mailto:omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>
> <mailto:omek...@googlegroups.com
> <mailto:omek...@googlegroups.com>>>
> <mailto:omeka-dev%252Buns...@googlegroups.com
> <mailto:omeka-dev%25252Bun...@googlegroups.com>
> <mailto:omeka-dev%25252Bun...@googlegroups.com
> <mailto:omeka-dev%2525252Bu...@googlegroups.com>>>>.

Patrick Murray-John

unread,
Apr 20, 2010, 3:45:43 PM4/20/10
to omek...@googlegroups.com
Ethan,

Looks like John's email beat me! Glad he cleaned up my fail!

Patrick
>> <mailto:omeka-dev%2525252Bu...@googlegroups.com>>>>.
>>
>>
>>
>> For more options, visit this group at
>>
>> http://groups.google.com/group/omeka-dev?hl=en.
>>
>>
>> -- You received this message because
>> you are
>> subscribed
>> to the Google
>> Groups "Omeka Dev" group.
>> To post to this group, send email to
>> omek...@googlegroups.com
>> <mailto:omek...@googlegroups.com>
>> <mailto:omek...@googlegroups.com
>> <mailto:omek...@googlegroups.com>>
>> <mailto:omek...@googlegroups.com
>> <mailto:omek...@googlegroups.com>
>> <mailto:omek...@googlegroups.com
>> <mailto:omek...@googlegroups.com>>>
>> <mailto:omeka-dev%2525252Bu...@googlegroups.com>>>>.
>> To post to this group, send email to omek...@googlegroups.com
>> <mailto:omek...@googlegroups.com>.
>> To unsubscribe from this group, send email to
>> omeka-dev+...@googlegroups.com
>> <mailto:omeka-dev+...@googlegroups.com>.
Reply all
Reply to author
Forward
0 new messages