Steady EAD import error: no DTD found!

266 views
Skip to first unread message

John

unread,
Apr 26, 2011, 9:50:17 AM4/26/11
to ICA-AtoM Users
Hi

I need to be able to import XLS files into XML to import into Atom. I
used Steady (a tool from University of North Carolina for converting
CSV to EAD XML) to convert some of their sample CSV file into EAD (XML
below) before importing it into Atom. It gave this error:

libxml error 522 on line 0 in input file: no DTD found!

What do I need to add to make this file importable?

Thanks,

John

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
- <ead xmlns:ns2="http://www.w3.org/1999/xlink" xmlns="urn:isbn:
1-931666-22-9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd">
- <eadheader findaidstatus="Completed" repositoryencoding="iso15511"
countryencoding="iso3166-1" dateencoding="iso8601">
<eadid />
- <filedesc>
- <titlestmt>
<titleproper>Preliminary Inventory to the</titleproper>
<author />
</titlestmt>
</filedesc>
- <profiledesc>
- <langusage>
Finding aid written in
<language langcode="eng" encodinganalog="Language">English.</
language>
</langusage>
</profiledesc>
</eadheader>
- <archdesc level="subgrp">
- <did>
<unittitle>Preliminary Inventory to the</unittitle>
<unitid />
- <langmaterial>
<language langcode="eng" />
</langmaterial>
</did>
- <accessrestrict>
<head>Access to Collection</head>
<p>Collection is open for research; access requires at least 24
hours advance notice.</p>
</accessrestrict>
- <arrangement>
<head>Organization of the Collection</head>
- <p>
This collection is organized into series:
- <list>
<item>1, Project Files, 1951-1978</item>
<item>2, Professional and Personal, 1955-1978</item>
<item>3, Microfilm, 1969-1979</item>
</list>
</p>
</arrangement>
- <dsc>
- <c01 level="series">
- <did>
<unitid>1</unitid>
<unittitle>Project Files</unittitle>
<unitdate>1951-1978</unitdate>
</did>
- <c02 level="file" audience="internal">
- <did>
- <note>
<p>note1</p>
</note>
<unittitle>Lucey, John and Mari</unittitle>
- <note>
<p>note2</p>
</note>
- <physdesc>
<extent>extent</extent>
</physdesc>
<unitdate>1977</unitdate>
<unitid>7732</unitid>
<container type="box" label="Mixed materials">45</container>
<container type="folder" label="Mixed materials">404</container>
<container type="item" label="Mixed materials">3</container>
</did>
- <controlaccess>
<geogname source="lcnaf">Raleigh (N.C.)</geogname>
<corpname source="corpname_source">corpname</corpname>
<famname source="famname_source">famname</famname>
<name source="name_source">name</name>
<persname source="persname_source">persname</persname>
<subject source="subject_source">subject</subject>
</controlaccess>
</c02>
- <c02 level="file">
- <did>
<unittitle>Foster, David</unittitle>
<unitdate>1979</unitdate>
<unitid>7801</unitid>
<container type="box" label="Mixed materials">46</container>
<container type="folder" label="Mixed materials">405</container>
</did>
</c02>
</c01>
- <c01 level="series">
- <did>
<unitid>2</unitid>
<unittitle>Professional and Personal</unittitle>
<unitdate>1955-1978</unitdate>
</did>
- <c02 level="file">
- <did>
<unittitle>Daily Financial Ledger</unittitle>
<unitdate>1969-1970</unitdate>
<container type="box" label="Mixed materials">58</container>
<container type="othertype" label="Mixed materials">550</container>
</did>
</c02>
- <c02 level="file">
- <did>
<unittitle>Scott, Correspondence Prior to 1959</unittitle>
<unitdate>1955-1959</unitdate>
<container type="box" label="Mixed materials">58</container>
<container type="othertype" label="Mixed materials">551</container>
</did>
- <accessrestrict>
<p>Restricted</p>
</accessrestrict>
</c02>
</c01>
- <c01 level="series">
- <did>
<unitid>3</unitid>
<unittitle>Microfilm</unittitle>
<unitdate>1969-1979</unitdate>
</did>
- <c02 level="file">
- <did>
<unittitle>Kenneth McCoy Scott</unittitle>
<unitid>4250</unitid>
<container type="box" label="Mixed materials">63</container>
<container type="othertype" label="Mixed materials">1</container>
</did>
- <scopecontent>
<p>Sessions 32-38</p>
</scopecontent>
</c02>
</c01>
</dsc>
</archdesc>
</ead>

MJ Suhonos

unread,
Apr 26, 2011, 10:50:16 AM4/26/11
to ica-ato...@googlegroups.com
Hi John,

The error you're seeing is actually an informational warning; ie. ICA-AtoM is letting you know that it cannot validate the imported EAD file, but it should still successfully import the data it contains. In this case, the EAD generated by Steady uses XML schema (XSD) instead of DTD, which isn't handled by the current version of ICA-AtoM.

Can you verify whether the content of the EAD file was actually imported? When the import is complete, there should be a button linking to the newly-created description, which will be in draft status. If your data is not being imported, then we have a different issue to track down.

By the way, thanks for pointing us at Steady -- we are working on CSV import at the moment for several migrations and as a new feature in the upcoming 1.2 release, so the approach Steady uses will be very useful for us to ensure better interoperability.

Hope this helps,
MJ

> --
> You received this message because you are subscribed to the Google Groups "ICA-AtoM Users" group.
> To post to this group, send email to ica-ato...@googlegroups.com.
> To unsubscribe from this group, send email to ica-atom-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/ica-atom-users?hl=en.
>

MJ Suhonos

unread,
Apr 26, 2011, 11:02:48 AM4/26/11
to ICA-AtoM Users
Just to follow-up on this, I was able to quickly reproduce this issue in the current development branch of ICA-AtoM; my apologies, I had made a mistake in testing that led me to believe missing DTD errors would not cause importing to fail.

Google Code is read-only this morning, but I will file an issue for this to correct validation for both DTD and XSD-based import in the next release.

In the meantime, the easiest way to get EAD generated by Steady to import into ICA-AtoM is to:

1) On the second line of the XML file (after the <?xml ... declaration), add:

<!DOCTYPE ead PUBLIC "-//Society of American Archivists//DTD ead.dtd (Encoded Archival Description (EAD) Version 1.0)//EN" "http://www.loc.gov/ead/ead.xsd">

2) Remove the attributes in the <ead> element on the following line, ie. replace:

<ead xmlns:ns2="http://www.w3.org/1999/xlink" xmlns="urn:isbn:1-931666-22-9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd">

Simply with: <ead>

MJ

John

unread,
Apr 26, 2011, 12:39:24 PM4/26/11
to ICA-AtoM Users
Dear MJ

Thanks for all this & glad to help! However, copying the XML file from
my first post and then making the changes you listed just returns a
"500 Internal Server Error".
For the first change, did you mean add another line for the <!
Doctype..> statement, ie. press return after the <?xml declaration?

Thanks,

John

MJ Suhonos

unread,
Apr 26, 2011, 12:44:05 PM4/26/11
to ica-ato...@googlegroups.com
Hi John,

Yes, the first three lines of your XML file should look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<!DOCTYPE ead PUBLIC "-//Society of American Archivists//DTD ead.dtd (Encoded Archival Description (EAD) Version 1.0)//EN" "http://www.loc.gov/ead/ead.xsd">

<ead>

... with the rest of the content following. I've attached a copy of the sample EAD file for reference, hopefully Google Groups will handle it gracefully.

MJ

test.xml

John

unread,
Apr 26, 2011, 1:53:15 PM4/26/11
to ICA-AtoM Users
Hi MJ

That's what I had done originally and I've tried it again & it still
doesn't work. Initially it just gives a Done blank screen and with
'http://localhost/icaatom-1.1/index.php/;object/import' as the URL.
Even after a few minutes there is no change and pressing return again
gives the 500 Internal error. I then cleared all the cookies and re-
tried this but still no avail.

On a tangent, how do you attach files to these posts?

Thanks,

John

John

unread,
Apr 26, 2011, 2:00:27 PM4/26/11
to ICA-AtoM Users
Hi MJ

Hmm, that was what I did originally. It intially gives a blank Done
screen and then after a few minutes of no change, pressing return
gives the 500 internal error. I cleared the cookies and then re ran it
but it still didn't work.

How do you attach files to posts?

Thanks,

John
>  test.xml
> 3KViewDownload

MJ Suhonos

unread,
Apr 26, 2011, 3:11:22 PM4/26/11
to ica-ato...@googlegroups.com
Hi John,

Hm, a 500 internal error is a bit more difficult to diagnose. The easiest way is to replace index.php in the URL with qubit_dev.php , eg. import using this URL:

http://localhost/icaatom-1.1/qubit_dev.php/;object/importSelect

Alternatively, if you're familiar with the Apache log on your system, it should provide some more detailed information around what's causing an error 500.

Are you getting these errors with the file I provided as well as the one from Steady?

I use my desktop email client (Mail.app on Mac OSX) to read this mailing list, so I just attach files like "regular" email. I'm not sure if something similar is possible through the Google Groups web interface.

MJ

John

unread,
Apr 27, 2011, 4:52:11 AM4/27/11
to ICA-AtoM Users
Hi MJ

Yes, I get these errors with both your file & the file (with your
changes) I got from Steady.
Below are all the errors it get running it in debug mode.

Sorry for asking possibly silly questions but I can't open your test
file or my altered Steady file in MS XML editor. It says:

Cannot view XML input using style sheet. Please correct the error and
then click the Refresh button, or try again later.

--------------------------------------------------------------------------------

An XML element is not allowed inside a DTD. Error processing resource
'http://www.loc.gov/ead/ead.xsd'.

However, I can open the original Steady file in XML Editor. To make
the edits & view the files I have been using Notepad++.

Fatal error: Class 'XSLTProcessor' not found in C:\wamp\www
\icaatom-1.1\lib\QubitXmlImport.class.php on line 179
Call Stack
# Time Memory Function Location
1 0.0004 371080 {main}( ) ..\qubit_dev.php:0
2 0.3514 12366064 sfContext->dispatch( ) ..\qubit_dev.php:13
3 0.3515 12366096 sfFrontWebController->dispatch( ) ..
\sfContext.class.php:170
4 0.3518 12385456 sfController->forward( ) ..
\sfFrontWebController.class.php:48
5 0.3782 12634504 sfFilterChain->execute( ) ..\sfController.class.php:
238
6 0.3785 12635368 QubitTransactionFilter->execute( ) ..
\sfFilterChain.class.php:53
7 0.3785 12635368 sfFilterChain->execute( ) ..
\QubitTransactionFilter.class.php:40
8 0.3787 12636200 sfHistoryPluginFilter->execute( ) ..
\sfFilterChain.class.php:53
9 0.3787 12636200 sfFilterChain->execute( ) ..
\sfHistoryPluginFilter.class.php:18
10 0.3789 12637024 sfRenderingFilter->execute( ) ..
\sfFilterChain.class.php:53
11 0.3789 12637024 sfFilterChain->execute( ) ..
\sfRenderingFilter.class.php:33
12 0.3791 12637856 sfBasicSecurityFilter->execute( ) ..
\sfFilterChain.class.php:53
13 0.3792 12637856 sfFilterChain->execute( ) ..
\sfBasicSecurityFilter.class.php:72
14 0.3794 12638680 siteSettingsFilter->execute( ) ..
\sfFilterChain.class.php:53
15 0.5441 13300896 sfFilterChain->execute( ) ..
\SiteSettingsFilter.class.php:47
16 0.5443 13301720 sfExecutionFilter->execute( ) ..
\sfFilterChain.class.php:53
17 0.5444 13302472 sfExecutionFilter->handleAction( ) ..
\sfExecutionFilter.class.php:42
18 0.5444 13302472 sfExecutionFilter->executeAction( ) ..
\sfExecutionFilter.class.php:78
19 0.5444 13302504 ObjectImportAction->execute( ) ..
\sfExecutionFilter.class.php:92
20 0.5467 13476240 QubitXmlImport::execute( ) ..
\importAction.class.php:36

Thanks,

John

Jesús García Crespo

unread,
Apr 27, 2011, 7:07:14 AM4/27/11
to ica-ato...@googlegroups.com
Hi John,

On Wed, Apr 27, 2011 at 10:52 AM, John <john...@googlemail.com> wrote:
Fatal error: Class 'XSLTProcessor' not found in C:\wamp\www
\icaatom-1.1\lib\QubitXmlImport.class.php on line 179

XSLTProcessor class is part of PHP-XSL extension, which is necessary to execute the import script, but it's not enabled by default in WAMP. To enable it, click on the WAMP tray icon -> select PHP extensions -> scroll down to php-xsl -> click on php-xsl if it is not selected. Restart the Apache service.

Regards,

--
Jesús García Crespo,
Software Engineer, Artefactual Systems Inc.
http://www.artefactual.com | +1.604.527.2056

John

unread,
Apr 28, 2011, 11:31:25 AM4/28/11
to ICA-AtoM Users
Hi Jesús

Success!

Thanks,

John

On Apr 27, 12:07 pm, Jesús García Crespo <je...@artefactual.com>
wrote:
> Hi John,

Stephen Gadd [Docuracy Ltd]

unread,
Nov 6, 2011, 4:47:04 PM11/6/11
to ica-ato...@googlegroups.com
Anyone else reaching this point in the thread might like to use the attached bash script which I've knocked up to save on typing.

Change the data filenames (include paths where relevant), save the three lines in a .sh file, then chmod 775 to make it executable. It'll do the conversion and then tweak the output.

You could use sed in additional lines to change the names of Levels to suit the Taxonomy of your destination data.

Upcoming v1.2 will allow simple command-line importing of xml - I'll post an update to this code if I work out a script to take care of the whole process.

Best wishes,
Stephen
fix-STEADY-DTD-xml.sh
Reply all
Reply to author
Forward
0 new messages