Richard Jones writes:
> We have just noticed that uploading an HTML file which references pictures
> and also uploading the pictures themselves does not result in a renderable
> HTML page when the item is viewed. In retrospect, this is obviously the
> case given the way that DSpace stores its files, and after a quick think I
> can't see an obvious way around this. Would other people consider this a
> problem? Should there be a fix for it or is it the case that we simply
> can't take HTML files with images referenced?
Hussein Suleman writes:
>this is an old problem and i havent seen any simple solutions in
>packaged systems.
A number of somewhat unrelated comments:
0/ This has been for a year my single biggest complaint with DSpace.
1/ It's not just images. It's also other support files such as external CSS
style sheets, .JS files, etc. Think of HTML as a compound document format
implemented as multiple separate files.
2/ I believe this problem has been pretty well addressed in both WebCT and
Blackboard, so I have to dissagree with Hussein. Blackboard, for instance,
parses the HTML file when it is uploaded and prompts for each support file
where there's a URL in an IMG, LINK, etc. tag and where the URL is relative
(more precisely, where it is either a simple filename with no slashes, or
has a dirname/filename form, the latter to support HTML exported from MS
Office products). Both WebCT and Blackboard have easy to use and very
workable user interfaces.
3/ On the archival vs display question, I appreciate the philosophical
issues. As a practical matter, though, the fact that DSpace is web based
leads our customers to expect it to be able to display at least simple web
pages. A fortiori, a web page is not an HTML file. It's a set of multiple
URLs in a relationship, and even an archive needs to preserve the
information about that relationship. Introducing an additional packaging
layer by uploading a ZIP or tar file is one way to preserve that
relationship, but it complicates the archiving issues; the "document type"
for supported/known/unknown decisions now needs to encode both the packaging
format and the format of the packaged files.
4/ Per Richard Rogers email to dspace-general of 5 Sep, the DSpace 1.2
release is planned to include "9) Better support for web page (HTML
document) item display." As a practical matter, I see the ability to
display IMGs with relative URLs as a make or break feature for DSpace. If
some minimal support for this doesn't make it into the 1.2 release, I'm
pretty sure that a lot of us -- probably including UO -- will jump ship for
Fedora or Eprints.
5/ There's a risk that pursuit of the perfect might make us lose sight of
"good enough". I don't think that in general it's possible to archive a web
page perfectly. These objects are too dynamic. And the HTML spec is
complex; consider URLs in a CSS encoded in parentheses, or frames, or
site-rooted relative links. I think the ability to handle a simple HTML
document with simple CSS and image files is critical, but I don't think we
need to handle much more of the complexity of HTML than that. If we can
handle the typical HTML document as created by MS Word's "save as web page"
and have it viewable after being uploaded that will definitely be good
enough for my needs.
JQ Johnson Office: 115F Knight Library
Academic Education Coordinator mailto:
j...@darkwing.uoregon.edu
1299 University of Oregon phone:
1-541-346-1746; -3485 fax
Eugene, OR 97403-1299
http://darkwing.uoregon.edu/~jqj/