Tracking down a whitelabel error..

69 views
Skip to first unread message

Tim Young

unread,
Oct 29, 2024, 11:35:31 AM10/29/24
to dspac...@googlegroups.com
I am a linux tech who is helping some librarians debug their dspace.  The complaint is that they are seeing some error pages after clicking on the "pdf" link after looking at an item.
Whitelabel Error Page
This application has no explicit mapping for /error, so you are seeing this as a fallback.

Tue Oct 29 15:04:16 UTC 2024
There was an unexpected error (type=Internal Server Error, status=500).
An internal read or write operation failed

From what I google; there should be an error message in the dspace.log file, but there are no errors there.  (plenty of recent "INFO" and one or two "WARN" entries, but nothing that looks like it has to do with the above error.)  I verified all the permissions on the assetstore, and they are the same across the board (all files have the same file permissions, and all directories have the same dir permissions)

Now, for some history:

This site is relatively new.  The school had an older version dspace for quite some time, but they had a hard-drive go out on them at the same time that they had some issues with their backup. The short of it is that, instead of "upgrading" dspace, like we had planned, we ended up installing a clean dspace 7.6 and re-entering their heaps and piles of items.

Many of the items which are in their dspace work fine.  But a number of items do not.  I am suspicious of the "failing" items, as the dates (dc.date.accessioned, dc.date.available, which appear to be the dates that the items were entered) are from years ago.  We literally built this thing less than a month ago and the dates are showing between 3 and 5 years prior.  I suspect that someone imported some dumps from the past, and those dumps did not include the actual items.  But that is my suspicion.

Is there a way to verify that the bitstreams (I believe this is the correct word; I am looking for the files themselves) exist for the items in the database?  Can we dump a list of items without bitstreams?  And, what would be the procedure for entries that no longer have items associated with them?  Delete and re-add, or can they upload a new item to an existing entry?

Again, I am an IT guy who is mainly helping out, so I apologize for not being immersed in dspace and knowing everything I should about the tools available to me.

- Tim Young

mw...@iu.edu

unread,
Oct 30, 2024, 2:47:12 PM10/30/24
to dspac...@googlegroups.com
Understood. DSpace uses some of these terms differently so, to avoid
confusion:

o an Item is a DSpace abstraction in the database, which represents a
set of files and the metadata which describe the set.

o a Bitstream is a DSpace abstraction in the database, which
represents a file in the filesystem and metadata which describe it.
One article of metadata is the name under which the file was
submitted.

o the actual files are spread randomly through an Assetstore directory
tree, and each is named by a hash of its content. They are placed
in directories named by the first few characters of the hash, so
assetstore file 93948004603449655886343619484568742798 would be
found as '93/94/80/93948004603449655886343619484568742798' under
the assetstore root.

It's possible to have multiple Assetstores but you probably don't. To
find the root of the single assetstore, you can use 'bin/dspace dsprop
-property assetstore.dir'.

I'm not sure what pieces you believe to be missing. To find
Bitstreams representing files that don't exist in the assetstore, you
could walk the Bitstream database table and look up each row's
'internal_id' column value in the filesystem. The 'internal_id' is
just the hash -- you'll need to compute the path as noted above.

It is possible to delete an existing bitstream from, and to up load a new
one to, an Item when logged in with appropriate rights (such as a
member of the Administrator group). Navigate to the Item, use the
pencil / Edit this item button at top right, select the Bitstreams
tab, and you'll see a list of bitstreams with action buttons, plus
uploading controls at the top.

There is another abstraction that you'll see there: the Bundle.
Bitstreams are contained in Bundles which are contained in Items. You
probably want to upload to the ORIGINAL bundle -- the others contain
ancillary or derived files such as thumbnail images, license texts,
and extracted flat text for indexing, and should normally be treated
as opaque implementation details.

In case you haven't already found it: the Official Documentation is
at https://wiki.lyrasis.org/display/DSDOC7x

If you have further questions, please ask. If the documentation is
unclear, please say so!

If you can get more details on how the content of the repository was
rebuilt, it may be helpful in finding the best way forward or, at
least, understanding the repository's current state.

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
library.indianapolis.iu.edu
signature.asc

Tim Young

unread,
Oct 30, 2024, 4:28:37 PM10/30/24
to dspac...@googlegroups.com
mw...@iu.edu wrote:
If you can get more details on how the content of the repository was rebuilt, it may be helpful in finding the best way forward or, at least, understanding the repository's current state.

I do think we figured out how the mess came to be.  I think they restored a database from a defunct dspace install, not realizing that there was no backup of the assetstore.  So, basically, the file is missing but all the database side of stuff is there.  Since we know the date we rebuilt, we should be able to find all the items which have been created before that time and "fix" those.  It looks like we can initially do a search of something like: dc.date.issued:[1999 TO 2024-03-20] to find all the records. 

They still do have most of the assets in digital form, but it will be work to find them scattered across multiple computers.  It looks like the best process for fixing them, if we have the original file, is follow these instructions you gave:

It is possible to delete an existing bitstream from, and to up load a new one to, an Item when logged in with appropriate rights (such as a member of the Administrator group). Navigate to the Item, use the pencil / Edit this item button at top right, select the Bitstreams tab, and you'll see a list of bitstreams with action buttons, plus uploading controls at the top.

And then we will have the joyous task of dealing with all the ones that fall through the cracks, whether because they cannot find the original file, or because the item was somehow skipped when they were fixing.

Anyway.  I think you have helped; at least I know where things stand.  It may be easier for them to bulk delete all the ones that are certainly no longer there and simply upload all the ones that they do have.  I will let them figure that one out.

    - Tim Young

On 10/30/2024 1:47 PM, mw...@iu.edu wrote:
On Tue, Oct 29, 2024 at 03:33:35PM +0000, Tim Young wrote:
I am a linux tech who is helping some librarians debug their dspace.  The complaint is that they are seeing some error pages after clicking on the "pdf" link after looking at an item.
Whitelabel Error Page
This application has no explicit mapping for /error, so you are seeing this as a fallback.

Tue Oct 29 15:04:16 UTC 2024
There was an unexpected error (type=Internal Server Error, status=500).
An internal read or write operation failed

(snip)
Reply all
Reply to author
Forward
0 new messages