Problems resolving pointers to dataset’s datafiles in 4.3 with existing handles created in 3.6.2.

47 views
Skip to first unread message

ofu...@gmail.com

unread,
May 9, 2016, 3:24:55 AM5/9/16
to Dataverse Migration Working Group

Hi,
We are on the process of migrating our dvn 3.6.2 to Dataverse 4.3 (http://henry2.ub.uit.no:8080/dataverse/root). We met  some problems on resolving pointers to dataset’s datafiles with existing handles created in 3.6.2.


We generated the list of the database IDs (from table dataset) of all our *released* datasets with Handle global ids and saved it as “list_of_db_ids.txt”

We run a script containing the following:

------

#!/bin/bash

cat list_of_db_ids.txt | while read dbid

do

    curl 'http://localhost:8080/api/datasets/'$dbid'/modifyRegistration?key=a8767631-ff9b-46cc-bc3d-73dac1c2b558'

    echo

done

------


But, the datafiles in Dataverse 4.3 are showing 0 bytes after migrating. Metadata, Terms and Version information are OK.

We used packlist.txt to pack the v3.x directories/files :

      /usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/...

We unpacked the datafiles and moved the directory  /10037.1 to 

       /usr/local/glassfish4/glassfish/domains/domain1/config/files/
 
We can see in Glassfish4's domain.xml file that the filepath is configured to be: /usr/local/glassfish4/glassfish/domains/domain1/config/files/

We have restarted Glassfish4 and re-indexed Solr.  Any other post-migration tasks we have to perform before the datafiles can be viewed/downloaded and the size displayed on a browser?

Ofuuzo

Leonid Andreev

unread,
May 9, 2016, 3:55:26 PM5/9/16
to Dataverse Migration Working Group
Hello,

So, it looks like you were able to make some serious progress since last week? - Great!

> But, the datafiles in Dataverse 4.3 are showing 0 bytes after migrating ...

Most likely, the file migration script on the old system failed to locate the physical files; so they didn't get copied; and the file sizes were left set to zero.
(In other words, only the migration of the database records for the old files succeeded; but not for the physical files).

Could you please send/post the first 10 lines of the packlist.txt;
Also, if you saved the debugging output of the files_source_ script (in datafiles.sql.out, as the instruction suggested, or otherwise), please send that too.

As for the other part of your question, about handles:
There is no connection between registered handles and the datafiles.
(handles are persistent identifiers that are registered for studies/datasets; individual files don't have handles, neither in DVN3, nor in Dataverse 4).
I'm seeing that your old handles are still pointing to the studies on opendata.uit.no, even though you ran the script to re-register them...
(I just tried http://hdl.handle.net/10037.1/10285, and it did resolve to http://opendata.uit.no/dvn/dv/trolling/faces/study/StudyPage.xhtml?globalId=hdl:10037.1/10285).
We can figure out what's going on there; but let's figure out what happened to the files first.

You most likely DON'T want to change the registration of all your identifiers just yet to the dataverse 4 urls; until you make sure everything has been migrated properly.
Message has been deleted

ofu...@gmail.com

unread,
May 9, 2016, 5:13:50 PM5/9/16
to Dataverse Migration Working Group
First 10  lines of the packlist.txt:
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/142
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/143
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/144
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/145
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/146
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/147
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/148
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/149
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/150
/usr/local/glassfish3/glassfish/domains/domain1/config/files/studies/10037.1/10121/152

I did not save datafiles.sql.out . I could try and re-generate the file it if it isnecessary.

Leonid Andreev

unread,
May 9, 2016, 7:39:57 PM5/9/16
to Dataverse Migration Working Group
OK, there may be a problem with that script, files_source_: these files pathnames should really be relative, and not absolute.
But you must have figured that out; (you said you "moved the directory 10037.1 ...")

Because I just checked the dataset "10037.1/10121" (http://henry2.ub.uit.no:8080/dataset.xhtml?persistentId=hdl:10037.1/10121)
 - and all the files in that dataset ARE actually downloadable; and all have non-zero, correct sizes!
Here's another dataset with the files migrated ok: http://henry2.ub.uit.no:8080/dataset.xhtml?persistentId=hdl:10037.1/10281

So it's not ALL of your files that failed to migrate; but only SOME of them.

The fact that the missing files have size = 0 really means that the migration script (files_source_) could not locate the absolute file pathnames saved in your old database on the filesystem.
Did you run that script on the same physical server that's actually serving your DVN 3 installation (http://opendata.uit.no?)
- that's most likely the problem; that some of the files are stored in a directory that's mounted on one server, but isn't accessible to another.

OK, please run the script again, and save all the output:

cat migrated_datasets.txt | ./files_source_ <OFFSET> > files_import.sql 2>files_import.stderr

- and attach both files_import.sql and files_import.stderr.

ofu...@gmail.com

unread,
May 12, 2016, 6:19:03 AM5/12/16
to Dataverse Migration Working Group
Hi,
We have manged to migrate from 3.6.2(3) to 4.3 (except  2a. migrate preloaded customization on Migration steps).  We could share some of our experiences (simplify the instruction steps) if it is needed, probably next week.

Ofuuzo

Leonid Andreev

unread,
May 12, 2016, 12:14:36 PM5/12/16
to Dataverse Migration Working Group
Hi,
That's great to hear!

Yes, please post whatever feedback you have.
We DO know that we need to provide better/easier to use migration tools.
Both the scripts and the documentation.
We just haven't been able to allocate enough resources yet to address it, unfortunately.

But I'm going to go ahead and update the files section. And explain the potential issues with the DVN 3 filenames.
(Specifically, the fact that in DVN3, the database stored absolute pathnames for the files. And that it was possible to end up having files stored in several different directories on the file system, if the files.dir setting was changed throughout the life of the installation)
Reply all
Reply to author
Forward
0 new messages