dataverse backup and recovery

Deirdre Kirmis

unread,

Jun 16, 2020, 3:21:25 AM6/16/20

to dataverse...@googlegroups.com

Hi All .. I am working on disaster recovery procedures for our dataverse pilot installation, and am running into some issues. Wondering if anyone who is using AWS can help? We are using S3 for user files and RDS Postgres for the database. Our scenario is that the old S3 bucket and database are completely destroyed, and all we have are backup files in an external backup location for those. So, I did the following: created a new EC2 instance and installed dataverse; created a new database and restored a dump from the old; copied the S3 files to the new bucket and pointed to that new bucket using the bucket name JVM option. I know that the storageidentifier field in the dvobject table contains the name of the S3 bucket, so I also replaced that string in all of the records with the new bucket name (and reindexed solr). I tried all of this and am getting a 502 error when I try to load the site.

Are there other steps that I need to do? Has anyone done this successfully, and is there an easier way? =) Thanks for any help.

Night Owl

James Myers

unread,

Jun 16, 2020, 11:00:07 AM6/16/20

to dataverse...@googlegroups.com

Deirdre,

I think your backup process looks like it should work. Since 502 is a Bad Gateway error, my guess would be that it’s related to the configuration of whatever load balancer/proxy you’re using versus something specific to the recovery procedure itself. You should be able to see which ‘Server: ‘ is sending the 502 error by looking in the browser development console (under the Network tab on Chrome) – when I’ve seen 502s, the Server: was ‘AWS ELB’ (or similar) rather than ‘Apache’.

W.r.t. to the recovery process itself, the only thing I’d add is to check settings and jvm-options if your new host changes the domain name where you’re running things (The fqdn jvm setting would be one).

I hope that helps,

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/BYAPR06MB4869C130C627C46843A38D26879D0%40BYAPR06MB4869.namprd06.prod.outlook.com.

Leonid Andreev

unread,

Jun 16, 2020, 4:23:08 PM6/16/20

to Dataverse Users Community

Hi Deirdre,

To add to what Jim said earlier, what you're seeing is a failure to just access the application. I.e. it's happening way before any errors trying to access the files on S3 would occur. So it's something much more basic with the setup; it could be that Glassfish fails to start. It could be that the application fails to deploy. if Apache is in the mix, it could be the configuration that deals with how it redirects requests to Glassfish.

I can give you some pointers to where to begin diagnosing things - how to check if Dataverse is actually deployed under Glassfish; if it's not, whether it's because it can't talk to the database; how to check the Apache configuration etc. etc.

But, since the goal is not necessarily to fix this test instance, but to have a working recovery process in place - maybe we should start with reviewing that process. You mentioned that you "created a new EC2 instance and installed dataverse" - did you really start completely from scratch, with a new CentOS image, then installing and configuring a new Dataverse instance? - there would be quite a few opportunities to miss some configuration step by doing it that way.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Deirdre Kirmis

unread,

Jun 17, 2020, 1:17:41 PM6/17/20

to dataverse...@googlegroups.com

Yes, I am building the entire thing from scratch, and trying to automate that process for the purposes of disaster recovery. =) I am using dataverse-ansible for the initial install of dataverse, but yes, there are other configuration things before and after that. I figured out that my issue earlier was caused by not giving the glassfish user aws credentials to see the s3 bucket. The other thing was, when I imported the database, it still had a pointer to an external solr server in the settings, and still had my default authentication as “shib”, even though I am not using shibboleth in this installation, so tried to change that back to “builtin” but couldn’t.

I have the site loading now, and it sees all of the dataverses/datasets, and user files, but it doesn’t see the metadata and I can’t run any curl commands to change JVM options, so I’ve messed something up there. I get a “connection refused” error.

Here’s the steps that I’ve done:

1. create entire aws environment with ec2, s3, rds db, security groups, load balancer, target groups (ansible)

2. import database from original site to new rds postgres

3. move s3 data from old bucket to new (didn’t want to mess up original bucket for testing)

4. utilize dataverse-ansible (work of art) to do initial install of dv (used basically as is … didn’t enable options for s3 or rds)

5. change jvm options for files to s3 and new bucket

6. change pointer in domain.xml to database from localhost to rds instance

7. update database with new file “storageidentifier” for user files

8. update database to remove pointer to external solr instance (had to do it this way because when I tried to change it via the JVM options, didn’t have access to the other solr instance … and rightly so)

When I tried to run the curl command to update the default auth, that’s where I get the “connection refused” error. Do you know of any other settings that I should be aware of? Thanks for any help.

Night Owl

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/BYAPR06MB4869C130C627C46843A38D26879D0%40BYAPR06MB4869.namprd06.prod.outlook.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/9d9bbf9f-c877-4f2c-bc3e-683fe3263811o%40googlegroups.com.

Deirdre Kirmis

unread,

Jun 17, 2020, 2:25:44 PM6/17/20

to dataverse...@googlegroups.com

Okay, never mind… the metadata is there, it is the previewers that aren’t working … and I am missing any custom headers and footer images that were uploaded in the original site.

The curl commands worked when I remembered to use my api key!

Night Owl

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/BYAPR06MB4869667FE9C7FCB538BCC68F879A0%40BYAPR06MB4869.namprd06.prod.outlook.com.

Philip Durbin

unread,

Jun 17, 2020, 4:14:32 PM6/17/20

to dataverse...@googlegroups.com

In my experience, previewers require a valid SSL cert on the Dataverse installation. Or you have to accept the self-signed cert. Just something you might want to check.

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/BYAPR06MB4869161C79D1FB0BAA0E2B13879A0%40BYAPR06MB4869.namprd06.prod.outlook.com.

--

Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

James Myers

unread,

Jun 18, 2020, 7:14:22 AM6/18/20

to dataverse...@googlegroups.com

Another easy to overlook step: the S3 bucket has to allow cross site (CORS) calls if you have direct downloads turned on. If you’re specifying hosts instead of using ‘*’, you need to include the globaldataversecommunityconsortium.github.io host to get the previewers to work (in addition to your Dataverse host for the rest of the UI to work).

-- Jim

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8ET%2BGGtuqWg3F6m2DHdB1xdk0a%3DjoX0KC2Q63B1r3KOjg%40mail.gmail.com.

Reply all

Reply to author

Forward