--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/76597081-1b1a-442f-99df-0b1ecee88476%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Michel,
Phil pointed me at the DV issue and scripts (somewhere in google docs) and here’s my summary, with a slight script update.
The key steps are to move the files to S3 and then to update the database to indicate that those files are to be retrieved from S3.
For QDR, I cd’d to the Dataverse files/ directory and did
Do a dry run:
aws s3 sync ./10.5072 s3://qdr-dataverse-dev/10.5072 --dryrun
Run the sync of everything in the 10.5072 dir tree to the bucket:
aws s3 sync ./10.5072 s3://qdr-dataverse-dev/10.5072
and then the same thing for our production authority. (The temp file directory is still needed/doesn’t get transferred to S3). (--dryrun is a great way to make sure you’ve got paths correct and the command is doing what you want before you run it for real.)
To be cautious, I’ve just moved the file directories rather than deleting the data in the file system until I had everything working in S3.
I then ran the following updates in sql. The first updates the storageidentifier for DataFiles and the second does Datasets. (As far as I know, the latter isn’t really needed since the storageidentifier for Datasets is not used, but leaving them as file://... doesn’t make much sense and could cause problems if it ever gets used/I’ve missed somewhere it is used.)
You should definitely backup your databases first – always a good idea but it’s easy to cut/paste with the wrong bucket (i.e. from a dev/test server to production) and then it can be tricky to recover. From my notes, I think these queries came from Dataverse with the exception that I think I added a check for NULL storageidentifiers (QDR had some from older versions/testing)
Then run the following update queries in postgres:
UPDATE dvobject SET storageidentifier='s3://qdr-dataverse-dev:' || storageidentifier WHERE id in (SELECT o.id FROM dvobject o, dataset s WHERE o.dtype = 'DataFile' AND s.id = o.owner_id AND s.harvestingclient_id IS null AND o.storageidentifier NOT LIKE 's3://%' AND storageidentifier IS NOT NULL)
UPDATE dvobject SET storageidentifier=overlay(storageidentifier placing 's3://' from 1 for 7) WHERE id in (SELECT o.id FROM dvobject o, dataset s WHERE o.dtype = 'Dataset' AND s.id = o.id AND s.harvestingclient_id IS null AND o.storageidentifier LIKE 'file://%')
Also, FWIW, since the cached thumbnails and metadata exports will also be in S3, you’ll need an aws command to delete them rather than a file system rm (e.g. to refresh when a new version updates the content of metadata files.) The following S3 command deletes just the cached files:
aws s3 rm s3://qdr-dataverse-dev/ --recursive --exclude "*" --include "*.cached"
Hope that helps,
--Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8ED8p%3D9jRUAZ833hxcVqnbk1SMc_LLHgasOj4oB_TkUsw%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/76597081-1b1a-442f-99df-0b1ecee88476%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.