--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/fc53d7a5-d01c-49b6-aaab-ae011236599a%40googlegroups.com.
Hello,We're queuing this up in cron: https://wiki.postgresql.org/wiki/Automated_Backup_on_Linuxand pushing the resulting nightly backups into our preservation pipeline (in iRODS).Donald
On Thu, Sep 19, 2019 at 2:01 PM Jamie Jamison <jam...@g.ucla.edu> wrote:
The UCLA dataverse is on AWS, data in an S3 bucket that is cross region replicated. For backup of the data I'm running rclone to backup the data to our department box.--I'm reading through the dataverse backup scripts. My question is how others are backing up their metadata. I need to setup a script to backup the database but wasn't sure how other people are setting this up.Thank you,Jamie Jamison
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
I know metadata can be exported but just to clarify - is the dataset metadata in the postgres database? I'm trying to setup backup and restore procedures and describe where the various pieces are located. The hypothetical scenario is rebuilding our system from backups.
On Thursday, September 19, 2019 at 11:04:18 AM UTC-7, Don Sizemore wrote:
Hello,We're queuing this up in cron: https://wiki.postgresql.org/wiki/Automated_Backup_on_Linuxand pushing the resulting nightly backups into our preservation pipeline (in iRODS).Donald
On Thu, Sep 19, 2019 at 2:01 PM Jamie Jamison <jam...@g.ucla.edu> wrote:
The UCLA dataverse is on AWS, data in an S3 bucket that is cross region replicated. For backup of the data I'm running rclone to backup the data to our department box.--I'm reading through the dataverse backup scripts. My question is how others are backing up their metadata. I need to setup a script to backup the database but wasn't sure how other people are setting this up.Thank you,Jamie Jamison
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/fc53d7a5-d01c-49b6-aaab-ae011236599a%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/c97f6663-c743-4a9a-a507-355d68e7b5f2%40googlegroups.com.
The main pieces you want are:• user-uploaded datafiles (formally known as the files.dir hierarchy, now in S3 for you)• the postgres database (which indeed contains all metadata and may be backed up via shell script)• any customization you made beneath /usr/local/glassfish4/glassfish/domains/domain1/docroot or equivalent (in our case, none)Our datafiles still live on local storage, so we drop our nightly database dumps alongside files.dir and push the whole thing into our preservation pipeline.Donald
On Thu, Sep 19, 2019 at 3:42 PM Jamie Jamison <jam...@g.ucla.edu> wrote:
I know metadata can be exported but just to clarify - is the dataset metadata in the postgres database? I'm trying to setup backup and restore procedures and describe where the various pieces are located. The hypothetical scenario is rebuilding our system from backups.
On Thursday, September 19, 2019 at 11:04:18 AM UTC-7, Don Sizemore wrote:
Hello,We're queuing this up in cron: https://wiki.postgresql.org/wiki/Automated_Backup_on_Linuxand pushing the resulting nightly backups into our preservation pipeline (in iRODS).Donald
On Thu, Sep 19, 2019 at 2:01 PM Jamie Jamison <jam...@g.ucla.edu> wrote:
The UCLA dataverse is on AWS, data in an S3 bucket that is cross region replicated. For backup of the data I'm running rclone to backup the data to our department box.--I'm reading through the dataverse backup scripts. My question is how others are backing up their metadata. I need to setup a script to backup the database but wasn't sure how other people are setting this up.Thank you,Jamie Jamison
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/fc53d7a5-d01c-49b6-aaab-ae011236599a%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/fc53d7a5-d01c-49b6-aaab-ae011236599a%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/c97f6663-c743-4a9a-a507-355d68e7b5f2%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/b80a20d2-be4f-494a-8168-bc49d4a64d85%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/02cdd1a2-8cfd-49b3-a047-8071121dc530%40googlegroups.com.
My understanding of the Data Curation Tool is that it is enriching variable metadata stored in the database which (as before) is then exported into DDI, for example. I think one of the big things is being able to add weights to variables but it's probably best to consult Victoria Lubitch's slides about the tool: https://osf.io/a2wtk/
On Sun, Sep 22, 2019 at 9:04 PM Janet McDougall - Australian Data Archive <janet.m...@anu.edu.au> wrote:
Great conversation here, and Phil this metadata preservation information is very useful. I didn’t realise that the metadata 'export' accessed this store rather than the database - i can understand why as you have described though.
Will the Data Curation tool 'variable metadata' also be a metadata export when it's in production?
thanks
Janet
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/02cdd1a2-8cfd-49b3-a047-8071121dc530%40googlegroups.com.
Just to be sure: The .cached files are stored in the file system we use as part of our Dataverse installation in addition to be stored in the PostgreSQL database? I wasn't aware of this. I think this is very useful when it comes to long-term preservation efforts. Do preservation trails in e.g. Archivematica relate to these files in any way?
Best, Philipp
mandag 23. september 2019 03.39.02 UTC+2 skrev Philip Durbin følgende:
My understanding of the Data Curation Tool is that it is enriching variable metadata stored in the database which (as before) is then exported into DDI, for example. I think one of the big things is being able to add weights to variables but it's probably best to consult Victoria Lubitch's slides about the tool: https://osf.io/a2wtk/
On Sun, Sep 22, 2019 at 9:04 PM Janet McDougall - Australian Data Archive <janet.m...@anu.edu.au> wrote:
Great conversation here, and Phil this metadata preservation information is very useful. I didn’t realise that the metadata 'export' accessed this store rather than the database - i can understand why as you have described though.
Will the Data Curation tool 'variable metadata' also be a metadata export when it's in production?
thanks
Janet
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/02cdd1a2-8cfd-49b3-a047-8071121dc530%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/eaf4e260-dd14-4ec6-b639-b5474d88ec3b%40googlegroups.com.
Yes. Out of the box, the "cached" files are stored on the filesystem but they'll be on S3 or Swift if you use those alternate file storage options (I'm 90% sure of this). I do agree that this is helpful for long term preservation. I've heard Jon Crabtree from Odum talk about the importance of this feature. I'm still interested in a good, short way to talk about this at https://dataverse.org/software-features so if any wordsmiths out there have any suggestions, please let me know. :) I keep thinking I should give a talk called "hidden features of Dataverse". :)I'm not sure what "preservation trails" are but my reading of https://www.archivematica.org/en/docs/archivematica-1.10/user-manual/transfer/dataverse/#dataverse-mets-file that the "native JSON" format produced by Dataverse (export_dataverse_json.cached in my example above) is transformed by Archivematica into a DDI-based METS file. I'm sorry if I have any of this wrong. Someone else out there knows the details better than I do.I hope this helps,Phil
On Mon, Sep 23, 2019 at 7:30 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Just to be sure: The .cached files are stored in the file system we use as part of our Dataverse installation in addition to be stored in the PostgreSQL database? I wasn't aware of this. I think this is very useful when it comes to long-term preservation efforts. Do preservation trails in e.g. Archivematica relate to these files in any way?
Best, Philipp
mandag 23. september 2019 03.39.02 UTC+2 skrev Philip Durbin følgende:
My understanding of the Data Curation Tool is that it is enriching variable metadata stored in the database which (as before) is then exported into DDI, for example. I think one of the big things is being able to add weights to variables but it's probably best to consult Victoria Lubitch's slides about the tool: https://osf.io/a2wtk/
On Sun, Sep 22, 2019 at 9:04 PM Janet McDougall - Australian Data Archive <janet.m...@anu.edu.au> wrote:
Great conversation here, and Phil this metadata preservation information is very useful. I didn’t realise that the metadata 'export' accessed this store rather than the database - i can understand why as you have described though.
Will the Data Curation tool 'variable metadata' also be a metadata export when it's in production?
thanks
Janet
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/02cdd1a2-8cfd-49b3-a047-8071121dc530%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/eaf4e260-dd14-4ec6-b639-b5474d88ec3b%40googlegroups.com.
Thanks, Phil. The term "preservation trail" is probably not used anywhere in any preservation framework, but was coined by me on the fly in an attempt to distract from my ignorance on this issue. ;-) We are not using Archivematica or any similar tool yet, so like Phil, I'd be interested in any more details.
Philipp
mandag 23. september 2019 16.08.23 UTC+2 skrev Philip Durbin følgende:
Yes. Out of the box, the "cached" files are stored on the filesystem but they'll be on S3 or Swift if you use those alternate file storage options (I'm 90% sure of this). I do agree that this is helpful for long term preservation. I've heard Jon Crabtree from Odum talk about the importance of this feature. I'm still interested in a good, short way to talk about this at https://dataverse.org/software-features so if any wordsmiths out there have any suggestions, please let me know. :) I keep thinking I should give a talk called "hidden features of Dataverse". :)I'm not sure what "preservation trails" are but my reading of https://www.archivematica.org/en/docs/archivematica-1.10/user-manual/transfer/dataverse/#dataverse-mets-file that the "native JSON" format produced by Dataverse (export_dataverse_json.cached in my example above) is transformed by Archivematica into a DDI-based METS file. I'm sorry if I have any of this wrong. Someone else out there knows the details better than I do.I hope this helps,Phil
On Mon, Sep 23, 2019 at 7:30 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Just to be sure: The .cached files are stored in the file system we use as part of our Dataverse installation in addition to be stored in the PostgreSQL database? I wasn't aware of this. I think this is very useful when it comes to long-term preservation efforts. Do preservation trails in e.g. Archivematica relate to these files in any way?
Best, Philipp
mandag 23. september 2019 03.39.02 UTC+2 skrev Philip Durbin følgende:
My understanding of the Data Curation Tool is that it is enriching variable metadata stored in the database which (as before) is then exported into DDI, for example. I think one of the big things is being able to add weights to variables but it's probably best to consult Victoria Lubitch's slides about the tool: https://osf.io/a2wtk/
On Sun, Sep 22, 2019 at 9:04 PM Janet McDougall - Australian Data Archive <janet.m...@anu.edu.au> wrote:
Great conversation here, and Phil this metadata preservation information is very useful. I didn’t realise that the metadata 'export' accessed this store rather than the database - i can understand why as you have described though.
Will the Data Curation tool 'variable metadata' also be a metadata export when it's in production?
thanks
Janet
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/02cdd1a2-8cfd-49b3-a047-8071121dc530%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/eaf4e260-dd14-4ec6-b639-b5474d88ec3b%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d79f753c-357f-4c8b-86fe-c655cc8549c3%40googlegroups.com.
and the datacite.xml file as well.
Right now though, I think the database is the only complete record, and the only one that would allow you to restore a Dataverse instance. Some of the export formats are partial by design, i.e. in that they only include the subset of metadata that can be mapped to a particular schema/format. The json and ore exports are conceptually different – they are nominally intended to be complete (at least the ore map is since I was the one with the intention) – but, in practice, I don’t think they include everything needed to round-trip yet. For the ORE map, I know that it was in development before the variable metadata editing was merged and before provenance (text and file) was done, so it doesn’t include everything needed to restore a dataset yet. It’s definitely a goal of the ORE/BagIt effort to make the archive file sufficient to restore a dataset into a working Dataverse instance, and the Dataverse API is sufficient/close to sufficient to export/re-ingest everything so tools like pyDataverse could become a back-up tool. But for now, I think the cached metadata files are more of a preservation option than a back-up one.
-- Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8FZsQ_nx7iG1MmBC9vR5LnnjP3NtsqmsUGKveRUz0txLA%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/02cdd1a2-8cfd-49b3-a047-8071121dc530%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/eaf4e260-dd14-4ec6-b639-b5474d88ec3b%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d79f753c-357f-4c8b-86fe-c655cc8549c3%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
Janet,
FWIW: The ORE map being generated is stored as a cached file and is available from the ‘export metadata’ menu. That said, the main purpose in creating it was to include it in the BagIt file along with all the content (and other files required by the BagIt spec, and more recently, as recommended by RDA) and to be able to transfer those zipped Bags to long term storage. And, eventually, to pull them back into Dataverse. For that reason, the initial code doesn’t archive things like variable level metadata that would be regenerated on import. However, that changes now that the variable metadata is editable.
So far code exists to create the ORE map and the Bag, and to send the Bag to Google’s Cloud, to DuraCloud (previously the way into DPN and currently a way to send things to Chronopolis) or to the file system (created by Odum as a way to then manage the Bags in iRODS) – with creation triggered manually as an admin or automatically as part of publication. The initial concept is that the archive represents the state of the Dataset in Dataverse and that it is stored but not being further processed ala Archivematica.
Open issues:
· There is not yet a way to read the Bag automatically back into a Dataverse instance.
· While I believe the Bags with their ORE map files were sufficient to recreate a Dataset in Dataverse (i.e. all the data and metadata that users see would be the same after a round-trip) this is no longer true given the recent work to make variable-level metadata editable (so what Dataverse would re-derive is not the same as the edited original) and to allow uploaded provenance files, neither of which are included in the current Bags.
· As said above, the original use case was for archiving/future restoration rather than preservation, so the Bags currently don’t include metadata/data that is automatically derived by Dataverse.
The first two are definitely things that we (I/QDR/GDCC/…) want to/plan to address in the ~near term. This would include adding the edit variable-level metadata to the OREmap and adding any provenance file to the Bag (with a reference in the ORE map). (The Dataverse Uploader already has some place-holder code for reading a Bag, so we have a starting point).
The latter item – thinking about the Bag in support of preservation is something that could be done, e.g. a switch to add all derivable data/metadata, including all of the cached metadata export files, to the Bag. I guess one question for the community is whether that’s useful. (Should all the export formats be included? Just specified ones? If the ORE map includes the variable level metadata (and potentially the derived file formats), are the exported metadata files also needed?)
--Jim
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/02cdd1a2-8cfd-49b3-a047-8071121dc530%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/eaf4e260-dd14-4ec6-b639-b5474d88ec3b%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d79f753c-357f-4c8b-86fe-c655cc8549c3%40googlegroups.com.
--
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dataverse-community/CABbxx8FZsQ_nx7iG1MmBC9vR5LnnjP3NtsqmsUGKveRUz0txLA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dataverse-commu...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dataverse-community/74b5072b-1f9e-4938-bbb1-f7b08e63a7cf%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/02cdd1a2-8cfd-49b3-a047-8071121dc530%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/eaf4e260-dd14-4ec6-b639-b5474d88ec3b%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d79f753c-357f-4c8b-86fe-c655cc8549c3%40googlegroups.com.
--Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8FZsQ_nx7iG1MmBC9vR5LnnjP3NtsqmsUGKveRUz0txLA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.