offsite backup/mirror (LOCKSS?)

21 views
Skip to first unread message

aussda....@gmail.com

unread,
Jul 19, 2018, 6:11:59 AM7/19/18
to Dataverse Users Community
Hi Everyone,

We're currently exploring options for offsite backups of our data catalog and are curious what other organizations are doing in this regard. Is it possible to use LOCKSS with Dataverse? If not LOCKSS is there an alternative solution? 

Thanks very much for the help! Best, Frank

Philip Durbin

unread,
Jul 19, 2018, 7:39:13 AM7/19/18
to dataverse...@googlegroups.com
Hi Frank,

LOCKSS was supported by DVN 3.x ( http://guides.dataverse.org/en/3.6.2/dataverse-installer-main.html#using-lockss-with-dvn ) but is not supported in Dataverse 4. https://github.com/IQSS/dataverse/issues/2025 is more than three years old and suggests continuing to support LOCKSS, looking into ResourceSync, or other options.

A year and a half later, Pete Meyer from Harvard Medical School offered his idea for a Data Locality Module (DLM) for Dataverse in https://github.com/IQSS/dataverse/issues/3403 and we recently added a new database table for storage sites that controls in the Dataverse GUI which sites around the world have rsync datasets that he has replicated around: http://guides.dataverse.org/en/4.9.1/developers/big-data-support.html#configuring-download-via-rsync

There are a number of community efforts underway to push or pull data from Dataverse into archival systems such as the Digital Preservation Network (using the BagIt standard: https://github.com/IQSS/dataverse/issues/4706 ), Archivematica (using Archival Information Packages, AIPs: https://github.com/IQSS/dataverse/issues/4283#issuecomment-392078932 ), or EASY ( https://groups.google.com/d/msg/dataverse-community/HNRb0c92MoA/-b7UfhSzBwAJ ).

As part of Harvard Dataverse moving to S3, we (royal we, not me) wrote a backup script to copy data offsite. You can read about it at http://guides.dataverse.org/en/4.9.1/admin/backups.html and https://github.com/IQSS/dataverse/blob/v4.9.1/scripts/backup/run_backup/README_HOWTO.txt

If you're serious about wanting LOCKSS support specifically, I'd suggest creating a fresh GitHub issue.

There's probably other stuff I'm forgetting. Others should definitely feel free to tell their stories on this topic.

I hope this helps,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/7dd22972-efcf-4cfb-89f7-a008eba2e631%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Don Sizemore

unread,
Jul 19, 2018, 10:18:24 AM7/19/18
to dataverse...@googlegroups.com
One more preservation strategy to tack on, in the vein of "the-Internet-is-run-by-bash-and-cron:"

Odum takes nightly postgres database dumps, then uses iRODS' irsync command to push two weeks worth of database dumps and the entire files.dir hierarchy into our primary iRODS server. From there these backup copies are replicated (three backup copies total, checksums all around), and we push a copy into S3-east, which replicates to S3-west.

This is in addition to our local virtual machine and filesystem backups.

In case this is helpful,
Donald


On Thu, Jul 19, 2018 at 7:39 AM, Philip Durbin <philip...@harvard.edu> wrote:
Hi Frank,

LOCKSS was supported by DVN 3.x ( http://guides.dataverse.org/en/3.6.2/dataverse-installer-main.html#using-lockss-with-dvn ) but is not supported in Dataverse 4. https://github.com/IQSS/dataverse/issues/2025 is more than three years old and suggests continuing to support LOCKSS, looking into ResourceSync, or other options.

A year and a half later, Pete Meyer from Harvard Medical School offered his idea for a Data Locality Module (DLM) for Dataverse in https://github.com/IQSS/dataverse/issues/3403 and we recently added a new database table for storage sites that controls in the Dataverse GUI which sites around the world have rsync datasets that he has replicated around: http://guides.dataverse.org/en/4.9.1/developers/big-data-support.html#configuring-download-via-rsync

There are a number of community efforts underway to push or pull data from Dataverse into archival systems such as the Digital Preservation Network (using the BagIt standard: https://github.com/IQSS/dataverse/issues/4706 ), Archivematica (using Archival Information Packages, AIPs: https://github.com/IQSS/dataverse/issues/4283#issuecomment-392078932 ), or EASY ( https://groups.google.com/d/msg/dataverse-community/HNRb0c92MoA/-b7UfhSzBwAJ ).

As part of Harvard Dataverse moving to S3, we (royal we, not me) wrote a backup script to copy data offsite. You can read about it at http://guides.dataverse.org/en/4.9.1/admin/backups.html and https://github.com/IQSS/dataverse/blob/v4.9.1/scripts/backup/run_backup/README_HOWTO.txt

If you're serious about wanting LOCKSS support specifically, I'd suggest creating a fresh GitHub issue.

There's probably other stuff I'm forgetting. Others should definitely feel free to tell their stories on this topic.

I hope this helps,

Phil
On Thu, Jul 19, 2018 at 6:11 AM, <aussda....@gmail.com> wrote:
Hi Everyone,

We're currently exploring options for offsite backups of our data catalog and are curious what other organizations are doing in this regard. Is it possible to use LOCKSS with Dataverse? If not LOCKSS is there an alternative solution? 

Thanks very much for the help! Best, Frank

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages