What are the Best Practices for Large Data Migration in dSpace?

114 views
Skip to first unread message

Dhaval Koradiya

unread,
Sep 22, 2023, 7:32:46 AM9/22/23
to DSpace Community
I need to migrate a significant volume of data(~500GB and ~300k artifacts) in dSpace from other repository. What are the recommended strategies and tools for handling large data migrations efficiently in dSpace? Any insights or best practices would be greatly appreciated.


DSpace Community

unread,
Sep 22, 2023, 11:11:07 AM9/22/23
to DSpace Community
Hi,

You may already have come across these, but we have a variety of import mechanisms for getting content into DSpace: https://wiki.lyrasis.org/display/DSDOC7x/Ingesting+Content+and+Metadata

Larger amounts of content are probably easiest to bulk upload via either of these

The format you import with may depend on which seems easier for you to get the data into.

My other recommendation would be to start *small* initially, in order to get the hang of it.   Choose a small subset of the content as a "test" to import and get that working.  You may also want to consider importing in batches little by little (e.g. 10 batches of 30K artifacts or some similar approach).

You also will want to be aware of our "Performance Tuning" hints/tips: https://wiki.lyrasis.org/display/DSDOC7x/Performance+Tuning+DSpace   With any batch upload, you will require extra memory/CPU... and the larger the batch size, the more it may require (especially memory).

Those are some general guidelines, but feel free to ask more specific questions or share more about your migration.  I'm sure others here could share more specific tips as well based on their own bulk import experiences.

Tim

Dhaval Koradiya

unread,
Sep 25, 2023, 7:53:52 AM9/25/23
to DSpace Community
Thank you Tim for your quick response 🙂

I have gone through both of ways... my queries are

1. I have understood this, using SAFBuilder we can create a zip file based on CSV and upload via UI for each collection, Are there any options for batches upload in Simple Archive Format?
2. How to use AIP and METS? What is the supported structure? How to generate or create from existing data information? I have gone through this it's looking like complex to me. Any similar documents or prectical video available? It would be a great help to understand the Archival Information Package process.

What's your thought, Which one is the recommended way for such a large data migration?
Reply all
Reply to author
Forward
0 new messages