Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

attention heavy ingesters who have migrated from I7 to I2

56 views
Skip to first unread message

Sharon Hanna

unread,
Dec 18, 2024, 7:47:39 PM12/18/24
to islandora
Hi All,

We ingest fairly heavily: over the last eight months, we've ingested around 1475 objects/repository items per month (mean). Is there anyone who has migrated to the new Islandora who ingests at a similar rate? How has your process (including time required for image) changed in the new environment?

Thanks in advance for any replies received, and happy holidays!

Sharon Hanna
Programmer Analyst
Special Collections and Archives
The University of British Columbia | Okanagan Campus
Syilx Okanagan Nation Territory

Joe Corall

unread,
Dec 19, 2024, 6:14:10 AM12/19/24
to isla...@googlegroups.com
Hi Sharon,

Here at Lehigh we've been averaging around 7,000 items per month since going live on i2 around July 1.

Before going live, our Special Collections Librarians, Metadata Librarians, and myself met and modified the i7 ingest spreadsheet so it matched our i2 metadata profile. The end result looks like this.

Now similar to the workflow we had for ingesting content into i7: a batch ingest begins by copying the ingest template and populating the template copy with content; one i2 item/object/node per spreadsheet row.

What's different now is we have some tooling around both populating the spreadsheet and actually running the ingest.
  1. While populating the spreadsheet editors can click a "Check my work" button in Google Sheets and the data in the spreadsheet is analyzed to ensure things like files referenced for upload exist, titles in the spreadsheet are populated, dates are formatted correctly, etc.
  2. Once the spreadsheet is ready, another button exists in Google Sheets to direct the editor to start the ingest using Islandora Workbench. The ingest is started by supplying the Google Sheets URL to a GitHub Action Workflow. The job then transforms the Google Sheet into the format Workbench expects, and executes the ingest. The job status is sent to a group Slack Channel to keep everyone in the loop of what's running and the success/failure of the ingest job.
We're continuing to iterate on this process. As we use this new workflow more we're finding things we could be catching with the "Check my work" button to prevent undesirable metadata from being added to the repository. I'm also working on making the tooling more general in the hopes this could eventually be used by other institutions if desired.



--
Learn more about Islandora in general at islandora.ca and join the community at https://github.com/Islandora/islandora-community/wiki
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/islandora/a96decd5-46b5-4f3d-bb25-533d565aa2dan%40googlegroups.com.

Sharon Hanna

unread,
Dec 19, 2024, 12:50:41 PM12/19/24
to isla...@googlegroups.com
Hi Joe, thanks so much!! It sounds like you’ve developed some great tools using Google. Has everything gone smoothly with Workbench, and what are the max number of media files and max total size you can ingest at a time? 

Thanks!

Sharon

You received this message because you are subscribed to a topic in the Google Groups "islandora" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/islandora/Q6ePTujmr5M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to islandora+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/islandora/CAF4-1v%2BXwb1Mnbw3zJ282RTOHnnyNSYwScpsN72zHR1CN%3DxFOA%40mail.gmail.com.

Cory Lampert

unread,
Dec 19, 2024, 1:12:03 PM12/19/24
to isla...@googlegroups.com
Hi Sharon,

We regularly ingest large batches using Workbench at UNLV.  

A recent batch was from an archival collection of campus photo services with a total of 290,000 images. We've been successful in large ingests of up to about 2,700 records. A recent ingest created 2,686 new nodes and added 2,622 jpg files as media. It did take about 7 hours to complete.

We did check with our server admin before starting to check that the system should be able to handle it, and it seems like so far, so good!

Hope this helps!
Cory


--

Sharon Hanna

unread,
Dec 19, 2024, 1:17:10 PM12/19/24
to isla...@googlegroups.com
Thank you, Cory! Good to know!

-Sharon 

You received this message because you are subscribed to a topic in the Google Groups "islandora" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/islandora/Q6ePTujmr5M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to islandora+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/islandora/CANG81qAo%3DUYrOATgT%3DpO%2Bfvgryhm-KV_HYATuWFtB6cBh%2BtiuQ%40mail.gmail.com.

Joe Corall

unread,
Dec 19, 2024, 2:52:50 PM12/19/24
to isla...@googlegroups.com
Workbench has been wonderful! We actually used workbench for our migration off i7. For that process, we saw significant speed improvements on our migration jobs after writing a wrapper script around workbench to allow parallel jobs to run. We migrated 380K nodes/media this way. The need for speed was mainly so we could get off i7 as quickly as possible after we were sure our i7<->i2 mappings were working properly; we haven't had a need to improve the speed (i.e. run parallel jobs) since the large batches we ran during the migration.

Since migrating to i2 we've uploaded 4K items in a single batch with no problem. For us a batch that size takes about 3 hours. We probably could handle tens of thousands in a single batch, but managing a spreadsheet that large is pretty unwieldy for any individual to manage.

Early on for larger videos we had some timeout issues, but that's not specific to workbench - more a php/webserver problem. Our site is behind a reverse proxy // haproxy frontend our infrastructure team manages. That reverse proxy has some small timeout settings (i.e. 60s). So we ended up needing to point workbench directly to our Islandora server (what the reverse proxy considers the "backend"). This ended up being ideal since with this approach we can create a separate php-fpm pool that is only available from on-campus (and only workbench should ever be pointed at) that we can then set very high timeout and memory limits for. So we have very large timeout and memory/upload limits only for the workbench php-fpm pool. Our normal site traffic has much tighter restrictions.

Sharon Hanna

unread,
Dec 19, 2024, 5:44:08 PM12/19/24
to isla...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages