CSV Importation via Web Interface

Skip to first unread message

Bryan Vasquez

May 17, 2023, 9:18:35 PM5/17/23
to AtoM Users

We are planning to create a new instance of AtoM in our institution for a migration of old records, the main question that we have is the following, Is there a defined limit for the CSV importation via the web interface? We are thinking to do tasks of uploadings of 50,000 records per time, that is gonna work right that way without any problems?

Thank you so much in advancing for your assistance.

Bryan Vásquez.
National Archive from Costa Rica.

rakkitha samaraweera

May 18, 2023, 6:05:07 AM5/18/23
to ica-ato...@googlegroups.com
Hello, If you have a way for that, pls let me know.

National Archives,
Sri lanka.

You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/1c49fccd-b4bb-4535-9268-332cef7015d6n%40googlegroups.com.

Dan Gillean

May 18, 2023, 10:11:20 AM5/18/23
to ica-ato...@googlegroups.com
Hi Bryan, 

There is not technically a defined limit of rows for web imports since the job scheduler should avoid timeout issues due to browser limits etc... but for a migration project, I would strongly encourage you to use the command-line interface import instead, as you will have more control (with CLI options not available in the UI), and less complexity that could introduce unexpected errors (the CLI can parse the file directly, rather than needing the UI and job scheduler). 

I believe the job scheduler will use more system resources than the CLI, so for large files you are more likely to exhaust available resources like memory than if you were using the command-line directly. Additionally, when importing via the CLI you can make the process much faster by using the CLI options to skip certain steps that typically take a long time when running individually, but can be run much faster as separate tasks at the end. For example: 
  • By default, unless you add the --index option, the CLI import will not try to update the search index with every row it imports, making the import MUCH faster and less resource intensive. You can then just run the search:populate command at the end of the import, as described in the documentation here. There is a "skip indexing" option in the user interface, but you need the CLI to run the search:populate task anyway
  • The --skip-nested-set-build option does not have a user interface equivalent and will also dramatically speed up the import if used. Building the nested set after the import is quite fast (usually seconds), with php symfony propel:build-nested-set, as documented here. 
  • If your import CSV includes paths to digital objects (using the digitalObjectPath or digitalObjectURI columns in the CSV), you could also use the --skip-derivatives command to speed up the import process, and then run the command to generate digital object derivatives separately after, as described in the documentation here
In general, we perform all migrations using CLI tools, and for larger CSV files, I strongly recommend you do the same. If for some reason the CLI is not available to you for migration, then I would suggest much smaller file sizes, so you are less likely to exhaust the system resources, and can do some checking between CSV files to ensure everything is importing as expected. 

In general, I recommend doing this type of spot-checking between CSV files no matter which way you import them - and consider making backups along the way, every time you confirm the import has gone well! See: 
Finally, you might want to check out some of the tips and suggestions found in this slide deck: 
 Good luck! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
he / him

Reply all
Reply to author
0 new messages