Running parallel php symfony import:bulk

84 views
Skip to first unread message

Glen Robson

unread,
Oct 22, 2015, 8:04:55 PM10/22/15
to ICA-AtoM Users
Hi,

We are migration a large number of EAD documents into AtoM and some of them are very large and are taking a couple of hours to load. We've extended the memory to a point where the load still has 2g spare and added as many indexes to mysql as we can think of. It seems now to be limited to the CPU and as I understand it php will only use one CPU rather than spread the load.

Is it possible to run two or more bulk uploads at the same time? I'm hoping each upload will tie up different CPUs. Does this sound like a good idea or should we just be patient and wait for the single process to finish? We are running AtoM 2.2 and have indexing turned off for the load.

Thanks for your help

Glen

Creighton Barrett

unread,
Oct 22, 2015, 8:59:31 PM10/22/15
to ica-ato...@googlegroups.com
Glen, I don't have the answer to your question, but I just wanted to chime in and say that multiple-hour imports are something that we have come to expect with large EAD or CSV files. How big are the EAD files? We have experienced this with EAD and with CSV files with tens of thousands of file-level descriptions.

Our experience with import has been that the indexing doesn't really add a huge amount of time to the import and leaving the indexing turned on can save time because you don't have to rebuild the index after the imports (which also takes hours). We've also found that it is best to import large files in small batches, even one at a time, rather than try to do one import:bulk command that might fail at some point along the way.

I'd be interested to hear if you can run two or more bulk uploads at the same time, but I hope this helps!

Cheers,

Creighton



--
You received this message because you are subscribed to the Google Groups "ICA-AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at http://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/f3c8e4a7-a27a-4c83-8388-2170eac42a85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Glen Robson

unread,
Oct 22, 2015, 10:41:03 PM10/22/15
to ica-ato...@googlegroups.com
Hi Creighton,

The largest one is 64MB and we’ve got about 220 EAD files between that and 1MB. Its roughly working out at 16MB per hour. Thanks for the tip about indexing we’ve been running with it off but if we have to do this again Ill turn it on. We are importing the files one at a time rather than storing in a directory as you suggest as I was worried it might stop half way through but I don’t know if this causes a performance hit.

Thanks for the pointers.

Glen

David Juhasz

unread,
Oct 23, 2015, 1:06:43 PM10/23/15
to ica-ato...@googlegroups.com
Hi Glen,

As far as I know we've never tried running multiple import processes at the same time, but I can't think of any reason not to give it a try.  I think the main risk is running out of memory.

Cheers,
David

--

David Juhasz
Director, AtoM Technical Services Artefactual Systems Inc. www.artefactual.com

Creighton Barrett

unread,
Oct 28, 2015, 1:52:59 PM10/28/15
to ica-ato...@googlegroups.com
I'd be curious to hear how this goes for you, Glen. I spoke with a developer about this and he mentioned that a bulk import failed once when he (or someone maybe someone else) tried to manually create a new record. He also mentioned that he has noticed slower import and page load performance with large "flat" files (e.g., a collection with thousands of files or items but no series). We are currently testing a CSV file with one collection-level description and almost 60,000 file-level descriptions and the import and the tree-view navigation has been much slower than other large files where, perhaps, the hierarchical structure helps improve performance.

Those are some big EAD files! We did our larger EAD files one at a time but stored our smaller files in a directory, but we also didn't have anything that big.

Good luck,

Creighton

Glen Robson

unread,
Oct 28, 2015, 2:04:11 PM10/28/15
to ica-ato...@googlegroups.com
Hi Creighton and David,

Thanks both for your advice, I’m afraid as time was getting short for go live we didn’t want to test parallel loading in case something went wrong. Its currently still loading and hopefully will finish by the end of the week.

Thanks and sorry we are not able to test at this current time.

Glen


Creighton Barrett

unread,
Oct 28, 2015, 2:40:26 PM10/28/15
to ica-ato...@googlegroups.com
Hey, I totally understand. No worries!

Reply all
Reply to author
Forward
0 new messages