Concurrently run this Bulk Import tool with the BFSIT version embedded in Alfresco?

43 views

Skip to first unread message

t.me...@aca-it.be

unread,

Feb 8, 2018, 8:42:35 AM2/8/18

to Alfresco Bulk Import Tool

Hello Peter

I have a question regarding the compatibility between the latest version of your bulk import tool and the version embedded in Alfresco.

We have a customer who is already using the embedded bulk import tool very regularly (read triggered by several scheduled jobs). They would now like to import a huge amount of documents, without halting the execution of these other scheduled import jobs.

So this is were it gets interesting... I was wondering... Do you know / think if it's possible to run the latest version of the bulk import tool concurrently on the same machine with the embedded version of the bulk import tool? Or is this not possible because they will interfere with each other in some kind of way?

Kind regards from a brave Alfresco soul

Tom Metten

Peter Monks

unread,

Feb 8, 2018, 12:18:00 PM2/8/18

to alfresco-bulk-f...@googlegroups.com

G'day Tom,

Yes the two tools should be compatible in the way you describe (they are independent of one another), and so could technically be used in parallel.

However doing so may be a bad idea, since imports (especially large ones) are a heavy operation for the repository, and running multiple imports concurrently may over-stress the Alfresco environment.

One thing you didn't mention was whether there was also interactive (i.e. human user) activity in the Alfresco instance at the same time as these imports. That also needs to be factored in, since if Alfresco (or more likely the database) is pegged, users are going to experience potentially unacceptable response times (whereas batch processes in general, and the import tool specifically, isn't sensitive to poor response times).

I'd suggest giving this a solid test in a non-critical environment, ideally using real data (both existing data in the repository, and the input data for the scheduled and one-off imports), as well as simulating end-user activity and monitoring response times for that class of traffic. Not only will this validate whether the approach works at all, it will also (more importantly) give you plenty of opportunity to identify where tuning might be needed (this FAQ item [1] talks briefly about this, but in short the database is almost certainly going to need tuning and/or increases in capacity - it's the ultimate bottleneck in a well-tuned Alfresco environment).

I'd also suggest clustering Alfresco, and running the scheduled jobs, big one-time import, and end user activity on different sets of Alfresco cluster nodes (e.g. scheduled imports on one node, one-time import on another, and user activity on a couple of other nodes). This not only ensures the JVM & Alfresco code isn't a bottleneck, it also allows you to tune those cluster nodes differently, based on the distinct traffic patterns (i.e. batch vs interactive). For example, in my (somewhat out of date) experience, the import tool doesn't need much memory, but interactive requests need a lot (and that requirement scales with the number of concurrent users).

This is a really interesting scenario btw - if you get the chance to do this work, I'd be keen to hear how it went, and I suspect I'm not the only one - perhaps this would make a good blog post?

Cheers,

Peter

--
You received this message because you are subscribed to the Google Groups "Alfresco Bulk Import Tool" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesystem-import+unsubscribe@googlegroups.com.
To post to this group, send email to alfresco-bulk-filesystem-imp...@googlegroups.com.
Visit this group at https://groups.google.com/group/alfresco-bulk-filesystem-import.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages