Best practises around configuring a dedicated Alfresco ingestion node

228 views
Skip to first unread message

Colin Stephenson

unread,
Jul 24, 2013, 12:20:19 PM7/24/13
to alfresco-bulk-f...@googlegroups.com
Hey Pete,

On my current project we are finally getting to setting up a dedicated content ingestion node.  We are trying to make this a lightweight as possible.  Have you or anyone else put together some best practices around this, for example, disabling quartz jobs that do not need to run, etc.

I have removed Solr, Share and any amps from this node, just looking for other areas to trim the fat on,

Thanks,
Colin.

Peter Monks

unread,
Jul 24, 2013, 2:42:03 PM7/24/13
to alfresco-bulk-f...@googlegroups.com, alfresco-techn...@googlegroups.com
G'day Colin,

This is more of a general Alfresco question so I've also cc'ed the alfresco-technical discussion group.  Hopefully no one blows a gasket over my blatant cross-posting!  ;-)

To your question, it depends a little on what you're trying to do.  Much of the Alfresco functionality consumes minimal resources if it's not used, so I personally wouldn't spend too much time trying to wring every possible ounce of savings out of the system, unless you're running on really resource constrained hardware.

That said there is some low hanging fruit around protocols - they're trivial to disable and (when running) consume heavyweight OS resources (i.e. sockets).  The file server protocols configuration is described here [1], the IMAP protocol configuration here [2] and the SMTP protocol (inbound) here [3].  In a nutshell you just need to add these properties to alfresco-global.properties (or set them through JMX, if you prefer):

cifs.enabled=false
ftp.enabled=false
nfs.enabled=false
imap.server.enabled=false
email.inbound.enabled=false

To disable the Sharepoint Protocol, simply make sure not to install the optional AMP it's shipped in (vti-module.amp, IIRC).

I'm sure there's more low hanging fruit for trimming down an Alfresco install - if anyone else has other nifty tricks, please join the conversation!



--
You received this message because you are subscribed to the Google Groups "Alfresco Bulk Filesystem Import" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesys...@googlegroups.com.
To post to this group, send email to alfresco-bulk-f...@googlegroups.com.
Visit this group at http://groups.google.com/group/alfresco-bulk-filesystem-import.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Stephan Marciniak

unread,
Jul 24, 2013, 3:29:07 PM7/24/13
to colinst...@gmail.com, alfresco-bulk-f...@googlegroups.com, alfresco-techn...@googlegroups.com
Hi Colin,

I have been developing a few Alfresco based applications as well.
And very recently, dealing with import batches aiming a total of 10 million documents, I decided to build up a stripped down Tomcat instance that is more or less dedicated to import tasks. I even removed the share.war from the deployment on that server instance, since it is not needed for bulk processing at all. That might save you some resources.

Additionally to what Peter described I disabled the Lucene backup job, since it gave me some trouble once before. This job is scheduled by default to run at 3 a.m. every night and puts the repository on read-only during the index backup creation.
If you are running content ingestion at the exact same time, it might result in transactional issues and even Alfresco crashes.
So my suggestion is to disable that job and create Lucene index backups manually on demand, or at least make sure you never run bulk imports around the same time as the index backup job is configured.

I am describing my experience with Alfresco 3.4.7 in particular. There is a good chance that Alfresco 4.x deals differently with the situation I described.


Regards,
Stephan

Colin Stephenson

unread,
Jul 24, 2013, 3:48:43 PM7/24/13
to alfresco-bulk-f...@googlegroups.com, colinst...@gmail.com, alfresco-techn...@googlegroups.com
This is going to be a dedicated node churning through content over the next few months.  We are using Solr so that helps in this instance, but I want to make sure this node does not attempt to do any Solr backups, etc.  So removing unnecessary amps, share, solr, openoffice, etc. all help.

It will be interesting to see what else is happening in Alfresco that may be useful to disable for an ingestion node :)

Thanks for all comments so far

Colin.
To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesystem-import+unsubscribe@googlegroups.com.
To post to this group, send email to alfresco-bulk-filesystem-imp...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Alfresco Bulk Filesystem Import" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesystem-import+unsubscribe@googlegroups.com.
To post to this group, send email to alfresco-bulk-filesystem-imp...@googlegroups.com.

Peter Monks

unread,
Jul 24, 2013, 6:11:20 PM7/24/13
to alfresco-bulk-f...@googlegroups.com, alfresco-techn...@googlegroups.com
Another thing I just thought of: during large imports the database's index statistics can get badly out of sync, which can steadily hurt database performance as the import progresses.  Figuring out (with the help of a DBA) a way to allow these statistics to be updated while an import is in progress is likely a good idea.  At the very least ensure that database statistics are recomputed after an import completes.

Cheers,
Peter

 

 


To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesys...@googlegroups.com.
To post to this group, send email to alfresco-bulk-f...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages