Bulk import of 6 millions of images (Alfresco 3.4.d)

Khalil MEZYAOUI

unread,

Jun 10, 2014, 10:16:00 AM6/10/14

to alfresco-bulk-f...@googlegroups.com

Hi Peter,

We intend to test our Alfresco architecture against a large amount of image files (to see if Alfresco will continue working normally or not..)

Since we are using alfresco 3.4.d (community) we have no options to use Solr, and then I would like to know if there is a particular config to do in Lucene before running the import.

Here is some details about our architecture :

-We are using the alfresco-3.4.d cluster with two nodes and synchronizing ehcache with multicast discovery
-8Go of RAM for each node
-Lucene indexes are stored in local, content stores are shared with nfs mounts
-Only metadatas are indexed (about 10 metadatas) since images are binary files (jpg, jpeg, etc)
-We have generated about 3To of files (6 millions of images with xml metadata files)

I'm asking this question because I didn't find a real Alfresco test benchs proving that Alfresco (3.4.d) will support such amount of objects...

ps: we can not migrate to another version of Alfresco (4.x for example) at least for now as we have a lot of customized code....

Many thanks
Khalil

Peter Monks

unread,

Jun 11, 2014, 7:15:53 PM6/11/14

to alfresco-bulk-f...@googlegroups.com

G’day Khalil,

Comments inline in [blue steel].

Cheers,
Peter

On 2014-06-10, at 7:16 AM, Khalil MEZYAOUI <mezy...@gmail.com> wrote:

Hi Peter,

We intend to test our Alfresco architecture against a large amount of image files (to see if Alfresco will continue working normally or not..)

Since we are using alfresco 3.4.d (community) we have no options to use Solr, and then I would like to know if there is a particular config to do in Lucene before running the import.

[pmonks] No special configuration of Lucene is necessary, however the default configuration (synchronous indexing) is slower than asynchronous indexing.

Here is some details about our architecture :

-We are using the alfresco-3.4.d cluster with two nodes and synchronizing ehcache with multicast discovery
-8Go of RAM for each node
-Lucene indexes are stored in local, content stores are shared with nfs mounts
-Only metadatas are indexed (about 10 metadatas) since images are binary files (jpg, jpeg, etc)
-We have generated about 3To of files (6 millions of images with xml metadata files)

I'm asking this question because I didn't find a real Alfresco test benchs proving that Alfresco (3.4.d) will support such amount of objects…

[pmonks] Back in 2008 Alfresco v2.2 was tested to 100 million documents (see this link [1]), and there are several installations that I’m aware of that exceed that number (some of them on 3.x versions). I’m also aware of the bulk import tool having been used to import approximately 40 million documents.

Since Alfresco v4.0, we’ve done another significant round of benchmarking, focusing on a number of different dimensions (i.e. not just total document count). That information is available here [2].

One thing to note is that all of these benchmarks, imports etc. were performed on the Enterprise edition, rather than Community.

[1] http://www.alfresco.com/news/press-releases/alfresco-benchmark-exceeds-100-million-objects

[2] http://www.alfresco.com/resources/whitepapers/alfresco-scalability-blueprint

ps: we can not migrate to another version of Alfresco (4.x for example) at least for now as we have a lot of customized code….

[pmonks] Are those customisations mostly in Share? I’d be surprised if well-designed & implemented repository customisations were difficult to upgrade. The bulk import tool (for example), was originally developed for Alfresco v2.1, and has been readily upgradeable across both 3.x and (thus far) 4.x.

Many thanks
Khalil

--
You received this message because you are subscribed to the Google Groups "Alfresco Bulk Filesystem Import" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesys...@googlegroups.com.
To post to this group, send email to alfresco-bulk-f...@googlegroups.com.
Visit this group at http://groups.google.com/group/alfresco-bulk-filesystem-import.
For more options, visit https://groups.google.com/d/optout.

Khalil MEZYAOUI

unread,

Jun 13, 2014, 4:12:21 AM6/13/14

to alfresco-bulk-f...@googlegroups.com

Hi Peter,

Many Thanks for your clarifications and details.

Kindly find my answers bellow..

Thanks.

Khalil

[pmonks] No special configuration of Lucene is necessary, however the default configuration (synchronous indexing) is slower than asynchronous indexing.

Is that mean I can use atomic=false in my data model or use index.tracking.disableInTransactionIndexing (with all the issues coming with) ?

[pmonks] Are those customisations mostly in Share? I’d be surprised if well-designed & implemented repository customisations were difficult to upgrade. The bulk import tool (for example), was originally developed for Alfresco v2.1, and has been readily upgradeable across both 3.x and (thus far) 4.x.

Yes, almost all customizations are in Share. In Repositry we have customized FileFolderService (that will be deactivated for bulk) to add some controls for images in addition of other customizations related to external authentication, etc.

Reply all

Reply to author

Forward