best way to upload huge volumes of data in Alfresco

837 views
Skip to first unread message

David Crespo

unread,
Mar 8, 2012, 4:37:45 AM3/8/12
to alfresco-techn...@googlegroups.com
We have stored a lot of documents in a file system of a remote machine, hopefully this machine will be in the same network. A document is composed of files and folders with more files in. So the average size of a document is over 4 GB. We have to move this documents from the remote file system to Alfresco. And here is goes my question: What is the best way to accomplish this task? 
I am considering:

1. Bulk file system import. Most people suggest to use this tool for my problem but I think it doesn't fit well in my case because Bulk tool works only with documents in the same filesystem where Alfresco is running or the filesystem should be in the same network and be accessible.

2. Upload servlet. I don't know how to use this option I guess it should be by webscript or webservice and so by http and I think that there is a limit in this situation because of the http protocol and also the JVM memory, I don't know. Does the upload webscript example from alfrescowiki use the upload servlet? 

3. Set up the ftp server of Alfresco and move the content using a ftp client in the filesystem remote.

I look forward your suggestions,

Thanks.

esp...@gmail.com

unread,
Mar 17, 2012, 10:15:49 PM3/17/12
to alfresco-techn...@googlegroups.com
The Bulk Import Tool[1] has become the most common way to import huge volumes of data into Alfresco. The version on Google Code[2] has some features and fixes that are not included in the version incorporated in Alfresco 4.

You are correct that using the Bulk Import Tool assumes that the data is already on one of the Alfresco servers, but that is a hurdle you will have to overcome anyway. Sometimes your fastest option for moving data will be to physically move the drive and connect it to one of your Alfresco servers. However you decided to get the data to the server, the tool supports both copying it into the Alfresco repository, or ingesting it into the database while leaving it in-place on the file system (presumably you would copy it into the alf_data directory before running the tool).

Writing a webscript to which you can post data is sometimes used as a way to allow you to wrap the upload with your own logic. But it won't be as fast as the import tool.

Options such as FTP, CIFS, or SMTP are also often used. They don't give you much out-of-the-box control over what metadata gets extracted, but they are super simple to leverage. You can ingest content at a pretty fast rate using these interfaces, but it won't be as fast as copying the data directly to the server and using the import tool.

Cheers,

Richard

[1] https://addons.alfresco.com/addons/bulk-filesystem-import-tool
[2] http://code.google.com/p/alfresco-bulk-filesystem-import/

David Crespo

unread,
Mar 19, 2012, 3:19:34 PM3/19/12
to alfresco-techn...@googlegroups.com
Thank you Richard very useful your suggestions.

--
You received this message because you are subscribed to the Google Groups "Alfresco Technical Discussion" group.
To post to this group, send email to alfresco-techn...@googlegroups.com.
To unsubscribe from this group, send email to alfresco-technical-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/alfresco-technical-discussion?hl=en.




--
David Crespo Arroyo
Manager and developer of ECM projects based on Alfresco. Certified Alfresco Instructor.



Reply all
Reply to author
Forward
0 new messages