Bulk Import is failing while importing files larger than 20MB, our contentstore is S3

246 views
Skip to first unread message

Roshan John

unread,
Mar 26, 2017, 11:34:31 PM3/26/17
to Alfresco Bulk Import Tool
Hi,
   I was testing this bulk import for one of our migrations. We are running Alfresco 5.0, with S3 as our contentstore. Whenever I try to upload a file that is larger than 20MB I get the error pasted below. I have successfully tested imports with files ranging from 1MB up to 18MB in size.

This is the file size captured by the dry-run
Bytes imported:226044933983311616 / sec

Here is the error: 
Exception:
org.alfresco.extension.bulkimport.impl.ItemImportException: Unexpected exception:
 class org.alfresco.service.cmr.repository.ContentIOException: 02261554 S3StreamListener: Failed to upload content: contentstore/-system-/70328420-ef57-444c-a76f-0d8c2dba12a5.bin (02261553 Unable to upload multipart file:contentstore/-system-/70328420-ef57-444c-a76f-0d8c2dba12a5.bin)
While importing item: localhost_access_log.2015-08-27.txt (1 version):
HEAD: <content> <metadata>
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importItem(BatchImporterImpl.java:230)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importBatchImpl(BatchImporterImpl.java:184)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.access$200(BatchImporterImpl.java:69)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl$2.execute(BatchImporterImpl.java:161)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:454)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importBatchInTxn(BatchImporterImpl.java:152)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.access$000(BatchImporterImpl.java:69)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl$1.doWork(BatchImporterImpl.java:130)
at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:548)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importBatch(BatchImporterImpl.java:124)
at org.alfresco.extension.bulkimport.impl.Scanner.submitCurrentBatch(Scanner.java:333)
at org.alfresco.extension.bulkimport.impl.Scanner.run(Scanner.java:194)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.alfresco.service.cmr.repository.ContentIOException: 02261554 S3StreamListener: Failed to upload content: contentstore/-system-/70328420-ef57-444c-a76f-0d8c2dba12a5.bin (02261553 Unable to upload multipart file:contentstore/-system-/70328420-ef57-444c-a76f-0d8c2dba12a5.bin)
at org.alfresco.integrations.s3store.S3StreamListener.contentStreamClosed(S3StreamListener.java:77)
at org.alfresco.repo.content.AbstractContentAccessor$CallbackFileChannel.fireChannelClosed(AbstractContentAccessor.java:331)
at org.alfresco.repo.content.AbstractContentAccessor$CallbackFileChannel.implCloseChannel(AbstractContentAccessor.java:315)
at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
at java.nio.channels.Channels$1.close(Channels.java:178)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at org.alfresco.repo.content.LimitedStreamCopier.copyStreamsLong(LimitedStreamCopier.java:105)
at org.alfresco.repo.content.AbstractContentWriter.copyStreams(AbstractContentWriter.java:502)
at org.alfresco.repo.content.AbstractContentWriter.putContent(AbstractContentWriter.java:479)
at org.alfresco.extension.bulkimport.source.fs.FilesystemBulkImportItemVersion.putContent(FilesystemBulkImportItemVersion.java:211)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importVersionContent(BatchImporterImpl.java:627)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importVersionContentAndMetadata(BatchImporterImpl.java:488)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importVersion(BatchImporterImpl.java:435)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importFile(BatchImporterImpl.java:390)
at org.alfresco.extension.bulkimport.impl.BatchImporterImpl.importItem(BatchImporterImpl.java:212)
... 12 more
Caused by: org.alfresco.service.cmr.repository.ContentIOException: 02261553 Unable to upload multipart file:contentstore/-system-/70328420-ef57-444c-a76f-0d8c2dba12a5.bin
at org.alfresco.integrations.s3store.S3StreamListener.uploadFileAsMultipart(S3StreamListener.java:158)
at org.alfresco.integrations.s3store.S3StreamListener.uploadFile(S3StreamListener.java:118)
at org.alfresco.integrations.s3store.S3StreamListener.retrieveFileStreamAndUpload(S3StreamListener.java:87)
at org.alfresco.integrations.s3store.S3StreamListener.contentStreamClosed(S3StreamListener.java:62)
... 26 more
Caused by: java.lang.NullPointerException
at org.jets3t.service.multi.s3.ThreadedS3Service$1.fireProgressEvent(ThreadedS3Service.java:108)
at org.jets3t.service.multi.ThreadedStorageService$ThreadGroupManager.run(ThreadedStorageService.java:1938)
at org.jets3t.service.multi.s3.ThreadedS3Service.multipartStartUploads(ThreadedS3Service.java:139)
at org.jets3t.service.utils.MultipartUtils.uploadObjects(MultipartUtils.java:253)
at org.alfresco.integrations.s3store.S3ServiceAdapterImpl.uploadObjects(S3ServiceAdapterImpl.java:63)
at org.alfresco.integrations.s3store.S3StreamListener.uploadFileAsMultipart(S3StreamListener.java:150)
... 29 more


Thanks in advance
Roshan

Roshan John

unread,
Mar 29, 2017, 7:48:32 PM3/29/17
to Alfresco Bulk Import Tool
This does not seem to be an issue, i tried the upload again, and it worked fine for large files, not sure if there was some problem the previous time with the connectivity to S3.

Peter Monks

unread,
Mar 29, 2017, 8:14:22 PM3/29/17
to alfresco-bulk-f...@googlegroups.com
G'day Roshan,

Yeah it looks like an intermittent problem with S3 or the network, that triggered poor error handling in the Alfresco S3 connector.  Might be worth a question or bug report with Alfresco?

Cheers,
Peter


--
You received this message because you are subscribed to the Google Groups "Alfresco Bulk Import Tool" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesystem-import+unsubscribe@googlegroups.com.
To post to this group, send email to alfresco-bulk-filesystem-imp...@googlegroups.com.
Visit this group at https://groups.google.com/group/alfresco-bulk-filesystem-import.
For more options, visit https://groups.google.com/d/optout.

Roshan John

unread,
Mar 29, 2017, 11:27:31 PM3/29/17
to Alfresco Bulk Import Tool
Thanks for the reply Peter,
                 I had a question about the in-place upload, not sure if this is the right forum to post it in (let me know).
As mentioned, we are using S3 as our content store. When looking thru the S3 contentstore, it seemed like the way the files are stored in S3 is different than when the contentstore is on a filesystem. On a filesystem its broken down by year/month/day/hr/min etc..but in S3 the structure seemed to be flat. Just wondering if we could still use the in-place upload, if we copied our files and metadata files into S3 with a folder structure. Any personal experience with running an in-place bulk upload when using S3.

Also I think I know the answer to this, but just wanted to confirm. In the wiki it was written
"The source directory can be physically local to the server (i.e. stored on a directly attached hard drive, SSD drive, RAID array, etc.), or on a remote device that is mounted into the server's filesystem (e.g. NAS, SAN, iSCSI, etc.)."
I am guessing S3 cannot be used as a source directory. Just asking since one of the options to migrate a large amount of data from NAS or SAN devices to AWS, was to use AWS Snowball, which copies the data to S3.

Thanks for your help
Roshan

Peter Monks

unread,
Mar 30, 2017, 11:30:59 AM3/30/17
to alfresco-bulk-f...@googlegroups.com, Mark Lugert
G'day Roshan,

The default (file system) BulkImportSource can only perform an in-place import if Alfresco is configured to use a filesystem content store.  Obviously the S3 content store is not a filesystem content store, so in-place imports aren't possible with that default BulkImportSource.

That said, it's entirely possible to implement a custom BulkImportSource for S3 that is capable of in-place imports - in fact IIRC, Symflofy have done exactly that as part of their solution.

Mark - am I recalling that correctly?

Cheers,
Peter

Apologes for speling & gramar erorrs - sent from mobil deivce
--
You received this message because you are subscribed to the Google Groups "Alfresco Bulk Import Tool" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesys...@googlegroups.com.
To post to this group, send email to alfresco-bulk-f...@googlegroups.com.

Peter Monks

unread,
Mar 30, 2017, 12:07:09 PM3/30/17
to Mark Lugert, alfresco-bulk-f...@googlegroups.com
Any thoughts on open sourcing that S3 BulkImportSource?

Cheers,
Peter

Apologes for speling & gramar erorrs - sent from mobil deivce

On Mar 30, 2017, at 8:37 AM, Mark Lugert <mlu...@simflofy.com> wrote:

Yes Peter, that is correct, we have an in place importer for s3.

Mark

Roshan John

unread,
Apr 3, 2017, 11:52:09 AM4/3/17
to alfresco-bulk-f...@googlegroups.com, Mark Lugert
Thanks for this Peter.

Mark: Can you let us know what are our options to use your tool. Is there a license cost to use it? or would your professional services team need to be part of the migration effort?

Thanks
Roshan

To unsubscribe from this group and stop receiving emails from it, send an email to alfresco-bulk-filesystem-import+unsubscribe@googlegroups.com.
To post to this group, send email to alfresco-bulk-filesystem-imp...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Alfresco Bulk Import Tool" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alfresco-bulk-filesystem-import/w4UKaqmEVQo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to alfresco-bulk-filesystem-import+unsubscribe@googlegroups.com.
To post to this group, send email to alfresco-bulk-filesystem-imp...@googlegroups.com.



--
Regards
Roshan
Reply all
Reply to author
Forward
0 new messages