Data Ingestion - High Volumes to AWS EKS Cluster installed Alfresco Content Services

25 views
Skip to first unread message

Sudatta Mawade

unread,
Jul 28, 2020, 12:19:07 PM7/28/20
to Alfresco Bulk Import Tool
Hi Peter,

Hope you are doing great. Just to give you a brief. I have been successfully using your non-embedded fork of Alfresco Bulk Import tool for the data ingestion in Alfresco. 

Below is the glimpse of my environment.

AWS EKS Cluster - 3 worker nodes - m5.4x-Large, 16 core, 64GB, , Content Store - s3 bucket

My Alfresco Content Services are all installed in Kubernetes Pods (Share, Repo, database). Postgress database is configured with EFS. I have set the throughput for database to 200mb/s.

Since, my content store is s3 and I have received the data in AWS EC2 windows instance. I am using the below approach to ingest the data.

Local Set up:

Installed Local Alfresco Installation in EC2 instance along with the bulk import utility. Configured it to the AWS EKS content store i.e s3 bucket and the postgres database in the Kubernetes pod.

While running the Bulk Import utility I am able to ingest the data to my original environment. However, I am able to get a throughput of around 1,00,000 documents / hour (considering the size of documents as 15-100kb and metadata attributes like 10 in number.) 

Below are the statistics which I am receiving:

Files scanned - 120/sec
Content Streamed - 24/sec
Nodes Imported - 23/sec

I have tried increasing the memory and cpu allocation for each of the share, repository, postgress pods. Also increased the database EFS throughput to 200mbs. Also, I have tried increasing the batch weight from 100 to 200. But to no avail the stats remain more or less the same.

Can you please help me to understand what can be changed in order to increase the throughput. Also, which factor decides the number of nodes imported in Alfresco. So that I can tune that to achieve better ingestion performance. 

Any help or guidance would be really appreciated. Just trying to get 10TBs of data in Alfresco in less than 30 days. :)
 

Thank you and have a great day ahead.

Regards,
Sudatta Mawade
Reply all
Reply to author
Forward
0 new messages