Request for help on Improving Scalability for Akka Streams Program

24 views
Skip to first unread message

emmanuelo...@gmail.com

unread,
Jan 11, 2018, 6:18:47 AM1/11/18
to Akka User List
Hi Everyone,

I am learning Scala and Akka Streams, I would kindly like to request for help on enhancing my small Akka Streams Program so that it can scale with increase in data. Below are the two functions of the Program;

a. convertTxtToPDF: This function converts raw text files to PDF files
b. setPwdLock: This function sets a password on PDF files generated

It takes below time for processing;

i. 1000 text files takes 8 mins (50 parallelism)
ii. 3000 text files takes 14 mins (150 parallelism)
iii. 10000 text files takes 47 mins (500 parallelism)

I noted that with increase in files, the longer it takes to process. Kindly help me enhance the Program to scale with increased data i.e processing should take less time with increased number of text files.

Please find the attached code for the Program.

Kind Regards,
Emmanuel.
Akka Streams example.txt

Viktor Klang

unread,
Jan 11, 2018, 6:38:38 AM1/11/18
to Akka User List
Where is your bottleneck?

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscribe@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--
Cheers,

Rob Crawford

unread,
Jan 11, 2018, 12:48:31 PM1/11/18
to Akka User List
I think you're expecting too much out of parallel processing. The files need to be loaded -- how fast is the disk and its interface? The files need to be written -- divide that speed by half.
There's processing that needs to be done -- how many cores do you have?

Unless the files are extremely small -- as in "can be read with a single I/O operation" -- I would expect you're I/O bound reading and writing 50 files at a time, AND you're CPU bound trying to convert 50 files at once. Drop the parallelism factor down to the number of cores you have, then start increasing it to find what the machine can actually handle.
Reply all
Reply to author
Forward
0 new messages