AftrRAD Parallel Version Draft

116 views
Skip to first unread message

Mike Sovic

unread,
Apr 18, 2015, 4:51:45 PM4/18/15
to aft...@googlegroups.com
Hi All,

Since we've heard from a fair number of folks running AftrRAD on Linux systems (which is often quite a bit slower than on Macs), I took a stab at adding an option for parallel runs on multi-processor systems.  I've attached a draft version of the updated script.  Currently, the only two steps that run in parallel are the initial demultiplexing step and the ACANA alignment step, which is the one causing the biggest issue on Linux.  So, we may be able to further optimize additional steps later, but as jt is, this should be a significant improvement in terms of speed on Linux systems (it should also speed up Mac runs if you have multiple processors available).  If you want to give it a try, do the following…

1.)  Add the attached script to your working AftrRAD directory.
2.)  Install Parallel:ForkManager.  I was able to do this on both my Mac and Linux system by simply typing "sudo cpan Parallel:ForkManager" at the command prompt, entering the sudo password, and typing 'yes' whenever prompted.  
3.)  There is a command line argument in the new script to specify the number of processors to use, so, where you would have originally run 'perl AftrRAD.pl', simply run...'perl AftrRAD_Parallel_Draft.pl maxProcesses-X', where of course X is number of processors you want to use.
4.)  Note the output to the screen in the parallel version may look just slightly different than if running on a single processor, but the results should be the same.

If you give it a try, any feedback would be much appreciated - especially if you encounter problems.  Eventually, we'll incorporate this into a new official version of the program, but want to test it out bit more first.  Hope it's helpful!

               Mike

 
AftrRAD_Parallel_Draft.pl

Bartosz Ulaszewski

unread,
May 27, 2015, 5:32:44 AM5/27/15
to aft...@googlegroups.com
Dear Mike,
It seems that the parallel script doesn't seem to work fine on my virtual machine. Maybe it's a size of the set: 40GB  of data, now it is working forth week on: "Aligning all potentially alleleic read pairs with ACANA" number of total alignments: 1.829.727, at this moment it is in 1.136.000 alignment. I've checked the processor use: it is on the same level as in a single thread run - one core. Do You have any suggestions?
---
VM config: Linux Ubuntu 12, 64 bit, Intel® Xeon(R) CPU E5-4640 0 @ 2.40GHz × 16, 64 GB RAM.

Mike Sovic

unread,
May 27, 2015, 11:05:54 AM5/27/15
to aft...@googlegroups.com
Hi Bartosz,

Yeah, you should definitely see multiple processors running during the alignment step, so it seems the parallel part isn't working for you, and that would certainly explain the extremely long run.  Can you first check the Report file and see what the argument for maxProcesses is?  It will be near the top of the file under the heading "Parameters used for this run…".  There may be more than one of these Report files if you have more than one input fastq file - if so, any one will do.  You should find the file in the Output/RunInfo directory.  Let me know what this says and we'll try to go from there.

           Mike

Bartosz Ulaszewski

unread,
Jun 9, 2015, 2:38:52 PM6/9/15
to aft...@googlegroups.com
Dear Mike,
I have checked the report files as you recommended, and it gave me answers. I did some syntax errors so it that was the main reason of such long run. After the mistake correction the 'parallel script' worked very fine. Now I will be working on demultiplexed data for parameter testing and let you know how the script managed.

Mike Sovic

unread,
Jun 10, 2015, 8:20:01 AM6/10/15
to aft...@googlegroups.com
Bartosz,

Great - glad to hear you got it figured out, and that it's working for you.  Thanks for the update, and do keep us posted on how things work going forward.

             Mike

Bartosz Ulaszewski

unread,
Jun 10, 2015, 11:46:09 AM6/10/15
to aft...@googlegroups.com
Hey!
I have done a test demultiplexed data run, here is a copy of the prompts from the terminal. I have seen some errors and I'm sure if they are critical or if they can be ignored. If you will need any more information, please let me know.
Best regards,
Bartosz
demultiplexed_samples_terminal_log.txt

Mike Sovic

unread,
Jun 10, 2015, 12:44:14 PM6/10/15
to aft...@googlegroups.com
OK, I think I see what's causing the errors (or at least some of them).  Two things…

1.)  It looks like I introduced a bug into the parallel version of the script when it's used on demultiplexed data.  I will make a separate post shortly with an updated version.

2.)  Did you maybe have a Barcodes.txt file in the Barcodes folder when you started this run?  If so, that should be removed.  This folder (and the Data folder) should be empty when starting a demultiplexed run.

I'm actually not sure right now whether these issues would affect the results or not, but even if they didn't, there's a chance they might affect things downstream (i.e. running Genotypes.pl or FilterSNPs.pl), so I would go ahead and re-run just to be safe.  Hopefully the re-run will be error-free, but if not, just let us know and we'll go from there.  

               Mike
Reply all
Reply to author
Forward
0 new messages