Running Cactus efficiently on mammalian genomes

7 views
Skip to first unread message

Leo Goodstadt

unread,
May 16, 2013, 7:37:57 AM5/16/13
to cactu...@googlegroups.com
You mentioned that you were scaling up cactus to run efficiently on mammalian scale data.

Do you have any tips on how to get this to run efficiently? We are currently trying to align assemblies of very close related (subspecies) mammals and this is taking a long time. We are running cactus on a servers with 48 CPUs and 512 Mb of memory. Is there some magic dust / command line arguments to get it to take full advantage of the hardware?

Thanks

Leo

Benedict Paten

unread,
May 16, 2013, 8:25:11 PM5/16/13
to cactu...@googlegroups.com, Glenn Hickey
Hi Leo,

So I need to update what little documentation there is to point people at:


This module wraps cactus and allows you to align genomes progressively, given a phylogeny (which need not be binary). It also much simplifies the installation, as Glenn has put all the dependencies in nice submodules using a recursive git thingy. 

With progressiveCactus you can align complete mammalian genomes. By specifying the number of threads avaiable on your multi-core box you should be able to get it to do this in a matter of around a day per added genome. If this is too expensive, I have parameters that will make that quite a bit faster for closely related genomes - adding this to progressiveCactus is on my stack of things todo, probably reducing runtime to around 8 or so hours per genome.

I'm going to be away over the weekend, but I'll make it a priority to get these "fast" parameters pushed as an option in progressiveCactus early next week. It should be transparent to you as a user, if you specify small branch lengths in your input phylogeny then it will default to these parameters, though we can make it an explicit option.

If you have more questions, Glenn can help (I think he's on this mailing list, but I added him in case).

Benedict




--
You received this message because you are subscribed to the Google Groups "cactus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cactusUsers...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Benedict Paten

unread,
May 29, 2013, 8:28:05 PM5/29/13
to cactu...@googlegroups.com, Glenn Hickey
Hi Leo,

So I went ahead and introduced some "faster" parameters, which should marginally speed things up when you're aligning close genomes. If you do:

cd progressiveCactus
git pull
git submodule update --init
make ucscClean && make
It should be ready to go. Let me know if you have problems, especially with the parallelism, 

Benedict
Reply all
Reply to author
Forward
0 new messages