cactus chaining?

Aaron Mackey

unread,

Feb 22, 2012, 12:15:39 PM2/22/12

to cactus, willmc...@gmail.com

Hi -- we've been using cactus_workflow.py (from the main branch) to
align a known syntenic region of Arabidopsis, Soybean and Medicago
genomes; we are getting lots of nice MAF blocks, but it remains
unclear how to chain these together? The README stops after
MAFgenerator, but it certainly seems from browsing the source that the
workflow is capable of much more. Any advice of where to go next
would be appreciated.

Thanks,
-Aaron

Benedict Paten

unread,

Feb 23, 2012, 11:37:42 AM2/23/12

to cactu...@googlegroups.com

Hi Aaron,

We're currently very embroiled in the Alignathon evaluation:

http://compbio.soe.ucsc.edu/alignathon/

When we're done we will push and support a much better version of
Cactus that will support some of the things you may have noticed we
added to the code.

For that reason I'd prefer to wait before giving a lot of advice on
how to get the best out of the current code base. That said, if you
can be more specific about what you'd like to test then I can perhaps
help more.

On our end, a large part of the problem has been to make the whole set
of algorithms scale to handle mammalian genome scale data, which it
nearly now does.

Benedict

Aaron Mackey

unread,

Feb 23, 2012, 12:00:11 PM2/23/12

to cactu...@googlegroups.com

Thanks, I understand completely, and I appreciate any help you can spare.

The one thing we'd really like to know how to do right now is to chain together adjoining MAF blocks into larger (co-)syntenic blocks. We wish to use these blocks as the input for comparative gene structure analyses. Using the "raw" MAF blocks we find that only about 50% of CDS regions are covered by any MAF blocks, although in a graphical dot-plot they clearly line up into long extended syntenic blocks.

Thanks,

-Aaron

Benedict Paten

unread,

Feb 23, 2012, 12:17:03 PM2/23/12

to cactu...@googlegroups.com

Well, currently Cactus only dumps ungapped maf blocks - but, you can
get it to order them according to a consensus ordering of the input
sequences. We've recently submitted a paper on this:

http://dl.dropbox.com/u/156669/papers/reference.pdf

The idea being that no individual genome provides the best or
completest ordering.

You can get cactus_workflow.py to build the reference by adding the
--buildReference flag. When you build the maf you should then see an
extra sequence added to all your blocks that orders them all and
provides a common coordinate system.

Benedict

Reply all

Reply to author

Forward