Multiple vs. all-to-all pairwise alignments

Manfred Grabherr

unread,

Jan 4, 2012, 4:46:43 AM1/4/12

to Alignathon

Hi All,

Maybe this is obvious to everyone, but it would greatly help if
someone could clarify; the web site talks about generating genome-wide
species-species alignments, but is the goal of the alignathon really
to produce the best possible *multiple* alignments between many
genomes? If so, how do you handle regions that align between e.g.
mouse and rat but have no homologs in human, cow and dog? And what if
the syntenic regions in cow and dog also align to each other, but not
to anything else? This might not be a big issue for genomes that are
as closely related as primates, but a substantial fraction of e.g.
mouse and human cannot be aligned because they are too highly diverged
(that is, on the real genomes).

More specifically, we are developing software to generate all-to-all
pairwise alignments between many genomes, so that one can look at each
genome and compare sequences to all other genomes, as long as they can
be aligned with sufficient specificity. For studying mammalian genomes
(other than human, which seems to have a special place among mammals),
we find this to be a more practical approach rather than parsing a
real multiple alignment for all genomes, in particular since this
easily extends to more distantly related genomes, for which the
majority of the genome cannot be aligned, e.g. human-chicken. Does our
approach fall within the scope of the alignathon? We don't generate
any true multiple alignents, yet our all-to-all pairwise alignments
contain the same information (I should say "comparable" rather than
"same" here - one could argue that multiple alignments are more
accurate since local ambiguities can, in principle, be resolved using
more than one genome. However, one can also argue that errors in a
single genome can distort the alignments between all others).

Thanks!

- Manfred

Dent Earl

unread,

Jan 4, 2012, 3:56:27 PM1/4/12

to align...@googlegroups.com

Hi Manfred,

Thanks for asking, I bet other groups will wonder about this too. While Alignathon is interested in multiple alignments, as you point out not all positions in the sequences of these genomes will have homology with all the other species. So, what to do when species A is homologous to species B at a position but not to C, D, or E? The MAF format allows you to have alignment blocks that contain just a subset of the sequences. So you can have alignment blocks that contain just A and B but not C, D or E.

For your specific question about pairwise alignments and the scope of the Alignathon, sure, we'll take your pairwise alignments and we'll perform the analyses on them. But to be fair to other groups and to check the consistency of the pairwise alignments, we're also going to compute the transitive closure for the pairwise alignments you submit in order to test the induced alignments too. For example if you give us two alignments, one with species A to B (notation: A-B) and one B-C we will induce A-C from the first two and combine all the alignments into a single file in order to test. If you gave us all three pairs we would create induced alignments for the same three pairs by the same process but by holding out one of the three at a time. I.e. you submit A-B, B-C, A-C we would induce (A-C)* by using A-B and B-C, then similarly produce (B-C)* and (A-B)*, and then include those alignments in the tests along with your A-B, B-C, A-C.

The end result will be two analyses: one for the submitted pairwise alignments and one for the total induced alignment.

Hope that's clear and helps!

d

Manfred Grabherr

unread,

Jan 5, 2012, 6:52:07 AM1/5/12

to Alignathon

Hi Dent,

Thank you for the clarification, this is very helpful!

Cheers,

- Manfred

Reply all

Reply to author

Forward