Alignathon evaluation

27 views
Skip to first unread message

Aaron Darling

unread,
Jan 4, 2012, 6:31:10 PM1/4/12
to Alignathon
Hi all,
I am curious about the rationale for sharing the true alignments
generated by the simulation process.
Clearly this is helpful for people who want to contribute new
evaluation metrics, since they will be able to generate alignments and
check that their metrics work correctly on the actual data being used
for evaluation. And there is a great need (in my opinion) for more and
better alignment evaluation metrics.
But giving away the correct alignment with the simulated sequences has
obvious implications for interpreting any submitted alignments. It
gives people submitting alignments the means to tune their aligners to
the correct data. Now hopefully most people will not do this, at least
not intentionally, however even simple summary statistics like
checking whether the computed alignment has a similar file size to the
correct alignment could be "accidentally" used to decide which
alignment to submit for consideration. On the other hand if the goal
is exactly to get people to tune their aligners to this data, it seems
like it would be extremely helpful to know exactly how it is done, and
how generally applicable the process is so that it can be repeated.
Then it seems like there should be another set of test data where the
answer is truly unknown to the folks computing the alignments so we
can get an idea of how the methods work in the real-world usage case
where the true answer is unknown. Is this Alignathon 2?

I'm guessing Dent & others at UCSC have already discussed this exact
issue, if so could you relay your thoughts on the matter to the people
you are trying to engage in this project?

Best,
-Aaron

Benedict Paten

unread,
Jan 4, 2012, 7:03:14 PM1/4/12
to align...@googlegroups.com
Hi Aaron,

Part of the issue here is that Dent was an author we me on our Cactus
alignment program, I helped him develop the Evolver scripts,
I'm privileged to work with Dent and, finally, we (myself and others
at UCSC, not Dent) are planning on submitting Cactus entries to the
Alignathon! So, to avoid any real or imagined bias towards UCSC, we
wanted everything to be available to everyone, despite this creating
the problems you outline.

Note, no one knows the true answer to the 12 fly alignments, and as
long as people submit answers for them, these should provide
interesting complementary metrics. That said, I think it would be good
to do a run with novel simulations - perhaps this could be done after
the initial submission stage - as a check for overtraining to the
provided simulations. I also agree that Dent should provide the exact
recipe he used for generating the alignments, so that people can make
there own sets and play around.

Benedict

Dent Earl

unread,
Jan 4, 2012, 7:05:58 PM1/4/12
to align...@googlegroups.com
Hey Aaron,

Great question. We went with the totally open approach to solutions for Alignathon because as a lab we have a strong interest in alignment and some of our lab members will be working on their own submissions. I have no involvement with them in their submissions. However, we wanted to be totally above-board on everything so instead of dealing with any doubts about fairness we thought it would be best to let everyone see exactly the same thing: everything.
Incidentally, the size of the "true" alignments will likely throw a person off, those alignments contain oodles more detail than an aligner could hope to recover since they hold all of the intermediate branch-point species that were simulated in order to get the leaf-node species that are actually provided.
I do share your hope that people don't over-fit to the solutions (or really "fit" at all) but I was more worried about the specter of impropriety.

d

aarond...@ucdavis.edu

unread,
Jan 5, 2012, 12:04:28 AM1/5/12
to align...@googlegroups.com

Thanks for the replies Dent and Benedict.
One of the great things we had going for us in the Assemblathon was that
the people involved assembly evaluation (you guys at UCSC, myself and
others at Davis) are clearly "users" of the algorithms to be tested rather
than developers. Unfortunately in the alignathon that's no longer the case.
I would like to think that most people who choose to participate by
submitting alignments have a strong understanding of the issues surrounding
overfitting and would carefully maintain separation of information in
testing datasets from training datasets. But it would be nice if there were
some guarantees. I wonder if it would be possible to have an alignment
module framework akin to the evaluation module framework, so that people
could submit an alignment program rather than an alignment? Then the
alignment program could be run on a newly simulated never-before-seen
dataset at evaluation time. People submitting alignments could then run
their aligner module locally and verify that it works in the framework.
This has the added benefit of guaranteeing reproducibility and making
absolutely explicit what is required to achieve a particular alignment
result -- and would resolve a major criticism of the Assemblathon approach.
Perhaps there is some concern that most aligners are difficult enough to
run on this scale of data that they can not be packaged up in a module? If
so could this be avoided by having the module specify computational
requirements (> X Gb RAM, Y GB scratch disk space, etc.)? Are there other
issues apart from the obvious need for extra coding?


Manfred Grabherr

unread,
Jan 5, 2012, 5:35:47 AM1/5/12
to Alignathon
Hi Aaron,

I think you are raising really important issues here. One thing that
was never clear to me in the Assemblathons was whether it is the
software that is to be evaluated, or rather the group's skill to apply
the software in an optimal way. If it is the further in case of the
Alignathon, it might make sense to require the software to be run with
the exact parameters for all data sets. While this is not a guarantee
that there won't be any data-specific tuning, it would be a more
realistic scenario for future users to decide which program(s) to use.
I also like the idea of submitting the software with the predictions,
plus a script to run it, if necessary. That would ensure
reproducibility (even if the software is executed locally by each
group for these data sets), and it would be great to use the submitted
software to generate predictions on additional data sets for which the
truth is not made known - but this could be a future step.

Cheers,

Manfred

Dent Earl

unread,
Jan 5, 2012, 11:06:30 AM1/5/12
to align...@googlegroups.com
Hey Aaron (and Manfred),

Thanks for the suggestion! I think this is a great idea and I would love to be able to include such a framework in future Alignathons; doing so would be fairly simple from my point of view. I can envision a submission that is geared specifically to a set of known input sequences where the code submitted is essentially a recipe script or Makefile that performs all the requisite steps to make the alignment. This sort of submission is something could be possible under a future Alignathon. As an aside, short of actually providing a Makefile we are expecting that teams that are submitting to Alignathon will produce a written recipe of how they created their alignments.
For the question "What are we evaluating, the tool or the product of the tool," I know we tried to be careful in the Assemblathon paper to refer to assemblies rather than assemblers because we weren't looking at the results of the asemblers alone. There were data preprocessing steps, some assemblers did their own scaffolding, some used off the shelf scaffolding, and there were data post-processing steps. Some teams were comprised of a single individual and there were a few teams that were staffed by multiple engineers. We thought it would have been misleading to focus on assembers at that point and decided to focus on the product, the assemblies. I imagine we'll be taking a similar approach for Alignathon. Best,

d

Reply all
Reply to author
Forward
0 new messages