We've finally received a decision from Genome Research: revise and resubmit. In summary the three reviewers asked us to:
* Use definitive speech — we need to equivocate less in the discussion and be more pointed in our conclusions.
* Make a table of runtime numbers — some sort of summary.
We intend to push back some on reviewer two who we feel asks for too much. We aren't going to create full genome simulations and re-do the project and we aren't going to bar entries that only submitted to results for some of the datasets, among other issues we have with reviewer two.
To: Benedict Paten <ben...edu>
From: Laura DeMare <lde...edu>
ReplyTo: lde...edu
Subject: Genome Research -- Manuscript Review Completed
Cc:
Dear Dr. Paten,
Your manuscript entitled, "Alignathon: A competitive assessment of whole genome
view the referees' comments.
As you will see from the reviews at the website, the referees felt the manuscript
presented material of interest to some of our readers, but they also had considerable
concerns about the work as it stands, which were more strongly stated in their
comments to the Editors. The referees did indicate that with appropriate revision to
increase the depth of the analyses and discussion beyond a summary of the competition
to advance the state of the art, this manuscript might ultimately be suitable for
publication in Genome Research. In this regard, we would be happy to consider a
revised manuscript for publication. In addition to providing deeper insight and more
detail, please improve the presentation significantly, reducing the length by
approximately 20% by streamlining the language, removing redundancy, and focusing on
the main novel points of interest to our readers. Please also reformat Reference list
in journal style (see instructions to authors). We will have the revised manuscript
seen again by at least one of the referees, possibly all the referees depending on
the changes made to the manuscript. Please read carefully through all the referees'
comments and address each point.
In revising your manuscript also go through your work once more to make sure it is
written in as streamlined a fashion as possible. This is the best way to be sure the
main points of the work are highlighted and the general interest for the work is kept
at its peak. Any redundancies in the text should be removed, and only points of main
interest should be included in results and discussion. Material not essential to the
direct understanding of the manuscript should be considered carefully as potential
material for online-only supplemental material; all lengthy tables should be reduced
to a single page where possible, and the complete table be made available as an
online supplement. If English is your second language, a native English speaker
should proofread the work prior to resubmission.
following all data availability and fair-use rules AND to format your manuscript in
our house style. Putting your manuscript in the appropriate format will serve to
expedite your manuscript for publication as well as save substantially on possible
production fees incurred during the processing of your manuscript by the printer, who
charge for any changes made during galley stages. Pay special attention to the
appropriate ORDER of the sections (or obtain permission from the editor to alter the
format), and also be absolutely sure figures and tables are SEPARATE from the text
and figure legends are SEPARATE from figures. Use only appropriate nomenclature
(including appropriate use of italics and capitalization) in both text and figures.
(Links to nomenclature sites are also available on our websites.) All data utilized
in this manuscript must be made available by posting it in the accepted public
databases. In the absence of an appropriate database, the data must be submitted as
supplementary material to be posted at the Genome Research website (you may also, of
course, post it at your own website).
A reminder that all data or information obtained through personal communication also
requires a letter of approval for use. Authors are responsible for obtaining
permission from the rights holder to adapt or reproduce material previously published
elsewhere (typically, this will be the publisher) and for including any required
permission statement alongside the citation. Please e-mail (preferred) or fax
PERMISSIONS to: Peggy Calicchia, Genome Research, Editorial Secretary, Tel:
sure to submit it as a REVISED manuscript rather than a NEW (resubmitted) manuscript.
Resubmitting a manuscript creates a new submission date, and should only be used if
the time available to revise a manuscript has elapsed. Be sure to place your
responses to the referees in the box marked "Response to reviews".
NOTE: We DO NOT accept e-mailed versions of manuscript revisions. These are often
corrupted or can be missed in the large amount of e-mail sent to the editor.
NOTE: We now accept electronic art files created in TIFF, EPS, PDF, JPEG, or AI when
the digital file specifications are adhered to (see parameters online in Submission
of Electronic Figures for Accepted Manuscripts: Detailed Instructions available in
To aid you in submitting your digital files in the correct format, we have a digital
art analysis program associated with our online submission system called Digital Expert:
There are no image files for Digital Expert to analyze.
If "Fail" is returned in the report the art file is unusable must be revised to
satisfy the art specifications above and detailed in Instructions to Authors.
I look forward to receiving your revised manuscript.
With kind regards,
Laura DeMare, Ph.D.
Assistant Editor
Genome Research
Reviewer 1 Comments for the Author...
GENOME/2014/174920 - Alignathon: A competitive assessment of whole genome alignment methods. This manuscript describes a quantitative comparison of the ‘whole-genome’ alignments generated by a series of multiple sequence alignment tools. Overall the information provided in the manuscript will be useful to the community at large and can be used as an important benchmark for improving/comparing the performance of genome aligners.
Minor points
1. Some text related to at least the general hardware / compute time needed by the tools would be helpful to include, or at least explicit references pointing to these requirements.
2. The logic behind the decisions to use the specific metrics used in the study to compare the aligners is useful but could be condensed.
3. It was somewhat disappointing that the mammalian input sequences were not ‘complete’ genome sequences, and that the inputs were limited to mammalian and fly genomes.
Reviewer 2 Comments for the Author...
The manuscript by Earl et al describes the results from the Alignathon, which was a competitive assessment of whole genome alignment methods similar to design to the high profile Assemblathon evaluations for recent years. In the competition, 3 sets of genomes (2 simulated and 1 real) were aligned by multiple groups, and the accuracy of the submitted alignments were determined using a variety of metrics. Whole genome alignment is an important topic, and is becoming more important every day, and the results of this competition would be of great interest to many people in the genomics community.
However, the current manuscript is frustrating to read as many significant questions about the experimental design and performance of the algorithms are left unanswered. The manuscript also lacks depth in explaining the relative performance in the algorithms in different contexts. This would be one of the most significant contributions of the manuscript, as it would enable the reader to understand which aspects of their own alignments are trustworthy and which are more noisy.
Major points:
The manuscript enumerates several potential limitations and biases using EVOLVER to construct the simulations: ancient homologies are not captured, separate transposable element insertions are not considered homologous, and most significantly, EVOLVER simulations were used throughout the development of Cactus which may have implicitly trained Cactus to follow the EVOLVER evolutionary model. Each of these potential biases must be resolved, especially the role of the EVOLVER evolutionary model in the performance of Cactus. It is unsettling to think this explains how it was determined to be the best algorithm for the mammalian dataset by a large margin. Ideally Cactus would be further evaluated using an ensemble of simulated genomes with radically different parameterizations, or perhaps with a different simulator, to characterize how dependent Cactus is on the specific evolutionary model used. Similarly, more research is needed to quantify if PSAR-align had an unfair advantage since its objective function is related to the scoring function used by the competition.
The manuscript is incomplete in its analysis of how the algorithms perform in different contexts, and should be analyzed at finer resolution than genic, neutral, and repetitive. Do complex gene families perform as well as genes without any paralogs? How are repetitive regions even determined? Does repeat copy number or repeat length influence the performance? Are the algorithms more or less robust to substitutions, deletions, inversions, moves, copies, etc. Researching and understanding these properties will shed light on the gaps in technologies and where the alignments can be trusted.
The distinction between evolutionary homology and sequence similarity should be discussed. Unlike evolutionary homology, sequence similarity is not transitive: if A is 90% similar to B, and B is 90% similar to C, it does not imply that A is 90% similar to C. Were participants clearly informed that evolutionary homology would be evaluated? It seems not since not every entry was transitively closed.
The incomplete submissions are frustrating, especially for algorithms that were only applied to a single dataset. The manuscript is highly speculative in its discussion of EPO and progressiveMauve. In particular, the conclusion that progressiveMauve is the algorithm of choice for precise alignments should be qualified to state this has only been demonstrated for highly similar genomes. I would encourage the contest organizers to require submissions on all datasets in any future competitions.
Details are missing for how Mugsy and PSAR-align were executed in the supplementary file. The manuscript should note how Mugsy was designed for closely related genomes only – note this is the title of the Mugsy manuscript. As a result it is not surprising it has poor performance on the more distant genomes, but I suspect could be improved with modest parameter tuning (assuming defaults were used).
Minor points:
Include citation on promise of future sequencing technologies (page 5)
Include citation for rate of substitutions for vertebrate evolution (page 14)
Typo in max chromosome length in Table 1 simHuman / primate.
Runtimes and usability requirements for the different algorithms should be presented and discussed. A modestly worse F-score might be preferred if the runtime is substantially faster or requires substantially less RAM/disk.
Reviewer 3 Comments for the Author...
This MS assess multiple whole genome alignment tools after conducting a bake-off approximately 2 years ago. This MS is likely to be of moderate interest to a core set of researchers.
The introduction and results in particular need to be extensively shortened. The results conflate selection of methodological choice (which could likely go into a discussion in the supplement) and interpretation of the results (which belongs in the discussion). What is currently in the discussion can also be shortened. The current discussion also presents new results and analysis. The intermingling of results/methods/discussion make a paper of this length difficult to read and understand. Removal of flowery and unnecessary language ("If asked to propose a winner of the competition, it is reasonable to claim that it depends upon the requirements of the user." is an egregious example) would also benefit the over-all flow of the MS.
Discussion might be made more interesting to other readers not directly interested in WGA by addressing why some algorithms out-perform others. In particular, why is cactus' stronger performance in the simulated data not mentioned at all in the conclusions? Further, the authors present multiple limitations of the study but do not present any solutions to these limitations. It would behoove them to present a much "higher-level" discussion to try and address more globally interesting issues in either: MSA, testing, simulation, or evolution.