Dear all,
We need your input! We have received an exciting offer from Manolis
Kellis and Stephen Richards of the modENCODE project regarding our
real data set of 12 flies. The modENCODE project has completed the
sequencing of an additional eight fly species and I think we should
include them in the Alignathon. See the bottom of the email for
details on assembly quality etc.
Including the additional eight species and bumping the real data set
up to 20 flies will increase the impact of everyone's work on
Alignathon. The problem will be a bit harder but the novel data will
certainly increase interest in the results.
Additionally Manolis has also agreed to use the best alignment that
Alignathon produces, thereby giving participants an extra prize to
shoot for.
However, I'm sensitive to the fact that this would require us to
change the competition half-way through. In order to accommodate the
additional burden this change in data sets may cause for groups, I
propose that we add an additional two weeks to the deadline, which
would move it to Friday March 9th.
So, I've had my say, but Alignathon is a community project and I need
your input. What do you guys think, does this sound like something
you'd like to do? Are the additional two weeks sufficient?
I hope to hear from you soon,
d
cc: Manolis, Stephen
####################
DETAILS:
Here's a link to the modENCODE comparative genomics whitepaper:
http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/modENCODE_ComparativeGenomics_WhitePaper.pdf
Here's Stephen Richards describing the assemblies:
"""
We did all of the sequencing and assembly the same way, and on all of
the species where the lines could be inbred by Artyom Kopps lab, that
was very successful, for one species (D.rho), where Artyom could not
inbreed, the assembly was just OK. The data was assembled using CABOG.
The input data was 15X 454 fragment, + 30X "clone" coverage of 3kb and
8kb paired end libraries - again 454 data.
The assembly stats generally look great.
Species contig N50 scaffold N50 total bases
D. bia 436kb 3,128kb 180Mb
D. bip 149kb 663kb 166Mb
D. ele 214kb 1,714kb 171Mb
D. eug 224kb 977kb 156Mb
D. fic 276kb 1,049kb 151Mb
D. kik 209kb 911kb 163Mb
D. rho* 19kb 45kb 195Mb
D. tak 125kb 390kb 181Mb
* Could not be sib sib mated
The assemblies are all now available from genbank, and 454 data is in
the SRA.
for example for D.bipectinata:
http://www.ncbi.nlm.nih.gov/genome/?term=Drosophila%20bipectinata
and
http://www.ncbi.nlm.nih.gov/sra/?term=Drosophila%20bipectinata
and
http://www.ncbi.nlm.nih.gov/bioproject/62313
for the NCBI bioproject page
"""