Important! New data set opportunity, input requested

9 views
Skip to first unread message

Dent Earl

unread,
Jan 24, 2012, 4:47:41 PM1/24/12
to Alignathon, Manolis Kellis, Stephen Richards
Dear all,

We need your input! We have received an exciting offer from Manolis
Kellis and Stephen Richards of the modENCODE project regarding our
real data set of 12 flies. The modENCODE project has completed the
sequencing of an additional eight fly species and I think we should
include them in the Alignathon. See the bottom of the email for
details on assembly quality etc.

Including the additional eight species and bumping the real data set
up to 20 flies will increase the impact of everyone's work on
Alignathon. The problem will be a bit harder but the novel data will
certainly increase interest in the results.

Additionally Manolis has also agreed to use the best alignment that
Alignathon produces, thereby giving participants an extra prize to
shoot for.

However, I'm sensitive to the fact that this would require us to
change the competition half-way through. In order to accommodate the
additional burden this change in data sets may cause for groups, I
propose that we add an additional two weeks to the deadline, which
would move it to Friday March 9th.

So, I've had my say, but Alignathon is a community project and I need
your input. What do you guys think, does this sound like something
you'd like to do? Are the additional two weeks sufficient?

I hope to hear from you soon,

d

cc: Manolis, Stephen

####################
DETAILS:
Here's a link to the modENCODE comparative genomics whitepaper:
http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/modENCODE_ComparativeGenomics_WhitePaper.pdf

Here's Stephen Richards describing the assemblies:
"""
We did all of the sequencing and assembly the same way, and on all of
the species where the lines could be inbred by Artyom Kopps lab, that
was very successful, for one species (D.rho), where Artyom could not
inbreed, the assembly was just OK. The data was assembled using CABOG.
The input data was 15X 454 fragment, + 30X "clone" coverage of 3kb and
8kb paired end libraries - again 454 data.
The assembly stats generally look great.

Species contig N50 scaffold N50 total bases
D. bia 436kb 3,128kb 180Mb
D. bip 149kb 663kb 166Mb
D. ele 214kb 1,714kb 171Mb
D. eug 224kb 977kb 156Mb
D. fic 276kb 1,049kb 151Mb
D. kik 209kb 911kb 163Mb
D. rho* 19kb 45kb 195Mb
D. tak 125kb 390kb 181Mb
* Could not be sib sib mated
The assemblies are all now available from genbank, and 454 data is in
the SRA.
for example for D.bipectinata:
http://www.ncbi.nlm.nih.gov/genome/?term=Drosophila%20bipectinata
and
http://www.ncbi.nlm.nih.gov/sra/?term=Drosophila%20bipectinata
and
http://www.ncbi.nlm.nih.gov/bioproject/62313
for the NCBI bioproject page
"""

Brian

unread,
Jan 24, 2012, 10:58:04 PM1/24/12
to Alignathon
Is there a species tree?
> Here's a link to the modENCODE comparative genomics whitepaper:http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/modENCOD...
>
> Here's Stephen Richards describing the assemblies:
> """
> We did all of the sequencing and assembly the same way, and on all of
> the species where the lines could be inbred by Artyom Kopps lab, that
> was very successful, for one species (D.rho), where Artyom could not
> inbreed, the assembly was just OK. The data was assembled using CABOG.
> The input data was 15X 454 fragment, + 30X "clone" coverage of 3kb and
> 8kb paired end libraries - again 454 data.
> The assembly stats generally look great.
>
> Species contig N50      scaffold N50    total bases
> D. bia  436kb   3,128kb 180Mb
> D. bip  149kb   663kb   166Mb
> D. ele  214kb   1,714kb 171Mb
> D. eug  224kb   977kb   156Mb
> D. fic  276kb   1,049kb 151Mb
> D. kik  209kb   911kb   163Mb
> D. rho* 19kb    45kb    195Mb
> D. tak  125kb   390kb   181Mb
> * Could not be sib sib mated
> The assemblies are all now available from genbank, and 454 data is in
> the SRA.
> for example for D.bipectinata:http://www.ncbi.nlm.nih.gov/genome/?term=Drosophila%20bipectinata
> andhttp://www.ncbi.nlm.nih.gov/sra/?term=Drosophila%20bipectinata
> andhttp://www.ncbi.nlm.nih.gov/bioproject/62313

Aaron Darling

unread,
Jan 24, 2012, 11:41:36 PM1/24/12
to align...@googlegroups.com, Manolis Kellis, Stephen Richards, Artyom
And a related question:

For what fraction of the genome can we expect the species tree to be a
reasonable representation of the ancestry of homologous sites? Seems
like something should be known about this based on studies of the first
12 fly genomes. Any ideas or pointers to exceptional papers on the
topic?

Aaron Darling

unread,
Jan 25, 2012, 12:43:17 AM1/25/12
to Matt Rasmussen, Manolis Kellis, align...@googlegroups.com, Stephen Richards, Artyom
Very interesting, I would not have guessed the phylogenetically
congruent portion to be as low as 38%. I wonder if we can come up with
some way to measure the effect of ILS on genome alignment quality in a
controlled way. Kumar has a missive about the misleading effects of
alignment guide tree bias near the end of this paper:
http://genome.cshlp.org/content/17/2/127.full
Do others think this would be worth investigating, if not this time then
maybe for Alignathon 2? :)


On Wed, 2012-01-25 at 00:20 -0500, Matt Rasmussen wrote:
> For the dmel, dere, dyak species in particular there is another good
> paper about frequency of incomplete lineage sorting.
>
> Pollard DA, Iyer VN, Moses AM, Eisen MB, Widespread Discordance of
> Gene Trees with Species Tree in Drosophila: Evidence for Incomplete
> Lineage Sorting. PLoS Genetics, 2006. 2:10.
> http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.0020173
>
> Matt
>
> On Wed, Jan 25, 2012 at 12:15 AM, Manolis Kellis <man...@mit.edu> wrote:
> > Based on Figure 2 of this paper:
> > http://genome.cshlp.org/content/17/12/1932.long
> > about 38% for gene-length fragments (panel B), higher for longer segments
> > and lower for smaller (panel C).
> > Matt could weigh in. Best, M

Dent Earl

unread,
Jan 26, 2012, 1:32:30 PM1/26/12
to Alignathon
Hey Brian,

I'm still working on getting the newick for this, but there is a
species tree in the white paper, it's figure 1.
http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/modENCODE_ComparativeGenomics_WhitePaper.pdf

d
Reply all
Reply to author
Forward
0 new messages