PacBio data on GIAB FTP for AJ Trio and NA12878

448 views
Skip to first unread message

Justin Zook

unread,
May 18, 2015, 4:25:41 PM5/18/15
to genome-in...@googlegroups.com
We are pleased to announce two new PacBio datasets that are now on the GIAB FTP site:

1. In a collaboration between NIST and Mt. Sinai School of Medicine, we have completed PacBio sequencing of the Ashkenazim Jewish trio that is a candidate NIST Reference Material.  Raw data is on the FTP site, with ~69x coverage of the son, 32x on the father, and 30x on the mother.  ~90% of the data for each genome is from P6-C4 chemistry and the remaining from P5-C3.  The N50 read length is ~11kb.  All of these data are public without embargo, but we encourage you to participate in our GIAB Analysis Group if you are interested in analyzing these data.  The data and a readme with further information are available at:
Note that we plan to put the h5 files in the SRA and in the data directory as well in the coming weeks, and the analysis group is working on mapping these data and bam files will be uploaded as well.

2. Mt. Sinai School of Medicine has also kindly uploaded their PacBio sequencing for NA12878 to our GIAB ftp.  They have uploaded the raw h5 files as well as a vcf containing SV calls, error-corrected reads, and a bam file.  Look for their paper in Nature Methods to be published soon!  Their uploaded files and a readme with more information are at:

Cheers,
Justin Zook

Adam Phillippy

unread,
May 18, 2015, 4:52:41 PM5/18/15
to Justin Zook, genome-in...@googlegroups.com
Thanks, Justin!

Sergey and I plan to assemble the PacBio trio data. We should have assemblies of the parents done by the end of this month and the son sometime in June. We will distribute these to the group as soon as we have them.

Best,
-Adam


--
You received this message because you are subscribed to the Google Groups "Genome in a Bottle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-in-a-bot...@googlegroups.com.
To post to this group, send email to genome-in...@googlegroups.com.
Visit this group at http://groups.google.com/group/genome-in-a-bottle.
To view this discussion on the web visit https://groups.google.com/d/msgid/genome-in-a-bottle/e3c18c3f-da94-4371-8055-2c73a158ab34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

michael.schatz

unread,
Jun 3, 2015, 1:21:56 AM6/3/15
to genome-in...@googlegroups.com, justi...@gmail.com, Fritz Sedlazeck
Other than Adam's note about assembling these, has anyone started to work with these data? Im interested to look at them for SV calling with my postdoc Fritz but dont want to duplicate efforts.

Thank you

Mike


On Monday, May 18, 2015 at 4:52:41 PM UTC-4, Adam Phillippy wrote:
Thanks, Justin!

Sergey and I plan to assemble the PacBio trio data. We should have assemblies of the parents done by the end of this month and the son sometime in June. We will distribute these to the group as soon as we have them.

Best,
-Adam

On Mon, May 18, 2015 at 4:25 PM, Justin Zook <justi...@gmail.com> wrote:
We are pleased to announce two new PacBio datasets that are now on the GIAB FTP site:

1. In a collaboration between NIST and Mt. Sinai School of Medicine, we have completed PacBio sequencing of the Ashkenazim Jewish trio that is a candidate NIST Reference Material.  Raw data is on the FTP site, with ~69x coverage of the son, 32x on the father, and 30x on the mother.  ~90% of the data for each genome is from P6-C4 chemistry and the remaining from P5-C3.  The N50 read length is ~11kb.  All of these data are public without embargo, but we encourage you to participate in our GIAB Analysis Group if you are interested in analyzing these data.  The data and a readme with further information are available at:
Note that we plan to put the h5 files in the SRA and in the data directory as well in the coming weeks, and the analysis group is working on mapping these data and bam files will be uploaded as well.

2. Mt. Sinai School of Medicine has also kindly uploaded their PacBio sequencing for NA12878 to our GIAB ftp.  They have uploaded the raw h5 files as well as a vcf containing SV calls, error-corrected reads, and a bam file.  Look for their paper in Nature Methods to be published soon!  Their uploaded files and a readme with more information are at:

Cheers,
Justin Zook

--
You received this message because you are subscribed to the Google Groups "Genome in a Bottle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-in-a-bottle+unsub...@googlegroups.com.

Andrew Carroll

unread,
Jun 3, 2015, 1:29:21 AM6/3/15
to michael.schatz, Brett Hannigan, George Asimenos, genome-in...@googlegroups.com, Justin Zook, Fritz Sedlazeck
We at DNAnexus are working with them for structural variant calling and genome assembly in various contexts. I suspect our efforts are complementary and likely not overlapping at this current point (adding in Brett Hannigan  and George Asimenos who are involved in the process).

We'd be happy to coordinate if you're interested.

On Tue, Jun 2, 2015 at 10:21 PM, michael.schatz <michael...@gmail.com> wrote:
Other than Adam's note about assembling these, has anyone started to work with these data? Im interested to look at them for SV calling with my postdoc Fritz but dont want to duplicate efforts.

Thank you

Mike


On Monday, May 18, 2015 at 4:52:41 PM UTC-4, Adam Phillippy wrote:
Thanks, Justin!

Sergey and I plan to assemble the PacBio trio data. We should have assemblies of the parents done by the end of this month and the son sometime in June. We will distribute these to the group as soon as we have them.

Best,
-Adam

On Mon, May 18, 2015 at 4:25 PM, Justin Zook <justi...@gmail.com> wrote:
We are pleased to announce two new PacBio datasets that are now on the GIAB FTP site:

1. In a collaboration between NIST and Mt. Sinai School of Medicine, we have completed PacBio sequencing of the Ashkenazim Jewish trio that is a candidate NIST Reference Material.  Raw data is on the FTP site, with ~69x coverage of the son, 32x on the father, and 30x on the mother.  ~90% of the data for each genome is from P6-C4 chemistry and the remaining from P5-C3.  The N50 read length is ~11kb.  All of these data are public without embargo, but we encourage you to participate in our GIAB Analysis Group if you are interested in analyzing these data.  The data and a readme with further information are available at:
Note that we plan to put the h5 files in the SRA and in the data directory as well in the coming weeks, and the analysis group is working on mapping these data and bam files will be uploaded as well.

2. Mt. Sinai School of Medicine has also kindly uploaded their PacBio sequencing for NA12878 to our GIAB ftp.  They have uploaded the raw h5 files as well as a vcf containing SV calls, error-corrected reads, and a bam file.  Look for their paper in Nature Methods to be published soon!  Their uploaded files and a readme with more information are at:

Cheers,
Justin Zook

--
You received this message because you are subscribed to the Google Groups "Genome in a Bottle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-in-a-bot...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Genome in a Bottle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-in-a-bot...@googlegroups.com.

To post to this group, send email to genome-in...@googlegroups.com.
Visit this group at http://groups.google.com/group/genome-in-a-bottle.

For more options, visit https://groups.google.com/d/optout.


---
The contents of this e-mail and any attachments are confidential and only for use by the intended recipient. Any unauthorized use, distribution or copying of this message is strictly prohibited. If you are not the intended recipient please inform the sender immediately by reply e-mail and delete this message from your system. Thank you for your co-operation.

Michael Schatz

unread,
Jun 3, 2015, 1:32:30 AM6/3/15
to Andrew Carroll, Brett Hannigan, George Asimenos, genome-in...@googlegroups.com, Justin Zook, Fritz Sedlazeck
Actually we have been exploring SV detection from the error corrected reads as well as raw reads. Do you have a FALCON run complete already? That would save us a considerable about of time.

Thank you,

Mike

Bashir, Ali

unread,
Jun 3, 2015, 9:20:42 AM6/3/15
to Michael Schatz, Andrew Carroll, Brett Hannigan, George Asimenos, genome-in...@googlegroups.com, Justin Zook, Fritz Sedlazeck
Hi Mike,

I have a falcon assembly ru complete for the child - I will try to upload that today.  Note, it has not been quiver corrected yet.

-Ali

Robert Sebra Gmail

unread,
Jun 3, 2015, 9:22:17 AM6/3/15
to Bashir, Ali, Michael Schatz, Andrew Carroll, Brett Hannigan, George Asimenos, genome-in...@googlegroups.com, Justin Zook, Fritz Sedlazeck
Hi Ali, Mike et al.

I’ll try and get the Quiver correction done ASAP and then get that to you, Ali.

B


Adam Phillippy

unread,
Jun 3, 2015, 9:38:21 AM6/3/15
to Robert Sebra Gmail, Bashir, Ali, Michael Schatz, Andrew Carroll, Brett Hannigan, George Asimenos, genome-in...@googlegroups.com, Justin Zook, Fritz Sedlazeck
Hey Mike,
I was planning on looking at SVs as well, but starting from the assemblies. I believe we have both parents assembled and Quivered, and the child should be done soon. Will post those CA assemblies for the group shortly.

-Adam


Lina M. Solis Castillero

unread,
Jul 6, 2015, 1:13:08 PM7/6/15
to Justin Zook, genome-in...@googlegroups.com
Dear Justin,

How can I access to the SNPs database do you already have for the
RM-8398 I just bought, I can't find it.
I really appreciate ypour help,

Thanks!


Justin Zook <justi...@gmail.com> escribió:
> --
> You received this message because you are subscribed to the Google
> Groups "Genome in a Bottle" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to genome-in-a-bot...@googlegroups.com.
> To post to this group, send email to genome-in...@googlegroups.com.
> Visit this group at http://groups.google.com/group/genome-in-a-bottle.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/genome-in-a-bottle/e3c18c3f-da94-4371-8055-2c73a158ab34%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Saludos cordiales,

Lina M. Solis Castillero. TM MSc.
Master en Genetica y Biologia Molecular.
Directora General
Laboratorio Clinico Genetix, S.A.
La Alameda, Calle Makario 3 de Chipre,
Edif. Plaza San Marcos # 1, Local 6 PB.
www.genetix.com.pa
Tel. (507) 260-2990 (507)260-2999.

Ayúdenos a servirle mejor, por favor responda la encuesta de
satisfacción que encontrará en el siguiente link:
http://www.surveymonkey.com/s/CY6ZYDP
Gracias!

NOTA CONFIDENCIAL:
La información contenida en este correo-electrónico y cualquier
archivo adjunto son originados por Laboratorio Clínico Genetix, S.A;
es de uso privilegiado y/o confidencial y solo puede ser utilizada por
la persona, entidad o compañía a la cual está dirigido. Si usted ha
recibido este mensaje por error favor destruirlo y avisar al
remitente. Si usted no es el destinatario no deberá revelar, copiar o
distribuir o tomar cualquier acción basado en los contenidos del
mensaje. Cualquier retención, diseminación o distribución total o
parcial no autorizada de este mensaje esta estrictamente prohibida y
sancionada por la ley. Las observaciones y opiniones expresadas en
este mensaje de correo electrónico pueden no necesariamente ser
aquellos de la Administración o Directivos de Laboratorio Clínico
Genetix, S.A.
CONFIDENTIAL NOTE:
The information in this E-mail and any attachments transmitted are
originated by Laboratorio Genetix, is intended to be privileged and/or
confidential and only for use of the individual, entity or company to
whom it is addressed. If you have received this e-mail in error please
destroy it and contact the sender. If you are not the addressee you
may not disclose, copy, distribute or take any action based on the
contents hereof. Any total o partial unauthorized retention,
dissemination, distribution or copying of this message is strictly
prohibited and sanctioned by law. The observations and opinions
expressed in this email could be not necessarily from the
Administration or Managers of Laboratorio Genetix.



michael.schatz

unread,
Jul 6, 2015, 2:20:26 PM7/6/15
to genome-in...@googlegroups.com
Is the readme available for the PacBio AJ Trio data? There is a long list of directories here, but it is not clear which data are good or bad (with _new versus without?):

Thank you!

Mike

Justin Zook

unread,
Jul 6, 2015, 2:22:21 PM7/6/15
to michael.schatz, genome-in...@googlegroups.com
Hi Mike,

Yes, there's a readme in the directory just above that: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/technical/pacbio_AJTrio 

Cheers,
Justin

--
You received this message because you are subscribed to the Google Groups "Genome in a Bottle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-in-a-bot...@googlegroups.com.
To post to this group, send email to genome-in...@googlegroups.com.
Visit this group at http://groups.google.com/group/genome-in-a-bottle.

Michael Schatz

unread,
Jul 6, 2015, 2:23:27 PM7/6/15
to Justin Zook, genome-in...@googlegroups.com
Thanks! Sorry I missed it, it was so cleverly hidden :)

Mike 

Justin Zook

unread,
Jul 6, 2015, 2:25:28 PM7/6/15
to lso...@genetix.com.pa, genome-in...@googlegroups.com
Dear Lina,

The locations of the high-confidence vcf and bed files will be listed in the Report of Investigation you'll get with the DNA and it is also online here: 

Cheers,
Justin

Lina M. Solis Castillero

unread,
Jul 6, 2015, 4:01:56 PM7/6/15
to Justin Zook, genome-in...@googlegroups.com
Dear Justin,

Thank you very much!


Justin Zook <justi...@gmail.com> escribió:
Reply all
Reply to author
Forward
0 new messages