Issue finding the correct data

2 views
Skip to first unread message

Eric Smith

unread,
Nov 9, 2015, 9:12:30 AM11/9/15
to ag1000g...@googlegroups.com
Hello,

I’ve dowloaded the .vcf of the Ag1000g latest data release and have identified a few individuals that I would like to look at a little more specifically. However, when I go to the SRA to download the raw reads for the samples I’m interested in, I can’t seem to find the files I need. I’m specifically look for the following 12 samples:

+-----------+------------+
| sample_id | sra_sample |
+-----------+------------+
| AB0151-C  | ERS224003  |
| AB0235-C  | ERS224211  |
| AB0241-C  | ERS224195  |
| AB0261-C  | ERS224656  |
| AC0090-C  | ERS223917  |
| AC0188-C  | ERS223769  |
| AN0127-C  | ERS224555  |
| AN0130-C  | ERS224488  |
| AN0135-C  | ERS224836  |
| AN0230-C  | ERS224619  |
| AN0312-C  | ERS224849  |
| AS0039-C  | ERS224199  |
+-----------+------------+
12 rows in set (0.00 sec)


Any help with finding these would be greatly appreciated.

Thank You,
Eric Smith
--
Eric Smith
PhD Candidate
Genetics, Genomics, and Bioinformatics PhD Program
University of California, Riverside
White lab, Department of Entomology

Alistair Miles

unread,
Nov 9, 2015, 9:17:46 AM11/9/15
to Eric Smith, ag1000g...@googlegroups.com
Hi Eric,

If you're in Panoptes you can click through to ENA from the samples table:


The ENA URL for each sample is something like:


From there you should be able to get fastq etc. Note that BAM/CRAM from ENA are not the same as the alignments we used for variant calling.

Hth,
Alistair
--
You received this message because you are subscribed to the Google Groups "ag1000g-public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ag1000g-publi...@googlegroups.com.
To post to this group, send email to ag1000g...@googlegroups.com.
Visit this group at http://groups.google.com/group/ag1000g-public.
To view this discussion on the web visit https://groups.google.com/d/msgid/ag1000g-public/45EA6E21-57C4-43E2-855B-8C401F4A7F06%40ucr.edu.
For more options, visit https://groups.google.com/d/optout.


--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alim...@googlemail.com
Tel: +44 (0)1865 287721

Alistair Miles

unread,
Nov 10, 2015, 6:42:20 AM11/10/15
to Eric Smith, ag1000g...@googlegroups.com
Hi Eric,

I have previously written a script to scrape the metadata from ENA and download multiple files, you might be able to do something similar. E.g., if you plug the sample accession into a URL like this:


...you'll get a text file which contains FTP URLs for all the fastqs for this sample. You could parse that in a script, extract the URLs and fetch them.

Hth,
Alistair
Reply all
Reply to author
Forward
0 new messages