Extracting a raw sequence from VCF files

66 views
Skip to first unread message

Sebastian

unread,
Apr 29, 2013, 5:39:28 AM4/29/13
to camda201...@googlegroups.com
Hi,
 
I wanted to ask if there is a simple way to extract the base sequence (A,C,G,T,N) for each chromosome and each individual from the korean dataset?
I would assume that given the reference sequence there should be an automated way getting raw sequences from a VCF file.
 
We are trying to implement an algorithm for SNP finding (Dataset 2, Question 1), but have only considered raw base sequence files as input so far.
 
Thanks
Sebastian

Djork-Arné Clevert

unread,
Apr 29, 2013, 10:01:18 AM4/29/13
to camda201...@googlegroups.com
Hi Sebastian,

the KPGP-data is not phased, so it is probably not possible 
to infer the raw sequence from the vcf-file.
Like it or not, but for your purpose you have to use the sequence
alignment files, which are also available at www.camda.info.

Cheers,
Okko
-- 
Djork-Arné Clevert, PhD
Institute of Bioinformatics
Johannes Kepler University Linz

Phone: +49 30 4432 4702
Fax: +49 30 6883 5307
Email: ok...@clevert.de

--
You received this message because you are subscribed to the Google Groups "CAMDA 2013 discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camda2013discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sebastian

unread,
Apr 29, 2013, 11:05:47 AM4/29/13
to camda201...@googlegroups.com, ok...@clevert.de
Dear Okko,

thanks for the hint! I wasn't aware of the problems that can occur when one ignores phasing.

I guess you talk about this BAM-file: https://www.wisconsingenomics.org/camda12/KPGP/KPGP1_G_110915_HiSeq_EastAsian_Kor_F.merge.bam.gz

Is there a standard pipeline documented somewhere that we could run on these BAM files to extract a raw sequence? I found something along the lines of:

samtools mpileup -uf reference.fa aligment.bam | bcftools view -cg - | vcfutils vcf2fq

Am I going in the right direction?

Sorry if these questions are trivial for you, but I have not much experience dealing with all these file different file formats.

Thanks,
Sebastian

Djork-Arné Clevert

unread,
Apr 29, 2013, 12:10:46 PM4/29/13
to camda201...@googlegroups.com
Dear Sebastian, 
by gut feeling I would say you are heading for the right direction, 
but I instantly can't give you a reliable answer.  
However, I will try to gather extra information and come back to 
you asap.
Cheers,
Okko
  
-- 
Djork-Arné Clevert, PhD
Institute of Bioinformatics
Johannes Kepler University Linz

Phone: +49 30 4432 4702
Fax: +49 30 6883 5307
Email: ok...@clevert.de

Reply all
Reply to author
Forward
0 new messages