Looking for a script to convert SNP raw data into VCF

691 views
Skip to first unread message

Dennis Fink

unread,
Aug 26, 2019, 11:07:34 AM8/26/19
to Circos
Dear group,

from those of you working with the visualization of genomic SNP data (e.g. 23andMe), did anyone find/write a script to convert a raw 23andMe .txt. file into a VCF that can be used by Circos? If it's not clear, an example raw file can be found here: https://drive.google.com/file/d/1Km9CfLIjIkU_49JMIRceSjBdMLkXFHGV/view?usp=sharing

Thanks a lot,

Dennis

Wayne

unread,
Aug 27, 2019, 10:14:33 AM8/27/19
to Circos

Additionally, see:





The second one would be able to be installed at a session launched from https://github.com/fomightez/circos-binder via `%pip install 23andme-to-vcf` in a Jupyter notebook with a Python kernel (which is any one that isn't the 'Getting Circos Up and Running' one, which is backed by a bash kernel). 

The other software might take a bit more effort to install there depending on their dependencies. By the way, I should have said in the other thread where I mentioned Circos-binder, that if you are concerned with security and are looking for options to get around your Windows machine being an issue, you could also set up and use a remote, private Ubuntu-based machine at Amazon,Google, or other cloud offering company that you connect to from your own computer via SSH. I'd be glad to help if you contact me via fomightez at ye' olde gmail (pardon the text added for obfuscation against data miners).

Wayne

unread,
Aug 28, 2019, 11:07:57 AM8/28/19
to Circos
For running 23andme-to-vcf it seems you need the FASTA file of the sequence (*.fa file) and the indexed version of it as well (*.fai file).

That data can be obtained from iGenomes site at https://support.illumina.com/sequencing/sequencing_software/igenome.html . Illumina make available the appropriate files for analyzing many genomes including Homo sapiens.  You just need to confirm which reference genome assembly 23andme used for the genomes you have and get that version of the data.
Reply all
Reply to author
Forward
0 new messages