SARS-CoV-2 variation data

61 views
Skip to first unread message

Scott Cain

unread,
Oct 12, 2020, 11:12:17 AM10/12/20
to gen...@soe.ucsc.edu
Hi,

In the data access section of this page for SARS-CoV-2 variation data (https://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=915469975_LxVYr9SWsCXsc7eIQ5KQKUdKloh5&c=NC_045512v2&g=nextstrainSamples), it says that data for this track can be downloaded from the Download Server (https://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/nextstrain/), however, in that directory, there do not appear to be any VCF files.  Have those files been temporarily misplaced, or are they gone for good?

Thanks,
Scott


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

Matthew Speir

unread,
Oct 12, 2020, 6:56:23 PM10/12/20
to Scott Cain, UCSC Genome Browser Discussion List
Hello, Scott.

Thank you for your question about SARS-CoV-2 variation data in the UCSC Genome Browser.

We had to stop offering the variant file for download. SARS-CoV-2 variants displayed by Nextstrain are derived from a subset of GISAID sequences and the GISAID Terms and Conditions prohibit the redistribution of GISAID-derived data. They also require that the submitters of all sequences be acknowledged when the variants are used.

If you are registered with GISAID, you can access GISAID sequences and other downloadable data directly from them. They are labeled on their site as "msa_<date>". We have a program faToVcf that can extract VCF from a multi-sequence FASTA alignment such as the “nextfasta” download from GISAID. Our tool faToVcf is available for Linux and MacOSX here: https://hgdownload.soe.ucsc.edu/admin/exe/. It requires at least 4GB of memory. You can run the program without any arguments to see the usage statement and options. Here are some steps to get started using the tool as well:

# This command enables faToVcf to be run as a program (otherwise the command would say "Permission denied")
chmod a+x faToVcf

# This command shows basic usage instructions and describes the options:
./faToVcf

# This command converts msa fasta to VCF without per-sample genotype columns:
./faToVcf -includeRef \
    -ref='hCoV-19/Wuhan/Hu-1/2019|EPI_ISL_402125|2019-12-31|Asia' \
    -vcfChrom=NC_045512.2 \
    -noGenotypes \
    msa_0925.fasta msa_0925.sites.vcf

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Training videos & resources: http://genome.ucsc.edu/training/index.html

Want to share the Browser with colleagues? Host a workshop: http://bit.ly/ucscTraining

---

Matthew Speir

UCSC Cell Browser, Quality Assurance and Data Wrangler

Human Cell Atlas, User Experience Researcher

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CA%2BJTaozqR1TEkOGD33gXW0njKv3-3Zry4y7%3DLznHhmOd0%2BnPwg%40mail.gmail.com.

Scott Cain

unread,
Oct 12, 2020, 7:06:42 PM10/12/20
to Matthew Speir, UCSC Genome Browser Discussion List
Hi Matthew,

Thanks for getting back to me. I had a feeling that was the explanation.  Perhaps you should update the description for that track to let users know the data are no longer available for download.

Thanks,
Scott

Reply all
Reply to author
Forward
0 new messages