Help finding the correct file version for dbSNP VCF ID replacement

53 views
Skip to first unread message

171se

unread,
Nov 22, 2023, 1:40:38 PM11/22/23
to gen...@soe.ucsc.edu
Tried to use dbSNP version 156 using bcftools to replace the ID field in a reference VCF which originally contains a different position ID format. It seems the bcftools command did not work because a numeric chromosome column format in the #CHROM field which might not be compatible with bcftools to match the #CHROM field in the target VCF which contains a regular chr1-22 prefix format. 

Found a directory containing other versions with a variety of files at https://ftp.ncbi.nlm.nih.gov/snp/

I am not familiar with the contents and structure of this directory or the files within, and some files are quite large. I need help to get the file I need.

I would like to get a link or reference to a dbSNP file version that was made with the chr prefix format for the #CHROM field or perhaps the latest regular version but divided in files for each chromosome so that I might be able to use bcftools or some other program to replace the positions in this field with the chr prefix and chromosome number for each chromosome for all positions in each row for each chromosome file to see if the bcftools annotate command works in replacing the ID in the target reference VCF with the matching rsID. The aim is to get an imputed VCF with this ID type using minimac4. 

The VCF reference for which I would like to replace the ID content is called: 
20220422_3202_phased_SNV_INDEL_SV 1kGP_high_coverage_Illumina.chr1.filtered.SNV_INDEL_SV_phased_panel.vcf.gz


This reference VCF is in this format:
#CHROM POS ID REF ALT
chr1    10390   1:10390:CCCCTAACCC:C    CCCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA    C

These are the bcftools commands that where tried:

bcftools annotate --set-id '%CHROM\_%POS\_%REF\_%ALT' /input.vcf -Ov -o /output.vcf /dbsnp.vcf


bcftools annotate -a /dbsnp.vcf -c ID -o /output.vcf /input.vcf


bcftools annotate -a /dbsnp.vcf -c ID /input.vcf -o /output.vcf


bcftools annotate -c CHROM,FROM,TO,ID -a /dbsnp.vcf -o /output.vcf /input.vcf.gz


Here are some header fields in the dbSNP used:
##fileDate=20221104
##source=dbSNP
##dbSNP_BUILD_ID=156
##reference=GRCh38.p14
##phasing=partial

Here is a view of the first field rows in this dbSNP file:
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
NC_000001.11    10001   rs1570391677    T       A,C     .       .       RS=1570391677;dbSNPBuildID=154;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV;R5;GNO;FREQ=KOREAN:0.9891,0.0109,.|SGDP_PRJ:0,1,.|dbGaP_PopFreq:1,.,0;COMMON
NC_000001.11    10002   rs1570391692    A       C       .       .       RS=1570391692;dbSNPBuildID=154;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV;R5;GNO;FREQ=KOREAN:0.9944,0.005597
NC_000001.11    10003   rs1570391694    A       C       .       .       RS=1570391694;dbSNPBuildID=154;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV;R5;GNO;FREQ=KOREAN:0.9902,0.009763
NC_000001.11    10007   rs1639538116    T       C,G     .       .       RS=1639538116;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV;R5;GNO;FREQ=dbGaP_PopFreq:1,0,0

Jairo Navarro Gonzalez

unread,
Dec 8, 2023, 7:17:14 PM12/8/23
to 171se, gen...@soe.ucsc.edu

Hello,

Thank you for using the UCSC Genome Browser and sending your inquiry.

Unfortunately, you will have to ask NCBI's help desk for more details about your VCF file, as we do not control the data. You can email them using the following online form:

https://support.nlm.nih.gov/support/create-case/

If you still experience issues with bcftools after contacting the NCBI help desk, you should ask the bcftools authors for support using their software.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CANqhovaK-8cHJV1C1CX73bbqzmkXGFu2s6GqyMOedFa%2BPsUsng%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages