Editing dbSNP.vcf to use in vcfeval

6 views
Skip to first unread message

Stéphane Plaisance

unread,
Apr 18, 2023, 7:46:19 AM4/18/23
to rtg-...@realtimegenomics.com

Hi Len and all,

 

Ref: https://www.biostars.org/p/182127/

 

Can you please give a bit more help on how to adapt a dbSNP.vcf file to be used in vcfeval as -b input

Do we have to engineer a 9th FORMAT column with GT and a 10th column ‘ALT’ as sample name and 1 as content for the whole file?

 

Like this awk does:

 

```

awk 'BEGIN{FS="\t"; OFS="\t"}{if ($0~/^##/) print $0; else if($0~/#CHROM/) print $0,"FORMAT","ALT"; else print $0,"GT",1}' gallus_gallus.vcf | bgzip -c > gallus_gallus_alt.vcf.gz.vcf

```

 

Thanks in advance,

 

Stephane

 

 

PS my chicken database looks like this:

 

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO

1       249857  rs735818313     C       A       .       .       E_Multiple_observations;TSA=SNV;dbSNP_150

1       289870  rs1060254042    T       C       .       .       TSA=SNV;dbSNP_150

1       318447  rs1060384925    C       A       .       .       TSA=SNV;dbSNP_150

1       394741  rs16728075      G       A       .       .       E_Freq;TSA=SNV;dbSNP_150

1       394744  rs16728074      G       A       .       .       E_Freq;TSA=SNV;dbSNP_150

1       394746  rs16728073      C       T       .       .       E_Freq;TSA=SNV;dbSNP_150

1       394765  rs741618862     T       A       .       .       TSA=SNV;dbSNP_150

1       394776  rs731376058     T       C       .       .       TSA=SNV;dbSNP_150

 

--

Stephane Plaisance – NGS analysis specialist

VIB Nucleomics Core
Herestraat 49 – Post Box 816 – 3000 Leuven – Belgium
O&N4 Building – 8th Floor – Room 08.429
Tel. +32 16 32 00 60
www.nucleomicscore.sites.vib.be

cid143508*image002.png@01D96624.8A085D00

 

Len Trigg

unread,
Apr 18, 2023, 4:29:55 PM4/18/23
to Stéphane Plaisance, rtg-...@realtimegenomics.com
Hi Stéphane,

Hope all is well with you. You should be able to use the dbSNP VCF file directly without having to alter it. The trick though is that you need to use the --sample flag to tell it to work directly from the ALT alleles in the baseline file (-b). You will want either --sample ALT,ALT (for the case where your calls (-c) file also has no sample column), or --sample ALT,mysample (for the case where your calls has a sample column named "mysample"). You probably also want --squash-ploidy when using such database files.

Cheers,
Len.

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/A9F2449A-A41A-4223-99B7-22FBE6185DDC%40vib.be.
Reply all
Reply to author
Forward
0 new messages