The CHIPMIX is empty

49 views
Skip to first unread message

Jordi Valls

unread,
Aug 28, 2018, 6:16:26 AM8/28/18
to verifyBamID
Dear all,

I ran VerifybamID to analyse the contamination of my samples. All my samples are genotyped by array and sequenced at WGS. The command line that I used is:

verifyBamID --vcf x.vcf --bam x.bam --out CWGS144 --verbose --ignoreRG

The vcf is the result of our genotyping array, this is an example of our output:

1       982260  rs41285816      G       A       .       .       PR      GT      0/1

but my surprise is the output of verifybamid, where I've this result:

#SEQ_ID RG      CHIP_ID #SNPS   #READS  AVG_DP  FREEMIX    FREELK1   FREELK0    FREE_RH FREE_RA CHIPMIX CHIPLK1 CHIPLK0 CHIP_RH CHIP_RA DPREF   RDPHET  RDPALT
CWGS144 ALL     NA      188001     3639817   19.36      0.00028   17883159.13 17883348.79     NA              NA          NA         NA           NA         NA          NA        NA           NA         NA

Why I have a NA in CHIPMIX?? I underestand that FREEMIX is for samples where the genotype is not included in vcf, and the result show that my sample is not contaminated, but I hope that my sample has a result in CHIPMIX and I don't know why is empty.. The vcf need more information? Why I don't have result in CHIPMIX...

Thanks for your help.

Jordi

Hyun Min Kang

unread,
Aug 28, 2018, 9:57:52 AM8/28/18
to verif...@googlegroups.com
If CHIP_MIX is empty, that means that it does not have any individual genotypes in your VCF file. Is that the case?

--
You received this message because you are subscribed to the Google Groups "verifyBamID" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verifybamid...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jordi Valls

unread,
Aug 28, 2018, 10:06:19 AM8/28/18
to verifyBamID
Hi Hyun,

as you can see in my first post I have the genotype, because I did a microarray and WGS for the same sample. In the first post the VCF file example contains the GT  1/0 but I don't know if I need more information... This is an example of genotyping array result:


1       982260  rs41285816      G       A       .       .       PR      GT      0/1

I have the genotype but I don't have the result in CHIP_MIX... I need more parameters than GT?
In the first post you can see the commands and everything.

Thanks for your help

Hyun Min Kang

unread,
Aug 28, 2018, 10:13:14 AM8/28/18
to verif...@googlegroups.com
Does the VCF header contain proper sample ID?

Thanks,
Hyun.

Hyun Min Kang

unread,
Aug 28, 2018, 10:13:55 AM8/28/18
to verif...@googlegroups.com
Also, make sure that the --chip-mix option is turned on.

Hyun Min Kang

unread,
Aug 28, 2018, 10:15:15 AM8/28/18
to verif...@googlegroups.com
Plus, if you set --best option, it will try to find the best matching individual from VCF, although it could make things slow if VCF contains a lot of individuals. 

Jordi Valls

unread,
Aug 28, 2018, 11:03:44 AM8/28/18
to verifyBamID
Hi Hyun, I tried every one of these options, when I put the --best option I dont know why but its not activated... The same happens when I put --chip-mix, I write the command that I use now:
verifyBamID --vcf /gpfs/scratch/bsc05/bsc05138/GCAT1/genotype_Samples_1_10/CWGS144.vcf --bam .markdups.bam --out CWGS144_precise --chip-mix --best --self  --verbose --ignoreRG --precise  --maxDepth 30
How can I see the sample ID in the vcf header?? The header of my vcf is:
##fileformat=VCFv4.2
##fileDate=20180827
##source=PLINKv1.90
##contig=<ID=1,length=248916509>
##contig=<ID=2,length=242099233>
##contig=<ID=3,length=198145290>
##contig=<ID=4,length=189994496>
##contig=<ID=5,length=181271153>
##contig=<ID=6,length=170581297>
##contig=<ID=7,length=159329970>
##contig=<ID=8,length=145067296>
##contig=<ID=9,length=138199864>
##contig=<ID=10,length=133620800>
##contig=<ID=11,length=135075816>
##contig=<ID=12,length=133261768>
##contig=<ID=13,length=114341522>
##contig=<ID=14,length=106879457>
##contig=<ID=15,length=101858550>
##contig=<ID=16,length=90086939>
##contig=<ID=17,length=83203771>
##contig=<ID=18,length=80214184>
##contig=<ID=19,length=58582770>
##contig=<ID=20,length=64283879>
##contig=<ID=21,length=46661579>
##contig=<ID=22,length=50735904>
##contig=<ID=23,length=155763821>
##contig=<ID=24,length=26586955>
##contig=<ID=25,length=93120357>
##contig=<ID=26,length=16070>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  13_CWGS102

Thanks for your help

Jordi

Hyun Min Kang

unread,
Aug 28, 2018, 2:52:46 PM8/28/18
to verif...@googlegroups.com
It seems that you need to change the sample IDs in VCF to be consistent to the BAM file. Have you done that? 

When you use --best option, do not use --self option.  --site, --self, --best are exclusive options.

Hyun.

Jordi Valls

unread,
Aug 28, 2018, 5:36:16 PM8/28/18
to verifyBamID
Hi Hyun,

I will try it! but my question is where is the sample IDs, You refer to ID column from VCF, it means the ID of reads? Or you refer the ID of sample? if is the id of sample I dont know how to change it, because I dont know where is this ID... Sorry for the question. 
Thanks for clarify me the --best option.

Thanks for your help

Jordi


Hyun Min Kang

unread,
Aug 28, 2018, 6:34:37 PM8/28/18
to verif...@googlegroups.com
Jordi, I suggest to read VCF and SAM spec to figure out how to set the sample IDs properly.


Hyun.

Jordi Valls

unread,
Aug 31, 2018, 4:35:56 AM8/31/18
to verifyBamID
Hi Hyun,

thanks for your answer, finally I get the number of CHIPMIX. The problem was the sample ID...

Now is solved.

Jordi
Reply all
Reply to author
Forward
0 new messages