VCF input when genotype information is not available

96 views
Skip to first unread message

Kyoungil Min

unread,
Feb 26, 2020, 10:58:16 AM2/26/20
to verifyBamID
Dear Dr. Kang,

Hi, I'm a graduate student from Korea, and found this tool while studying about cross-contamination issues. Thank you for creating such a useful tool.


Since a lot of WGS bams are generated without external genotype information these days, I was wondering what could be the best option for input VCF without genotype. From what I understand, this VCF must meet the following criteria: 1) include only SNP, 2) include only biallelic sites, 3) have more than 1,000 variant sites, 4) have AF in INFO field. I'm trying to construct a large VCF file that could be used in different bam files and still produce consistent accuracy. So I'm pulling variant sites from VCFs produced by 1000genome projects and gnomAD. What could be the best strategy behind this process? Also, I have some additional questions.


1) It looks like only 1 bam file can be submitted at a time. If I have 4 separate bam files from 4 different individuals (each bam file has only one type of readgroup), should I combine them into 1 bam file with different readgroups? And will this process be meaningless if I have no genotype VCFs to indicate genotype for each sample? In other words, if I have no genotype VCFs, will the tool calculate cross-contamination based on 1 individual vs any possible individual instead of 1 individual vs one of the other 3 individuals? (So the result is the same as running the tool 4 times for separate bam files)

2) I know this tool was originally designed for human samples, but since the program does not require a reference file, can this be run on mouse samples if adequate input VCF is available? 

3) I read in other post that the tool works on WGS (but not WES) tumor samples too. In case of tumor WGS analysis, there are tumor and matched normal bam. If the tool was applied to the tumor bam, could FREEMIX score imply the percentage of normal tissue contamination in tumor sample? 

4) If I ran the tool on a set of tumor & matched normal bam with no genotype VCFs, is it possible to find tumor-normal swap based on the output results?


Thanks again for this wonderful tool.
Kyoung Il Min
Reply all
Reply to author
Forward
0 new messages