Small pipeline I created to use VerifyBAMID. Does it make sense?

134 views

Skip to first unread message

Albert Vilella

unread,

Jan 15, 2018, 4:46:03 AM1/15/18

to verifyBamID

Hi all,

I am trying to put together a QC tool to be able to identify sample swaps or meta data misannotations of my samples.

My data consists of low coverage WGS bam files (2-4x coverage).

I came up with this small pipeline which I would like the VerifyBAMID authors to comment on:

- Run angsd on set of diverse bams, produce mafs.gz file:

angsd -nThreads 4 -b bam.filelist -out CEGX_Run497.bams.46.angsd -gl 1 -domajorminor 1 -snp_pval 1e-6 -domaf 1 -minmaf 0.05 -doGlf 3

This produces a mafs.gz file where column 7 is 'number of individuals'.

In parallel, I run my variant caller on each of the bam files I want to query (currently using GATK Lite Unified Genotyper), obtaining the .vcf.gz files for each bam.

I then subset my vcf.gz files with the SNPs found by angsd using a number of individuals threshold, e.g. the median or mean or a certain number that is big enough (I don't have a good statistical measure to choose this ATM).

With my subset .vcf.gz files, I then run VerifyBAMID on each .vcf.gz/.bam pair combination, producing a semi-matrix of results:

$verifybamid --smID $pair_id --ignoreRG --maxDepth $maxdepth --noPhoneHome --vcf $vcffile --bam $inputfile --out $outpref 2>&1 | grep -v 'WARNING : Skipping'

From my test, when I do the .vcf.gz + .bam pairs of the same individuals (diagonal in the matrix), I am getting contamination values close to 50%. I am unsure why they are not close to 100%. Should they be? Should I change a parameter for this?

For the .vcf.gz / .bam combos outside the diagonal, I am getting values between 0.9% and 7%, although these values seem to be conditional to sequencing depth, where the higher values have higher coverage.

Any comments/feedback from my current pipeline?

Hyun Min Kang

unread,

Jan 15, 2018, 7:44:18 AM1/15/18

to verif...@googlegroups.com

Your input VCF has to be from external source, not from the array-based SNP data.

Hyun.

--
You received this message because you are subscribed to the Google Groups "verifyBamID" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verifybamid...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages