nsufficient Available markers

212 views
Skip to first unread message

haroo...@gmail.com

unread,
Dec 25, 2018, 3:20:20 AM12/25/18
to verifyBamID
I ran the verifyBAMID (https://github.com/Griffan/VerifyBamID) in order to check contamination and sample swaps for the RNASeq data, but the tool is displaying the message 'Insufficient Available markers'.


The command:
$VERIFY_BAM_ID_HOME/bin/VerifyBamID --BamFile $bamfile --Reference $ref --SVDPrefix $SVDfilesPrefix --NumThread 32 --OutputPileup

Where bamfile - rnaseq sample alignment file, SVDPrefix - prefix to auxiliary files generated with multi sample genotype file.


NOTICE - Process 1:249190891-249190891...
NOTICE - Process 1:249198692-249198692...
NOTICE - Number of marker in Reference Matrix:60794
NOTICE - Number of marker shared with input file:3901
NOTICE - Mean Depth:24.510126
NOTICE - SD Depth:189.026119
NOTICE - 3882 SNP markers remained after sanity check.

WARNING -
Insufficient Available markers, check input bam depth distribution in output pileup file after specifying --OutputPileup


Any help will be appreciated.
2.6.0.0

griffanz...@gmail.com

unread,
Dec 25, 2018, 1:27:23 PM12/25/18
to verifyBamID
This warning means that there is insufficient overlap between the marker set in your input file and the marker set in your reference matrix files. 

You may consider creating your own reference matrix with updated marker set.

haroo...@gmail.com

unread,
Dec 26, 2018, 2:47:07 AM12/26/18
to verifyBamID

Thanks for your message, highly appreciated.

 

VerifyBamID outputs the same warning message even with the test data provided by the authors: https://github.com/Griffan/VerifyBamID/tree/master/resource/test. I was wondering what was the expected outcome of the test data? In general, the test data is provided to facilitate the understanding of the tool output and works in absolute manner. Please advise in this regard.

 

Also, I was wondering what are the minimum requirements of the overlap between the marker set in RNA-Seq BAM file and marker set in multi-sample genotype (VCF) file? Is there any possibility to change the "insufficient overlap" requirement? I am concerned, it seems like most of the markers in the VCF file lie outside the exonic regions (exonic region comprises of 1-2% of the whole genome). I am testing verifybamid in order to check for the contamination and sample swaps in RNASeq samples. How can I subset my whole-genome based multi-sample VCF file in order to pass the overlap requirement for RNA-Seq sample?

 

Any advice will be highly appreciated.

 

Regards,


Haroon

2.6.0.0

griffanz...@gmail.com

unread,
Dec 27, 2018, 11:02:18 PM12/27/18
to verifyBamID
The test dataset is a very small bam file which serves the purpose of validating basic functionality when compiling the source code. (we will consider replacing this bam file to provide meaningful estimation)

If you want to run vb2 regardless of the overlap of marker set, you can specify --DisableSanityCheck to ignore this warning. But the accuracy of contamination estimation might not be as stable as the result when more markers are used. e.g. 10K, 100K

vb2 doesn't include chip-genotype aided sample swap detection(for now), you may refer to vb1 for this purpose.
Reply all
Reply to author
Forward
0 new messages