Hi All,
I am posting the verifyBamID results for few of the samples I have been analysing. I have couple of questions:
1. I understood that there is a clear mislabelling of all samples as CHIPMIX is almost 1 for all the samples. The correct labelling is given by bestRG file. Is my understanding correct ?
2. In the wiki page, its mentioned that the CHIP_ID for selfSM and selfRG should always be equal to SEQ_ID, but here I do not see that, in selfRG file, SEQ_ID is not CHIP_ID ?
3. Why IBD is very high for both self and best matching sample in Verbose output ?
LOGS:
selfSM:
#SEQ_ID CHIP_ID FREEMIX CHIPMIX
sample_841 sample_841 0.00912 0.99553
sample_845 sample_845 0.01125 0.99451
sample_930 sample_930 0.00975 0.99459
sample_931 sample_931 0.01073 0.99449
sample_935 sample_935 0.00865 0.99606
sample_936 sample_936 0.00660 0.99708
sample_937 sample_937 0.00567 0.99744
sample_939 sample_939 0.00555 0.99747
sample_940 sample_940 0.01142 0.99432
sample_941 sample_941 0.00844 0.99617
sample_943 sample_943 0.00832 0.99613
selfRG:
#SEQ_ID CHIP_ID FREEMIX CHIPMIX
sample_841 sample_843 0.00912 0.99553
sample_845 sample_856 0.01125 0.99451
sample_930 sample_833 0.00975 0.99459
sample_931 sample_834 0.01073 0.99449
sample_935 sample_838 0.00865 0.99606
sample_936 sample_839 0.00660 0.99708
sample_937 sample_840 0.00567 0.99744
sample_939 sample_842 0.00555 0.99747
sample_940 sample_844 0.01142 0.99432
sample_941 sample_845 0.00844 0.99617
sample_943 sample_847 0.00832 0.99613
bestRG:
#SEQ_ID CHIP_ID FREEMIX CHIPMIX
sample_841 sample_843 0.00912 0.00846
sample_845 sample_856 0.01125 0.01097
sample_930 sample_833 0.00975 0.00890
sample_931 sample_834 0.01073 0.00994
sample_935 sample_838 0.00865 0.00816
sample_936 sample_839 0.00660 0.00609
sample_937 sample_840 0.00567 0.00559
sample_939 sample_842 0.00555 0.00613
sample_940 sample_844 0.01142 0.01117
sample_941 sample_845 0.00844 0.00793
sample_943 sample_847 0.00832 0.00798
bestSM:
#SEQ_ID CHIP_ID FREEMIX CHIPMIX
sample_841 sample_843 0.00912 0.00846
sample_845 sample_856 0.01125 0.01097
sample_930 sample_833 0.00975 0.00890
sample_931 sample_834 0.01073 0.00994
sample_935 sample_838 0.00865 0.00816
sample_936 sample_839 0.00660 0.00609
sample_937 sample_840 0.00567 0.00559
sample_939 sample_842 0.00555 0.00613
sample_940 sample_844 0.01142 0.01117
sample_941 sample_845 0.00844 0.00793
sample_943 sample_847 0.00832 0.00798
Verbose output:
sample_841.log: Best Matching Individual is sample_843 with IBD = 0.991544
sample_841.log: Self Individual is sample_841 with IBD = 0.995526
sample_845.log: Best Matching Individual is sample_856 with IBD = 0.989027
sample_845.log: Self Individual is sample_845 with IBD = 0.994510
sample_930.log: Best Matching Individual is sample_833 with IBD = 0.991100
sample_930.log: Self Individual is sample_930 with IBD = 0.994592
sample_931.log: Best Matching Individual is sample_834 with IBD = 0.990062
sample_931.log: Self Individual is sample_931 with IBD = 0.994487
sample_935.log: Best Matching Individual is sample_838 with IBD = 0.991842
sample_935.log: Self Individual is sample_935 with IBD = 0.996062
sample_936.log: Best Matching Individual is sample_839 with IBD = 0.993906
sample_936.log: Self Individual is sample_936 with IBD = 0.997082
sample_937.log: Best Matching Individual is sample_840 with IBD = 0.994406
sample_937.log: Self Individual is sample_937 with IBD = 0.997440
sample_939.log: Best Matching Individual is sample_842 with IBD = 0.993873
sample_939.log: Self Individual is sample_939 with IBD = 0.997469
sample_940.log: Best Matching Individual is sample_844 with IBD = 0.988828
sample_940.log: Self Individual is sample_940 with IBD = 0.994324
sample_941.log: Best Matching Individual is sample_845 with IBD = 0.992072
sample_941.log: Self Individual is sample_941 with IBD = 0.996170
sample_943.log: Best Matching Individual is sample_847 with IBD = 0.992024
sample_943.log: Self Individual is sample_943 with IBD = 0.996127
PS: Processing of VCF: This is RNA-Seq data, so I used SNPs that overlap any annotated exon, as other SNPs will be least informative, to reduce the run time and to get accurate estimates.