How do I run JointSNVMix 0.8 properly?

129 views
Skip to first unread message

vue um

unread,
Oct 13, 2013, 10:38:58 PM10/13/13
to jointsnvmix...@googlegroups.com
Suppose I have six pairs of normal-tumor bams. Is this the right way to train and call SNPs?

jsm.py train --model snvmix2 --priors_file config/priors.cfg --initial_parameters_file config/params.cfg hg19.fa 1N.bam 1T.bam snvmix2.cfg
jsm.py train --model snvmix2 hg19.fa 2N.bam 2T.bam snvmix2.cfg
jsm.py train --model snvmix2 hg19.fa 3N.bam 3T.bam snvmix2.cfg
jsm.py train --model snvmix2 hg19.fa 4N.bam 4T.bam snvmix2.cfg
jsm.py train --model snvmix2 hg19.fa 5N.bam 5T.bam snvmix2.cfg
jsm.py train --model snvmix2 hg19.fa 6N.bam 6T.bam snvmix2.cfg

jsm.py classify --model snvmix2 --parameters_file snvmix2.cfg --out_file 1.jsm hg19.fa 1N.bam 1T.bam
jsm.py classify --model snvmix2 --parameters_file snvmix2.cfg --out_file 2.jsm hg19.fa 2N.bam 2T.bam
jsm.py classify --model snvmix2 --parameters_file snvmix2.cfg --out_file 3.jsm hg19.fa 3N.bam 3T.bam
jsm.py classify --model snvmix2 --parameters_file snvmix2.cfg --out_file 4.jsm hg19.fa 4N.bam 4T.bam
jsm.py classify --model snvmix2 --parameters_file snvmix2.cfg --out_file 5.jsm hg19.fa 5N.bam 5T.bam
jsm.py classify --model snvmix2 --parameters_file snvmix2.cfg --out_file 6.jsm hg19.fa 6N.bam 6T.bam

I presume in the output, rows with high p_AA_AB, p_AA_BB, p_AB_AA, p_AB_BB, p_BB_AA, p_BB_AB values are somatic mutations, right? 

aroth

unread,
Oct 15, 2013, 12:37:14 PM10/15/13
to jointsnvmix...@googlegroups.com
Hi Vue,
The right way to run JointSNVMix (JSM) for one sample is to `train` and then `classify`. You need to do this for each tumour/normal pair separately. So following your example you would have

# Analyse sample 1
jsm.py train --model snvmix2 --priors_file config/priors.cfg --initial_parameters_file config/params.cfg hg19.fa 1N.bam 1T.bam snvmix2.sample_1.cfg
jsm.py classify --model snvmix2 --parameters_file snvmix2.sample_1.cfg --out_file 1.jsm hg19.fa 1N.bam 1T.bam

# Analyse sample 2
jsm.py train --model snvmix2 --priors_file config/priors.cfg --initial_parameters_file config/params.cfg hg19.fa 2N.bam 2T.bam snvmix2.sample_2.cfg
jsm.py classify --model snvmix2 --parameters_file snvmix2.sample_2.cfg --out_file 1.jsm hg19.fa 2N.bam 2T.bam

.
.
.

Notice each sample is learning a parameter file snvmix2.sample_1.cfg, snvmix2.sample_2.cfg, etc.. which is unique. It is important to do the training on each sample separately as it is JSMs only mechanism for dealing with tumour content and other sample specific effects which would alter allelic abundance.

The probability of a somatic is in theory the sum of all states where the genotype changes i.e. p_AA_AB + p_AA_BB + p_AB_AA + p_AB_BB + p_BB_AA + p_BB_AB. In practice I usually only consider  p_AA_AB + p_AA_BB because mutations at positions which are already SNPs i.e. AB->BB are very rare and quite likely to be sequencing artifacts. 

Cheers,
Andy
Reply all
Reply to author
Forward
0 new messages