Hi Andy,
Thanks for your input. I am afraid that the larger the set the more
problematic mutation-seq run will get. That is, mutation-seq will
choked on larger sets, mainly when it constructs the features
(
construct_feature.pl). Specifically, here:
`$SAMPATH/samtools view -b $normal $region | $SAMPATH/samtools
mpileup -f $ref -E -g -| $SAMPATH/bcftools/bcftools view - | grep -v
'$pattern1' | grep -v '$pattern2' | grep -v '$pattern3' | grep -v
'INDEL' > $normal_sam_file`;
`$SAMPATH/samtools view -b $tumour $region | $SAMPATH/samtools
mpileup -f $ref -E -g -| $SAMPATH/bcftools/bcftools view - | grep -v
'$pattern1' | grep -v '$pattern2' | grep -v '$pattern3' | grep -v
'INDEL' > $tumour_sam_file`;
Changing that code to run smaller chunks of $region, cause the program
to take very long time to finish, if it finishes at all.
On a different issue, I am trying to train both models
(joint_snv_mix_one and joint_snv_mix_two) and I am noticing that
jsm.py train joint_snv_mix_one will finish with no errors, while using
the same data and the same parameters jsm.py train joint_snv_mix_two
always fail with the following error:
Traceback (most recent call last):
File "/lustre/home/user/bin/jsm.py", line 224, in <module>
args.func(args)
File "/lustre/home/user/sw/lib/python2.6/joint_snv_mix/runners/
train.py", line 61, in joint_snv_mix_two_train
train(model, sample, args)
File "/lustre/home/user/sw/lib/python2.6/joint_snv_mix/runners/
train.py", line 110, in train
model.fit(sample, args.convergence_threshold, args.max_iters)
File "joint_snv_mix.pyx", line 388, in
joint_snv_mix.trainers.joint_snv_mix.JointSnvMixModel.fit
(joint_snv_mix/trainers/joint_snv_mix.c:7567)
File "joint_snv_mix.pyx", line 450, in
joint_snv_mix.trainers.joint_snv_mix.JointSnvMixModelTrainer.train
(joint_snv_mix/trainers/joint_snv_mix.c:8106)
File "joint_snv_mix.pyx", line 486, in
joint_snv_mix.trainers.joint_snv_mix.JointSnvMixModelTrainer._check_convergence
(joint_snv_mix/trainers/joint_snv_mix.c:8413)
Exception: Lower bound decreased exiting.
I checked the size of the sub-sample and it seems that
joint_snv_mix_one sub-sample size always bigger than joint_snv_mix_two
sub-sample size. Here are the sizes for one of the runs I tried:
joint_snv_mix_one: Total sub-sample size is 18886504
joint_snv_mix_two: Total sub-sample size is 6266558
Any thoughts on that?
Thanks,
-Raad