mutation-seq on JointSNVMix output

72 views
Skip to first unread message

RG

unread,
Feb 9, 2012, 2:52:53 PM2/9/12
to JointSNVMix User Group
Hi,
Any recommendations/guidelines on how to run mutation-seq on
JointSNVMix output?

Thanks,
~Raad

aroth

unread,
Feb 10, 2012, 5:34:50 PM2/10/12
to JointSNVMix User Group
Hi Raad,
I personally have no experience running mutation-seq, so I can't
comment on the low level details. You might be best to contact the
mutation-seq authors.

What I can suggests is to a relatively low threshold for calling
somatic probabilities with JointSNVMix, so as to generate a large set
of candidates to pass to mutation-seq. In essence what you want to do
is use JointSNVMix to quickly generate a large set of promising
candidate positions for mutation-seq to further analyses.

I hope to incorporate mutation-seq directly into the JointSNVMix
software in the next release which should make things easier.
Unfortunately finding time to update the package has been tough.

Cheers,
Andy

RG

unread,
Feb 13, 2012, 10:44:13 AM2/13/12
to JointSNVMix User Group
Hi Andy,
Thanks for your input. I am afraid that the larger the set the more
problematic mutation-seq run will get. That is, mutation-seq will
choked on larger sets, mainly when it constructs the features
(construct_feature.pl). Specifically, here:

`$SAMPATH/samtools view -b $normal $region | $SAMPATH/samtools
mpileup -f $ref -E -g -| $SAMPATH/bcftools/bcftools view - | grep -v
'$pattern1' | grep -v '$pattern2' | grep -v '$pattern3' | grep -v
'INDEL' > $normal_sam_file`;
`$SAMPATH/samtools view -b $tumour $region | $SAMPATH/samtools
mpileup -f $ref -E -g -| $SAMPATH/bcftools/bcftools view - | grep -v
'$pattern1' | grep -v '$pattern2' | grep -v '$pattern3' | grep -v
'INDEL' > $tumour_sam_file`;

Changing that code to run smaller chunks of $region, cause the program
to take very long time to finish, if it finishes at all.

On a different issue, I am trying to train both models
(joint_snv_mix_one and joint_snv_mix_two) and I am noticing that
jsm.py train joint_snv_mix_one will finish with no errors, while using
the same data and the same parameters jsm.py train joint_snv_mix_two
always fail with the following error:
Traceback (most recent call last):
File "/lustre/home/user/bin/jsm.py", line 224, in <module>
args.func(args)
File "/lustre/home/user/sw/lib/python2.6/joint_snv_mix/runners/
train.py", line 61, in joint_snv_mix_two_train
train(model, sample, args)
File "/lustre/home/user/sw/lib/python2.6/joint_snv_mix/runners/
train.py", line 110, in train
model.fit(sample, args.convergence_threshold, args.max_iters)
File "joint_snv_mix.pyx", line 388, in
joint_snv_mix.trainers.joint_snv_mix.JointSnvMixModel.fit
(joint_snv_mix/trainers/joint_snv_mix.c:7567)
File "joint_snv_mix.pyx", line 450, in
joint_snv_mix.trainers.joint_snv_mix.JointSnvMixModelTrainer.train
(joint_snv_mix/trainers/joint_snv_mix.c:8106)
File "joint_snv_mix.pyx", line 486, in
joint_snv_mix.trainers.joint_snv_mix.JointSnvMixModelTrainer._check_convergence
(joint_snv_mix/trainers/joint_snv_mix.c:8413)
Exception: Lower bound decreased exiting.

I checked the size of the sub-sample and it seems that
joint_snv_mix_one sub-sample size always bigger than joint_snv_mix_two
sub-sample size. Here are the sizes for one of the runs I tried:
joint_snv_mix_one: Total sub-sample size is 18886504
joint_snv_mix_two: Total sub-sample size is 6266558

Any thoughts on that?
Thanks,
-Raad

RG

unread,
Feb 13, 2012, 6:16:06 PM2/13/12
to JointSNVMix User Group
Oh, I just saw your response to the same problem here:
http://code.google.com/p/joint-snv-mix/issues/detail?id=8

Is there a fix anytime soon?

Thanks for all the help Andy.
~Raad

aroth

unread,
Feb 13, 2012, 8:35:40 PM2/13/12
to JointSNVMix User Group
Hi Raad,
I am testing a beta version which should fix this and some other bugs.
Unfortunately I cannot replicate this bug so it is hard to know if the
fix works. If you want to try the beta it would be very helpful.

You can find the beta version on downloads page <http://
code.google.com/p/joint-snv-mix/downloads/list>. There will be a few
changes in this versions which you should be aware of.

1) The license will change from MIT to GPL2.
2) The interface is changing slightly for the train and classify
commands.
3) The Independent SNVMix, Fisher and threshold methods are being
dropped.
4) By default classify only outputs positions with a least one variant
allele in the tumour.
5) By default output goes to STDOUT

If you want to try it out you can install as normal with the usual
unzip and run "python setup.py install".

To see all options and usage type :
- For the training command : jsm.py train -h
- For the classify command : jsm.py classify -h

The new training command for running a JointSNVMix2 analysis is
subsampling every 100th position is
>>> jsm.py train REF_GENOME_FASTA NORMAL_BAM TUMOUR_BAM OUTPUT_FILE --skip_size=100 --model=snvmix2

Assuming you trained and produced a parameters file called
PARAMS_FILE, the new classify command is
>>> jsm.py classify REF_GENOME_FASTA NORMAL_BAM TUMOUR_BAM --parameters_file=PARAMS_FILE

Please let me know if that fixes the problem.

Cheers,
Andy

aroth

unread,
Feb 13, 2012, 8:36:40 PM2/13/12
to JointSNVMix User Group
Forgot to mention it is version 0.8-b1 I am referring to.

RG

unread,
Feb 14, 2012, 10:19:13 AM2/14/12
to JointSNVMix User Group
Hi Andy,
I will give 0.8-b1 a try and will let you know how it goes.

Thank you,
~Raad

RG

unread,
Feb 17, 2012, 6:38:23 PM2/17/12
to JointSNVMix User Group
Hi Andy,
I installed 0.8-b1 with no issues. Training the model on both tumor
and normal samples with --skip_size of 20, 40, 60, 80 and 100 went
smoothly.
I am in the process of running jsm.py classify

Thank you for all the help.
~Raad

aroth

unread,
Feb 17, 2012, 6:51:26 PM2/17/12
to JointSNVMix User Group
Great to hear. It might be prudent to run the 0.8-b1 on case that has
been analysed with 0.7.5 and just make sure the results are similar.

If you don't mind me asking, how was performance. I noticed you used a
skip_size of 20 which is quite small, and I would have thought memory
intensive.

One other thing worth mentioning with the new version is that there is
--chrom flag for both the train and classify command. You can use this
to run the software on a single chromosome at a time. The idea of this
command is that jobs can be split up by chromosome for speed. Strictly
speaking, the trained parameters will be per chromosome then, but that
might actually help performance.

Cheers,
Andy

RG

unread,
Mar 16, 2012, 7:05:22 PM3/16/12
to JointSNVMix User Group
Hi Andy,
I apologize for the late reply, I got distracted with many side
projects.
I will run a comparison and see how both version perform.
The skip_size of 20 took long time and I didn't notice any other
issues but I was running it on a machine with 90G RAM.
I will update you as soon as I get the comparison done.

Thanks,
~Raad
Reply all
Reply to author
Forward
0 new messages