The second beta of the 0.8.0 version is now posted on the downloads
page
http://code.google.com/p/joint-snv-mix/downloads/list.
This version brings several changes. Most notably are
1) JointSNVMix will now be under the GPL2 license.
2) MutationSeq like post-processing is now included via the --
post_process flag. See below for more details.
3) Default output for classify is the screen. To output to a file use
the --out_file flag.
4) Training and classifying by chromosome is possible with the --
chromosome flag.
5) Fisher, Threshold and Indepdent SNVMix models are removed.
6) A BetaBinomial model is now avaiiable. Models can be selected via
the --model flag for both training and classifying. Note all models
are not JointSNVMix type models.
7) Pysam and Cython are no longer a dependencies. ALGLIB is now a
dependency.
This version is still a beta because of a lack of testing and
documentation. Installation information is available on the wiki
http://code.google.com/p/joint-snv-mix/wiki/Installation. If you want
to try the beta, help can be obtained by running 'jsm.py train -h' and
'jsm.py classify -h'.
The most important feature is the new post-processing module. For a
description of the basic idea see the MutationSeq paper "Feature-based
classifiers for somatic mutation detection in tumour–normal paired
sequencing data" by J. Ding. The current implementation in JointSNVMix
uses a random forest hence the new dependency on ALGLIB. If the --
post_process flag is passed then an extra column will be appended to
the normal JointSNVMix output, which is the random forests probability
of somatic. This value should be more accurate than the JointSNVMix
probabilities as it uses many features beyond count data, such as the
presence of homopolymer runs. Passing sites through the post-processor
is a bit slow, so it is recommend to set --somatic_threshold >= 0.01.
This will cause only sites with a JointSNVMix somatic probability of
0.01 or higher to be post-processed and printed to screen.
Assuming the reference genome is ref_genome.fa, the normal BAM is
normal.bam and the tumour is tumour.bam.
Example : To train JointSNVMix2 on chromosome 22 only and save the
files in params_22.cfg with a subsampling of every 10th site
jsm.py train ref_genome.fa normal.bam tumour.bam params_22.cfg --
chrom=22 --model=snvmix2 --skip_size=10
Example: To classify using JointSNVMix1 on chromosome 22, using a
custom parameters file and post-processing sites with p_somatic>=0.2
jsm.py classify ref_genome.fa normal.bam tumour.bam --
parameters_file=params_22.cfg --chromosome=22 --post_process --
somatic_threshold=0.2
I would appreciate any feedback.
Cheers,
Andy