Installation went well.
I am using ubuntu 10.4.
Even though I had python 2.7 there was some small glitch at the
beginning
username@server:/JointSNVMix-0.7.4# python setup.py install
running install
running build
running build_py
running build_ext
building 'joint_snv_mix.counters.counter' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -
Wstrict-prototypes -fPIC -Ijoint_snv_mix/counters -Iinclude/pysam -
Iinclude/samtools -I/usr/include/python2.7 -c joint_snv_mix/counters/
counter.c -o build/temp.linux-x86_64-2.7/joint_snv_mix/counters/
counter.o
joint_snv_mix/counters/counter.c:4:20: fatal error: Python.h: No such
file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1
**sudo apt-get python2.7-dev install fixed this glitch.
As for the dependencies, cython and pysam, I just downloaded and
unzipped those...and ran "python setup.py install" in while in each
directory.
I then put these to directories in my PATH. This can be done by
editting .bash_profile or export to PATH
Then, I created my sample data...i just wanted to use subset 20K
positions on chr3 as follows:
samtools view -h -u normal.rmdup.bam chr3:178940000:178960000 >
normal_chr3_178940000-178960000.bam (2.9 GB)
samtools view -h -u tumor.rmdup.bam chr3:178940000:178960000 >
tumor_chr3_178940000-178960000.bam (2.9GB)
After the fact, realized that I probably could have done the chr3
subsetting from the whole genome bam files with the positions.file
...will explore that later.
Make sure to index whatever bam files you are using for input to the
program.
samtools index normal_chr3_178940000-178960000.bam
samtools indextumor_chr3_178940000-178960000.bam
Then, I reviewed the arguments for the program. First need to train
the program on the data:
jsm.py train joint_snv_mix_two reference_genome normal_bam tumour_bam
priors_file_name initial_parameter_file_name parameter_file_name
Other parameters (not required) [--positions_file POSITIONS_FILE] [--
min_normal_depth MIN_NORMAL_DEPTH] [--min_tumour_depth
MIN_TUMOUR_DEPTH] [--max_iters MAX_ITERS] [--skip_size SKIP_SIZE] [--
convergence_threshold CONVERGENCE_THRESHOLD]
I wanted to use only the required paramters at first.
Default priors_file and initial_parameter files can be found in the
config directory that comes with the software.
joint_priors.cfg and joint_parameters.cfg
parameter_file_name is for the user to specify...it is where the
results of the training are output and will be used for the classify
mode.
So, I ran it...
username@server:/JointSNVMix-0.7.4# jsm.py train joint_snv_mix_two /
home/shared/Downloads/chromFa/hg19.fa ./
normal_chr3_178940000-60000.bam ./tumor_chr3_178940000-178960000.bam ./
config/joint_priors.cfg ./config/joint_params.cfg ./config/
joint_new_params.txt
Worked well.
Now, to call SNPs
JointSNVMix classify joint_snv_mix_two reference_genome normal_bam
tumour_bam parameter_file_name out_file_name
jsm.py classify joint_snv_mix_two /home/shared/Downloads/chromFa/
hg19.fa ./normal_chr3_178940000-60000.bam ./
tumor_chr3_178940000-178960000.bam ./config/joint_new_params.txt ./
JointSNVMix_two_chr3:178940000-178940000.output.txt
Sure, enough...it ran great.
Here is a link to output file format....http://code.google.com/p/joint-
snv-mix/wiki/output
So, I wanted to see the most likely SNPs to be AA_AB (normal_tumor)
awk '$10 ~ /^1.0000$/'
JointSNVMix_two_chr3:178940000-178940000.output.txt
chr3 178952085 A G 50 2 28 16 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 184641098 A C 37 5 34 13 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 185003943 T G 46 0 18 6 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 187034293 G T 39 2 16 7 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 187336402 C T 44 6 27 12 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 188855139 T G 35 0 35 13 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 189726254 G T 33 7 17 9 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 190808008 G T 39 12 8 10 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 193247965 C T 45 0 25 8 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 195684404 G A 61 2 55 10 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 195785596 T G 41 2 7 5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 197113200 T C 72 3 97 16 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 197117059 G T 60 6 71 13 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
Can quickly see that using min_tumor_depth would eliminate the two
most dubious calls and I'm looking forward to MutationSeq to help
filter out false positives.
Now, will explore some of the other parameters.
Thanks Andrew and Shah lab, very much, for this program.
Trip
chr3 197117059 G T 60 6 71
13 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000
may have quite different count data if a filter on base or mapping
quality was used such as by the JointSNVMix1. When post-processing
JointSNVMix2 results, it is important to keep this in mind.
Cheers,
Andy