sharing intro JointSNVMix experience

53 views

Skip to first unread message

rts

unread,

Nov 21, 2011, 12:13:20 AM11/21/11

to JointSNVMix User Group

Hi All,
Just wanted to pass on my introductory experience.
No specific questions at present, just providing this for reference to
others.
I am an intermediate linux user.
First objective to see if I could run the program on a 20K base pair
alignment sample of my data with basic, default options.

Installation went well.
I am using ubuntu 10.4.
Even though I had python 2.7 there was some small glitch at the
beginning
username@server:/JointSNVMix-0.7.4# python setup.py install
running install
running build
running build_py
running build_ext
building 'joint_snv_mix.counters.counter' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -
Wstrict-prototypes -fPIC -Ijoint_snv_mix/counters -Iinclude/pysam -
Iinclude/samtools -I/usr/include/python2.7 -c joint_snv_mix/counters/
counter.c -o build/temp.linux-x86_64-2.7/joint_snv_mix/counters/
counter.o
joint_snv_mix/counters/counter.c:4:20: fatal error: Python.h: No such
file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

**sudo apt-get python2.7-dev install fixed this glitch.

As for the dependencies, cython and pysam, I just downloaded and
unzipped those...and ran "python setup.py install" in while in each
directory.
I then put these to directories in my PATH. This can be done by
editting .bash_profile or export to PATH

Then, I created my sample data...i just wanted to use subset 20K
positions on chr3 as follows:
samtools view -h -u normal.rmdup.bam chr3:178940000:178960000 >
normal_chr3_178940000-178960000.bam (2.9 GB)
samtools view -h -u tumor.rmdup.bam chr3:178940000:178960000 >
tumor_chr3_178940000-178960000.bam (2.9GB)

After the fact, realized that I probably could have done the chr3
subsetting from the whole genome bam files with the positions.file
...will explore that later.

Make sure to index whatever bam files you are using for input to the
program.
samtools index normal_chr3_178940000-178960000.bam
samtools indextumor_chr3_178940000-178960000.bam

Then, I reviewed the arguments for the program. First need to train
the program on the data:
jsm.py train joint_snv_mix_two reference_genome normal_bam tumour_bam
priors_file_name initial_parameter_file_name parameter_file_name

Other parameters (not required) [--positions_file POSITIONS_FILE] [--
min_normal_depth MIN_NORMAL_DEPTH] [--min_tumour_depth
MIN_TUMOUR_DEPTH] [--max_iters MAX_ITERS] [--skip_size SKIP_SIZE] [--
convergence_threshold CONVERGENCE_THRESHOLD]

I wanted to use only the required paramters at first.
Default priors_file and initial_parameter files can be found in the
config directory that comes with the software.
joint_priors.cfg and joint_parameters.cfg
parameter_file_name is for the user to specify...it is where the
results of the training are output and will be used for the classify
mode.

So, I ran it...

username@server:/JointSNVMix-0.7.4# jsm.py train joint_snv_mix_two /
home/shared/Downloads/chromFa/hg19.fa ./
normal_chr3_178940000-60000.bam ./tumor_chr3_178940000-178960000.bam ./
config/joint_priors.cfg ./config/joint_params.cfg ./config/
joint_new_params.txt

Worked well.

Now, to call SNPs

JointSNVMix classify joint_snv_mix_two reference_genome normal_bam
tumour_bam parameter_file_name out_file_name

jsm.py classify joint_snv_mix_two /home/shared/Downloads/chromFa/
hg19.fa ./normal_chr3_178940000-60000.bam ./
tumor_chr3_178940000-178960000.bam ./config/joint_new_params.txt ./
JointSNVMix_two_chr3:178940000-178940000.output.txt

Sure, enough...it ran great.
Here is a link to output file format....http://code.google.com/p/joint-
snv-mix/wiki/output

So, I wanted to see the most likely SNPs to be AA_AB (normal_tumor)
awk '$10 ~ /^1.0000$/'
JointSNVMix_two_chr3:178940000-178940000.output.txt

chr3 178952085 A G 50 2 28 16 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 184641098 A C 37 5 34 13 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 185003943 T G 46 0 18 6 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 187034293 G T 39 2 16 7 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 187336402 C T 44 6 27 12 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 188855139 T G 35 0 35 13 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 189726254 G T 33 7 17 9 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 190808008 G T 39 12 8 10 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 193247965 C T 45 0 25 8 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 195684404 G A 61 2 55 10 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 195785596 T G 41 2 7 5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
chr3 197113200 T C 72 3 97 16 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
chr3 197117059 G T 60 6 71 13 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000

Can quickly see that using min_tumor_depth would eliminate the two
most dubious calls and I'm looking forward to MutationSeq to help
filter out false positives.
Now, will explore some of the other parameters.
Thanks Andrew and Shah lab, very much, for this program.

Trip

aroth

unread,

Dec 5, 2011, 4:17:18 PM12/5/11

to JointSNVMix User Group

Just one thing to note when looking at the results of the JointSNVMix2
model. The count data shows all counts observed regardless of base and
mapping quality. However, not all counts contribute equally to
genotype call as they are weighted by qualities. Thus a line such as

chr3 197117059 G T 60 6 71
13 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000

may have quite different count data if a filter on base or mapping
quality was used such as by the JointSNVMix1. When post-processing
JointSNVMix2 results, it is important to keep this in mind.