Running QIIME Denoiser on large scale computer

MajorBytes

unread,

Jan 27, 2012, 8:45:35 AM1/27/12

to Qiime Forum

I am trying to setup a QIIME run on a large scale computer
https://portal.xsede.org/web/guest/tacc-ranger#running
and not having much luck in getting a successful run. I have a known
good run that I have ran on my 24 core server and I am now trying to
get the same run setup on Ranger for 4096 core run. I have QIIME
installed on Ranger and all tests run successful on the print_qiime.

Any tips would be greatly appreciated....

Jose Carlos Clemente

unread,

Jan 27, 2012, 9:46:43 AM1/27/12

to qiime...@googlegroups.com

Hi,
what is exactly the problem you are finding? Notice that using 4096
cores is probably overkill, how many input sequences do you have?
Jose

MajorBytes

unread,

Jan 27, 2012, 11:46:22 AM1/27/12

to Qiime Forum

This is what the split_library_log is showing..

Number raw input seqs 304951

Length outside bounds of 200 and 1000 1635
Num ambiguous bases exceeds limit of 0 3162
Missing Qual Score 0
Mean qual score below minimum of 25 104
Max homopolymer run exceeds limit of 6 29267
Num mismatches in primer exceeds limit of 0: 146414

Number of sequences with identifiable barcode but without identifiable
reverse primer: 59407

And there are two of those...
==============================================

Here is the QIIME script they are running...
===========================================
#!/bin/bash
#interactive commands are commented out
clear
print_qiime_config.py

#Split library Fwd
echo "Demultiplexing"
rm -rf split_library_output ; split_libraries.py -b 10 -m
mapping_file1.txt -f ZERT.fna -q ZERT.qual -z truncate_only -o
output_split_lib1/

#Split library Rev
echo "Demultiplexing"
rm -rf split_library_output ; split_libraries.py -b 10 -m
mapping_file2.txt -f ZERT.fna -q ZERT.qual -z truncate_only -n
1000000 -o output_split_lib2/

#Orient
echo "orient sequences in same direction"
adjust_seq_orientation.py -i output_split_lib2/seqs.fna -o
output_split_lib2/seqs_rc.fna -r

#Combine lib
echo "Combine fwd and rev libraries"
rm -fr output_combined
mkdir output_combined
cat output_split_lib1/seqs.fna output_split_lib2/seqs_rc.fna >
output_combined/combined_seqs.fna

#Convert sff file
echo "convert sff to sff.txt"
process_sff.py -i ZERT.sff -o sff_format/ -f

# qsub template
#requires format string (walltime, ncpus, nodes, queue, job_name,
keep_output, command)
#Denoise
echo "Denoise dataset"
rm -fr denoised
denoise_wrapper.py -n 256 -v -i sff_format/ZERT.txt -f output_combined/
combined_seqs.fna -o denoised/ -m mapping_file1.txt

#Inflate
echo "Inflating denoiser results"
rm -fr inflate
mkdir inflate
inflate_denoiser_output.py -c denoised/centroids.fasta -s denoised/
singletons.fasta -f output_combined/combined_seqs.fna -d denoised/
denoiser_mapping.txt -o
inflate/denoised_seqs.fasta

#otus
echo "Pick OTUs through OTU table"
rm -fr otus
pick_otus_through_otu_table.py -i inflate/denoised_seqs.fasta -o otus/
per_library_stats.py -i otus/otu_table.txt

#OTU Heatmap
echo "OTU Heatmap"
make_otu_heatmap_html.py -i otus/otu_table.txt -o otus/OTU_Heatmap/

#OTU Network
echo "OTU Network"
make_otu_network.py -m mapping_file1.txt -i otus/otu_table.txt -o otus/
OTU_Network

#Make Taxa Summary Charts
echo "Summarize taxa"
rm -rf wf_taxa_summary
summarize_taxa_through_plots.py -i otus/otu_table.txt -o
wf_taxa_summary -m mapping_file1.txt

echo "Alpha rarefaction"
alpha_diversity.py -h
echo "alpha_diversity:metrics
shannon,PD_whole_tree,chao1,observed_species" > alpha_params.txt
rm -rf wf_arare ; alpha_rarefaction.py -i otus/otu_table.txt -m
mapping_file1.txt -o wf_arare/ -p alpha_params.txt -t otus/rep_set.tre

echo "Beta diversity and plots"
rm -rf wf_bdiv_even146 ; beta_diversity_through_plots.py -i otus/
otu_table.txt -m mapping_file1.txt -o wf_bdiv_even146/ -t otus/
rep_set.tre -e 146

echo "Jackknifed beta diversity"
rm -rf wf_jack ; jackknifed_beta_diversity.py -i otus/otu_table.txt -t
otus/rep_set.tre -m mapping_file1.txt -o wf_jack -e 110

echo "Make Bootstrapped Tree"
make_bootstrapped_tree.py -m wf_jack/unweighted_unifrac/upgma_cmp/
master_tree.tre -s wf_jack/unweighted_unifrac/upgma_cmp/
jackknife_support.txt -o wf_jack/u
nweighted_unifrac/upgma_cmp/jackknife_named_nodes.pdf

=========================================================================

This is the job script

#!/bin/tcsh
# Use Bash Shell
#$ -V # Inherit the submission environment
#$ -cwd # Start job in submission directory
#$ -N myQIIMErun # Job Name
#$ -j y # combine stderr & stdout into stdout
#$ -o $JOB_NAME.o$JOB_ID # Name of the output file (eg. myMPI.oJobID)
#$ -pe 16way 256 # Requests 16 cores/node, 256 cores total
#$ -q normal # Queue name
#$ -l h_rt=23:00:00 # Run time (hh:mm:ss) - 23 hours
#$ -M <dyer....@umontana.edu> # Email notification address
(UNCOMMENT)
#$ -m be # Email at Begin/End of job (UNCOMMENT)

#{echo cmds, use "set echo" in csh}
set echo
##{4 threads/task}
##setenv OMP_NUM_THREADS 4
# Run the MPI executable named "a.out"
# Use TACC's numactl cmd to position
# one task on each socket socket
ibrun tacc_affinity ./RunQiimeBi.sh

=============================================================
print_qiime_config.py -t

System information
==================
Platform: linux2
Python version: 2.7.2 (default, Jun 24 2011, 11:24:26) [GCC
4.0.2 20051125 (Red Hat 4.0.2-8)]
Python executable: /share/home/01504/awarren/python/bin/python

Dependency versions
===================
PyCogent version: 1.5.1
NumPy version: 1.6.1
matplotlib version: 1.1.0
QIIME library version: 1.4.0
QIIME script version: 1.4.0
PyNAST version (if installed): 1.1
RDP Classifier version (if installed): rdp_classifier-2.2.jar

QIIME config values
===================
blastmat_dir: /share/home/01504/awarren/bin/
blast-2.2.22/data
topiaryexplorer_project_dir: None
pynast_template_alignment_fp: /share/home/01504/awarren/data/
core_set_aligned.fasta.imputed
cluster_jobs_fp: /share/home/01504/awarren/bin/
start_parallel_jobs.py
pynast_template_alignment_blastdb: None
assign_taxonomy_reference_seqs_fp: None
torque_queue: batch
template_alignment_lanemask_fp: None
jobs_to_start: 1
cloud_environment: False
qiime_scripts_dir: /share/home/01504/awarren/bin/
denoiser_min_per_core: 50
working_dir: /share/home/01504/awarren/tmp/
python_exe_fp: python
temp_dir: /tmp/
blastall_fp: blastall
seconds_to_sleep: 60
assign_taxonomy_id_to_taxonomy_fp: None

running checks:

test_FastTree_supported_version (__main__.Qiime_config)
FastTree is in path and version is supported ... ok
test_INFERNAL_supported_version (__main__.Qiime_config)
INFERNAL is in path and version is supported ... ok
test_ampliconnoise_install (__main__.Qiime_config)
AmpliconNoise install looks sane. ... ok
test_blast_supported_version (__main__.Qiime_config)
blast is in path and version is supported ... ok
test_blastall_fp (__main__.Qiime_config)
blastall_fp is set to a valid path ... ok
test_blastmat_dir (__main__.Qiime_config)
blastmat_dir is set to a valid path. ... ok
test_cdbtools_supported_version (__main__.Qiime_config)
cdbtools is in path and version is supported ... ok
test_cdhit_supported_version (__main__.Qiime_config)
cd-hit is in path and version is supported ... ok
test_chimeraSlayer_install (__main__.Qiime_config)
no obvious problems with ChimeraSlayer install ... ok
test_clearcut_supported_version (__main__.Qiime_config)
clearcut is in path and version is supported ... ok
test_cluster_jobs_fp (__main__.Qiime_config)
cluster_jobs_fp is set to a valid path and is executable ... ok
test_denoiser_supported_version (__main__.Qiime_config)
denoiser aligner is ready to use ... ok
test_for_obsolete_values (__main__.Qiime_config)
local qiime_config has no extra params ... ok
test_matplotlib_suported_version (__main__.Qiime_config)
maptplotlib version is supported ... ok
test_mothur_supported_version (__main__.Qiime_config)
mothur is in path and version is supported ... ok
test_muscle_supported_version (__main__.Qiime_config)
muscle is in path and version is supported ... ok
test_numpy_suported_version (__main__.Qiime_config)
numpy version is supported ... ok
test_pynast_suported_version (__main__.Qiime_config)
pynast version is supported ... ok
test_pynast_template_alignment_blastdb_fp (__main__.Qiime_config)
pynast_template_alignment_blastdb, if set, is set to a valid path ...
ok
test_pynast_template_alignment_fp (__main__.Qiime_config)
pynast_template_alignment, if set, is set to a valid path ... ok
test_python_exe_fp (__main__.Qiime_config)
python_exe_fp is set to a working python env ... ok
test_python_supported_version (__main__.Qiime_config)
python is in path and version is supported ... ok
test_qiime_scripts_dir (__main__.Qiime_config)
qiime_scripts_dir, if set, is set to a valid path ... ok
test_raxmlHPC_supported_version (__main__.Qiime_config)
raxmlHPC is in path and version is supported ... ok
test_temp_dir (__main__.Qiime_config)
temp_dir, if set, is set to a valid path ... ok
test_template_alignment_lanemask_fp (__main__.Qiime_config)
template_alignment_lanemask, if set, is set to a valid path ... ok
test_uclust_supported_version (__main__.Qiime_config)
uclust is in path and version is supported ... ok
test_working_dir (__main__.Qiime_config)
working_dir, if set, is set to a valid path ... ok

----------------------------------------------------------------------
Ran 28 tests in 3.662s

OK

On Jan 27, 7:46 am, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:

Jose Carlos Clemente

unread,

Jan 27, 2012, 12:19:21 PM1/27/12

to qiime...@googlegroups.com

So apparently 256 cores are being requested, that sounds more
reasonable given you have ~300K seqs before split_libraries. Do the
jobs get submitted and can you see them running? Use qstat to check
this.

Without knowing the particulars of your cluster it is difficult to
figure out what might be going wrong, the easiest way would be to
contact your sysadmin and ask them about it. Maybe the info in the
link below can be useful:

http://qiime.org/tutorials/parallel_qiime.html

Jose

MajorBytes

unread,

Jan 27, 2012, 12:48:28 PM1/27/12

to Qiime Forum

This is actually a test run of 256 cores on the Ranger system, I would
like to get it to run on the maximum number of cores...
It is getting submitted and the job is running, but the data in my
denoiser logs only have this in them....
===================================
login4$ more bsvxigprworker141.log
Client /tmp/bsvxigprworker141 trying to connect to
129.114.112.29:50893
client connected to ('129.114.112.29', 50893)
Server connection established
====================================

On my 24 core HPC boxes it has a lot more info....
Here is a brief example...

[root@elephants denoised]# more efqgyjazworker19.log
Client /tmp/efqgyjazworker19 trying to connect to 10.8.79.13:43056
client connected to ('10.8.79.13', 43056)
Server connection established
Data for round 0 received:
1327525442.15
/tmp/efqgyjazworker19_0...
done!
Data for round 1 received:
1327526266.82
/tmp/efqgyjazworker19_1... done!
Data for round 2 received: 1327527021.52
/tmp/efqgyjazworker19_2... done!
Data for round 3 received: 1327527752.92
/tmp/efqgyjazworker19_3... done!
Data for round 4 received: 1327529350.81
/tmp/efqgyjazworker19_4... done!
Data for round 5 received: 1327530036.45
/tmp/efqgyjazworker19_5... done!
Data for round 6 received: 1327531442.3
/tmp/efqgyjazworker19_6... done!
Data for round 7 received: 1327532081.88
/tmp/efqgyjazworker19_7... done!
Data for round 8 received: 1327532724.39
/tmp/efqgyjazworker19_8... done!
Data for round 9 received: 1327533324.54
/tmp/efqgyjazworker19_9... done!
Data for round 10 received: 1327533910.3
/tmp/efqgyjazworker19_10... done!
Data for round 11 received: 1327534506.3
/tmp/efqgyjazworker19_11... done!
========================================

If I am using a cluster can I still use the start_parallel_jobs or
should I be using the make_cluster_jobs instead.
I am not the researcher I am just the Sys Admin trying to get it to
run in a clustered environment...

I currently have no issues running it on my 24 core servers, besides
it taking a week and a half to finish the runs, so I really would like
to speed these up to at least a 24 hour time frame. You mentioned
overkill on a 4096 core run, so are you having ok run times on less
cores....

If I can get this to run on the Ranger system properly, I plan on
setting it up on Kraken and Condor..

On Jan 27, 10:19 am, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:

> > #$ -M <dyer.war...@umontana.edu> # Email notification address

> > raxmlHPC is in path and version is supported ... ok...
>
> read more »

Jens Reeder

unread,

Jan 27, 2012, 2:33:26 PM1/27/12

to qiime...@googlegroups.com

Note that the denoiser main loop only starts after all workers have connected.
Depending on your queue scheduler, this might take a while.
If you specified to use 256 workers, check if the file bsvxigprworker256.log
is there and what it says.

By the way, you seem to loose at lot of seqs in split_libs:

> Num mismatches in primer exceeds limit of 0: 146414

> Number of sequences with identifiable barcode but without identifiable
> reverse primer: 59407

You might want to check this out first, before proceeding your analysis.

Jens

Jeffrey Bllanchard

unread,

Dec 6, 2012, 8:34:58 PM12/6/12

to qiime...@googlegroups.com

Hi,

I am looking into setting up Qiime on xsede. Were you successful in getting Qiime to run?

Cheers, Jeff Blanchard

Reply all

Reply to author

Forward