RDP classifier errors: similar but different to previously asked questions

76 views
Skip to first unread message

Alexis Walker

unread,
May 6, 2016, 4:07:36 PM5/6/16
to Qiime 1 Forum
Hello, 

I am having trouble running the rdp classifer in my script:

pick_open_reference_otus.py -i /center/w/amwalker8/qiime1.8.0_test/Run_1/testseqs2500.fasta -o /center/w/amwalker8/qiime1.8.0_test/rdp_run -r /center/w/amwalker8/silva123_97_16S.fasta -p/center/w/amwalker8/qiime1.8.0_test/rdp_run/rdp_params.txt -f --prefilter_percent_id 0.97

My parameters file looks like:
pick_otus:enable_rev_strand_match True
align_seqs:template_fp /center/w/amwalker8/core_alignment_SILVA123.fasta
assign_taxonomy:reference_seqs_fp /center/w/amwalker8/silva123_97_16S.fasta
assign_taxonomy:id_to_taxonomy_fp /center/w/amwalker8/taxonomy_all_levels.txt
assign_taxonomy:assignment_method rdp
assign_taxonomy:rdp_max_memory 100000

I was able to run at 100GB since I am doing this via supercomputer. I am using just a small subset of my data (2500 seqs).
Attached is the print_qiime_config for the high performance computer (HPC) and the error produced.

Thanks in advance!
Alexis





print_qiime_config_HPC
HPC_RDP_error

Colin Brislawn

unread,
May 6, 2016, 5:52:06 PM5/6/16
to Qiime 1 Forum
Hello Alexis,

Did you look for clues in your log file? I noticed this line:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Looks like it ran out of memory. Try reducing your memory for the RDP classifier, say to 4000, and try running it again.

Colin 

Alexis Walker

unread,
May 6, 2016, 6:11:15 PM5/6/16
to Qiime 1 Forum
Hi Colin,

That is interesting that you recommend decreasing the max memory as I have seen the opposite in other threads. I think I am confused on exactly what the -rdp max memory is. I tried to lower the memory to 4000 and received the same error output. The file I attached in my initial question with the error is actually from my log file. I cannot understand exactly what the issue is from looking at the log file. Just one of the errors has " java.lang.OutOfMemoryError: Java heap space" but there seem to be errors following up to it.

Thanks,
Alexis

Colin Brislawn

unread,
May 8, 2016, 8:14:16 PM5/8/16
to Qiime 1 Forum
Hello Alexis,

Thanks for trying out those other setting for me. I'm still not totally sure what's going on....

I noticed something else when I went back and read the qiime config file again. The log shows you are using qiime 1.8.0, which is a few years old. I'm not sure if this is a problem of not, but it's possible we are running into a problem with qiime 1.8.0, which could have been fixed in qiime 1.9.1. Just a thought. 
Bigger problem: you are running python 3, while qiime needs python 2. Try loading a python 2 module (or talk to your admins about python 2) before running qiime. 

Have you run other qiime scripts successfully? I'm still looking for the right way to approach this and worry that I'm missing something... 
Thank you for your time,
Colin Brislawn

Kyle Bittinger

unread,
May 9, 2016, 4:31:34 PM5/9/16
to Qiime 1 Forum
Alexis, 

Another thing to try would be to reduce the size of your reference database to a few thousand sequences, and see if you get the same error.  This would allow us to see if you are running out of memory due to a bug, or if the problem is with the format of your reference files.

Also, I see this message in the standard error:
Picked up _JAVA_OPTIONS: -Xmx2048M

Do you think that this environment variable might be overriding the value set on the command line?  It shouldn't happen, but I've seen stranger things.  You might try setting this environment variable to a larger value in your job script, and see if that helps.

Best,
Kyle

Alexis Walker

unread,
May 9, 2016, 8:26:45 PM5/9/16
to Qiime 1 Forum
Hi Colin and Kyle,

Colin:
I was trying to run my script using rdp to assign taxonomy in our supercomputing environment since it seemed that memory was the issue. However the most recent versions on the HPC weren't downloaded correctly, so I am using 1.8.0. I have been trying the same thing on macqiime using my laptop with 16GB of memory but still not getting it to work. 

Kyle:
Those sound like good ideas, but before I implement them, I was wondering if you could explain to me in plain english (more so than what I can find :)), about what the RDP classifier does exactly and the general concept. I am quite confused with exactly what it is doing and what re-training the RDP classifier means. Even after reading all the READMEs in the rdp directory and looking at the source website I am having a hard time getting exactly what it is oding and what it needs to do it. I think if I could understand it a bit better I might be better at trouble shooting on my end. 

Thanks so much!
Alexis

Alexis Walker

unread,
May 10, 2016, 2:59:40 PM5/10/16
to Qiime 1 Forum
Quick update: I was able to run the rdp classifer  on  macqiime 1.9.1 with green genes after otu picking with my rep_set.fna: 

assign_taxonomy.py  -i /Users/alexiswalker/Desktop/16S/Beaufort/Silva123/rep_set.fna -o /Users/alexiswalker/Desktop/16S/Beaufort/Silva123/rdp_assign_tax_gg/ -m rdo

I am currently running with silva and I am thinking that it is the silva ref db that is far too large. I would like to use silva as it is currently better curated than green genes. Is there a way to use silva with rdp? Is there another database that you would recommend?

Thanks,
Alexis

Kyle Bittinger

unread,
May 12, 2016, 10:47:00 PM5/12/16
to Qiime 1 Forum
Alexis, I suspect you're having Java memory issues and I think the steps outlined above will help you to use the SILVA database with the RDP Classifier if that is your wish.

As for how the RDP Classifier works, their 2007 paper is the ultimate reference as far as I know:

In plain English, I'll probably say it wrong, but here goes: the RDP Classifier first tallies all the overlapping 8bp sequences in your read; these are called "words."  So if your read was "AGCTCCGATG," the Classifier would use the words "AGCTCCGA," "GCTCCGAT," and "CTCCGATG."  The Classifier compares the words from your read to words found in the reference database, does some math, and makes the assignment.  When the Classifier is trained, it counts words from the reference database and builds the tables that it will use to look up words when you run the program on your reads.

Reply all
Reply to author
Forward
0 new messages