Problem with results of denoise_wrapper.py

109 views
Skip to first unread message

Cesar Alejandro Perez Fernandez

unread,
Jul 8, 2017, 9:11:52 PM7/8/17
to Qiime 1 Forum
Hi!,

I'm using denoise_wrapper.py for my 454 data. It returned me the files denoiser.log, prefix_dereplicated.fasta, prefix_dereplicated.sff.txt, and prefix_mapping.txt. It is supossed that returns me centroids and singletons for the next step. I think this archives are intermediate files in the process.
My question is if the denoiser is finishing the process? or it is being interrupted by lack of memory. I'm running the script on my laptop which is a i5, Ubuntu 14, with 6gb of RAM. My fasta contains 14000 sequences approximately.

Thanks for the help!

Jens Reeder

unread,
Jul 10, 2017, 12:06:12 PM7/10/17
to Qiime 1 Forum
Hi Cesar,

14000 sequences are not much, so you should be able to process them even on your laptop.

Can you copy and paste the exact command you were running and any messages that were send to stdout? There should also be a denoiser.log file created which has useful information for debugging. Can you attach that to this post as well?

this will help us to figure out what is going on.

Jens

Cesar Alejandro Perez Fernandez

unread,
Jul 10, 2017, 6:33:07 PM7/10/17
to Qiime 1 Forum
Hi Jens,

I have multiple .sff files, so I'm running a .sh to perform demultiplexing and denoising of the data. My .sh is the next (in red the denoise_wrapper.py):

for file in $(<names.txt)
do
    ## create fasta, sff.txt, and qual from sff
    process_sff.py -i ${file}.sff -f -o fasta_sff/

    ##demultiplex paired reads (remove primers, barcodes, qual, etc)
    split_libraries.py -f fasta_sff/${file}.fna -m ${file}map.txt -q fasta_sff/${file}.qual -b 0 -l 100 -o demultiplexed/${file}/
    mv demultiplexed/${file}/seqs.fna demultiplexed/${file}_demultiplexed.fna

    ##denoise 454
    denoise_wrapper.py -i fasta_sff/${file}.txt -f demultiplexed/${file}_demultiplexed.fna -m ${file}map.txt -n 3 -o denoised/${file}/
    inflate_denoiser_output.py -c denoised/${file}/centroids.fasta -s denoised/${file}/singletons.fasta -f demultiplexed/${file}_demultiplexed.fna -d denoised/${file}/denoiser_mapping.txt -o denoised/${file}_denoised.fna

done

I attach the the denoiser.log and I don't have a stdout or stderr files

Thanks!
denoiser.log

Jens Reeder

unread,
Jul 10, 2017, 7:52:18 PM7/10/17
to Qiime 1 Forum
Hi Cesar,

the first thing you should do is to activate the -v flag in the denoise_wrapper.py call, such that we see some actual progress in the log file.
Secondly, I see you are trying to use three cpus for the computation. This requires you to have setup qiime correctly for parallel use.
Have you verified with another script that this works for you?
Have you followed the instructions here?

Jens


Cesar Alejandro Perez Fernandez

unread,
Jul 10, 2017, 9:48:42 PM7/10/17
to Qiime 1 Forum
I re-run the script with no flag -n (that was my mistake). I re-attach the .log file
denoiser.log

Jens Reeder

unread,
Jul 11, 2017, 12:35:37 AM7/11/17
to Qiime 1 Forum
ok, it looks like it is doing something now. The last lines right now are:

Round 1:
Rounds remaining in worst case: 666
Filtering with IIN13JX02GWPTO: 2780 flowgrams


I would monitor that file, e.g. with
tail -f denoiser.log

and it should add more info the further it goes along.
If not, let me know.

Jens

Cesar Alejandro Perez Fernandez

unread,
Jul 11, 2017, 1:42:57 AM7/11/17
to Qiime 1 Forum
I made the monitoring of the .log while the script was running, and the results are the same. It is not doing anymore

Jens Reeder

unread,
Jul 11, 2017, 12:44:37 PM7/11/17
to Qiime 1 Forum
ok, I assume you are sure that the process actually has stopped, correct? When you use top, there is no lingering python process anymore? If you don't know what 'top' is let's skip this step.

Next, let's see if we can find any bugs in your setup.
Can you run and copy the output of these two command

print_qiime_config.py -tf

which FlowgramAli_4frame



Cesar Alejandro Perez Fernandez

unread,
Jul 11, 2017, 3:44:57 PM7/11/17
to Qiime 1 Forum
Yes the process stopped. I was able to see it using htop.

the result of which FlowgramAli_4frame:
/usr/lib/qiime/support_files/denoiser/bin//FlowgramAli_4frame

I attach the results of print_qiime_config.py -tf
print_qiime_config.txt

Jens Reeder

unread,
Jul 12, 2017, 1:44:07 AM7/12/17
to Qiime 1 Forum
Hm, nothing obviously wrong to see there.
Since it appears to finsish the first phase of the process just fine, but fails to do anything in the second stage, let's inspect the files that have been produced so far.
Could you paste the first 10-20 lines of the prefix_dereplicated.fasta file and maybe do wc on the other files, so we can make sure they look alright?


And just to rule out other obscure errors, could you call and make sure that it works:
/usr/lib/qiime/support_files/denoiser/bin//FlowgramAli_4frame -h

Jens

Cesar Alejandro Perez Fernandez

unread,
Jul 12, 2017, 2:50:47 AM7/12/17
to Qiime 1 Forum
I'm attaching the image of the 20 first lines of the prefix_dereplicated.fasta.

These are the results of the wc:
$ wc prefix_dereplicated.fasta
5560    8340 1164168 prefix_dereplicated.fasta

$ wc prefix_dereplicated.sff.txt  
8353 1650839 8108640 prefix_dereplicated.sff.txt

$ wc prefix_mapping.txt                  
2780  5765 89255 prefix_mapping.txt

$ wc tmp1Ltdot.dat          
2782 1651491 8088136 tmp1Ltdot.dat

Additionally, this is the help of the flowgram:
$ /usr/lib/qiime/support_files/denoiser/bin//FlowgramAli_4frame -h                  
Usage: FlowgramAli_4frame mode error_profile input_file
where mode:
    -align            Align all flowgrams in input against first flowgram in input
    -score            Only compute alignment score
    -flow-lengths         print translated nucleotide length if flowgrams in input
    -relscore         Compute lenghth normalized alignment score
    -relscore_pairid     Compute length normalized alignment score and report %pair id of aligned seqs
    -self             Fast self-alignment score for all flowgrams in input
    -gapless         Fast gapless alignment of first flowgram in input against rest in input





Captura de pantalla de 2017-07-12 02-43-11.png

Jens Reeder

unread,
Jul 12, 2017, 9:22:12 PM7/12/17
to Qiime 1 Forum
ok, since the run actually started to produce a tmp*.dat file, let;s use that one to test the step that it seems to be hanging on:
Can you excecute this single step:

/usr/lib/qiime/support_files/denoiser/bin//FlowgramAli_4frame  -relscore_pairid /usr/local/lib/python2.7/dist-packages/qiime/support_files/denoiser/Data/FLX_error_profile.dat tmp1Ltdot.dat

Can you see if this produces an error?


Cesar Alejandro Perez Fernandez

unread,
Jul 13, 2017, 4:09:39 PM7/13/17
to Qiime 1 Forum
It not produced any errors; the result is a group of numbers
temp.txt.save

Jens Reeder

unread,
Jul 13, 2017, 6:16:42 PM7/13/17
to Qiime 1 Forum
A list of numbers is good - those are the alignment scores.
So everything works just as it should and I don't see any obvious errors.

Are you sure that it is not you or your machine somehow terminating the job?
Denoising takes a while and you need to keep the job running for several hours  without closing the shell it is running in.

If there is an actual problem with the code, you will see some error message either in the log file or right in the console.
Without it, there isn't anything that I can do from remote, sorry.

Jens



Cesar Alejandro Perez Fernandez

unread,
Jul 14, 2017, 11:32:14 AM7/14/17
to Qiime 1 Forum
I re-run denoise-wrapper.py individually (without using the .sh file) and it showed me the next error:

$denoise_wrapper.py -i fasta_sff/green.txt -f demultiplexed/green_demultiplexed.fna -m greenmap.txt -o denoised/green

Traceback (most recent call last):
  File "/usr/local/bin/denoise_wrapper.py", line 150, in <module>
    main()
  File "/usr/local/bin/denoise_wrapper.py", line 136, in main
    titanium=opts.titanium)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoise_wrapper.py", line 37, in fast_denoiser
    verbose=verbose, titanium=titanium)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoiser/flowgram_clustering.py", line 656, in denoise_seqs
    checkpoint_fp=checkpoint_fp)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoiser/flowgram_clustering.py", line 536, in greedy_clustering
    error_profile=error_profile, spread=spread)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoiser/flowgram_clustering.py", line 320, in filter_with_flowgram
    error_profile=error_profile)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoiser/flowgram_clustering.py", line 236, in get_flowgram_distances
    scores = [map(float, (s.split())) for s in scores_fh if s != "\n"]
ValueError: could not convert string to float: Usage:

In the same way, the script produces temporary and intermediate files, and the log doesn't show any error

Jens Reeder

unread,
Aug 18, 2017, 12:18:17 AM8/18/17
to Qiime 1 Forum
HI Cesar,

sorry for dropping the ball. I went on vacation and forgot about this.

Unfortunately, there isn't much left for me to troubleshoot.

It looks like the process starts alright and then it fails to properly call the FlowgramAli_4frame program.
The message you are seeing:

"ValueError: could not convert string to float: Usage: "
tells me that instead of the expected output the program returns its help string, which starts with "Usage: "
Now that only happens when the program is not called correctly, namely if called with the wrong number of arguments..
Since those arguments are created internally, I can't say what went wrong.

Grasping for straws, can you drop the  /green from the -o option nad see what that does?

Also I just noticed that your data looks like 454 Titanium and not FLX, so you should use the --titanium option to activate the correct error protocol.

Jens



Cesar Alejandro Perez Fernandez

unread,
Aug 18, 2017, 12:09:56 PM8/18/17
to qiime...@googlegroups.com
Hi Jens,

I run the command in the next way:

denoise_wrapper.py -i fasta_sff/green.txt -f demultiplexed/green_demultiplexed.fna -m greenmap.txt -o denoised/ --titanium

And the results are the same


Traceback (most recent call last):
  File "/usr/local/bin/denoise_wrapper.py", line 150, in <module>
    main()
  File "/usr/local/bin/denoise_wrapper.py", line 136, in main
    titanium=opts.titanium)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoise_wrapper.py", line 37, in fast_denoiser
    verbose=verbose, titanium=titanium)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoiser/flowgram_clustering.py", line 656, in denoise_seqs
    checkpoint_fp=checkpoint_fp)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoiser/flowgram_clustering.py", line 536, in greedy_clustering
    error_profile=error_profile, spread=spread)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoiser/flowgram_clustering.py", line 320, in filter_with_flowgram
    error_profile=error_profile)
  File "/usr/local/lib/python2.7/dist-packages/qiime/denoiser/flowgram_clustering.py", line 236, in get_flowgram_distances
    scores = [map(float, (s.split())) for s in scores_fh if s != "\n"]
ValueError: could not convert string to float: Usage:

There is the possibility of errors in the inputs?

Jens Reeder

unread,
Aug 22, 2017, 1:38:52 AM8/22/17
to Qiime 1 Forum
With errors in the inputs I was referring to whitespaces or tabs in filenames or something weird like that.
I don't see any of that in your input, so I doubt there is anything wrong.

As  a last attempt, I suggest to get qiime  running in a VM. That way you have a pre-installed enviroenment, where we are sure there are no issues.
Since you have only 14.000 sequences, it might be powerful enough to crunch through.

Here is a link to the qiime VM install:
http://qiime.org/install/virtual_box.html

Jens

Reply all
Reply to author
Forward
0 new messages