Re: SAMSA2 - python script issue

176 views
Skip to first unread message

Sam Westreich

unread,
Aug 24, 2018, 4:58:30 PM8/24/18
to Jordyn Bergsveinson, SAMSA bioinformatics group
Hi Jordyn,

Ah, I just solved this issue!  You're the second person to catch this, and you emailed just after I pushed a fix for the first person.  

If you grab the latest version of this script from the Github repo (https://github.com/transcript/samsa2/blob/master/python_scripts/DIAMOND_analysis_counter.py), it should work properly.

Let me know if this doesn't work, and I can help you with more troubleshooting.

Best,
Sam Westreich

On Fri, Aug 24, 2018 at 11:51 AM, Jordyn Bergsveinson <jbergs...@gmail.com> wrote:
Hello,

My apologies for the disruption to what I am sure is a busy day for you.

I am currently using SAMSA2 to process metatranscriptome data and thus far it has been wonderful to work with. I am at the aggregation step and keep running into issues with the Diamond_analysis_counter.py script (error output below). 

I have tried playing around with not only the version of python that I am calling, but also parentheses of the arguments, however am not having much luck. Is this an issue that you have run into previously and/or can you make any recommendation for how to fix the error/syntax?

Thank you for your time and any assistance you might be able to provide. Please let me know if you require any further information from me.

Thanks again,

Jordyn
____________
Dictionary database assembled.
Time elapsed: 334.103634 seconds.
Number of errors: 51149170

Top ten organism matches:
Traceback (most recent call last):
  File "/home/imss/tools/SAMSA/python_scripts/DIAMOND_analysis_counter.py", line 230, in <module>
    for k, v in sorted(condensed_RefSeq_hit_db.items(), key=lambda k,v: -v)[:10]:
TypeError: <lambda>() takes exactly 2 arguments (1 given)
'python2 /home/imss/tools/SAMSA/python_scripts/DIAMOND_analysis_counter.py -I /home/imss/tools/SAMSA/step_4_output/Control_01.RefSeq_annotated -D /home/imss/tools/SAMSA/full_databases/RefSeq_bac.fa -O' exited with non-zero status 1




--
Sam Westreich
Microbiome Scientist, DNAnexus, 

Sam Westreich

unread,
Sep 7, 2018, 2:29:49 AM9/7/18
to Jordyn Bergsveinson, SAMSA bioinformatics group
Hi Jordyn,

Apologies for not getting back to you sooner, work stuff got in the way.  

I haven't had anyone else encounter this issue, but I'm looking into it.  Since I've made some upstream parsing improvements, the lapply may not be needed any longer.  You can try deleting lines 115-116 of the R script run_DESeq_stats.R, as that makes the script run without issue on my test machine.

I'll keep looking to see if these lines are still needed, and if so, how I'll restructure them.

Best,
Sam

On Tue, Sep 4, 2018 at 10:33 AM, Jordyn Bergsveinson <jbergs...@gmail.com> wrote:
Hi Again Sam, 

I forgot to update that that was indeed the issue with the subsystem annotation step. 

I've caught another snag, which may or may not be the result of a recent update that was done to the run_DESeq_stats.R to reduce duplicates (according to GitHub, around a month ago).

Using R version 3.4.4, the issue appears to be with line 115: 
complete_table <- complete_table[, lapply(.SD, sum), by = complete_table$Row.names]
Error:
[1] "USAGE: $ run_DESeq_stats.R -I working_directory/ -O save.filename"
Working directory is  /home/imss/tools/SAMSA/step_5_output/RefSeq_results/org_results 
Error in `[.data.frame`(complete_table, , lapply(.SD, sum), by = complete_table$Row.names) : 
  unused argument (by = complete_table$Row.names)
Calls: [

I am slightly more R literate, so I’ve attempted to work with changing brackets/data frame, however to no avail so far. 

I thought I would check in with you to see if anyone else had encountered this issue. 

Thanks very much again for all your help - much appreciated. 

Jordyn

On Aug 27, 2018, at 3:37 PM, Sam Westreich <swest...@gmail.com> wrote:

Hey Jordyn,

No worries, glad to keep troubleshooting.

I see the issue here - it looks like the script is going recursive.  The input file for this (crashed) run is Control_01.subsy_annotated.hierarchy, which is the output of a previous DIAMOND_subsystems_analysis_counter.py run!  

Did you get this from the master script?  If so, you may need to go through and remove all the "*.hierarchy" files.  Conversely, another way to solve this is to pull the most updated master_script.sh from Github (https://github.com/transcript/samsa2/blob/master/bash_scripts/master_script.sh) - I've deleted the single asterisk that causes this recursive behavior.

This should fix the issue (I just tested the -P flag when pointed to a proper annotation file, i.e. XXX.subsys_annotated), but let me know if you still see issues.

Best,
Sam

On Mon, Aug 27, 2018 at 11:20 AM, Jordyn Bergsveinson <jbergs...@gmail.com> wrote:
Hi Sam, 

So sorry to bug you again and so soon.

I have moved on to attempting to aggregate the subsystem annotation results, and am once again running into issues, but with an entirely different error. This one I am even more out of my depth than on a lamda issue. 

Below is the output; note all scripts have been pulled recently/are the most up-to-date). Again, any help is greatly appreciated. 

I know this is certainly the downside of developing and maintaining a tool, however I greatly appreciate the overall package that you and co-authors developed and your assistance. 

Thanks again, 

Jordyn

Analysis of /home/imss/tools/SAMSA/step_4_output/Control_01.subsys_annotated.hierarchy complete.
Number of total lines: 2609
Number of unique sequences: 228
Time elapsed: 0.00442 seconds.

Starting database analysis now.
1000000 lines processed so far in 4.437465 seconds.
2000000 lines processed so far in 8.969947 seconds.
3000000 lines processed so far in 13.508576 seconds.
4000000 lines processed so far in 18.53037 seconds.
5000000 lines processed so far in 23.043998 seconds.
6000000 lines processed so far in 27.590813 seconds.
7000000 lines processed so far in 31.997657 seconds.

Success!
Time elapsed: 35.980884 seconds.
Number of lines: 7939855
Number of errors: 0
Traceback (most recent call last):
  File "/home/imss/tools/SAMSA/python_scripts/DIAMOND_subsystems_analysis_counter.py", line 155, in <module>
    partial_outfile.write(entry + "\t" + read_id_db[entry] + "\t" + db_hier_dictionary[read_id_db[entry]] + "\n")
KeyError: '277'
'python2 /home/imss/tools/SAMSA/python_scripts/DIAMOND_subsystems_analysis_counter.py -I /home/imss/tools/SAMSA/step_4_output/Control_01.subsys_annotated.hierarchy -D /home/imss/tools/SAMSA/full_databases/subsys_db.fa -O /home/imss/tools/SAMSA/step_4_output/Control_01.subsys_annotated.hierarchy.hierarchy -P /home/imss/tools/SAMSA/step_4_output/Control_01.subsys_annotated.hierarchy.receipt' exited with non-zero status 1

On Aug 24, 2018, at 3:30 PM, Jordyn Bergsveinson <jbergs...@gmail.com> wrote:

Hi Sam, 

Thank you so much for the speedy reply! The new script is working like a charm, I am off and running again.

Much appreciated, 

Jordyn 
Reply all
Reply to author
Forward
0 new messages