Error with DIAMOND_analysis_counter.py?

50 views
Skip to first unread message

Teresa Mccarrell

unread,
Jun 7, 2022, 5:08:19 PM6/7/22
to SAMSA bioinformatics group
I'm trying to jump into the samsa pipeline with some outputs from diamond, but I'm having an issue with the first script I'm trying, DIAMOND_analysis_counter.py.

I'm trying to run this line: 
python ../samsa2/python_scripts/DIAMOND_analysis_counter.py –I $31T0A1_S1_megahit.contigs.dmdout.m8 –D nr –O
But I get this error: WARNING: need to specify either organism results (with -O flag in command), reference IDs (with -R flag in command), or functional results (with -F flag in command).

I tried the -F or -R flags and the cause the same error..

For the file location, I have a folder called RNAdata which contains the folder the Diamond outputs and reference are in and the samsa2 folder. I'm calling the command from the folder with the Diamond outputs and reference, using the non-Diamond indexed version of the reference. 

The diamond command I ran was: 
diamond blastx -p 32 -d nr -q  31T0A1_S1_megahit.contigs. fa -o  31T0A1_S1_megahit.contigs.dmdout -f 6 -t dmnd_tmp -k 10 --id 85 --query-cover 65 --min-score 60
And I renamed the output to 31T0A1_S1_megahit.contigs.dmdout.m8 since I didn't see an option in the diamond page to export directly to m8. 

The diamond output looks like this:
Screenshot (861).png

Sam Westreich

unread,
Jun 7, 2022, 9:22:06 PM6/7/22
to Teresa Mccarrell, SAMSA bioinformatics group
Hi Teresa,

Really quick check - you have "nr" for the name of the database.  Is this correct?  It's saved as a file that is literally "nr", no extension?

Other question - it looks like you have em-dashes (–) in front of the O in the command.  Is this because you copied from somewhere else?  Could you try this again with en dashes (-)?

I'm curious as to how em-dashes got there.  I should maybe add some sanity checking commands to look for either em or en dashes...

Best,
Sam

--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/samsa-bioinformatics-group/3d4e42a6-3659-4824-bb25-684da0b509fbn%40googlegroups.com.


--
Sam Westreich, PMP, PhD
Microbiome Scientist, DNAnexus, 

Teresa Mccarrell

unread,
Jun 8, 2022, 12:06:30 PM6/8/22
to SAMSA bioinformatics group
I had downloaded this database as a file called "nr.gz," which became "nr" with no extension following unzipping. Despite having no extension, it worked in Diamond. 

Yep, I did copy the command from a tutorial-style webpage. After changing the type of dashes, it started to run, but gave this error:

Now reading through the m8 results infile.

Analysis of 730B_1_S19_megahit.contigs.dmdout.m8 complete.
Number of total lines: 65453
Number of unique sequences: 14386
Time elapsed: 0.1435084342956543 seconds

Starting database analysis now.
Traceback (most recent call last):
  File "../samsa2/python_scripts/DIAMOND_analysis_counter.py", line 151, in <module>
    if split_db_org[1] == "sp.":
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../samsa2/python_scripts/DIAMOND_analysis_counter.py", line 157, in <module>
    db_org = split_db_org[1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../samsa2/python_scripts/DIAMOND_analysis_counter.py", line 162, in <module>
    db_org = split_db_org[1] + " " + split_db_org[2]
IndexError: list index out of range

Reply all
Reply to author
Forward
0 new messages