Hello, I am having issues with standardized_DIAMOND_analysis_counter.py script
I am getting an IndexError in line 138
I've modified samsa2 master_script_for_sample_files.bash file and the error comes after STEP4 is DONE
command from the samsa2 pipeline script:STEP 5: AGGREGATING WITH ANALYSIS_COUNTER
Now reading through the m8 results infile.
Analysis of /projects/bact.fun.unmapped.RefSeq_annotated complete.
Number of total lines: 574668
Number of unique sequences: 574668
Time elapsed: 1.8101940155 seconds.
then "Starting database analysis now." message pops and goes until
198M lines processed so far in 2025.08801007 seconds.
Then I get this error:Traceback (most recent call last):
File "/projects/tools/samsa2/python_scripts/standardized_DIAMOND_analysis_counter.py", line 138, in
db_entry = db_entry[1][:-1]
IndexError: list index out of range
Here is an snapshot of your script from line 127 to 138
I generated a database containing viral, fungi and bacteria sequence.
Bacterial and Viral sequences were downloaded from NCBI but Fungi was downloaded from Zenodo.org where Samsa2 creators uploaded their data.
https://zenodo.org/record/3737678#.Y5uzSS-B2_cI've checked if there might be an issue with the sequence names from each database and I couldn't find any issues.
Here are examples from
Bacterial:
>WP_206150240.1 LysE family translocator [Burkholderia sp. Tr-20390]
MSLSALLAFALILSVGVATPGPTVLLAMSNGSRYGLRHAMVGMLGAVTADVVLVALVGCGLGMLLDASETAFVTLKLAGAAWLAYVGVRMLLSSGGSAAAQALDHATPDHRTAFLKSFFVAMSNPKYYLFMSALLPQFVDRSHAIAPQYAILAATIVAIDVIGMTGYALLGVHSVRVWKAAGEKWLNRVSGSLLLMLAGYVALYRKAAN
Viral:
>YP_009137152.1 envelope glycoprotein L [Human alphaherpesvirus 2]
MGFVCLFGLVVMGAWGAWGGSQATEYVLRSVIAKEVGDILRVPCMRTPADDVSWRYEAPSVIDYARIDGIFLRYHCPGLDTFLWDRHAQRAYLVNPFLFAAGFLEDLSHSVFPADTQETTTRRALYKEIRDALGSRKQAVSHAPVRAGCVNFDYSRTRRCVGRRDLRPANTTSTWEPPVSSDDEASSQSKPLATQPPVLALSNAPPRRVSPTRGRRRHTRLRRN
and Fungi:
>MT1.1
SSIYTITCYPRRTFLPLYVYGTLSHRSYKFILFSNLSNIKAHLVSYPALTSLYGTSLKYFSVGILFTFNPIILLIFVYSIRESFYSVFSSLTSGMLSIIISEALLFFTYFWGILHFSLSPYPLSNEGIIITSSRMLILTITFILASASCMTACLQVFIEKGMSFEISSIICIIYLLGECFASLQTTEYLHLSYHINDTVYTTLFYCVTGLHFSHVVIGLLLLIIYFIRIIEIYDTSTEWFINSFGISYIVIPHTDQITILYWHFVEIVWLFIEFLFYSE
I look forward hearing from you. Thanks in advance.