error with standardized_DIAMOND_analysis_counter.py

36 views

Skip to first unread message

Arian Lundberg

unread,

Dec 18, 2022, 11:38:40 PM12/18/22

to SAMSA bioinformatics group

Hello, I am having issues with standardized_DIAMOND_analysis_counter.py script
I am getting an IndexError in line 138

I've modified samsa2 master_script_for_sample_files.bash file and the error comes after STEP4 is DONE

command from the samsa2 pipeline script:

STEP 5: AGGREGATING WITH ANALYSIS_COUNTER

for file in $starting_files_location/step_4_output/*RefSeq_annotated* do python $python_programs/standardized_DIAMOND_analysis_counter.py -I $file -D $RefSeq_db -O python $python_programs/standardized_DIAMOND_analysis_counter.py -I $file -D $RefSeq_db -F done error:

Now reading through the m8 results infile.

Analysis of /projects/bact.fun.unmapped.RefSeq_annotated complete.
Number of total lines: 574668
Number of unique sequences: 574668
Time elapsed: 1.8101940155 seconds.

then "Starting database analysis now." message pops and goes until

198M lines processed so far in 2025.08801007 seconds.

Then I get this error:

Traceback (most recent call last):
File "/projects/tools/samsa2/python_scripts/standardized_DIAMOND_analysis_counter.py", line 138, in
db_entry = db_entry[1][:-1]
IndexError: list index out of range

Here is an snapshot of your script from line 127 to 138

for line in db: if line.startswith(">") == True: db_line_counter += 1 splitline = line.split("[",1) # ID, the hit returned in DIAMOND results db_id = str(splitline[0].split()[0])[1:] # name and functional description db_entry = line.split("[", 1) db_entry = db_entry[0].split(" ", 1) db_entry = db_entry[1][:-1]

I generated a database containing viral, fungi and bacteria sequence.

Bacterial and Viral sequences were downloaded from NCBI but Fungi was downloaded from Zenodo.org where Samsa2 creators uploaded their data.

https://zenodo.org/record/3737678#.Y5uzSS-B2_c

I've checked if there might be an issue with the sequence names from each database and I couldn't find any issues.

Here are examples from

Bacterial:

>WP_206150240.1 LysE family translocator [Burkholderia sp. Tr-20390]

MSLSALLAFALILSVGVATPGPTVLLAMSNGSRYGLRHAMVGMLGAVTADVVLVALVGCGLGMLLDASETAFVTLKLAGAAWLAYVGVRMLLSSGGSAAAQALDHATPDHRTAFLKSFFVAMSNPKYYLFMSALLPQFVDRSHAIAPQYAILAATIVAIDVIGMTGYALLGVHSVRVWKAAGEKWLNRVSGSLLLMLAGYVALYRKAAN

Viral:

>YP_009137152.1 envelope glycoprotein L [Human alphaherpesvirus 2]

MGFVCLFGLVVMGAWGAWGGSQATEYVLRSVIAKEVGDILRVPCMRTPADDVSWRYEAPSVIDYARIDGIFLRYHCPGLDTFLWDRHAQRAYLVNPFLFAAGFLEDLSHSVFPADTQETTTRRALYKEIRDALGSRKQAVSHAPVRAGCVNFDYSRTRRCVGRRDLRPANTTSTWEPPVSSDDEASSQSKPLATQPPVLALSNAPPRRVSPTRGRRRHTRLRRN

and Fungi:

>MT1.1

SSIYTITCYPRRTFLPLYVYGTLSHRSYKFILFSNLSNIKAHLVSYPALTSLYGTSLKYFSVGILFTFNPIILLIFVYSIRESFYSVFSSLTSGMLSIIISEALLFFTYFWGILHFSLSPYPLSNEGIIITSSRMLILTITFILASASCMTACLQVFIEKGMSFEISSIICIIYLLGECFASLQTTEYLHLSYHINDTVYTTLFYCVTGLHFSHVVIGLLLLIIYFIRIIEIYDTSTEWFINSFGISYIVIPHTDQITILYWHFVEIVWLFIEFLFYSE