fixspf.py

33 views
Skip to first unread message

S. Kose

unread,
Jul 30, 2024, 10:39:04 AM7/30/24
to Microbiome Helper
Hi there,

I've had a bit of trouble spf files and tried fixspf.py. I get the below error, Is this a python/conda version error or something else/a delim issue? 

python3 fix_spf.py -i merged_metaphlan_profile.spf -o merged_metaphlan_profile_fixed.spf
Traceback (most recent call last):
  File "fix_spf.py", line 396, in <module>
    main()
  File "fix_spf.py", line 382, in main
    input_spf = pd.read_csv(filepath_or_buffer=args.input, sep='\t')
  File "/home/mh_user/anaconda2/envs/picrust2-dev/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/mh_user/anaconda2/envs/picrust2-dev/lib/python3.6/site-packages/pandas/io/parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "/home/mh_user/anaconda2/envs/picrust2-dev/lib/python3.6/site-packages/pandas/io/parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "/home/mh_user/anaconda2/envs/picrust2-dev/lib/python3.6/site-packages/pandas/io/parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 67 fields in line 135, saw 199


Andre Comeau

unread,
Jul 30, 2024, 3:02:39 PM7/30/24
to Microbiome Helper
Hmm...that error does seem to "Google out" to a delimiter error...could you show us the first few lines (using head command) of your SPF file there? You should also be running the script with the full option on it to tell it which formatting for the taxonomy string you are using, so should be something like:

fix_spf.py -i deblur_output_exported/feature-table_w_tax.spf \           -o deblur_output_exported/feature-table_w_tax_final.spf \           --replace_ambig_letter_format
...you are missing the last option (either the one above or "--replace_ambig_D_format") on your command below.



ANDRÉ M. COMEAU, PhD
Manager Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2
 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 


From: microbio...@googlegroups.com <microbio...@googlegroups.com> on behalf of S. Kose <sureyy...@gmail.com>
Sent: July 30, 2024 11:39 AM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: [microbiome-helper] fixspf.py
 
CAUTION: The Sender of this email is not from within Dalhousie.
--
You received this message because you are subscribed to the Google Groups "Microbiome Helper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microbiome-hel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microbiome-helper/e5a2d7cd-ca85-4e26-9782-7b6c53a3a141n%40googlegroups.com.

S. Kose

unread,
Jul 31, 2024, 6:51:59 AM7/31/24
to Microbiome Helper
Dear Andre,

Thank you for your response. I did try those options and get the same error. The file looks okay to me. When I use the checkheirarchy.py I also get a cannot parse line 2 error. This didn't happen with the example set, so a bit flumoxed here. 


spf file header:

(picrust2-dev) mh_user@MicrobiomeHelper:~/Desktop/Metaphlan_Bugs_List$ head merged_metaphlan_profile.spf
ID 02V2 02V5 02V7 06V2 06V5 06V7 10V2 10V5 10V7 12V2 12V5 12V7 13V2 13V513V7 14V2 14V5 14V7 15V2 15V5 15V7 17V2 17V5 17V7 18V2 18V5 18V7 23V2 23V5 23V7 35V2 35V5 35V7 38V2 38V5 38V7 39V2 39V5 39V7 40V2 40V5 40V7 41V2 41V5 41V7 45V2 45V5 45V7 46V2 46V5 46V7 47V2 47V5 47V7 48V2 48V5 48V7 49V2 49V5   49V7 50V2 50V5 50V7 51V2 51V5 51V7
#/home/sk/miniconda3/bin/metaphlan /scratch/sk/preprocessing/merg_done/02V2.fastq -t rel_ab -o /scratch/sk/preprocessing/done_rmhost/remvdh/merg_done/hm_out/02V2/02V2_humann_temp/02V2.tsv --input_type fastq --bowtie2out /scratch/sk/hm_out/02V2/02V2_metaphlan_bowtie2.txt --nproc 18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

(....some more info past the header....)

#90000616 reads processed       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
#92084389 reads processed       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
#94317359 reads processed       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
#SampleID       Metaphlan_Analysis      Metaphlan_Analysis      Metaphlan_Analysis      Metaphlan_Analysis      Metap
#clade_name     NCBI_tax_id     relative_abundance      additional_species      NCBI_tax_id     relative_abundance
#mpa_vJun23_CHOCOPhlAnSGB_202307
UNCLASSIFIED    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
k__Archaea      0.0     0.0     0.0     0.0     2157    0.05766         2157    0.04036         2157    0.01889
k__Archaea|p__Candidatus_Thermoplasmatota       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
k__Archaea|p__Candidatus_Thermoplasmatota|c__Thermoplasmata     0.0     0.0     0.0     0.0     0.0     0.0     0.0
k__Archaea|p__Candidatus_Thermoplasmatota|c__Thermoplasmata|o__Methanomassiliicoccales  0.0     0.0     0.0     0.0
k__Archaea|p__Candidatus_Thermoplasmatota|c__Thermoplasmata|o__Methanomassiliicoccales|f__Candidatus_Methanomethyloph
k__Archaea|p__Candidatus_Thermoplasmatota|c__Thermoplasmata|o__Methanomassiliicoccales|f__Candidatus_Methanomethyloph
k__Archaea|p__Candidatus_Thermoplasmatota|c__Thermoplasmata|o__Methanomassiliicoccales|f__Candidatus_Methanomethyloph
k__Archaea|p__Candidatus_Thermoplasmatota|c__Thermoplasmata|o__Methanomassiliicoccales|f__Candidatus_Methanomethyloph
k__Archaea|p__Euryarchaeota     0.0     0.0     0.0     0.0     2157|28890      0.05766         2157|28890      0.040
k__Archaea|p__Euryarchaeota|c__Methanobacteria  0.0     0.0     0.0     0.0     2157|28890|183925       0.05766
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales    0.0     0.0     0.0     0.0     2157|28890|18
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae     0.0     0.0     0.0
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__
k__Bacteria     2       100.0           2       100.0           2       100.0           2       100.0           2
k__Bacteria|p__Actinobacteria   2|201174        4.30251         2|201174        1.57887         2|201174        0.606
k__Bacteria|p__Actinobacteria|c__Actinomycetia  2|201174|1760   4.21583         2|201174|1760   1.27228         2|201

Andre Comeau

unread,
Jul 31, 2024, 11:57:23 AM7/31/24
to Microbiome Helper
What did you do to create this file? There is a whole lot of wrong stuff in there at the top of the file and then this appears to be a hierarchical file type that comes from Metaphlan or Kraken where the lines of taxonomy are fully "stratified out" (ie: Kingdom numbers, then Phylum numbers, then Class...etc.), whereas SPF files are simply one line per taxon at whatever final identity level could be achieved (usually all are down to the species level or padded to be so).

This script is intended to be used in the context of our MicrobiomeHelper SOPs where you create BIOM ASV tables from QIIME2 which are then converted to the SPF format and "fixed" using this script to pad ambiguous taxonomic levels down to the species level since STAMP needs a "full" taxonomy string...it is not intended for other uses.

The SPF file format also looks like the below (starting from line1), for example:

Level_1
Level_2
Level_3
Level_4
Level_5
Level_6
Level_7
E_Aut22_VAL
E_Aut22_VDL
d__Eukaryota
p__Cryptomycota
c__Incertae_Sedis_dup0
o__Incertae_Sedis_dup0
f__Incertae_Sedis_dup0
g__Paramicrosporidium
Unclassified
180
0
d__Eukaryota
p__Chlorophyta
c__Chlorophyceae
o__Chlorophyceae
f__Chlorophyceae
g__Chlorophyceae
Unclassified
0
16
d__Eukaryota
p__Chlorophyta
c__Chlorophyceae
o__Chlorophyceae
f__Chlorophyceae
g__Chlorophyceae
s__Ankistrodesmus_falcatus
0
0
d__Eukaryota
p__Chlorophyta
c__Chlorophyceae
o__Chlorophyceae
f__Chlorophyceae
g__Chlorophyceae
Unclassified
0
247




ANDRÉ M. COMEAU, PhD
Manager Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2
 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 


Sent: July 31, 2024 7:51 AM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: Re: [microbiome-helper] fixspf.py
 

S. Kose

unread,
Aug 1, 2024, 10:32:09 AM8/1/24
to microbio...@googlegroups.com
Thanks for you help Andre, I'll have a look and get back to you if need be. 

Best wishes,
SK

You received this message because you are subscribed to a topic in the Google Groups "Microbiome Helper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/microbiome-helper/VLFj-Zjkfu4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to microbiome-hel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microbiome-helper/QB1PR01MB3650A1496C4C4ACADAC1EE44FDB12%40QB1PR01MB3650.CANPRD01.PROD.OUTLOOK.COM.
Reply all
Reply to author
Forward
0 new messages