Failed Job - FBMN GC workflow - Library Search - MSDIAL

141 views
Skip to first unread message

stefano papazian

unread,
Jun 7, 2020, 12:27:07 PM6/7/20
to GNPS Discussion Forum and Bug Reports
Hello,

I am trying the new GC FBMN workflow in GNPS, using already processed data from within MS-DIAL. However, I constantly get failed jobs.  
The issue is possibly due to the format of files and tables. I have a couple of questions in regards. I find it a bit confusing to follow the instructions, between different documentation pages:

Specifically:

1) spectra: should the mgf from MS-DIAL exported as "centroided" or "deconvoluted"?  I am using the deconvoluted.
2) quantitification table: should this be in .csv or .txt format? I started with .csv, tried also in .txt , but both failed. 
Also, should it be in the format exported by MS-DIAL, or reformatted as mzMine, i.e. with "row ID" and "row m/z"? 
I am using the MS-DIAL format (area.txt), edited removing the statistics part. Basically, as this example: https://raw.githubusercontent.com/lfnothias/GNPSDocumentation/master/docs/tutorials/AG_tutorial_files/MS-DIAL-GNPS_AG_test_featuretable.txt
But here it sounds like I should reformat it as mzMine: https://ccms-ucsd.github.io/GNPSDocumentation/gc-ms-library-molecular-network/

(to reduce the source of errors, here I did not include the metadata table).

Thank you very much for your feedback

Best,

Stefano

stefano papazian

unread,
Jun 7, 2020, 12:51:54 PM6/7/20
to GNPS Discussion Forum and Bug Reports
Regarding the MS-DIAL pre-processing:
data were acquired with GC Q-Exactive, in profile mode, and converted from Thermo .raw to .abf. The .abf (profile) data were processed in MS-DIAL.

Does it matter for the GNPS FBMN GC Library search workflow, if the raw files were not initially converted to mzXML centroided, but ABF profile?

Thanks

Stefano 

Aksenov Alexander

unread,
Jun 8, 2020, 3:12:19 PM6/8/20
to GNPS Discussion Forum and Bug Reports
Hi Stefano
The short answer is yes, you just need to format both the spectra .mgf and the quant .csv table the same way as for the MZmine output. Make sure the column header match in the csv file as shown in the tutorial and you have the row ID, row m/z and row retention time columns :
image.png

We have a script that converts MSDIAL output into the correct format so it won't have to be done manually and it will be linked out in the tutorial which we will update shortly.
Best
Alexander

Aksenov Alexander

unread,
Jun 8, 2020, 3:13:18 PM6/8/20
to GNPS Discussion Forum and Bug Reports
Stefano - 
if you run a library search/networking job, it doesn't matter what were the raw files and how they were processed as long as the deconvolution results are formatted to be compatible with GNPS. The format of raw data matters if you want to run the deconvolution job on GNPS, then your files would need to be cdf or mzml, but it shouldn't matter how you converted them from vendor's format (as long as there is no issue with the conversion). Strongly recommend to use centroid data - even though GNPS deconvolution workflow accepts profile data, there is a higher chance of running into errors (also, the orbitrap data in profile would be huge).
Best
Alexander

Aksenov Alexander

unread,
Jun 18, 2020, 1:14:01 PM6/18/20
to GNPS Discussion Forum and Bug Reports
Stefan0 -
the library search workflow has been updated to accept MSDIAL results directly. You can give it a try.
Best
Alexander

stefano papazian

unread,
Jun 23, 2020, 11:13:36 AM6/23/20
to GNPS Discussion Forum and Bug Reports
Hi Alexander,

I tried again now. The job failed, with an error that refers to the carbon marker (csv). I am using RT in minutes as floats with decimals as a dot (as in the MS-DIAL quant table).
This is the error. I will try now the same job without the carbon marker.

Carbon_Marker_File/Carbon_Marker_File-00000.csv Traceback (most recent call last): File "/data/ccms-gnps/tools/molecularsearch-gc/release_23/calculate_kovats.py", line 69, in <module> main() File "/data/ccms-gnps/tools/molecularsearch-gc/release_23/calculate_kovats.py", line 66, in main args.output_library_identifications) File "/data/ccms-gnps/tools/molecularsearch-gc/release_23/calculate_kovats.py", line 39, in calculate_kovats identifications_df = pd.read_csv(input_identifications_filename, sep="\t") File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds) File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 449, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 818, in __init__ self._make_engine(self.engine) File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 1049, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 1695, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 565, in pandas._libs.parsers.TextReader.__cinit__ pandas.errors.EmptyDataError: No columns to parse from file Tool execution terminates abnormally with exit code [1]

stefano papazian

unread,
Jun 23, 2020, 11:37:39 AM6/23/20
to GNPS Discussion Forum and Bug Reports
Now without the carbon marker it worked. What do you think it could be the problem in the format?
This the failed job ID: 18ea6230aa9843cd9c2ea34336368cc7

Aksenov Alexander

unread,
Jun 23, 2020, 2:01:38 PM6/23/20
to GNPS Discussion Forum and Bug Reports
Stefano -
the error indicates the carbon markers file is not formatted correctly, it should be formatted as this example: https://ccms-ucsd.github.io/GNPSDocumentation/static/gc_kovats_skin.csv
Best
Alexander

stefano papazian

unread,
Jun 24, 2020, 3:06:32 AM6/24/20
to GNPS Discussion Forum and Bug Reports
Hi Alexander,

so I guess the Rt (mine in minutes) MUST be in seconds? 
That´s also what is written in the documentation, but from our private conversation I understood that the Rt in minutes was also fine.  
The Rt entries in the MS-DIAL quant table are can be kept minutes? 

Thanks

Stefano

stefano papazian

unread,
Jun 24, 2020, 3:29:51 AM6/24/20
to GNPS Discussion Forum and Bug Reports
Nope, failed again. I had Rt in minutes in the original quant table, and now in seconds for the RI. Same error, below
ID=a7b11d597711495284a2c3c172bef2a3

In addition, the jobs that worked (without the RI) produced a very small network, with no hits on the GNPS library.
I have about 400 features, from a high-res GC-Orbi without derivatization 
ID=bace3f16c4614bc88767c80d841378ec


Carbon_Marker_File/Carbon_Marker_File-00000.csv Traceback (most recent call last): File "/data/ccms-gnps/tools/molecularsearch-gc/release_23/calculate_kovats.py", line 69, in <module> main() File "/data/ccms-gnps/tools/molecularsearch-gc/release_23/calculate_kovats.py", line 66, in main args.output_library_identifications) File "/data/ccms-gnps/tools/molecularsearch-gc/release_23/calculate_kovats.py", line 39, in calculate_kovats identifications_df = pd.read_csv(input_identifications_filename, sep="\t") File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds) File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 449, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 818, in __init__ self._make_engine(self.engine) File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 1049, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/data/ccms-gnps/tools/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 1695, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 565, in pandas._libs.parsers.TextReader.__cinit__ pandas.errors.EmptyDataError: No columns to parse from file Tool execution terminates abnormally with exit code [1]

Aksenov Alexander

unread,
Jun 24, 2020, 7:54:01 PM6/24/20
to GNPS Discussion Forum and Bug Reports
Stefano -
take the example RI file and replace the values with your own, this way the formatting would be correct.
Regarding the job that worked - your Fragment Ion Mass Tolerance is much too low. Even though you use high res data, you are searching against low res libraries and so mass defect of the exact masses from rounded to the nominal masses values would often exceed the value you have set (0.025Da). Make it 0.4 or so and then you can filter matches post hoc. Other solution would be to convert you high res library to GNPS-compatible format (https://ccms-ucsd.github.io/GNPSDocumentation/batchupload/) and search against it with narrow mass tolerance.
Best
Alexander
Reply all
Reply to author
Forward
0 new messages