Question on Mascot.dat file and database.fasta file

248 views
Skip to first unread message

Fay Wang

unread,
Jun 18, 2010, 2:09:25 PM6/18/10
to spctools-discuss
Hi all,

I posted my questions yesterday, but they were a bit too long to read,
so I shortened it (still seems long).

I am trying to use TPP (4.3.1) to run some cICAT data I collected a
while ago. All these cICAT data have been searched by Mascot (I have
all the .dat files). So far I have not got my data run by TPP on my
computer. Here are some questions.

1. If I have 1 mzXML file, a Mascot.dat file, and a database.fasta
file, are they enough to run TPP? (I guess so).

2. Does the database.fasta file has to be the exactly same
database.fasta file when I did Mascot searches (my data are a bit
old). I use SwissProt database. As I recall, when you update it, the
newer database overwrite the older one. So, can I use the Mascot.dat
files (searched against a older version SwissProt database) along with
the current database (newer version) to get TPP run? If not, where can
I download those older databases? I can not find them online.

3. I redid Mascot search anyway. With this new .dat file and the
SwissProt database.fasta file (for generating this .dat file), the TPP
still did not run. I got following message:

c:/Inetpub/wwwroot/ISB/data/SSM4-1/F030337.pep.xml [ Unreadable! ]

command "c:\Inetpub\tpp-bin\Mascot2XML c:/Inetpub/wwwroot/ISB/data/
SSM4-1/F030337.dat -Dc:/Inetpub/wwwroot/ISB/data/SwissProt_57.12.fasta
-Etrypsin" failed: Operation not permitted
Command FAILED

I do not know how to fix it. It seems some permission issue? I looked
up a lot of old posts, seems no one ever mentioned this problem. Maybe
it is too simple to ask?

Any suggestion? Thanks a lot!

Fay

Brian Pratt

unread,
Jun 18, 2010, 5:52:05 PM6/18/10
to spctools...@googlegroups.com
Hard to say what's going on.  Perhaps you could upload your new mascot result file to the files area at http://groups.google.com/group/spctools-discuss?hl=en. and someone can have a look.
 
Brian


--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To post to this group, send email to spctools...@googlegroups.com.
To unsubscribe from this group, send email to spctools-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.


Jing Wang

unread,
Jun 18, 2010, 8:57:02 PM6/18/10
to spctools...@googlegroups.com
Hi Brian,

Thanks!

I have uploaded my mascot search file (F030337.dat).

Fay

Brian Pratt

unread,
Jun 22, 2010, 9:06:47 AM6/22/10
to spctools...@googlegroups.com
I don't see it there...

Jing Wang

unread,
Jun 22, 2010, 2:02:14 PM6/22/10
to spctools...@googlegroups.com
I wasn't sure where did I actually upload this .dat file since I saw two different "incoming" folders under ftp://ftp.systemsbiology.net/pub.   It seems two different directories (one is in red-colored letter, and other other one is black). I could only upload my file to one of them. That was what I did.

Anyway, I attached this .dat file in this email. Please take a look.

Thanks a lot,

Jing


F030337.dat

Jing Wang

unread,
Jun 24, 2010, 5:14:43 PM6/24/10
to spctools...@googlegroups.com
Brian,

Thanks for looking into my search file.

Anything I can do now to fix the first problem?

Sorry for uploading the file to the wrong place. I did what the link, "how to upload files to the SPC team", describes at http://groups.google.com/group/spctools-discuss?hl=en. Couldn't believe I overlooked the "+upload file" button for several times last week. Not recall seeing this button at all. It looks so OBVIOUS now.

Thanks,

Jing 


Jing Wang

unread,
Jun 28, 2010, 7:37:28 PM6/28/10
to spctools...@googlegroups.com
Hi Brian,

I ran the Mascot search again with "trypsin" as enzyme, instead of tryspin/P. The .dat file was able to convert to .pep.xml file successfully (cool!). Well, there is a warning at the end of the command lines, and I don't know how important this is.

".......

333.   opening 334spectrum.0000.0000.3.out   

 

 warning: cannot open "c:/Inetpub/wwwroot/ISB/data/SSM411/F030382.mzXML" for reading MS instrument info.

Command Successful"


When I ran "Analyze Peptides" for the next step, It failed.  It seemed can't find .mzXML file (MS data?). There is the mzXML file in the same folder.


see following error message:

# Commands for session NYDW4ZIZ7 on Mon Jun 28 16:24:38 2010
# BEGIN COMMAND BLOCK
###### BEGIN Command Execution ######
[Mon Jun 28 16:24:38 2010] EXECUTING: run_in c:/Inetpub/wwwroot/ISB/data/SSM411; c:\Inetpub\tpp-bin\xinteract  -NSSM411.pep.xml -p0.05 -l1 -Oi -X-m1.0-nC,9.0 -A-lC-r0.5-mC9.0 c:/Inetpub/wwwroot/ISB/data/SSM411/F030382.pep.xml 
OUTPUT:
 
c:\Inetpub\tpp-bin\xinteract (TPP v4.3 JETSTREAM rev 1, Build 200909091257 (MinGW))
 
running: "C:/Inetpub/tpp-bin/InteractParser "SSM411.pep.xml" "c:/Inetpub/wwwroot/ISB/data/SSM411/F030382.pep.xml" -L"1""
 file 1: c:/Inetpub/wwwroot/ISB/data/SSM411/F030382.pep.xml
 processed altogether 238 results
  
 results written to file c:/Inetpub/wwwroot/ISB/data/SSM411/SSM411.pep.xml
 
 direct your browser to http://localhost/ISB/data/SSM411/SSM411.pep.shtml
   
command completed in 0 sec 
 
running: "C:/Inetpub/tpp-bin/PeptideProphetParser "SSM411.pep.xml" MINPROB=0.05 ICAT"
 (MASCOT) (icat)
results for charge 1: 0 id tot and 0 adj scores
results for charge 2: 189 id tot and 55 adj scores
results for charge 2: 21.2543 adj_ion_mean and 25.4733 adj_ion_hom mean 23.7547id mean0.908354 correlation (r) 
2+ ion - id = 0.988351*(ion - hom) + -3.92233 with error = 4.05281
mean ion - id: 21.2543, mean ion - hom: 25.4733
results for charge 3: 49 id tot and 16 adj scores
results for charge 4: 0 id tot and 0 adj scores
results for charge 5: 0 id tot and 0 adj scores
results for charge 6: 0 id tot and 0 adj scores
results for charge 6: -0 adj_ion_mean and -0 adj_ion_hom mean 0id meannan correlation (r) 
results for charge 7: 0 id tot and 0 adj scores
init with MASCOT trypsin 
MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN
 
 PeptideProphet  (TPP v4.3 JETSTREAM rev 1, Build 200909091257 (MinGW)) AKeller@ISB
 read in 0 1+, 188 2+, 49 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra.
Initialising statistical models ...
Iterations: .........10.........20....
model complete after 25 iterations
command completed in 0 sec 
 
running: "C:/Inetpub/tpp-bin/ProphetModels.pl -i SSM411.pep.xml"
Analyzing SSM411.pep.xml ...
Parsing search results "c:/Inetpub/wwwroot/ISB/data/SSM411/F030382 (MASCOT)"...
  => Total of 154 hits.
command completed in 1 sec 
 
running: "C:/Inetpub/tpp-bin/XPressPeptideParser "SSM411.pep.xml" -m1.0 -nC,9.0"
WARNING: Found more than one variable mod on 'C'.
XPRESS error - cannot open file from basename c:/Inetpub/wwwroot/ISB/data/SSM411/F030382, will try to derive from scan names
scan-derived scan file c:/Inetpub/wwwroot/ISB/data/SSM411/.mzXML (from ) not found, cannot proceed...
 
command "C:/Inetpub/tpp-bin/XPressPeptideParser "SSM411.pep.xml" -m1.0 -nC,9.0" failed: Operation not permitted
 
command "C:/Inetpub/tpp-bin/XPressPeptideParser "SSM411.pep.xml" -m1.0 -nC,9.0" exited with non-zero exit code: 1
QUIT - the job is incomplete
 
command "c:\Inetpub\tpp-bin\xinteract -NSSM411.pep.xml -p0.05 -l1 -Oi -X-m1.0-nC,9.0 -A-lC-r0.5-mC9.0 c:/Inetpub/wwwroot/ISB/data/SSM411/F030382.pep.xml" failed: Operation not permitted
END OUTPUT
RETURN CODE:256
###### End Command Execution ######
# All finished at Mon Jun 28 16:24:39 2010
# END COMMAND BLOCK

 

Any suggestion for fixing this?

Thanks in advance

Jing




On Thu, Jun 24, 2010 at 8:06 AM, Brian Pratt <brian...@insilicos.com> wrote:
Not everyone has read access to the ISB's FTP server (including me), which is why I said "upload your new mascot result file to the files area at http://groups.google.com/group/spctools-discuss?hl=en. "
 
Anyway, there are two problems:
1) 'error: enzyme in search constraint "Trypsin/P" is not recognized'
2) the above message doesn't appear by default and the program fails silently
 
I'll check in a fix for the second problem.

Brian Pratt

unread,
Jun 28, 2010, 7:42:16 PM6/28/10
to spctools...@googlegroups.com
Try renaming the mzXMl file to "F030382.mzXML", maybe?

Jimmy Eng

unread,
Jun 28, 2010, 8:10:51 PM6/28/10
to spctools...@googlegroups.com
Unfortunately based on the error messages, I don't believe that
naming/renaming the mzXML will solve all of the problems.

One convention to try and stick to is to use the same base name for
all of the files. That means either
F030382.dat, F030382.pep.xml, F030382.mzXML
or
SSM411.dat, SSM411.pep.xml, SSM411.mzXML
or better yet
original_name.mzXML, original_name.dat, original_name.pep.xml (where
you keep the base name the same as the original mzXML base name).

Sorry if I'm mixing up those two file base names but I see references
to pep.xml files for both.

I see you're trying to run XPRESS quantitation and it look like it's
reading an empty base name (to which it appends ".mzXML" to find
corresponding mzXML file) from the pep.xml file. After sticking to
the convention listed above, assuming you still get failure in the
tools, what are the values of the two "base_name" attributes in the
pep.xml file?

Did you start with an mzXML file to generate the mgf file using
MzXML2Search? If not, I don't believe you'll be able to run XPRESS as
the resulting .dat file (and converted pep.xml file) won't have the
scan number encodings in the convention that the TPP tools (such as
XPRESS) requires in order to properly access the spectral data.

- Jimmy

David Shteynberg

unread,
Jun 28, 2010, 8:24:02 PM6/28/10
to spctools...@googlegroups.com
When charge states are not listed in the mzXML file you should run the command:

MzXML2Search -mgf -c1-3 *.mzXML

This will generate the mgf files to submit to Mascot.

Once you get the .dat files back you should rename that to the
original basename of the mzXML file with the .dat extension.

Now you should run Mascot2XML and the scan numbers should be ok.

-David

David Shteynberg

unread,
Jun 28, 2010, 8:46:05 PM6/28/10
to spctools...@googlegroups.com
Actually, after a discussion on the phone with Jimmy, I think that the
-c1-3 option is not needed in this case and will cause some extra
searches being run. Just use

MzXML2Search -mgf *.mzXML


And then rename the resulting .dat file to match the input basename.

-David

Jing Wang

unread,
Jun 29, 2010, 7:09:40 PM6/29/10
to spctools...@googlegroups.com
Hi Jimmy,

Yes, you are right. SSM411 is how I named my sample, and F030382 is the .DAT file name generated by MASCOT.
I have renamed them by SSM411.*.

I didn't convert the .mgf file using MzXML2Search ealier. I used the pkl files generated by PLGS (Waters) from .raw data to do the Mascot searches, and got .DAT files. I directly converted these .DAT files to pep.XML files, but I couldn't see any spectral data under "IONS" column in PepXML viewer. When I did "mzXML---mgf----dat----pepXML" I can see the spectral data (but, so far I only got one file worked).

I will never think of naming issue. Thanks,

Jing

Jing Wang

unread,
Jun 29, 2010, 7:12:32 PM6/29/10
to spctools...@googlegroups.com
Hi Brian, Jimmy, David,

Thanks for all the suggestions! 

I did all you suggested: generated .mgf file, re-searched by Mascot, renamed .dat file......., and they all worked for the file I have been trying (SSM411, as Jimmy pointed out) so far. But, when I tried the different mzXML files (since SSM411 is just one of the fractions), I was stuck on Mascot searching. I have tried another 4 different mzXML files, they all gave me similar error massages:

Max number of ions is 10000. Ignoring ms-ms set starting at line 139813 [M00031]
Your search is continuing...
Warning:
.............................(similar warnings with different line numbers)
.......................
.......................
Max number of ions is 10000. Ignoring ms-ms set starting at line 154038 [M00031]
Your search is continuing...
Warning:

Your search is continuing...

Sorry, your search could not be performed due to the following mistake entering data.
Missing ion intensity value on line 3857088 of input file [M00430]
Please press the back button on your browser, correct the fault and retry the search.



Another problem is although the SSM411.mgf file worked on Mascot search, the results is a bit different from what I got earlier searched by pkl files generated by PLGS (Waters). The result from .mgf (converted by MzXML2Search) gives 33 identified proteins, and the one from .pkl (converted by PLGS) gives 39 identified proteins. The result from .mgf also gives fewer number of "peptide matches above identity threshold" compared to result from .pkl file (104 vs. 135).
I also tried to generate .pkl file by MzXML2Search command just for the curiosities. It gives the very similar result compared to the search from .mgf file. The only differece is ""peptide matches above identity threshold" with 103 instead of 104. The search from .pkl file (converted by PLGS) didn't give any warning message during the searching process, while the .mgf and .pkl (converted by MzXML2Search) gave the similar warning message as follows:

Max number of ions is 10000. Ignoring ms-ms set starting at line 275395 [M00031]
Your search is continuing...

....................... (similar warnings with different line numbers)

.......................

Your search is continuing...
Finished uploading search details and file...
Searching....
Warning:
Error 31 has been detected 26 times and only the first 10 messages have been output [M00999]
Your search is continuing...

.20% complete
..50% complete


Any suggestions for fixing?

Thanks in advance,

Jing







Jimmy Eng

unread,
Jun 29, 2010, 7:29:06 PM6/29/10
to spctools...@googlegroups.com
There's no surprise that generating mgf files using other tools can
cause Mascot to perform (much) better as those tools apply things like
peak picking which MzXML2Search doesn't do. MzXML2Search pretty much
just takes the input spectral data and writes it back out into the
chosen output format.

Anyways, what you describe below includes two issues: lower number of
identifications and max ions cutoff. There's no real near term
solution to directly address the first issue. That fix would entail
some developer to spend time implementing a peak picking routine in
the tool that's validated to work well with Mascot. The second issue
can be mitigated by using the '-N<num>' option in MzXML2Search. That
command line option specifies the maximum peak count to export for any
given spectrum. Use a command like the following:

MzXML2Search -mgf -N100 input.mzXML

This will cause only to 100 most intense m/z values for each spectrum
to be printed out. I just use 100 as an example. Because this does
reduce the peak count, it will have some affect on the resulting
Mascot identifications. And I'm sure there has to be some peak count
value that will give you optimal number of identifications; whether or
not that number of identifications approaches what you get by PLGS
data export is unknown though.

If you're motivated to do so, I would suggest that you generate
SSM411.mgf using various peak counts (50, 100, 150, 200, 400, etc.)
and run them through Mascot to see which gives the most
identifications and see if the results approach the PLGS results.

- Jimmy

Fay Wang

unread,
Jun 29, 2010, 8:02:46 PM6/29/10
to spctools-discuss

I run " MzXML2Search -mgf -N100 input C:\Inetpub\wwwroot\ISB\data
\SSM411\SSM411.mzXML " (was my command right?),
and it gave the message:

output mode selected: Mascot Generic Format
ERROR - option not recognized: -N100

I didn't see -N<num> in the options either?

Jing

Jimmy Eng

unread,
Jun 29, 2010, 8:19:52 PM6/29/10
to spctools...@googlegroups.com
Sorry, that's an option I added to the program in April which means
it's not available in the current release. Here's a Windows binary
that includes that option until the next TPP release occurs:
http://tinyurl.com/3634hpz

- Jimmy

Jing Wang

unread,
Jun 30, 2010, 1:20:56 PM6/30/10
to spctools...@googlegroups.com
Thanks! It worked! Cool!

I used a different fraction, and tried several numbers for N (i.e., 50, 100, 150, 200, 300, 400, 500, 800, 1000).

The results are listed as:
N number (# of identified proteins / # of peptide matches above identity threshold / # of peptide matches above homology or identity threshold)

The search done by pkl (generated by PLGS) gives result ( 69/ 201/ 210).

The ones done by mgf (by MzXML2Search) give results: (there are 4436 scans in this data)

N = 50,    ( 45/ 99/ 115)
N = 100,  ( 49/ 107/ 118)
N = 150,  ( 51/ 114/ 126)
N = 200,  ( 49/ 114/ 126)
N = 300,  ( 51/ 125/ 130)
N = 400,  ( 50/ 129/ 134)
N = 500,  ( 53/ 130/ 135)
N = 800,   ( 56/ 133/ 137)
N = 1000, ( 56/ 134/ 139)

Jing




Reply all
Reply to author
Forward
0 new messages