The problem with running xinteract on OMSSA pepxml

44 views
Skip to first unread message

Ping

unread,
Jun 30, 2009, 1:29:02 PM6/30/09
to spctools-discuss
Hi,

I am trying to run the xinteract on the omssa pep.xml output files. my
omssa's version is 2.1.4, my TPP version is 4.2.1. But I couldn't get
it through. I search the old post, there is a similar post, but the
problem was solved by specifying enzyme to xinteract.

I tried it, but is still not working. InteractParse went through, but
PeptideProphetParser got stuck by a segmentation fault.

Any help would be greatly appreciated!

Many Thanks,

Ping

***** output for interactParser and PeptideProphetParser

InteractParser 'interact.pep.xml' 'omssa.pep.xml' -L'7' -E'trypsin' -C
-P
file 1: ParoSaliv_SHAM_03.pep.xml
processed altogether 2623 results

PeptideProphetParser 'interact.pep.xml' DECOY=DECOY MINPROB=0
NONPARAM
Using Decoy Label "DECOY".
Using non-parametric distributions
(OMSSA) (minprob 0)
WARNING!! The discriminant function for OMSSA is not yet complete. It
is presented here to help facilitate trial and discussion. Reliance
on this code for publishable scientific results is not recommended.
init with OMSSA Trypsin
MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization:
UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

PeptideProphet (TPP v4.2 JETSTREAM rev 1, Build 200905131510
(linux)) AKeller@ISB
read in 75 1+, 1790 2+, 749 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra.
Initialising statistical models ...
WARNING: No decoys with label DECOY were found in this dataset.
reverting to fully unsupervised method.
Iterations: .........10.........20
Segmentation fault

Zhi Sun

unread,
Jun 30, 2009, 2:08:56 PM6/30/09
to spctools...@googlegroups.com
Hi all,

I just checked in two programs, and did not put the log message.

I will give a short description here:

createChargeFile.pl: this program reads the ms2 file generated by CPM
(charge prediction program), or directory which contains a list of dta file,
and creates a .charge file. The .charge file has two columns:
<spectrum_id>\t<charges>.

mergeCharges.pl: this program reads the .charges file produced by above
program and the original mzXML file, creates a new mzXML file from the old
one using the charge info in .charge file, and recreates the mzXML index.

These programs are used to process the ETD data.

Thanks,
Zhi

Zhi Sun

unread,
Jun 30, 2009, 2:08:56 PM6/30/09
to spctools...@googlegroups.com

Ping

unread,
Jun 30, 2009, 6:23:03 PM6/30/09
to spctools-discuss
Is this something else and suppose to be a new post?

YP

Ping

unread,
Jul 2, 2009, 1:38:44 PM7/2/09
to spctools-discuss
Does any one have suggestion on this problem? I used the omssacl -op
option to generate the pepxml result. Will this cause problems?

Thanks,

Ping Yan


David Shteynberg

unread,
Jul 2, 2009, 2:20:28 PM7/2/09
to spctools...@googlegroups.com
Hi Ping,

The problem is this message:
WARNING: No decoys with label DECOY were found in this dataset.
reverting to fully unsupervised method.
Iterations: .........10.........20

The OMSSA modelling relies on having a nonparameteric model which is
built from decoys IDs from the database. Are you searching your data
against a database that contains some decoys? Are you specifying the
correct decoy_tag to TPP? The decoy_tag is a unique string that
identifies all your decoy sequences in the database, all decoy names
must begin with this string.

-David

Jimmy Eng

unread,
Jul 2, 2009, 7:44:48 PM7/2/09
to spctools...@googlegroups.com
Ping,

I just downloaded OMSSA 2.1.4 and tried the direct pep.xml export
myself. I do see a problem with the resulting pep.xml file that the
"-op" option generates that's causing the problem you're seeing.

The key error message in your output is this:
WARNING: No decoys with label DECOY were found in this dataset.

Looking at the generated pep.xml files, OMSSA seems to be placing some
number in the protein="" attribute of each search_hit element. Whereas
PeptideProphet expects this protein attribute to contain some protein
identifier that includes the DECOY string for those decoy matches. In
the converters we use, the value of the protein attribute is the first
word of the protein definition line.

As for a fix, we need someone at NCBI to address this and hopefully
someone here will contact them about this. For you in the short term,
you're going to need a developer to modify you pep.xml files to replace
the value in the "protein" attribute with the first word from the
"protein_descr" attribute of each search_hit entry.

- Jimmy

David Shteynberg

unread,
Jul 2, 2009, 10:15:34 PM7/2/09
to spctools...@googlegroups.com
If you are running the tpp on the commandline you can correct this
problem in the omssa output by running the steps of xinteract
separately and enabling InteractParser option -P.
--
Sent from my mobile device

Ping

unread,
Jul 3, 2009, 1:08:01 AM7/3/09
to spctools-discuss
Jimmy,

Thanks a lot for your explanation. I am going to write a script to fix
the pep.xml file and see how it goes.

Ping

On Jul 2, 4:44 pm, Jimmy Eng <j...@systemsbiology.org> wrote:
> Ping,
>
> I just downloadedOMSSA2.1.4 and tried the direct pep.xml export
> myself.  I do see a problem with the resulting pep.xml file that the
> "-op" option generates that's causing the problem you're seeing.
>
> The key error message in your output is this:
> WARNING: No decoys with label DECOY were found in this dataset.
>
> Looking at the generated pep.xml files,OMSSAseems to be placing some
> number in the protein="" attribute of each search_hit element.  Whereas
> PeptideProphet expects this protein attribute to contain some protein
> identifier that includes the DECOY string for those decoy matches.  In
> the converters we use, the value of the protein attribute is the first
> word of the protein definition line.
>
> As for a fix, we need someone at NCBI to address this and hopefully
> someone here will contact them about this.  For you in the short term,
> you're going to need a developer to modify you pep.xml files to replace
> the value in the "protein" attribute with the first word from the
> "protein_descr" attribute of each search_hit entry.
>
> - Jimmy
>
> Ping wrote:
> > Hi,
>
> > I am trying to run the xinteract on theomssapep.xml output files. my
> >omssa'sversion is  2.1.4, my TPP version is 4.2.1. But I couldn't get
> > it through. I search the old post, there is a similar post, but the
> > problem was solved by specifying enzyme to xinteract.
>
> > I tried it, but is still not working. InteractParse went through, but
> > PeptideProphetParser got stuck by a segmentation fault.
>
> > Any help would be greatly appreciated!
>
> > Many Thanks,
>
> > Ping
>
> > ***** output for interactParser and PeptideProphetParser
>
> > InteractParser 'interact.pep.xml' 'omssa.pep.xml' -L'7' -E'trypsin' -C
> > -P
> >  file 1: ParoSaliv_SHAM_03.pep.xml
> >  processed altogether 2623 results
>
> > PeptideProphetParser 'interact.pep.xml' DECOY=DECOY MINPROB=0
> > NONPARAM
> > Using Decoy Label "DECOY".
> > Using non-parametric distributions
> >  (OMSSA) (minprob 0)
> > WARNING!! The discriminant function forOMSSAis not yet complete.  It

Ping

unread,
Jul 3, 2009, 1:12:11 AM7/3/09
to spctools-discuss
David,

Thanks a lot for your response. I generated the decoy database and
rerun
the OMSSA earlier.

I run the command line manually. And as you suggested, I used option -
p for
InteractParser, but when I run PeptideProphetParse, got this error:

ERROR: NAN probability density detected. Please alert the
developer !!!

I will try Jimmy's suggestion and see how it goes.

Thanks again!

Ping

On Jul 2, 7:15 pm, David Shteynberg <dshteynb...@systemsbiology.org>
wrote:
> If you are running the tpp on the commandline you can correct this
> problem in theomssaoutput by running the steps of xinteract
> separately and enabling InteractParser option -P.
>
> On 7/2/09, Jimmy Eng <j...@systemsbiology.org> wrote:
>
>
>
>
>
> > Ping,
>
> > I just downloadedOMSSA2.1.4 and tried the direct pep.xml export
> > myself.  I do see a problem with the resulting pep.xml file that the
> > "-op" option generates that's causing the problem you're seeing.
>
> > The key error message in your output is this:
> > WARNING: No decoys with label DECOY were found in this dataset.
>
> > Looking at the generated pep.xml files,OMSSAseems to be placing some
> > number in the protein="" attribute of each search_hit element.  Whereas
> > PeptideProphet expects this protein attribute to contain some protein
> > identifier that includes the DECOY string for those decoy matches.  In
> > the converters we use, the value of the protein attribute is the first
> > word of the protein definition line.
>
> > As for a fix, we need someone at NCBI to address this and hopefully
> > someone here will contact them about this.  For you in the short term,
> > you're going to need a developer to modify you pep.xml files to replace
> > the value in the "protein" attribute with the first word from the
> > "protein_descr" attribute of each search_hit entry.
>
> > - Jimmy
>
> > Ping wrote:
> >> Hi,
>
> >> I am trying to run the xinteract on theomssapep.xml output files. my
> >>omssa'sversion is  2.1.4, my TPP version is 4.2.1. But I couldn't get
> >> it through. I search the old post, there is a similar post, but the
> >> problem was solved by specifying enzyme to xinteract.
>
> >> I tried it, but is still not working. InteractParse went through, but
> >> PeptideProphetParser got stuck by a segmentation fault.
>
> >> Any help would be greatly appreciated!
>
> >> Many Thanks,
>
> >> Ping
>
> >> ***** output for interactParser and PeptideProphetParser
>
> >> InteractParser 'interact.pep.xml' 'omssa.pep.xml' -L'7' -E'trypsin' -C
> >> -P
> >>  file 1: ParoSaliv_SHAM_03.pep.xml
> >>  processed altogether 2623 results
>
> >> PeptideProphetParser 'interact.pep.xml' DECOY=DECOY MINPROB=0
> >> NONPARAM
> >> Using Decoy Label "DECOY".
> >> Using non-parametric distributions
> >>  (OMSSA) (minprob 0)
> >> WARNING!! The discriminant function forOMSSAis not yet complete.  It

Jake W

unread,
Jul 3, 2009, 6:46:25 AM7/3/09
to spctools-discuss
One solution to this is to use the indexing option ("-o T") in
formatdb when formatting your BLAST database for OMSSA searching. In
my experience, that will result in OMSSA writing the first word of the
protein definition line to the "protein" attribute in the pepXML.
Without the -o option, OMSSA just puts a number in the "protein="
attribute indicating that the hit came from the nth entry in the
database.

Ping

unread,
Jul 3, 2009, 5:16:25 PM7/3/09
to spctools-discuss
Hi Jake,

Thanks for your response. I tried -o T option with formatdb, and rerun
the omssacl, but got this error message.

"omssacl.cpp", line 279: Fatal: Exception in COMSSA::Run: NCBI C++
Exception:
"/net/napc02/vol/pubchem/Users/lewisg/checkin/cxx/src/serial/
serialobject.cpp", line 228: Error: NCBI-BlastDL::Blast-def-line.title

Ping
Message has been deleted

Ping

unread,
Jul 6, 2009, 3:31:27 PM7/6/09
to spctools-discuss
Jimmy,

As you suggested, I replace the value in the "protein" with the first
word from the "protein_descr" in each search_hit entry. But when I run
PeptideProphetParser, it still gives the error message:

ERROR: NAN probability density detected. Please alert the
developer !!!

Is there anything else I need to fix?

Thanks!

Ping


Here is the example of the result after I replace protein/
protein_descr

for not decoy string, it looks like this:

<search_hit hit_rank="1" peptide="YPSRPLPPPPPFGLGFVPPPPPPYGPGR"
peptide_prev_aa="R" peptide_next_aa="I" protein="gi|73975149|ref|
XP_862242.1|" num_tot_proteins="3" num_matched_ions="15"
tot_num_ions="54" calc_neutral_pep_mass="2949.569"
massdiff="0.59985990298992" is_rejected="0" protein_descr="gi|
73975149|
ref|XP_862242.1| PREDICTED: hypothetical protein XP_857149 isoform 2
[Canis familiaris]" num_tol_term="2" num_missed_cleavages="0">
<alternative_protein protein="gi|73975151|ref|XP_862273.1|"
protein_descr="gi|73975151|ref|XP_862273.1| PREDICTED: hypothetical
protein XP_857180 isoform 3 [Canis familiaris]"/>
<alternative_protein protein="gi|73975153|ref|XP_862298.1|"
protein_descr="gi|73975153|ref|XP_862298.1| PREDICTED: hypothetical
protein XP_857205 isoform 4 [Canis familiaris]"/>
<search_score name="pvalue" value="0.000002382663488"/>
<search_score name="expect" value="0.028320338214958"/>
</search_hit>

For decoy string, it looks like this:

<search_hit hit_rank="2" peptide="QESARYSAKVTVAGLEESATEAQQQIR"
peptide_prev_aa="K" peptide_next_aa="S" protein="decoy_2098
2" num_tot_proteins="1" num_matched_ions="17" tot_num_ions="52"
calc_neutral_pep_mass="2949.481" massdiff="0.9728599071700
05" is_rejected="0" protein_descr="decoy_20982">
<search_score name="pvalue" value="0.000014221885248"/>
<search_score name="expect" value="0.168173793062284"/>
</search_hit>


On Jul 2, 4:44 pm, Jimmy Eng <j...@systemsbiology.org> wrote:

Jimmy Eng

unread,
Jul 6, 2009, 4:55:06 PM7/6/09
to spctools...@googlegroups.com
Has anyone been able to feed OMSSA native pepXML output (using -op
option) through PeptideProphet? I don't know if anyone has tested this
or if OMSSA has only been tested using a separate converter.

Replacing the protein word was to address the error message having no
DECOY entries. This fix hopefully addressed that one point which is
seems to have done. So to address the follow-up problem, minimally
you'll want to include the diagnostic output from PeptideProphetParser
in case anything obvious shows up from that info. Possibly you'll need
to have a developer, who knows more than I do, take a closer look at
your data.

- Jimmy

Ping

unread,
Jul 9, 2009, 1:21:40 PM7/9/09
to spctools-discuss
Jimmy,

Thanks for your information.

I debug the TPP source code. There is a small bug inside the
NonParametricDistribution.cxx which causes the NAN error in the
previous tries. I fix it and the PeptideProphetParser is working fine
now.

Also after enabling InteractParser option -P as Davis suggested, I do
not need to replace the value in the "protein" with the first word
from the "protein_descr" in each search_hit entry.

Thanks again,

Ping

GATTACA

unread,
Jul 14, 2009, 11:46:08 AM7/14/09
to spctools-discuss
Ping,

Could you post what changes you made to the
NonParametricDistribution.cxx to fix the NAN error?

Thanks.

On Jul 9, 1:21 pm, Ping <yanpp...@gmail.com> wrote:
> Jimmy,
>
> Thanks for your information.
>
> I debug the TPP source code. There is a small bug inside the
> NonParametricDistribution.cxx which causes theNANerror in the
> previous tries.  I fix it and the PeptideProphetParser is working fine
> now.
>
> Also after enabling InteractParser option -P as Davis suggested, I do
> not need to replace the value in the "protein" with the first word
> from the "protein_descr" in each search_hit entry.
>
> Thanks again,
>
> Ping
>
> On Jul 6, 1:55 pm, Jimmy Eng <j...@systemsbiology.org> wrote:
>
> > Has anyone been able to feedOMSSAnative pepXML output (using -op
> > option) through PeptideProphet?  I don't know if anyone has tested this
> > or ifOMSSAhas only been tested using a separate converter.
>
> > Replacing the protein word was to address the error message having no
> > DECOY entries.  This fix hopefully addressed that one point which is
> > seems to have done.  So to address the follow-up problem, minimally
> > you'll want to include the diagnostic output from PeptideProphetParser
> > in case anything obvious shows up from that info.  Possibly you'll need
> > to have a developer, who knows more than I do, take a closer look at
> > your data.
>
> > - Jimmy
>
> > Ping wrote:
> > > Jimmy,
>
> > > As you suggested, I replace the value in the "protein" with the first
> > > word from the "protein_descr" in each search_hit entry. But when I run
> > > PeptideProphetParser, it still gives the error message:
>
> > > ERROR:NANprobability density detected.  Please alert the
> > >> I just downloadedOMSSA2.1.4 and tried the direct pep.xml export
> > >> myself.  I do see a problem with the resulting pep.xml file that the
> > >> "-op" option generates that's causing the problem you're seeing.
>
> > >> The key error message in your output is this:
> > >> WARNING: No decoys with label DECOY were found in this dataset.
>
> > >> Looking at the generated pep.xml files,OMSSAseems to be placing some
> > >> number in the protein="" attribute of each search_hit element.  Whereas
> > >> PeptideProphet expects this protein attribute to contain some protein
> > >> identifier that includes the DECOY string for those decoy matches.  In
> > >> the converters we use, the value of the protein attribute is the first
> > >> word of the protein definition line.
>
> > >> As for a fix, we need someone at NCBI to address this and hopefully
> > >> someone here will contact them about this.  For you in the short term,
> > >> you're going to need a developer to modify you pep.xml files to replace
> > >> the value in the "protein" attribute with the first word from the
> > >> "protein_descr" attribute of each search_hit entry.
>
> > >> - Jimmy
>
> > >> Ping wrote:
> > >>> Hi,
> > >>> I am trying to run the xinteract on theomssapep.xml output files. my
> > >>>omssa'sversion is  2.1.4, my TPP version is 4.2.1. But I couldn't get
> > >>> it through. I search the old post, there is a similar post, but the
> > >>> problem was solved by specifying enzyme to xinteract.
> > >>> I tried it, but is still not working. InteractParse went through, but
> > >>> PeptideProphetParser got stuck by a segmentation fault.
> > >>> Any help would be greatly appreciated!
> > >>> Many Thanks,
> > >>> Ping
> > >>> ***** output for interactParser and PeptideProphetParser
> > >>> InteractParser 'interact.pep.xml' 'omssa.pep.xml' -L'7' -E'trypsin' -C
> > >>> -P
> > >>>  file 1: ParoSaliv_SHAM_03.pep.xml
> > >>>  processed altogether 2623 results
> > >>> PeptideProphetParser 'interact.pep.xml' DECOY=DECOY MINPROB=0
> > >>> NONPARAM
> > >>> Using Decoy Label "DECOY".
> > >>> Using non-parametric distributions
> > >>>  (OMSSA) (minprob 0)
> > >>> WARNING!! The discriminant function forOMSSAis not yet complete.  It

Ping

unread,
Jul 14, 2009, 12:28:40 PM7/14/09
to spctools-discuss
Sure. I am not completely sure that my fix is correct.

Under two functions:

NonParametricDistribution::densityFit
NonParametricDistribution::varBWdensityFit

I changed :

(*d)[i] = (*d)[i] / k->size();

into:

if ( k->size() == 0)
(*d)[i] = 0;
else
(*d)[i] = (*d)[i] / k->size();


That is where I got the NAN error at the first point.

Thanks,

Ping

David Shteynberg

unread,
Jul 14, 2009, 12:47:52 PM7/14/09
to spctools...@googlegroups.com
Thanks for the fix. I already checked one in like it into SVN a few
weeks back. The upcoming release will avoid this problem.

-David
Reply all
Reply to author
Forward
0 new messages