how is xtandem fval calcualted

17 views
Skip to first unread message

Bill Nelson

unread,
Sep 14, 2009, 4:16:10 PM9/14/09
to spctools-discuss
Can you please explain how ProteinProphet calculates the fval from the
xtandem output?
It doesn't seem to be tracking the xtandem E-value, what else is
included?
Thanks,
Bill

bill

unread,
Sep 15, 2009, 5:30:14 PM9/15/09
to spctools-discuss
If this is not easily available can you direct me to the class/script/
method that creates the xtandem fval?
Thanks,
Bill

Brendan

unread,
Sep 16, 2009, 8:52:02 AM9/16/09
to spctools-discuss
Hi Bill,
I am the one who wrote it several years back. I derived the fval
somewhat by voodoo, using analysis tools I created in CPAS. It tested
quite well, and if anything produces slightly conservative
probabilities, where I felt the k-score fval was slightly optimistic
in its estimates (but only slightly). Note that the k-score fval was
derived for the scoring function originally published by Keller, et
al, before the score was incorporated into X! Tandem, and therefore
makes no use of X! Tandem's expect value.

These fval calculations can be found in <tpproot>/src/Validation/
DiscriminateFunction/Tandem (for native scoring) and Comet (for k-
score).

I remember that a peptide length correction (I think sqrt(length)
actually) was important in the final native fval. Because of how
aggressively X! Tandem weights the presence of matching ions, a larger
peptide is more likely to produce a wider spread between its best and
second best score, the key factor in the expectation value.

Hope that helps.

--Brendan

Jimmy Eng

unread,
Sep 16, 2009, 12:36:31 PM9/16/09
to spctools...@googlegroups.com
Bill,

If you're interested in Tandem's E-value , there is an option that
David added in a long time ago that allows the Tandem's E-value to be
used in place of the discriminant function. The xinteract option to
invoke this is "-OE". Looks like this adds "EXPECTSCORE" to the
PeptideProphetParser command line.

bill

unread,
Sep 16, 2009, 12:40:35 PM9/16/09
to spctools-discuss
Brendan and Jimmy,
Thanks for the information. Very helpful.
Bill

Brendan

unread,
Sep 17, 2009, 7:54:55 AM9/17/09
to spctools-discuss
Hi Jimmy and Bill,
I am sure using EXPECTSCORE in place of the fval has received less
testing for the veracity of its probabilities, something Alexei was
adamant about when I developed the fval for the X! Tandem native
score. You have to run searches with decoys and look at q-q plots for
this. Also, Alexei was initially dubious whether PeptideProphet would
work at all on X! Tandem native, because the expect scoring
distribution has a significant left skew, and is far from normal,
which I believe I was able to mitigate somewhat with the variables I
added in the fval. And, finally, my own ROC plots showed this fval
doing a better job at discriminating between true- and false-positive
hits.

So, go cautiously into throwing that switch, and make your own
estimations of benefit v. cost.

--Brendan

On Sep 16, 9:36 am, Jimmy Eng <jke...@gmail.com> wrote:
> Bill,
>
> If you're interested in Tandem's E-value , there is an option that
> David added in a long time ago that allows the Tandem's E-value to be
> used in place of the discriminant function.  The xinteract option to
> invoke this is "-OE".  Looks like this adds "EXPECTSCORE" to the
> PeptideProphetParser command line.
>

David Shteynberg

unread,
Sep 17, 2009, 1:05:48 PM9/17/09
to spctools...@googlegroups.com
The EXPECTSCORE option was something that was added after f-val at
Alexey request. I believe the reason for the inclusion of this option
is in the usage statement:

E [only use Expect Score as the Discriminant(applies only to X!Tandem data,
helpful for data with homologous top hits e.g. phospho or glyco)]


-David

bill

unread,
Oct 5, 2009, 3:19:32 PM10/5/09
to spctools-discuss
Thanks again for the feedback! We compared plain old expect values to
those with the Brendan's discriminant scoring and our samples
performed better with Brendan's scoring. So, now I'm back to figuring
out how fval is calculated.

I'm trying to maually work through calculating the fval for an xtandem
(k-score) result.

My stripped down pep.xml entry looks like this:

<spectrum_query spectrum="010319_f16.00923.00923.1"
<search_result>
<search_hit hit_rank="1" peptide="AISDAMFANPK" >
<search_score name="hyperscore" value="253"/>
<search_score name="nextscore" value="221"/>
<search_score name="expect" value="0.79"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="0.1732"
all_ntt_prob="(0.0000,0.0000,0.1732)">
<search_score_summary>
<parameter name="fval" value="1.0108"/>


I'm plugging the values into the getDiscriminantScore method in
TandemDiscrimFunction:

double TandemDiscrimFunction::getDiscriminantScore(SearchResult*
result)
{
TandemResult* tresult = (TandemResult*)(result);
double tot = const_;
double disc = score_wt_ * log((double)tresult->hyper_) + expect_wt_
* (0-log((double)tresult->expect_)) + delta_wt_ * (1.0 - (tresult-
>next_ / tresult->hyper_));
if (len_wt_)
disc /= len_wt_ * sqrt((double)strlen(tresult->peptide_));
tot += disc;
if (use_expect_) {
tot = 3 * tot - 8;
}
return tot;
}

I'm initailizing through TandemKscoreDF.cxx with TandemKscoreDF(1,
false)

static double consts[] = {-13.287, -28.708, -31.083, -31.083,
-31.083};
static double score_wts[] = {2.256, 4.91, 4.983, 4.983, 4.983};
static double delta_wts[] = {14.346, 10.882, 18.091, 18.091, 18.091};

if (!use_expect)
{
const_ = consts[charge];
score_wt_ = score_wts[charge];
delta_wt_ = delta_wts[charge];
}

So my manual calculation looks like this:
tot = -28.708
disc = 4.91 * ln(253) + 0 * (0 - ln(0.79)) + 10.882 * (1.0 -
(210/253)) =
4.91 * 5.533 + 0 * 0.236 + 10.882 * 0.169 =
27.169 + 0 + 1.8495 =
29.185

tot = -28.708 + 29.185 = 0.311

But, I'm trying to reproduce the fval of 1.0108. I'm not an
experienced C++ programmer. Am I missing where expect_wt_ and len_wt_
are initaillized? Is ther another step to the calculation?

Thanks,
Bill


On Sep 17, 1:05 pm, David Shteynberg <dshteynb...@systemsbiology.org>
wrote:

Brendan

unread,
Oct 6, 2009, 8:44:24 AM10/6/09
to spctools-discuss
Hi Bill,
I think you are going to have to build it yourself, and do some printf
debugging to get yourself through this one. I looked over what you
wrote, and the only issue I can see is that next_score appears to be
221 and not 210, but that only makes the final value further from what
is reported.

At this point I'd start having the code print out values for me to
check my assumptions. I am afraid I can't help you with that, though.

By the way, I can't take any credit for the k-score discriminant
function. It is the same function created for the original Keller
OMICS paper where the score was introduced, and mimics Comet
discriminant score. I created the discriminant score for X! Tandem
native scoring.

The expect_wt_ and len_wt_ values are initialize to zero in the
TandemDiscrimFunction base class.

--Brendan

bill

unread,
Oct 6, 2009, 9:13:33 AM10/6/09
to spctools-discuss
Thanks for your help. My next step was to run it in a debugger but I
don't have a visual studio license. I hate to do it but I guess I'll
have to give Microsoft more money.I'll post the results if I figure it
out.
Bill

Natalie Tasman

unread,
Oct 6, 2009, 1:37:47 PM10/6/09
to spctools...@googlegroups.com
Hi Bill,

While the currently support Visual Studio version for the TPP is the
(not-free) 2005 Professional version, you might want to first give the
(free) 2008 Express version a try. You would not be able to build the
SPC raw file converters (readw, etc), and you will probably turn up a
few places in the code that you'd need to change to build under 2008
Express.

Hope this helps,

Natalie

Brendan

unread,
Oct 7, 2009, 9:36:49 AM10/7/09
to spctools-discuss
Hi Bill,
Depending on how much you hate it, you could also just use some simple
printf statements to better understand how the values are being
calculated. I am a huge fan of debuggers, but this doesn't seem, yet,
like a complicated enough problem to truly require one. And using
print statements is surely your quickest least intrusive route to
getting more information, since it can be done immediately, if you can
already build successfully.

--Brendan

On Oct 6, 10:37 am, Natalie Tasman <natalie.tas...@insilicos.com>
wrote:

bill

unread,
Oct 8, 2009, 9:53:38 AM10/8/09
to spctools-discuss
Thanks again for your feedback. The printf is a good idea but I have a
new computer and need to get the TPP development environment set up
anyway.
-Bill

Natalie Tasman

unread,
Oct 8, 2009, 3:21:36 PM10/8/09
to spctools...@googlegroups.com
Hi Bill,

We build and distribute the Windows TPP version under the free mingw
system. Visual Studio will become unimportant to us once we switch
completely to ProteoWizard. But some really do like the Visual Studio
debugger too, so we'll probably continue to maintain that build
system. Just following up on Brendan's advice-- you really shouldn't
have to pay money to Microsoft to solve this one :)

Natalie

Bill Nelson

unread,
Nov 24, 2009, 4:22:14 PM11/24/09
to spctools-discuss
Here's what the problem was, the fomula is correct but I was using the
constants for the wrong charge state.
the spectra was a +1 but I was using the constants array position 1
instead of 0.
Thanks,
Bill


On Oct 8, 2:21 pm, Natalie Tasman <natalie.tas...@insilicos.com>
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages