protein group probabilities

53 views
Skip to first unread message

LW

unread,
Aug 12, 2011, 7:40:23 PM8/12/11
to spctools-discuss
Hi,

I have a question on the prot.xml. It seems like each protein group is
a probability and each
subgroup (those with group_sibling_id="a", "b", etc) has a
probability.

<protein_group group_number="1" probability="1.0000">
<protein protein_name="DECOY_40330" n_indistinguishable_proteins="1"
probability="1.0000" percent_coverage="2.9"
unique_stripped_peptides="LMVSNQFK+NMMTIETNSSTSVVSPRASTAR"
group_sibling_id="a" total_number_peptides="8"
pct_spectrum_ids="0.019" confidence="0.004">


How is the probability for the protein_group determined? I came across
cases where all the
subgroup probabilities are 0 but the protein_group probability is
1.0.
How do I explain this?

Thanks,
LW

GATTACA

unread,
Aug 13, 2011, 1:02:41 PM8/13/11
to spctools-discuss
So in these cases, all the proteins in the protein group share at
least one peptide.
The different sub groups occur because certain "clusters" of proteins
share peptides that are specific to the cluster.

As an example, imagine a group that consists of 3 sibling groups: a,b,
and c. All of the protein identifiers in the group correspond to
Histones. Sibling group 'a' contains peptides that are unique to
Histone2A. While sibling group 'b' contains Histone3 and sibling group
'c' has Histone4A.

All 3 sibling groups share at least some peptides in common, but each
sibling group also has some peptides, unique to itself.


Because peptide probabilities in ProteinProphet are adjusted based
upon the number of sibling peptides (nsp) and how the peptides are
shared among various proteins (wt) the probability for a sibling group
can be different from the probability of the group as a whole.

I don't know how clear that is, but that's my attempt at explaining
it.

LW

unread,
Aug 15, 2011, 1:25:51 PM8/15/11
to spctools-discuss
Thanks for the explanation.
If I understand correctly, the protein_group probability is done using
all
the peptides from the subgroups, while protein subgroup probability
is only for peptides from that subgroup (both unique and non-unique).

However, what about the case where protein_group probability is 1.0
even though all the individual subgroup probabilities are 0.
In this case the sibling groups have several entries in the
"unique_stripped_peptides" tag but "total_number_peptides=0"
Any explanation?
In general, which is better filtering by the protein_group
probability or the probability in each subgroup?

Thanks.
LW

GATTACA

unread,
Aug 16, 2011, 12:36:55 PM8/16/11
to spctools-discuss
I believe that in the XML file, total_num_peptides refers to the
total number of SPECTRA contributing to the sibling group, not the
actual number of peptides.
The individual sub groups can all be zero because their peptides are
all shared with each other so the weight factor is close to zero for
all peptides.
However, when all of the peptides are taken together for the group,
the weight factor is not considered and the group probability reaches
1.0.

If you look at the protein group in question that you are finding, I
think you'll see that all of the peptides in the group are shared
among each sibling group and with no other protein groups.
Reply all
Reply to author
Forward
0 new messages