"Failed translating Percolator objects into Crux objects"

William Barshop

unread,

Jul 9, 2015, 1:01:26 AM7/9/15

to crux-...@googlegroups.com

Hello everyone,
As per title-- I'm hitting an error after percolator finishes running. Can't exactly figure out what's going on, so I'd appreciate any advice I can get.

Here's the situation: Standard DDA dataset-- searched with MSGF+, msgf2pin giving me a tab delimited file. Fed this file to crux percolator with the following command line...

crux percolator --feature-in-file T --only-psms T --decoy-prefix Reverse_ 2015-05-26-wb-HEK293-Std-rtid2.tab

INFO: CPU: SLICKBOX
INFO: Wed Jul  8 21:46:32 PDT 2015
INFO: Reading file 2015-05-26-wb-HEK293-Std-rtid2.tab
INFO: Percolator version 2.09, Build Date Jul  7 2015 18:35:23
INFO: Copyright (c) 2006-9 University of Washington. All rights reserved.
INFO: Written by Lukas Käll (lukall@u.washington.edu) in the
INFO: Department of Genome Sciences at the University of Washington.
INFO: Issued command:
INFO: percolator -r crux-output/percolator.target.peptides.txt -v 2 -P Reverse_ --seed 1 -p 0.01 -n 0 --trainFDR 0.01 --testFDR 0.01 --maxiter 10 --train-ratio 0.6 -s 2015-05-26-wb-HEK293-Std-rtid2.tab
INFO: Started Wed Jul  8 21:46:32 2015
INFO:  on SLICKBOX
INFO: Hyperparameters selectionFdr=0.01, Cpos=0.01, Cneg=0, maxNiter=10
INFO: Reading Tab delimited input from datafile 2015-05-26-wb-HEK293-Std-rtid2.tab
INFO: Features:
INFO: RawScore DeNovoScore ScoreRatio Energy lnEValue IsotopeError lnExplainedIonCurrentRatio lnNTermIonCurrentRatio lnCTermIonCurrentRatio lnMS2IonCurrent Mass PepLen dM absdM MeanErrorTop7 sqMeanErrorTop7 StdevErrorTop7 Charge2 Charge3 Charge4 Charge5 ptm A-Freq C-Freq D-Freq E-Freq F-Freq G-Freq H-Freq I-Freq K-Freq L-Freq M-Freq N-Freq P-Freq Q-Freq R-Freq S-Freq T-Freq V-Freq W-Freq Y-Freq B-Freq Z-Freq J-Freq X-Freq U-Freq O-Freq
INFO: Train/test set contains 29487 positives and 29368 negatives, size ratio=1.00405 and pi0=1
INFO: selecting cneg by cross validation
INFO: Estimating 12964 over q=0.01 in initial direction
INFO: Reading in data and feature calculation took 1.47248 cpu seconds or 2 seconds wall time
INFO: ---Training with Cpos=0.01, Cneg selected by cross validation, fdr=0.01
INFO: Iteration 1 :     After the iteration step, 14175 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 2 :     After the iteration step, 14237 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 3 :     After the iteration step, 14255 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 4 :     After the iteration step, 14254 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 5 :     After the iteration step, 14256 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 6 :     After the iteration step, 14258 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 7 :     After the iteration step, 14256 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 8 :     After the iteration step, 14255 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 9 :     After the iteration step, 14256 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 10 :    After the iteration step, 14255 target PSMs with q<0.01 were estimated by cross validation
INFO: Obtained weights (only showing weights of first cross validation set)
INFO: # first line contains normalized weights, second line the raw weights
INFO: RawScore  DeNovoScore     ScoreRatio      Energy  lnEValue        IsotopeError    lnExplainedIonCurrentRatio      lnNTermIonCurrentRatio  lnCTermIonCurrentRatio  lnMS2IonCurrent Mass    PepLen  dM      absdM   MeanErrorTop7   sqMeanErrorTop7      StdevErrorTop7  Charge2 Charge3 Charge4 Charge5 ptm     A-Freq  C-Freq  D-Freq  E-Freq  F-Freq  G-Freq  H-Freq  I-Freq  K-Freq  L-Freq  M-Freq  N-Freq  P-Freq  Q-Freq  R-Freq  S-Freq  T-Freq  V-Freq  W-Freq  Y-Freq  B-Freq       Z-Freq  J-Freq  X-Freq  U-Freq  O-Freq  m0
INFO: 0.425     -0.4378 0.0018  -0.6383 1.2930  -0.0329 -0.1571 0.3448  0.2002  0.0161  0.2719  0.1906  -0.0147 -0.2634 0.2381  -0.1462 -0.1908 -0.0669 0.0539  0.0448  0.0193  0.0000  0.0569  0.0628  0.0322  -0.0386 -0.0178 0.1063  0.0441       0.0080  -0.1868 -0.0301 0.0681  0.0528  0.1207  0.0151  -0.1964 0.0491  0.0372  0.0159  0.0451  0.0239  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  -1.7650
INFO: 0.0106    -0.0105 0.0000  -0.0116 0.1842  -0.0356 -0.1206 0.1135  0.1322  0.0147  0.0017  0.0692  -1491.8835      -43293.1169     0.0079  -0.0004 -0.0062 -0.1612 0.1338  0.3421  1.7652  0.0000  0.8245  1.0884  0.4664  -0.3676 -0.2671      1.7539  0.8338  0.1212  -2.2033 -0.3209 1.5320  0.9139  1.8958  0.2038  -1.8771 0.6937  0.6108  0.2330  0.9415  0.3778  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  -1.3972
INFO: After all training done, 14182 target PSMs with q<0.0100 were found when measuring on the test set
INFO: Found 14182 target PSMs scoring over 1.0000% FDR level on testset
INFO: Merging results from 3 datasets
INFO: Tossing out "redundant" PSMs keeping only the best scoring PSM for each unique peptide.
INFO: Selecting pi_0=0.4966
INFO: Calibrating statistics - calculating q values
INFO: New pi_0 estimate on merged list gives 13335 peptides over q=0.0100
INFO: Calibrating statistics - calculating Posterior error probabilities (PEPs)
INFO: Processing took 27.98 cpu seconds or 28 seconds wall time
WARNING: Failed translating Percolator objects into Crux objects
FATAL: ProteinMatchCollection was null

I've uploaded my tab delimited file ( http://1drv.ms/1fpsJe3 )-- hopefully that may provide some hints as to what may be the root of this error.

If there's anything else/any other information which could help to elucidate the problem, let me know and I'll be happy to provide.
Thanks so much-- and I might add that crux is a phenomenal tool for the proteomics community.
Best,
William Barshop

William Barshop

unread,

Jul 9, 2015, 3:32:03 AM7/9/15

to crux-...@googlegroups.com

Hello all,
I was able to sidestep this issue by compiling the 2.1 release myself. I have worked myself into another problem, though, illustrated below.

crux percolator --only-psms T --decoy-prefix Reverse_ 2015-05-26-wb-HEK293-Std-rtid2.pin --overwrite T
INFO: Beginning percolator.
WARNING: The output directory 'crux-output' already exists.
Existing files will be overwritten.
WARNING: The file 'crux-output/percolator.log.txt' already exists and will be overwritten.
INFO: CPU: SLICKBOX
INFO: Thu Jul  9 00:27:06 PDT 2015
WARNING: The file 'crux-output/percolator.params.txt' already exists and will be overwritten.
INFO: Percolator version 2.07, Build Date Jul  8 2015 21:08:41


INFO: Copyright (c) 2006-9 University of Washington. All rights reserved.
INFO: Written by Lukas Käll (lukall@u.washington.edu) in the
INFO: Department of Genome Sciences at the University of Washington.
INFO: Issued command:


INFO: percolator -r crux-output/percolator.target.txt -B crux-output/percolator.decoy.txt -v 2 -P Reverse_ --seed 1 -p 0.01000000 -n 0.00000000 --trainFDR 0.01000000 --testFDR 0.01000000 --maxiter 10 --train-ratio 0.60000000 -s 2015-05-26-wb-HEK293-Std-rtid2.pin
INFO: Started Thu Jul  9 00:27:06 2015
INFO:  on SLICKBOX
INFO: Hyperparameters fdr=0.01, Cpos=0.01, Cneg=0, maxNiter=10
INFO: Reading Tab delimited input from datafile 2015-05-26-wb-HEK293-Std-rtid2.pin
INFO: Features:
INFO: ExpMass CalcMass RawScore DeNovoScore ScoreRatio Energy lnEValue IsotopeError lnExplainedIonCurrentRatio lnNTermIonCurrentRatio lnCTermIonCurrentRatio lnMS2IonCurrent Mass PepLen dM absdM MeanErrorTop7 sqMeanErrorTop7 StdevErrorTop7 Charge2 Charge3 Charge4 Charge5 ptm A-Freq C-Freq D-Freq E-Freq F-Freq G-Freq H-Freq I-Freq K-Freq L-Freq M-Freq N-Freq P-Freq Q-Freq R-Freq S-Freq T-Freq V-Freq W-Freq Y-Freq B-Freq Z-Freq J-Freq X-Freq U-Freq O-Freq


INFO: Train/test set contains 29487 positives and 29368 negatives, size ratio=1.00405 and pi0=1
INFO: selecting cneg by

 cross validation
INFO: Estimating 2 over q=0.01 in initial direction
INFO: Reading in data and feature calculation took 1.0315 cpu seconds or 1 seconds wall time
INFO: ---Training with Cpos=0.01, Cneg selected by cross validation, fdr=0.01
INFO: Iteration 1 :     After the iteration step, 13329 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 2 :     After the iteration step, 14143 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 3 :     After the iteration step, 14257 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 4 :     After the iteration step, 14279 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 5 :     After the iteration step, 14278 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 6 :     After the iteration step, 14280 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 7 :     After the iteration step, 14281 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 8 :     After the iteration step, 14276 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 9 :     After the iteration step, 14275 target PSMs with q<0.01 were estimated by cross validation
INFO: Iteration 10 :    After the iteration step, 14278 target PSMs with q<0.01 were estimated by cross validation
INFO: Obtained weights (only showing weights of first cross validation set)


INFO: # first line contains normalized weights, second line the raw weights


INFO: ExpMass   CalcMass        RawScore        DeNovoScore     ScoreRatio      Energy  lnEValue        IsotopeError    lnExplainedIonCurrentRatio      lnNTermIonCurrentRatio  lnCTermIonCurrentRatio  lnMS2IonCurrent Mass    PepLen  dM  absdM    MeanErrorTop7   sqMeanErrorTop7 StdevErrorTop7  Charge2 Charge3 Charge4 Charge5 ptm     A-Freq  C-Freq  D-Freq  E-Freq  F-Freq  G-Freq  H-Freq  I-Freq  K-Freq  L-Freq  M-Freq  N-Freq  P-Freq  Q-Freq  R-Freq  S-Freq  T-Freq  V-Freq       W-Freq  Y-Freq  B-Freq  Z-Freq  J-Freq  X-Freq  U-Freq  O-Freq  m0
INFO: 0.128     0.1277  0.4663  -0.4751 0.0015  -0.6964 1.2725  -0.0563 -0.1725 0.3598  0.2353  0.0468  0.1283  0.1661  0.0001  -0.2911 0.1984  -0.1482 -0.1874 -0.0808 0.0638  0.0584  0.0195  0.0000  0.0626  0.0186  0.0495  -0.0308 -0.0305      0.1215  0.0402  -0.0084 -0.1369 -0.0347 0.0745  0.0195  0.0983  -0.0168 -0.1484 0.0552  0.0299  0.0140  0.0289  0.0063  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  -1.7900
INFO: 0.0008    0.0008  0.0117  -0.0114 0.0000  -0.0126 0.1812  -0.0608 -0.1325 0.1185  0.1553  0.0428  0.0008  0.0603  10.8723 -47858.6686     0.0066  -0.0004 -0.0060 -0.1947 0.1583  0.4461  1.7902  0.0000  0.9071  0.3223  0.7173  -0.2929      -0.4568 2.0037  0.7597  -0.1284 -1.6146 -0.3689 1.6745  0.3371  1.5440  -0.2265 -1.4188 0.7797  0.4911  0.2057  0.6037  0.0999  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  -1.9621
INFO: After all training done, 14195 target PSMs with q<0.0100 were found when measuring on the test set
INFO: Found 14195 target PSMs scoring over 1.0000% FDR level on testset
INFO: Merging results from 3 datasets
FATAL: PSMID should be target_fileidx_scan_charge_rank

My PSMIDs are apparently not meeting the formatting criteria. I have been unable to find clear examples of what the "Target_fileidx_scan_charge_rank" PSMID string should look like (for example, what does the first segment of this string look like when it's a decoy? false? "decoy"?). I think that, unless I'm totally misinterpreting my current situation, this should be an easy fix-- provided I know what the PSMID expected formatting is.
Could anyone share with me either an example tab delimited input for crux percolator, or even just a few lines of target and decoy data from such a file?

All the best, and many thanks!
-William Barshop

...

Kaipo Tamura

unread,

Jul 9, 2015, 11:11:02 AM7/9/15

to William Barshop, crux-...@googlegroups.com

Hi William,
An example of an expected ID is: "target_0_35_1_5" (file 0, scan 35, charge 1, rank 5) or "decoy_1_35_1_1". You can see an entire pin file example by taking the demo.ms2 and small-yeast.fasta files from the test/smoke-tests/ directory and running:
crux tide-index small-yeast.fasta yeast-index
crux tide-search demo.ms2 yeast-index
crux make-pin crux-output/tide-search.target.txt

A pin file will be created as crux-output/make-pin.pin.
Hope that helps, let us know if you have any questions or issues.

Thanks,
Kaipo

--
You received this message because you are subscribed to the Google Groups "crux-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crux-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

William Barshop

unread,

Jul 10, 2015, 12:24:25 AM7/10/15

to crux-...@googlegroups.com, wbar...@g.ucla.edu

Kaipo,
Thanks for the prompt response! I went ahead and did as you said, and I've been looking through the pin file. I did notice from the source code that crux requires that the file ID be provided as an integer.
Does it make a difference for joint (multi-acquisition) analyses in which order the run/file IDs are labeled? Should one pair targets and decoys from the same acquisition to the same file ID, or is the file ID more of a unique ID for search result inputs (that is to say, every search regardless of target or decoy database is given a unique integer ID)?
Is there a way to carry a run-name (string) file origin for each PSM, or will I need to track which files each file ID originated from and reconstruct this information after crux percolator finishes running?
My intention is to end up building a spectral library from the filtered data, or to potentially feed the mzidentml output into skyline either directly (if Skyline doesn't choke on it), or via conversion into a simple ssl file for Skyline to ingest.

Cheers,
William

...

Kaipo Tamura

unread,

Jul 10, 2015, 2:33:43 PM7/10/15

to William Barshop, crux-...@googlegroups.com

Hi William,
For your purposes, I think the best method is to assign a unique ID per spectrum file, then when generating the .ssl file convert the IDs back into the spectrum file names.
Regarding Skyline, unfortunately it will not be able to read the mzIdentML files directly. It may be able to read .perc.xml (--pout-output T) if you have search results containing the modification information in .sqt format.
Hope that all makes sense, let us know how it goes.

Thanks,
Kaipo

--

William Barshop

unread,

Jul 10, 2015, 3:14:13 PM7/10/15

to crux-...@googlegroups.com, wbar...@g.ucla.edu

Kaipo,
I'll give it a try.
From my understanding, the file ID for percolator should be then given per spectrum file and not per search result output (the only difference being when one performs non-concatenated target/decoy searches)-- is this correct?

Any idea how Skyline would fare with percolator scores in a PepXML? I assume it may choke when it doesn't find PeptideProphet scores-- but I could always do a hacky fix of masquerading the score as a peptideprophet score by taking the percolator scores and replacing with (1-q) to mimic the "higher is better" peptideprophet scoring system. Do you think that might work?

Unfortunately, I have no SQT files, as I'm generating my percolator inputs from MSGF+ outputs (mzid files). I am a little hesitant to try to parse everything into SSL files, as I'm not sure how to handle multiple modifications on a single amino acid-- for example, a SILAC experiment where a heavy Lysine residue is also methylated. Is there a correct annotation form for this?

All the best,
William

Kaipo Tamura

unread,

Jul 10, 2015, 4:18:01 PM7/10/15

to William Barshop, crux-...@googlegroups.com

Hi William,
That is right, the file IDs will then be propagated to the output where you can use it to match back to the spectrum file.

Hacking together a PepXML that Skyline can read should certainly be possible. You can find a list of the accepted formats here:
https://skyline.gs.washington.edu/labkey/wiki/home/software/BiblioSpec/page.view?name=BlibBuild
PepXML files from multiple sources are supported - when reading a PepXML file, Skyline first attempts to determine the source so it knows what score type to use (in the case of PeptideProphet, it determines the source by finding the "peptideprophet_summary" element and uses the value of the "probability" attribute in "peptideprophet_result" elements).
You may instead want to try to mimic a MSGF+ PepXML by setting the "search_engine" attribute of "search_summary" to "MS-GFDB", then setting a "qvalue" attribute of "search_score" elements. This way you can avoid the 1-q conversion.

For .ssl files, I believe the correct way to specify modifications is a format like: AM[+16.0][+16.0]KRHGLDNY

Thanks,
Kaipo

William Barshop

unread,

Jul 12, 2015, 7:09:43 AM7/12/15

to crux-...@googlegroups.com

Hey Kaipo,

Just a little bit more trouble with the crux pin requirements... I went ahead and wrote some python scripts to handle renaming and tracking filenames to file integers without issue-- but ran into another problem at the post-percolator run stage.

How should modifications be specified? They had been slipped into my pin file as UNIMOD identified modifications, which it appears is not supported--

FATAL: UNIMOD modifications currently not supported:HYAHTDC[UNIMOD:4]PGHADYVK

Should mods be specified in the same way as in SSL files ( ABC[+57.072]DEFK )?

Thanks again,

William

On Wednesday, July 8, 2015 at 10:01:26 PM UTC-7, William Barshop wrote:

...

Kaipo Tamura

unread,

Jul 13, 2015, 2:20:12 PM7/13/15

to William Barshop, crux-...@googlegroups.com

Hi William,
Yes, the format with the mass change inside brackets is currently the standard in Crux.

Thanks,
Kaipo

--

William Barshop

unread,

Jul 14, 2015, 8:36:26 AM7/14/15

to crux-...@googlegroups.com, wbar...@g.ucla.edu

Kaipo,
Great. That seems to be working acceptably. Thanks for all the help-- it really shows that the projects from Noble/MacCoss groups get amazing support. Cheers to all of you.

I've run into an interesting situation now. You can tell me if I'm crazy, but this is what I've set up so far.

I've been feeding my tweaked inputs/outputs from crux percolator to crux spectral-counts

Counting raw spectral counts and NSAF values works without complaint, but I cannot manage to get dNSAF values calculated.
I have tried by using both .mzid outputs from crux percolator and the psms.txt files, neither will work for dNSAF.

Here is the problem I'm running into:

Running command... crux spectral-counts --measure dNSAF --protein-database /home/galaxy/galaxy/database/files/000/dataset_226.dat --overwrite T --fileroot 2015-05-05-wb-HEK293-STD-rtid2-WOHL-column-SINGLE-COLUMN-700bar psm-split-output/2015-05-05-wb-HEK293-STD-rtid2-WOHL-column-SINGLE-COLUMN-700bar.psms.txt
--------------------------------------
INFO: Beginning spectral-counts.


WARNING: The output directory 'crux-output' already exists.
Existing files will be overwritten.


WARNING: The file 'crux-output/2015-05-05-wb-HEK293-STD-rtid2-WOHL-column-SINGLE-COLUMN-700bar.spectral-counts.log.txt' already exists and will be overwritten.
INFO: CPU: SLICKBOX
INFO: Tue Jul 14 05:09:20 PDT 2015
WARNING: The file 'crux-output/2015-05-05-wb-HEK293-STD-rtid2-WOHL-column-SINGLE-COLUMN-700bar.spectral-counts.params.txt' already exists and will be overwritten.
INFO: Reached protein 1000
INFO: Reached protein 2000
INFO: Reached protein 3000
INFO: Reached protein 4000
INFO: Reached protein 5000
INFO: Reached protein 6000
INFO: Reached protein 7000
INFO: Reached protein 8000
INFO: Reached protein 9000
INFO: Reached protein 10000
INFO: Reached protein 20000
INFO: Reached protein 30000
INFO: Reached protein 40000
INFO: Total proteins found: 40538
WARNING: num target matches=0, suppressing warning
WARNING: No modification identifier found for mass shift 57.02.
Warning Suppressed, others may exist
INFO: Creating modification for 57.020000
INFO: parsed PSM: 1000
INFO: parsed PSM: 2000
INFO: parsed PSM: 3000
INFO: parsed PSM: 4000
INFO: parsed PSM: 5000
INFO: parsed PSM: 6000
INFO: parsed PSM: 7000
INFO: parsed PSM: 8000
INFO: parsed PSM: 9000
INFO: parsed PSM: 10000
INFO: parsed PSM: 11000
INFO: parsed PSM: 12000
INFO: parsed PSM: 13000
INFO: parsed PSM: 14000
INFO: parsed PSM: 15000
INFO: parsed PSM: 16000
INFO: parsed PSM: 17000
INFO: parsed PSM: 18000
INFO: parsed PSM: 19000
INFO: parsed PSM: 20000
INFO: parsed PSM: 21000
INFO: parsed PSM: 22000
INFO: parsed PSM: 23000
INFO: parsed PSM: 24000
INFO: parsed PSM: 25000
INFO: parsed PSM: 26000
INFO: parsed PSM: 27000
INFO: parsed PSM: 28000
INFO: parsed PSM: 29000
INFO: Number of matches:29868
INFO: Number of matches passed the threshold 14976
INFO: Number of peptides 13768
WARNING: Normalized protein scores do not add up to one!:0.000000
INFO: Number of proteins 0

You'll notice in that output that despite matches passing our threshold, crux claims that the num target matches=0, and ultimately no proteins are found.

Sorry to keep bothering with questions, but I'd love to have dNSAF calculations working in this setting.

Let me know if there's anything from my end that could help to track this down.

All the best,
William

William Barshop

unread,

Jul 14, 2015, 8:49:06 AM7/14/15

to crux-...@googlegroups.com

Figured this input file might help.

Best,
Will

...

2015-05-05-wb-HEK293-STD-rtid2-WOHL-column-SINGLE-COLUMN-700bar.psms.txt

Kaipo Tamura

unread,

Jul 14, 2015, 1:38:45 PM7/14/15

to William Barshop, crux-...@googlegroups.com

Hi Will,

All the values for "total matches/spectrum" in the input file are 0, which may be the source of the problem.

I think what is happening is that your Percolator input file probably did not have a column with this information - usually we get this information from a column in the pin file named "lnNumSP" which has values ln(x), where x is the number of matches for that particular spectrum and charge. Are these values accessible to you in a way that you could put them into the input file?

Thanks,
Kaipo

--

William Barshop

unread,

Jul 14, 2015, 8:08:44 PM7/14/15

to crux-...@googlegroups.com, wbar...@g.ucla.edu

Hrm. I'm sure I can figure out a way to slip them back in... I'll get back to you once I've made my attempt!

Thanks for the heads up! I guess I'm a bit surprised that NSAF worked without this value-- as did RAW spec counts.

Best,
Will

...

William Barshop

unread,

Jul 14, 2015, 8:28:46 PM7/14/15

to crux-...@googlegroups.com

Kaipo,
I took another look here.
I see in the output from crux percolator there exists a column labeled "total matches/spectrum" which I assume must be derived from the original input lnNumSP feature.
I also see that they are all zero values in my data-- which would make sense, since I have no lnNumSP feature in my pin file.

I am a bit confused on what the value of the "total matches/spectrum" represents in the context of spectral counting.

Is this the number of spectral matches to the same peptide sequence which pass the q-value filter (ie num accepted spectral counts for that PEPTIDE sequence... not really for the spectrum itself)?

Best,
William

William S Noble

unread,

Jul 14, 2015, 8:42:59 PM7/14/15

to William Barshop, crux-users

Hi William,

The total matches/spectrum column indicates the number of peptides from the database that the spectrum was scored against. Note that Comet reports this number without removing redundancy (i.e., if the same peptide occurs in two proteins, then this is counted twice). Tide reports "distinct matches / spectrum" as the total number of distinct peptide (i.e, duplicated peptides are only counted once).

I think the spectral counting code will use whichever value it is given. In principle, I think "distinct matches / spectrum" is a better number to use, if it's available.