Crux error when searching huge space

34 views
Skip to first unread message

hapen

unread,
Apr 12, 2012, 12:38:42 PM4/12/12
to crux-...@googlegroups.com
Dear Colleagues,

I'm a new user of Curx.
I'm doing an experiment on searching an expanded database by specifying PTM on all amino acids.
The parameters are set as follows:
--------
precursor-window=5
precursor-window-type=ppm
isotopic-mass=mono
fragment-mass=mono
mz-bin-width=0.36
mz-bin-offset=0.11
mod=11.050561:ACDEFGHIKLMNOPQRSTUVWY:10
enzyme=trypsin
missed-cleavages=TRUE
digestion=full-digest
num-decoys-per-target=0
--------
Crux was aborted with the error message:
--------
INFO: Searching spectrum number 3706 (1+), search number 9160
WARNING: No matches found for spectrum 3706, charge 1.
terminate called after throwing an instance of 'std::bad_alloc'
 what():  St9bad_alloc

--------
Do you know what happened with that?

Previously I got an error message that says:
ERROR: peptide count of 10000000 exceeds max match limit: 10000000
I noticed it's due to "_MAX_NUMBER_PEPTIDES" in MatchCollection.h
So I changed it to 1000000000 and it works well for other similar searches.

Besides, I have some questions regarding the INFO and WARNING messages from Crux.
... ...
INFO: Reached peptide 500000
INFO: Reached peptide 600000
INFO: Printing index
INFO: Sorting index
INFO: number of peptides(not unique): 1000000  # Does this mean the indexer generated 1000000 peptides? (I'm very curious about this special number)
INFO: Elapsed time: 5.88 s
--------
INFO: Created 11 peptide mods, keeping 11 with 255 or fewer aa mods # Could you explain the meaning of this line?
WARNING: SP Scoring: bin:4033 is greater than max:4000 # bin:4033 is the actual number of bins when binning peaks? How to avoid this warning in this case?
... ...
WARNING: No matches found for spectrum 1178, charge 2. # This spectrum has no candidate peptides matched in database within the mass tolerance, right?
... ...
INFO: Searching spectrum number 17374 (2+), search number 73570
INFO: Searching spectrum number 17377 (2+), search number 73580
INFO: Searching spectrum number 17382 (2+), search number 73590 # What's the search number? I noticed it's increased by step value 10
... ...

Thanks for your contributions to the proteomics community.

William Noble

unread,
Apr 12, 2012, 7:39:21 PM4/12/12
to crux-...@googlegroups.com

Hi Hapen,

Your error message indicates that you have run out of memory.  You might try running your search on a machine with more memory or restricting your modifications.  If you'd like us to look into this in more detail, we'd need to see the sample input files.  Answers to your other questions are below:
Actually, this line gets printed every 1,000,000 peptides, so all it means is that you have between 1,000,000 and 2,000,000 peptides.
INFO: Elapsed time: 5.88 s
--------
INFO: Created 11 peptide mods, keeping 11 with 255 or fewer aa mods # Could you explain the meaning of this line?
Crux distinguishes between amino acid modifications and peptide modifications.  The latter are just combinations of amino acid modifications.  You can set a limit on the number of amino acid modifications per peptide, but it doesn't look like you've done so, so it's using the default of 255.
WARNING: SP Scoring: bin:4033 is greater than max:4000 # bin:4033 is the actual number of bins when binning peaks? How to avoid this warning in this case?
The maximum number of bins appears to be hard-coded at the moment.  If it's important to be able to control this range, we could provide a parameter file option.
... ...
WARNING: No matches found for spectrum 1178, charge 2. # This spectrum has no candidate peptides matched in database within the mass tolerance, right?
Right.

... ...
INFO: Searching spectrum number 17374 (2+), search number 73570
INFO: Searching spectrum number 17377 (2+), search number 73580
INFO: Searching spectrum number 17382 (2+), search number 73590 # What's the search number? I noticed it's increased by step value 10
The search number is just the number of searches that have been carried out thus far.  Every spectrum/charge combination is counted as one search.

Bill
... ...

Thanks for your contributions to the proteomics community.

--
You received this message because you are subscribed to the Google Groups "crux-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/crux-users/-/VPhiYjzb_48J.
To post to this group, send email to crux-...@googlegroups.com.
To unsubscribe from this group, send email to crux-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/crux-users?hl=en.

Charles Grant

unread,
Apr 13, 2012, 2:57:25 AM4/13/12
to hapen, crux-...@googlegroups.com
Hi Hapen,

On Apr 12, 2012, at 9:38 AM, hapen wrote:

> WARNING: SP Scoring: bin:4033 is greater than max:4000 # bin:4033 is the actual number of bins when binning peaks? How to avoid this warning in this case?

I wanted to offer a brief addition to Bill's note. This warning is related to the calculation of the SP score. For the purposes of SP scoring, Crux assumes that no peak will have an mz larger than 4000. Peaks with higher mz are ignored in the calculation of the SP score.

An undocumented parameter, "max-mz", can be used to override this. You could try adding this parameter your parameter file. We haven't tested this, so we can't promise that it will improve your results. The line in your parameter file would look like

max-mz=4500

Charles

hapen

unread,
Apr 13, 2012, 9:59:16 AM4/13/12
to crux-...@googlegroups.com, hapen
Thanks for your information Charles!
Seems I didn't post my previous reply to Bill successfully.

I noticed there are 3 fields in the parameter file where the number of mods can be specified:
mod <mass change>:<aa list>:<max per peptide>
max-mods <int>
max-aas-modified <int>

What's the difference between these parameters?
Do the latter two include fixed mos and N/C terminal fixed/var mods?
Suppose I have the following modified peptide:
 x x  z  z
 | |  |  |
ANONYMOUSPEP  (x modify MN, y:NO, z:OP)
 ||   |
 yy   y
How will Crux count the above 3 different numbers?

Thanks,
Haipeng 

hapen

unread,
Apr 13, 2012, 10:15:21 AM4/13/12
to crux-...@googlegroups.com
Hi Bill,

Thanks a lot for your answers.
I reduced the number of modifications (mod=11.050561:ACDEFGHIKLMNOPQRSTUVWY:4) and now it works.
For the memory usage, if I understand it correctly, my error message came from where Crux try to allocate large space for candidate peptides for one spectrum in "MatchCollection.cpp". If the only top-K candidates are needed, say top 500, how about maintaining a max-heap to keep these candidates? Then the memory requirement will be largely reduced.

BRs/Haipeng

B. Frewen

unread,
Apr 13, 2012, 2:06:51 PM4/13/12
to hapen, crux-...@googlegroups.com
Hi Hapen,

The numbers for the modifications are as follows.

> mod <mass change>:<aa list>:<max per peptide>

The 'max per peptide' value is how many times this modification can be
applied to a peptide.

> max-mods <int>

This is the maximum number of modifications to look for on a peptide.
Suppose you have these parameters:

mod=79.9:STY:1 #phospho
mod=16:M:1 #oxidation
max-mods=3

The search will look for unmodified peptides, peptides with one phospho,
peptides with one oxidation, and peptides with one pospho and one
oxidation. Even though max-mods is 3, neither of those mods can be
applied more than once, so the defacto maximum number of mods per peptide
is 2.

Suppose a different set of parameters:

mod=79.9:STY:4 #phospho
mod=16:M:4 #oxidation
max-mods=3

Now there are many more combinations possible: no mods, one phospho, two
phosphos, three phosphos (but not four), one oxi, two oxi, three oxi, one
phospho and one oxi,...and so on.

> max-aas-modified <int>

In crux, an amino acid can have more than one modification applied to it.
This value is similar to 'max-mods', but counts each modified amino acid
as one instead of counting each modification. To use your example

> Suppose I have the following modified peptide:
> �x x �z �z
> �| | �| �|
> ANONYMOUSPEP �(x modify MN, y:NO, z:OP)
> �|| � |
> �yy � y

There are 7 total modifications on this peptide, so max-mods must be at
least 7. There are 5 modified amino acids on this peptide, so
max-aas-modified must be at least 5.

> Do the latter two include fixed mos and N/C terminal fixed/var mods?

All variable mods and fixed terminal mods are included in these counts
(parameters mod, cmod, nmod, cmod-fixed, nmod-fixed). Fixed mods with no
positional information (e.g. C=57.0) are not included.

Let me know if I can further clarify any of that.

Thanks,
Barbara

B. Frewen

unread,
Apr 13, 2012, 2:12:53 PM4/13/12
to hapen, crux-...@googlegroups.com
Hi Hapen,

I'll jump in and answer this one for Bill. There is such a filter if you
use Sp scoring. For crux search-for-matches, set compute-sp=true. By
default 500 of the top-scoring candidates will be kept and then scored by
xcorr. You can change that number to, say 200, with
max-rank-preliminary=200.

This is the default behavior for crux sequest-search. See the FAQ for
more information on the differences between the two.
http://noble.gs.washington.edu/proj/crux/crux-faq.html

Thanks,
Barbara

On Fri, 13 Apr 2012, hapen wrote:

> Hi Bill,
> Thanks a lot for your answers.I reduced the number of modifications
> (mod=11.050561:ACDEFGHIKLMNOPQRSTUVWY:4) and now it works.For the memory


> usage, if I understand it correctly, my error message came from where Crux
> try to allocate large space for candidate peptides for one spectrum in
> "MatchCollection.cpp".�If the only top-K candidates are needed, say top 500,
> how about maintaining a max-heap to keep these candidates? Then the memory
> requirement will be largely reduced.
>
> BRs/Haipeng
>
> On Friday, April 13, 2012 1:39:21 AM UTC+2, Bill Noble wrote:
>
> Hi Hapen,
>
> Your error message indicates that you have run out of memory.�
> You might try running your search on a machine with more memory
> or restricting your modifications.� If you'd like us to look
> into this in more detail, we'd need to see the sample input
> files.� Answers to your other questions are below:
>

> --
> You received this message because you are subscribed to the Google Groups
> "crux-users" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/crux-users/-/gP7qMbezWokJ.

Reply all
Reply to author
Forward
0 new messages