Performance issues on Windows 7

84 views
Skip to first unread message

Minglism

unread,
Nov 15, 2012, 6:24:59 PM11/15/12
to crux-...@googlegroups.com
Hi colleagues,

I am a new user of Crux. I installed the crux 1.39 (Cygwin 1.7.16) on Windows 7. But I encountered several issues:

1. search-for-matches runs very slow. I created indexes for the fasta file, but  It still took me 29 hrs to search 88,000 spectra in Ecoli database (4339 protein hits) with my computer (Intel Core i5-480M 2.66GHz). I don't know what I did wrong. I attached my parameter setting file in the email. Hope you guys can give me some advice.

2. I want to specify variable modifications on the N-terminal Q or E of a peptide. Is there any way to do that?

3. Our searches crashed when I tried to specify modification masses with more than 3 decimal places. But it works with 3 or less. Any idea why or how to fix it?

4. crux percolator reports errors when processing modified peptides. Here is an example:

WARNING: No modification identifier found for mass shift 16.00.
ERROR: There is an unidentifiable modification in sequence <ITPVVFIM[16]K> at position 10.
WARNING: No modification identifier found for mass shift 42.00.
ERROR: There is an unidentifiable modification in sequence <A[42]LAGELPLLK> at position 3.

Thanks in advance for your advice and help.

Mingguo

Minglism

unread,
Nov 15, 2012, 6:26:55 PM11/15/12
to crux-...@googlegroups.com
It doesn't allow me to attach a file, so I pasted it the setting as follows:

####################################################################
# Sample parameter file
#
# Lines starting with '#' will be ignored.  Don't leave any space
# before or after a parameter setting.  The format is
#
# <parameter-name>=<value>
#
#####################################################################

# Which isotopes to use in calculating fragment ion mass.
# <string>=average|mono. Default=mono.
# Parameter file only.  Used by crux-search-for-matches and
# crux-predict-peptide-ions.
fragment-mass=mono

# Print to stdout additional information about the spectrum.
# Avaliable only for crux-get-ms2-spectrum.  Does not affect contents of
# the output file.
stats=false

# Specifies the operator that is used to compare an entry in the
# specified column to the value given on the command line. 
# (eq|gt|gte|lt|lte|neq). Default: eq.
# Available for crux extract-rows
comparison=eq

# Minimum number of points for estimating the Weibull parameters. 
# Default=4000.
# Available for crux search-for-xlinks
min-weibull-points=4000

# Number of consecutive MS1 scans over which a peptide must be observed
# to be considered real. Gaps in persistence are allowed when setting
# --gap-tolerance. Default = 3.
# Available for crux bullseye
scan-tolerance=3

# Use mzid's passThreshold attribute to filter matches. Default false.
# Used when parsing mzIdentML files.
mzid-use-pass-threshold=false

# Predict the given number of isotope peaks (0|1|2). Default=0.
# Only available for crux-predict-peptide-ion.  Automatically set to 0
# for Sp scoring and 1 for xcorr scoring.
isotope=0

# Window type to use for selecting decoy peptides from precursor mz.
# <string>=mass|mz|ppm. Default=mass.
# Available for crux search-for-matches
precursor-window-type-decoy=ppm

# The ion series to predict (b,y,by,bya). Default='by' (both b and y
# ions).
# Only available for crux-predict-peptide-ions.  Set automatically to
# 'by' for searching.
primary-ions=by

# Specifies whether to do optimization at the protein, peptide or psm
# level. Default = protein.
# Available for barista.
optimization=protein

# Perform parsimony analysis on the proteins and report a parsimony rank
# column in output file. Default=none. Can be
# <string>=none|simple|greedy
# Available for spectral-counts.
parsimony=none

# Which isotopes to use in calcuating peptide mass.
# <string>=average|mono. Default=average.
# Used from command line or parameter file by crux-create-index and
# crux-generate-peptides.  Parameter file only for
# crux-search-for-matches.
isotopic-mass=mono

# Include self-loop peptides in the database.  Default=T.
# Available for crux search-for-xlinks program.
xlink-include-selfloops=true

# Quantification at protein or peptide level (PROTEIN,PEPTIDE).
# Default=PROTEIN.
# Available for spectral-counts and either NSAF and SIN.
quant-level=protein

# Window type to use for selecting candidate peptides. 
# <string>=mass|mz|ppm. Default=mass.
# Available for search-for-matches, search-for-xlinks.
precursor-window-type=ppm

# Search peptides within +/- 'precursor-window' of the spectrum mass. 
# Definition of precursor window depends upon precursor-window-type.
# Default=3.0.
# Available from the parameter file only for crux-search-for-matches,
# crux-create-index, and crux-generate-peptides.
precursor-window=30.0

# What type of threshold to use when parsing matches none|qvalue|custom.
# Default=qvalue.
# used for crux spectral-counts
threshold-type=qvalue

# Set the precision for masses and m/z written to sqt and .txt files. 
# Default=4
# Available from parameter file for all commands.
mass-precision=4

# Specify rules for in silico digestion of proteins. See HTML
# documentation for syntax. Default is trypsin.
# Overrides the enzyme option.  Two lists of residues are given enclosed
# in square brackets or curly braces and separated by a |. The first
# list contains residues required/prohibited before the cleavage site
# and the second list is residues after the cleavage site.  If the
# residues are required for digestion, they are in square brackets, '['
# and ']'.  If the residues prevent digestion, then they are enclosed in
# curly braces, '{' and '}'.  Use X to indicate all residues.  For
# example, trypsin cuts after R or K but not before P which is
# represented as [RK]|{P}.  AspN cuts after any residue but only before
# D which is represented as [X]|[D].
custom-enzyme=__NULL_STR

# Predict flanking peaks for b and y ions (T,F). Default=F.
# Only available for crux-predict-peptide-ion.
flanking=false

# Include peaks +/- 1da around b/y ions in theoretical spectrum. 
# sequest-search and search-for-xlinks default=T. search-for-matches
# default=F.
# Available in the paramter file for all search commands.
use-flanking-peaks=false

# Change the mass of all amino acids 'A' by the given amount.
# For parameter file only.  Default=no mass change.
A=0.0

# Change the mass of all amino acids 'C' by the given amount.
# For parameter file only.  Default=+57.0214637206.
C=57.021

# Change the mass of all amino acids 'D' by the given amount.
# For parameter file only.  Default=no mass change.
D=0.0

# Change the mass of all amino acids 'E' by the given amount.
# For parameter file only.  Default=no mass change.
E=0.0

# Specify the width of the bins used to discretize the m/z axis.  Also
# used as tolerance for assigning ions.  Default=1.0005079 for
# monoisotopic mass or 1.0011413 for average mass.
# Available for crux-search-for-matches and xlink-assign-ions.
mz-bin-width=0.2

# Change the mass of all amino acids 'F' by the given amount.
# For parameter file only.  Default=no mass change.
F=0.0

# Change the mass of all amino acids 'G' by the given amount.
# For parameter file only.  Default=no mass change.
G=0.0

# Change the mass of all amino acids 'H' by the given amount.
# For parameter file only.  Default=no mass change.
H=0.0

# Change the mass of all amino acids 'I' by the given amount.
# For parameter file only.  Default=no mass change.
I=0.0

# Change the mass of all amino acids 'K' by the given amount.
# For parameter file only.  Default=no mass change.
K=0.0

# Change the mass of all amino acids 'L' by the given amount.
# For parameter file only.  Default=no mass change.
L=0.0

# Change the mass of all amino acids 'M' by the given amount.
# For parameter file only.  Default=no mass change.
M=0.0

# Change the mass of all amino acids 'N' by the given amount.
# For parameter file only.  Default=no mass change.
N=0.0

# Change the mass of all amino acids 'P' by the given amount.
# For parameter file only.  Default=no mass change.
P=0.0

# Change the mass of all amino acids 'Q' by the given amount.
# For parameter file only.  Default=no mass change.
Q=0.0

# Change the mass of all amino acids 'R' by the given amount.
# For parameter file only.  Default=no mass change.
R=0.0

# Change the mass of all amino acids 'S' by the given amount.
# For parameter file only.  Default=no mass change.
S=0.0

# Change the mass of all amino acids 'T' by the given amount.
# For parameter file only.  Default=no mass change.
T=0.0

# Change the mass of all amino acids 'V' by the given amount.
# For parameter file only.  Default=no mass change.
V=0.0

# Change the mass of all amino acids 'W' by the given amount.
# For parameter file only.  Default=no mass change.
W=0.0

# Change the mass of all amino acids 'Y' by the given amount.
# For parameter file only.  Default=no mass change.
Y=0.0

# Print version number and quit.
# Available for all crux programs.  On command line use '--version T'.
version=false

# Set the resolution of the observed spectra at m/z 400. Used in
# conjunction with --instrument The default is 100000.
# Available for crux hardklor
resolution=100000.0

# The format to write the output spectra to. By default, the spectra
# will be output in the same format as the MS/MS input.
# Available for crux bullseye
spectrum-format=__NULL_STR

# The minimum length of peptides to consider. Default=6.
# Used from the command line or parameter file by crux-create-index and
# crux-generate-peptides.  Parameter file only for
# crux-search-for-matches.
min-length=6

# Predict the precursor ions, and all associated ions (neutral-losses,
# multiple charge states) consistent with the other specified options.
# (T,F) Default=F.
# Only available for crux-predict-peptide-ions.
precursor-ions=false

# Degree of digestion used to generate peptides.
# <string>=full-digest|partial-digest. Either both ends or one end of a
# peptide must conform to enzyme specificity rules. Default=full-digest.
# Used in conjunction with enzyme option when enzyme is not set to to
# 'no-enzyme'.  Available from command line or parameter file for
# crux-generate-peptides and crux create-index.  Available from
# parameter file for crux search-for-matches.  Digestion rules are as
# follows: enzyme name [cuts after one of these residues][but not before
# one of these residues].  trypsin [RK][P], elastase [ALIV][P],
# chymotrypsin [FWY][P].
digestion=full-digest

# Replace existing files (T) or exit if attempting to overwrite (F).
# Default=F.
# Available for all crux programs.  Applies to parameter file as well as
# index, search, and analysis output files.
overwrite=false

# Choose the algorithm for analyzing combinations of multiple peptide or
# protein isotope distributions. (basic | fewest-peptides |
# fast-fewest-peptides | fewest-peptides-choice |
# fast-fewest-peptides-choice) Default=fast-fewest-peptides.
# Available for crux hardklor
hardklor-algorithm=fast-fewest-peptides

# Parser to use for reading in spectra <string>=pwiz|mstoolkit|crux.
# Default=crux.
# Available for search-for-matches, search-for-xlinks.
spectrum-parser=pwiz

# Set additional options with values in the given file.
# Available for all crux programs. Any options specified on the command
# line will override values in the parameter file.
parameter-file=__NULL_STR

# The minimum number of peaks a spectrum must have for it to be
# searched. Default=20.
# Parameter file only for search-for-matches and sequest-search.
min-peaks=10

# Predict peaks with the given maximum number of nh3 neutral loss
# modifications. Default=0.
# Only available for crux-predict-peptide-ions.
nh3=0

# Type of analysis to make on the match results:
# (RAW|NSAF|dNSAF|SIN|EMPAI). Default=NSAF. 
# Available for spectral-counts.  RAW is raw counts, NSAF is Normalized
# Spectral Abundance Factor, dNSAF is Distributed Spectral Abundance
# Factor, SIN is Spectral Index Normalized and EMPAI is Exponentially
# Modified Protein Abundance Index
measure=NSAF

# Direction of threshold for matches.  If true, then all matches whose
# value is <= threshold will be accepted.  If false, then all matches >=
# threshold will be accepted.  Used with custom-threshold parameter
# (default True).
# Available for spectral-counts.
custom-threshold-min=true

# Number of psms per spectrum to score with xcorr after preliminary
# scoring with Sp. Set to 0 to score all psms with xcorr. Default=500.
# Used by crux-search-for-matches.  For positive values, the Sp
# (preliminary) score acts as a filter; only high scoring psms go on to
# be scored with xcorr.  This saves some time.  If set to 0, all psms
# are scored with both scores. 
max-rank-preliminary=5

# The maximum number of modifications that can be applied to a single
# peptide.  Default=no limit.
# Available from parameter file for crux-search-for-matches.
max-mods=10

# Set the tolerance (+/-ppm) for finding persistent peptides. Default =
# 10.0.
# Available for crux bullseye
persist-tolerance=10.0

# Spectrum charge states to search. <string>=1|2|3|all. Default=all.
# Used by crux-search-for-matches to limit the charge states considered
# in the search.  With 'all' every spectrum will be searched and spectra
# with multiple charge states will be searched once at each charge
# state.  With 1, 2 ,or 3 only spectra with that that charge will be
# searched.
spectrum-charge=all

# MS2 file corresponding to the psm file. Required for SIN.
# Available for spectral-counts with measure=SIN.
input-ms2=__NULL_STR

# Show search progress by printing every n spectra searched. 
# Default=10.
# Set to 0 to show no search progress.  Available for crux
# search-for-matches from parameter file.
print-search-progress=100

# Restrict analysis to only a small window in each segment ( (min-max)
# in m/z). The user must specify the starting and ending m/z values
# between which the analysis will be performed. By default the whole
# spectrum is analyzed.
# Available for crux hardklor
mz-window=__NULL_STR

# Set the signal-to-noise threshold. Any integer or decimal value
# greater than or equal to 0.0 is valid. The default value is 1.0.
# Available for crux hardklor
signal-to-noise=1.0

# The estimated percent of target scores that are drawn from the null
# distribution.
# Used by compute-q-values, percolator and q-ranker
pi-zero=1.0

# Reports peptide intensities as the distribution area. Default false.
# Available for crux hardklor
distribution-area=false

# Prefix added to output file names. Default=none. 
# Used by crux search-for-matches, crux sequest-search, crux percolator
# crux compute-q-values, and crux q-ranker.
fileroot=__NULL_STR

# If true, Hardklor will calculate the local noise levels across the
# spectrum using --sn-window, then select a floor of this set of noise
# levels to apply to the whole spectrum.
# Available for crux hardklor
static-sn=true

# Sort in ascending order.  Otherwise, descending. Default: True.
# Available for sort-by-column
ascending=true

# Compute the Sp score for all candidate peptides.  Default=F
# Available for search-for-matches.  Sp scoring is always done for
# sequest-search.
compute-sp=T

# Set the correlation threshold [0,1.0] to accept a predicted isotope
# distribution.  Default=0.85
# Available for crux hardklor
corr=0.85

# The maximum mass of peptides to consider. Default=8000.
# Available from command line or parameter file for crux bullseye
bullseye-max-mass=8000.00

# If the target and decoy searches were run separately, rather than
# using a concatenated database, then Q-ranker will assume that the
# database search results provided as a required argument are from the
# target database search. This option then allows the user to specify
# the location of the decoy search results. Like the required arguments,
# these search results can be provided as a single file, a list of files
# or a directory. However, the choice (file, list or directory) must be
# consistent for the MS2 files and the target and decoy tab-delimited
# files. Also, if the MS2 and tab-delimited files are provided in
# directories, then Q-ranker will use the MS2 filename (foo.ms2) to
# identify corresponding target and decoy tab-delimited files with names
# like foo*.target.txt and foo*.decoy.txt. This naming convention allows
# the target and decoy txt files to reside in the same directory.
#  Available for q-ranker and barista.
separate-searches=__NULL_STR

# Set level of output to stderr (0-100).  Default=30.
# Available for all crux programs.  Each level prints the following
# messages, including all those at lower verbosity levels: 0-fatal
# errors, 10-non-fatal errors, 20-warnings, 30-information on the
# progress of execution, 40-more progress information, 50-debug info,
# 60-detailed debug info.
verbosity=30

# Specify the location of the left edge of the first bin used to
# discretize the m/z axis. Default=0.68
# Available for crux-search-for-matches.
mz-bin-offset=0.68

# Create an ASCII version of the peptide list.  Default=F.
# Creates an ASCII file in the output directory containing one peptide
# per line.
peptide-list=false

# Choose the charge state determination method. (B|F|P|Q|S). Default=Q.
# Available for crux hardklor
cdm=Q

# The minimum mass of peptides to consider. Default=600.
# Available from command line or parameter file for crux bullseye
bullseye-min-mass=400.00

# Print the theoretical spectrum
# Available for xlink-predict-peptide-ions (Default=F).
print-theoretical-spectrum=false

# Set the tolerance (+/-units) around the retention time over which a
# peptide can be matched to the MS/MS spectrum. The unit of time is
# whatever unit is used in your data file (usually minutes). Default =
# 0.5.
# Available for crux bullseye
retention-tolerance=0.50

# Set the maximum charge state to look for when analyzing a spectrum.
# Default=5.
# Available for crux hardklor
max-charge=5

# Number of decoy peptides to search for every target peptide
# searched.Only valid for fasta searches when --decoys is not none.
# Default=0.
# Use --decoy-location to control where they are returned (which
# file(s)) and --decoys to control how targets are randomized. 
# Available for search-for-matches and sequest-search when searching a
# fasta file. 
num-decoys-per-target=1

# Search only select spectra specified as a single scan number or as a
# range as in x-y.  Default=search all.
# The search range x-y is inclusive of x and y.
scan-number=__NULL_STR

# The maximum number of modified amino acids that can appear in one
# peptide.  Each aa can be modified multiple times.  Default=no limit.
# Available from parameter file for search-for-matches.
max-aas-modified=10

# Include peptides with up to n missed cleavage sites. Default=0.
# Available from command line or parameter file for crux-create-index
# and crux-generate-peptides.  Parameter file only for
# crux-search-for-matches.  When used with
# enzyme=<trypsin|elastase|chymotrpysin>  includes peptides containing
# one or more potential cleavage sites.
missed-cleavages=2

# Output filename for complete list of decoy p-values. 
# Default='search.decoy.p.txt'
# Only available for crux search-for-matches. The location of this file
# is controlled by --output-dir.
search-decoy-pvalue-file=search.decoy.p.txt

# Search decoy-peptides within +/-  'mass-window-decoy' of the spectrum
# mass.  Default=20.0.
# Available for crux search-for-xlinks. 
precursor-window-decoy=30.0

# Set the depth of combinatorial analysis. Default 3.
# Available for crux hardklor
depth=3

# Re-run a previous Q-ranker analysis using a previously computed set of
# lookup tables.
#  Available for q-ranker and barista.
re-run=__NULL_STR

# Specify "no base" averagine. Only modified averagine models will be
# used in the analysis. Default = F 
# Available for crux hardklor
no-base=false

# Print the header line of the tsv file. Default=T.
# Available for crux extract-columns and extract-rows
header=true

# Search result can be as a file or a list of files. This option allows
# users to specify the search results are provided as a list of files by
# setting the --list-of-files option to T. Defualt= false.
# Availabe for barista.
list-of-files=false

# Minimum mass of spectra to be searched. Default=0.
# Available for crux-search-for-matches.
spectrum-min-mass=50.00

# Folder to which results will be written. Default='crux-output'.
# Used by crux create-index, crux search-for-matches, crux
# compute-q-values, and crux percolator.
output-dir=crux-output

# Include dead-end peptides in the database.  Default=T.
# Available for crux search-for-xlinks program.
xlink-include-deadends=true

# Gap size tolerance when checking for peptides across consecutive MS1
# scans. Used in conjunction with --scan-tolerance. Default = 1.
# Available for crux bullseye
gap-tolerance=1

# Type of instrument (fticr|orbi|tof|qit) on which the data was
# collected. Used in conjuction with --resolution. The default is fticr.
# Available for crux hardklor
instrument=fticr

# The maximum length of peptides to consider. Default=50.
# Available from command line or parameter file for crux-create-index 
# and crux-generate-peptides. Parameter file only for
# crux-search-for-matches.
max-length=50

# Set a filter for mzXML files. Default=none
# Available for crux hardklor
mzxml-filter=none

# Generate peptides only once, even if they appear in more than one
# protein (T,F).  Default=F.
# Available from command line or parameter file for
# crux-genereate-peptides. Returns one line per peptide when true or one
# line per peptide per protein occurence when false.  
unique-peptides=true

# Q-ranker analysis begins with a pre-processsing step that creates a
# set of lookup tables which are then used during training. Normally,
# these lookup tables are deleted at the end of the Q-ranker analysis,
# but setting this option to T prevents the deletion of these tables.
# Subsequently, the Q-ranker analysis can be repeated more efficiently
# by specifying the --re-run option. Default = F.
# Available for q-ranker and barista.
skip-cleanup=false

# Predict ions up to max charge state (1,2,...,6) or up to the charge
# state of the peptide (peptide).  If the max-ion-charge is greater than
# the charge state of the peptide, then the max is the peptide charge.
# Default='peptide'.
# Available for predict-peptide-ions and search-for-xlinks. Set to
# 'peptide' for search.
max-ion-charge=peptide

# The maximum mass of peptides to consider. Default=7200.
# Available from command line or parameter file for crux-create-index
# and crux-generate-peptides. Parameter file only for
# crux-search-for-matches.
max-mass=7200.00

# Ignore peptides that persist for this length. The unit of time is
# whatever unit is used in your data file (usually minutes). These
# peptides are considered contaminants. Default = 2.0.
# Available for crux bullseye
max-persist=2.00

# The minimum mass of peptides to consider. Default=200.
# Available from command line or parameter file for crux-create-index
# and crux-generate-peptides. Parameter file only for
# crux-search-for-matches.
min-mass=300.000

# Set the maximum number of peptides or proteins that are estimated from
# the peaks found in a spectrum segment. The default value is 10.
# Available for crux hardklor
max-p=10

# Set the precision for scores written to sqt and text files. Default=8.
# Available from parameter file for crux search-for-matches, percolator,
# and compute-q-values.
precision=8

# Specify location of decoy search results.
# <string>=target-file|one-decoy-file|separate-decoy-files.
# Default=separate-decoy-files.
# Applies when decoys is not none.  Use 'target-file' to mix target and
# decoy search results in one file. 'one-decoy-file' will return target
# results in one file and all decoys in another. 'separate-decoy-files'
# will create as many decoy files as num-decoys-per-target.
decoy-location=separate-decoy-files

# The threshold to use for filtering matches. Default=0.01.
# Available for spectral-counts.  All PSMs with higher (or lower) than
# this will be ignored.
threshold=0.010

# Set the tolerance (+/-ppm) for exact match searches. Default = 10.0.
# Available for crux bullseye
exact-tolerance=10.0

# Specify a fixed modification to apply to the N-terminus of peptides.
# Available from parameter file for crux sequest-search and
# search-for-matches.
nmod-fixed=NO MODS

# Q-ranker uses enriched feature set derived from the spectra in ms2
# files. It can be forced to use minimal feature set by setting the
# --use-spec-features option to F. Default T.
# Available for q-ranker and barista.
use-spec-features=true

# Set the maximum width of any set of peaks in a spectrum when computing
# the results (in m/z). Thus, if the value was 5.0, then sets of peaks
# greater than 5 m/z are divided into smaller sets prior to analysis.
# The default value is 4.0.
# Available for crux hardklor
max-width=4.00

# Specify a fixed modification to apply to the C-terminus of peptides.
# Available from parameter file for crux sequest-search and
# search-for-matches.
cmod-fixed=NO MODS

# Optional file into which psm features are printed. Default=F.
# Available for percolator and q-ranker.  File will be named
# <fileroot>.percolator.features.txt or <fileroot>.qranker.features.txt.
feature-file=T

# Set the signal-to-noise window length (in m/z). Because noise may be
# non-uniform across a spectra, this value adjusts the segment size
# considered when calculating a signal-over-noise ratio. The default
# value is 250.0.
# Available for crux hardklor
sn-window=250.00

# Enzyme to use for in silico digestion of proteins.
# <string>=trypsin|chymotrypsin|elastase|clostripain|
# cyanogen-bromide|iodosobenzoate|proline-endopeptidase|
# staph-protease|aspn|modified-chymotrypsin|no-enzyme. Default=trypsin.
# Used in conjunction with the options digestion and missed-cleavages.
# Use 'no-enzyme' for non-specific digestion.  Available from command
# line or parameter file for crux-generate-peptides and crux
# create-index.  Available from parameter file for crux
# search-for-matches.   Digestion rules: enzyme name [cuts after one of
# these residues]|{but not before one of these residues}.  trypsin
# [RK]|{P}, elastase [ALIV]|{P}, chymotrypsin [FWY]|{P}, clostripain
# [R]|[], cyanogen-bromide [M]|[], iodosobenzoate [W]|[],
# proline-endopeptidase [P]|[], staph-protease [E]|[],
# modified-chymotrypsin [FWYL]|{P}, elastase-trypsin-chymotrypsin
# [ALIVKRWFY]|{P},aspn []|[D] (cuts before D).
enzyme=trypsin

# Specifies the data type the column contains (int|real|string) Default:
# string
# Available for crux extract-rows
column-type=string

# Predict peaks with the given maximum number of h2o neutral loss
# modifications. Default=0.
# Only available for crux-predict-peptide-ions.
h2o=0

# Use a name of custom threshold rather than (default NULL)
# Available for spectral-counts.
custom-threshold-name=__NULL_STR

# Ignore peptides with multiple mappings to proteins (T,F). Default=F.
# Available for spectral-counts.
unique-mapping=false

# Set the sensitivity level. There are four levels, 0 (low), 1
# (moderate), 2 (high), and 3 (max). The default value is 2.
# Available for crux hardklor
sensitivity=2

# Compute p-values for the main score type. Default=F.
# Currently only implemented for XCORR.
compute-p-values=false

# Require an exact match to the precursor ion. Rather than use wide
# precursor boundaries, this flag forces Bullseye to match precursors to
# the base isotope peak identified in Hardklor. The tolerance is set
# with the --persist-tolerance flag. Default = F.
# Available for crux bullseye
exact-match=false

# Include alternative averagine models in the analysis that  incorporate
# additional atoms or isotopic enrichments.
# Available for crux hardklor
averagine-mod=__NULL_STR

# The number of PSMs per spectrum writen to the output  file(s). 
# Default=5.
# Available from parameter file for crux-search-for-matches.
top-match=5

# Maximum mass of spectra to search. Default no maximum.
# Available for crux-search-for-matches.
spectrum-max-mass=8000.00

# Include linear peptides in the database.  Default=T.
# Available for crux search-for-xlinks program (Default=T).
xlink-include-linears=true

# Specifies the prefix of the protein names that indicates a decoy.
# Default = rand_.
#  Available for q-ranker and barista.
decoy-prefix=rand_

# Include a decoy version of every peptide by shuffling or reversing the
# target sequence. 
# <string>=none|reverse|protein-shuffle|peptide-shuffle. Use 'none' for
# no decoys.  Default=protein-shuffle.
# For create-index, store the decoys in the index.  For search, either
# use decoys in the index or generate them from the fasta file.
decoys=reverse

# Set the minimum charge state to look for when analyzing a spectrum.
# Default=1.
# Available for crux hardklor
min-charge=1

# Print peptide sequence (T,F). Default=F.
# Available only for crux-generate-peptides.
output-sequence=false

# Print the masses of modified sequences in one of three ways
# 'mod-only', 'total' (residue mass plus modification), or 'separate'
# (for multiple mods to one residue): Default 'mod-only'.
# Available in the parameter file for search-for-matches, sequest-search
# and generate-peptides.
mod-mass-format=mod-only

# Specify a variable modification to apply to peptides.  <mass
# change>:<aa list>:<max per peptide>:<prevents cleavage>:<prevents
# cross-link>.  Sub-parameters prevents cleavage and prevents cross-link
# are optional (T|F).Default=no mods.
# Available from parameter file for crux-generate-peptides and
# crux-search-for-matches and the the same must be used for crux
# compute-q-value.
mod=15.995:M:10
mod=-18.011:E:1

# Specify a variable modification to apply to N-terminus of peptides. 
# <mass change>:<max distance from protein n-term (-1 for no max)>
# Available from parameter file for crux-generate-peptides and
# crux-search-for-matches and the the same must be used for crux
# compute-q-value.
nmod=42.011:0
nmod=-17.027:C:5000
#nmod=-18.011:E:5000
# nmod=-17.027:-1

# Specify a variable modification to apply to C-terminus of peptides.
# <mass change>:<max distance from protein c-term (-1 for no max)>.
# Default=no mods.
# Available from parameter file for crux-generate-peptides and
# crux-search-for-matches and the the same must be used for crux
# compute-q-value.
cmod=NO MODS

William S Noble

unread,
Nov 16, 2012, 5:35:21 PM11/16/12
to Minglism, crux-users
On Thu, Nov 15, 2012 at 3:24 PM, Minglism <mingg...@gmail.com> wrote:
Hi colleagues,

I am a new user of Crux. I installed the crux 1.39 (Cygwin 1.7.16) on Windows 7. But I encountered several issues:

1. search-for-matches runs very slow. I created indexes for the fasta file, but  It still took me 29 hrs to search 88,000 spectra in Ecoli database (4339 protein hits) with my computer (Intel Core i5-480M 2.66GHz). I don't know what I did wrong. I attached my parameter setting file in the email. Hope you guys can give me some advice.


There are several issues at play here.  First, you are using several variable modifications, which will slow down your search relative to a search without mods.  However, we ran your analysis on a comparable machine running Windows 7 + Cygwin, and the search only required approximately 1.5 hours.  There are some Cygwin interactions with anti-virus software that can really slow things down. If you are using an AV program, then you can add the Cygwin directory to the 'exclude from checking list.' This may yield some speedup.

Longer term, we intend to add an option to encode variable modifications into the peptide index, which will require more disk space but will yield much faster searching.

 
2. I want to specify variable modifications on the N-terminal Q or E of a peptide. Is there any way to do that?


No, we do not currently support modifications of specific residues at N- or C-termini.  If this is important to you, then let us know, and we can add it to our list of requested features.

3. Our searches crashed when I tried to specify modification masses with more than 3 decimal places. But it works with 3 or less. Any idea why or how to fix it?


We have not been able to reproduce this problem.  Simply adding digits to the modification masses in the file that you sent does not cause any problems.  Can you send us the parameter file that causes this, along with the resulting log file?
 
4. crux percolator reports errors when processing modified peptides. Here is an example:

WARNING: No modification identifier found for mass shift 16.00.
ERROR: There is an unidentifiable modification in sequence <ITPVVFIM[16]K> at position 10.
WARNING: No modification identifier found for mass shift 42.00.
ERROR: There is an unidentifiable modification in sequence <A[42]LAGELPLLK> at position 3.


This is a known issue, related to the fact that we have changed our method of reporting modifications from the old SEQUEST style (which used various punctuation marks to represent different modifications) to the more widely used "bracketed mass value" representation.  Unfortunately, the old version of Percolator that still exists in Crux does not support this changed format.  We are a few weeks away from releasing a version of Crux that will fix this problem. Meanwhile, if you are interested in hacking your output file so that it will be readable by the current version of crux percolator, we can send instructions for how to do that.

Bill

 
Thanks in advance for your advice and help.

Mingguo

--
 
 

William S Noble

unread,
Nov 20, 2012, 12:49:06 PM11/20/12
to mingguo xu, crux-users
On Mon, Nov 19, 2012 at 8:50 AM, mingguo xu <mingg...@gmail.com> wrote:


On Fri, Nov 16, 2012 at 3:35 PM, William S Noble <thab...@gmail.com> wrote:
On Thu, Nov 15, 2012 at 3:24 PM, Minglism <mingg...@gmail.com> wrote:
Hi colleagues,

I am a new user of Crux. I installed the crux 1.39 (Cygwin 1.7.16) on Windows 7. But I encountered several issues:

1. search-for-matches runs very slow. I created indexes for the fasta file, but  It still took me 29 hrs to search 88,000 spectra in Ecoli database (4339 protein hits) with my computer (Intel Core i5-480M 2.66GHz). I don't know what I did wrong. I attached my parameter setting file in the email. Hope you guys can give me some advice.


There are several issues at play here.  First, you are using several variable modifications, which will slow down your search relative to a search without mods.  However, we ran your analysis on a comparable machine running Windows 7 + Cygwin, and the search only required approximately 1.5 hours.  There are some Cygwin interactions with anti-virus software that can really slow things down. If you are using an AV program, then you can add the Cygwin directory to the 'exclude from checking list.' This may yield some speedup.

Longer term, we intend to add an option to encode variable modifications into the peptide index, which will require more disk space but will yield much faster searching.

The *.ms2 file I sent to you was just a small portion (3200 spectra) of a large file (88k spectra). So in terms of time, it is similar. But just for comparison, with the same small file I sent you, X!Tandem only took about 10 min to finish under the same setting. Mascot took even shorter, less than 5 min. But I still would love to be able use Crux in the future. Hope the coming new version will increase the search speed. 

Thanks for the comparison numbers.  Would you be willing to send us a copy of your Mascot and X!Tandem command lines / configuration files?  I'd like to investigate the source of this discrepancy in running times.
 
PS: I noticed that crux only uses one core. I think nowadays multi-core computers are very common. It would be great to include multi-thread calculation. Just a suggestion.

This is definitely on the list.
 

 
2. I want to specify variable modifications on the N-terminal Q or E of a peptide. Is there any way to do that?


No, we do not currently support modifications of specific residues at N- or C-termini.  If this is important to you, then let us know, and we can add it to our list of requested features.

It is important to me. I think this would be a good option to have anyways, considering that it can reduce the search space considerably.   
 
OK, I have added it to our list of requested features.
 

3. Our searches crashed when I tried to specify modification masses with more than 3 decimal places. But it works with 3 or less. Any idea why or how to fix it?


We have not been able to reproduce this problem.  Simply adding digits to the modification masses in the file that you sent does not cause any problems.  Can you send us the parameter file that causes this, along with the resulting log file?

Here is the parameter file (At line 753, if I change the mod=15.994919:M:10 to mod=15.995:M:10, it has no problem. ): https://docs.google.com/open?id=0B7QE4Zq1OtXLVnBRei1pbm4wU1k
Here is the search log (with verbosity = 60):
 
 
4. crux percolator reports errors when processing modified peptides. Here is an example:

WARNING: No modification identifier found for mass shift 16.00.
ERROR: There is an unidentifiable modification in sequence <ITPVVFIM[16]K> at position 10.
WARNING: No modification identifier found for mass shift 42.00.
ERROR: There is an unidentifiable modification in sequence <A[42]LAGELPLLK> at position 3.


This is a known issue, related to the fact that we have changed our method of reporting modifications from the old SEQUEST style (which used various punctuation marks to represent different modifications) to the more widely used "bracketed mass value" representation.  Unfortunately, the old version of Percolator that still exists in Crux does not support this changed format.  We are a few weeks away from releasing a version of Crux that will fix this problem. Meanwhile, if you are interested in hacking your output file so that it will be readable by the current version of crux percolator, we can send instructions for how to do that.

I would love to know the instruction. Thanks very much.

OK, it turns out that the two issues above are related.  Crux actually requires that all of the modifications be specified to the same level of precision.  Apparently, if your parameter file contains modifications with different precisions, it causes the error you saw.  This is a bug, and will be fixed, but the workaround for now is simply to make sure that you add extra 0s to some of your specified mods so that they all have the same precision.

I suspect that this issue is also causing problems for Percolator.  If your modifications are specified in the output file as [16.00], then Percolator needs to parse the mods using the same precision.  In theory, Percolator should figure this out, but in case it doesn't do so, you can use the "hidden" parameter named "mod-precision" to specify the number of digits after the decimal.

Please let us know if this does not solve your problem.

Bill

 


Thanks for your help.

Mingguo

 

Bill

 
Thanks in advance for your advice and help.

Mingguo

--
 
 




--
徐明果
Mingguo Xu

B. Frewen

unread,
Nov 20, 2012, 2:30:17 PM11/20/12
to mingguo xu, crux-users
Hi Mingguo,

A quick thought on search speed. You give one of your modification
parameters as

mod=15.995:M:10

Ten seems like a lot of modifications for one peptide. If you can get by
with fewer, that could speed up your search time dramatically. Three is
often sufficient.

Just a thought.

Barbara
> ???
> Mingguo Xu
>
>
> --
>
>
>
>

Charles Grant

unread,
Dec 4, 2012, 1:38:41 PM12/4/12
to Minglism, crux-users, William S Noble
Hi Minguo,

On Nov 16, 2012, at 2:35 PM, William S Noble <thab...@gmail.com> wrote:

>
> 3. Our searches crashed when I tried to specify modification masses with more than 3 decimal places. But it works with 3 or less. Any idea why or how to fix it?
>

We're able to reproduce this problem now, but only when running Crux under Cygwin. Unfortunately our ability to debug the Cygwin version of Crux is currently limited because of bugs in the current Cygwin version of the debugger. We will try to address this problem in a future version, but in the meantime the workaround it to limit modifications masses to 3 decimal places.

Sorry for the inconvenience. We do appreciate your taking the time to report the problem.

Charles
Reply all
Reply to author
Forward
0 new messages