Hi,
Sorry to barge into this out of the blue, but the topic prompted me to
suggest that perhaps we could consider adding our DBToolkit
(
https://code.google.com/p/dbtoolkit/) at some point.
It knows how to do nearly any trick to a database, and (like SearchGUI
and PeptideShaker) runs in command line mode as well as GUI mode. It's
also a long-standing piece of software (ten years old :)) that has
become very feature-rich and rock-solid stable over the years.
Some features of DBToolkit:
- Digestion, including a full predefined enzyme list + custom
definable enzymes (including regular expression-based enzymes)
+'bifunctional enzymes' (creates peptides where the N-temrinus is
cleaved according to a different enzyme specificity than the C-terminus;
great for studying in vivo protease cleavage followed by trypsin)
- Replacing residues in sequences based on regular expressions
- Filtering of entries based on their location, weight, or composition
(custom format [e.g. ">2R AND (L OR I) AND !W"] or regular expression)
- 'ragging' of databases
- Creating shuffled or reversed versions
- Lossless removal of sequence-level redundancy
- Ability to 'mature' sequences from UniProtKB/Swiss-Prot (i.e.,
removing all annotated pre- and pro-peptides)
- Sensible parsing of very many different types of FASTA headers
Should you be interested, the manual tells you everything :).
https://code.google.com/p/dbtoolkit/source/browse/trunk/src/main/resources/about.txt
Cheers,
lnnrt.