parsing the peptides

lwott

unread,

Mar 30, 2006, 8:20:12 AM3/30/06

to spctools-discuss

What *.cxx or *.h is respobsible for parsing the peptide out of the
.xml file? I am working on a way to get the TPP to work with the GIST
label. I have ASAPRatio successful generating peptide ratios.
However, the databaseparser is failing to correctly lookup the protein
name. In order to get this work I had to add a Z to all my peptides.
The light peptides look like 'ZXXXXX' and the heavy look like
'Z(subscript 45)XXXXX'. I just need the parsers to ignore the Z. I
can't seem to find the correct file in the source code. Once I get the
peptides correctly parsed I would like to submit the changes I have
made as I think that they will be helpful for anyone trying to specify
modified termini.
Thanks,
Lee

David Shteynberg

unread,

Mar 30, 2006, 11:11:17 AM3/30/06

to spctools...@googlegroups.com

Hi Lee,

I'm not exactly sure what you are trying to do, but the tools already have the ability to represent terminal modifications (without the use of special placer holders). I don't think that there is one specific to tool for parsing a peptide out of a pepxml file, you may have to write one that does exactly what you want. DatabaseParser's job is to parse out the database name from the pepXML, it will not be able to look up a protein name because it really wasn't written for that purpose. But with open source code you are free to change the code to customize it for your own purpose. Anyways, I am pretty sure the tools are already flexible enough to handle your workflow, and with a little discussion I think we can massage the tools to work correctly on your data. Can you send me the specifics of this labelling method?

Thanks,
-David

winmail.dat

Ott, Lee William

unread,

Mar 30, 2006, 11:48:37 AM3/30/06

to spctools...@googlegroups.com

The GIST label modifies both the N-terminus and all lysines. If you could show my how to do this, it would be great. I have run the TPP without these special place holders and have been unsuccessful.
Thanks,
Lee

________________________________

winmail.dat

Ott, Lee William

unread,

Mar 30, 2006, 12:01:19 PM3/30/06

to spctools...@googlegroups.com

David,
I looked at the my interact.xml file before running ASAPRatio and 'Non Existent' appears under the protein column. I don't think that DatabaseParser is running correctly but I don't get any error messages. Any suggestions?
Thanks,
Lee

________________________________

From: spctools...@googlegroups.com on behalf of David Shteynberg
Sent: Thu 3/30/2006 11:11 AM
To: spctools...@googlegroups.com
Subject: RE: parsing the peptides

winmail.dat

Ott, Lee William

unread,

Mar 30, 2006, 1:12:19 PM3/30/06

to spctools...@googlegroups.com

David,
DatabaseParser is not working correctly with the peptides that I modified by adding the Z. All the protein names are 'Non Existent'. Any suggestions as to why adding 'Z' would disrupt the function of the parser.
Thanks,
Lee

________________________________

From: spctools...@googlegroups.com on behalf of David Shteynberg
Sent: Thu 3/30/2006 11:11 AM
To: spctools...@googlegroups.com
Subject: RE: parsing the peptides

winmail.dat

Jimmy Eng

unread,

Mar 30, 2006, 1:22:42 PM3/30/06

to spctools...@googlegroups.com

Likely because the program is looking for proteins that contain the
peptide sequence ZXXXXXXX and no protein in the database has that sequence.

Ott, Lee William

unread,

Mar 30, 2006, 1:39:15 PM3/30/06

to spctools...@googlegroups.com

David,
The reason that I am making these changes is so that I only have to do one sequest search. I then run the .html files through a series of filters that makes sure the peptide are either all heavy or all light. I then add a Z onto all peptides and change the '[' indicating a heavy n-termini to '#'. I think that I have most of the kinks worked out, I just need to get the parser to correctly parse out the protein name.
Thanks,
Lee

________________________________

From: spctools...@googlegroups.com on behalf of David Shteynberg
Sent: Thu 3/30/2006 11:11 AM
To: spctools...@googlegroups.com
Subject: RE: parsing the peptides

winmail.dat

Ott, Lee William

unread,

Mar 30, 2006, 1:45:13 PM3/30/06

to spctools...@googlegroups.com

I would like to change the source code to ignore the Z when looking in the database. Where do I need to change this in the source code?
Thanks,
Lee

________________________________

winmail.dat

David Shteynberg

unread,

Mar 30, 2006, 2:24:27 PM3/30/06

to spctools...@googlegroups.com

Hi Lee,

I believe we’ve already been successful quantifying this type of labeling with the existing tools.

You need to do two searches: heavy and light. Assume you have a directory …/data.

Create two directories ../data/heavy …/data/light
in the corresponding directories run two searches specifying the labels as static mods
Convert the resulting summary.html files to pepXML separately in the tow directories
From the …/data directory run the following command ‘xinteract –A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml’

Let me know if you need me to further elaborate on any of these steps.

Thanks,

-David

From: spctools...@googlegroups.com [mailto:spctools...@googlegroups.com] On Behalf Of Ott, Lee William
Sent: Thursday, March 30, 2006 10:45 AM
To: spctools...@googlegroups.com
Subject: RE: parsing the peptides

I would like to change the source code to ignore the Z when looking in the database. Where do I need to change this in the source code?

Thanks,

Lee

Ott, Lee William

unread,

Mar 30, 2006, 2:37:55 PM3/30/06

to spctools...@googlegroups.com

I am running a 12 step MuDPIT. If I run two searches, it would take twice as long to run Sequest. It would then take twice as long to analyze the 24 .html files. This would take at least a couple of weeks to analyze this dataset. It would seem more efficient if only one search needs to be performed.
I would like to try to make it work with only one Sequest search.
Thanks,
Lee

________________________________

From: spctools...@googlegroups.com on behalf of David Shteynberg
Sent: Thu 3/30/2006 2:24 PM
To: spctools...@googlegroups.com
Subject: RE: parsing the peptides

Hi Lee,

I believe we've already been successful quantifying this type of labeling with the existing tools.

You need to do two searches: heavy and light. Assume you have a directory .../data.

1. Create two directories ../data/heavy .../data/light
2. in the corresponding directories run two searches specifying the labels as static mods
3. Convert the resulting summary.html files to pepXML separately in the tow directories
4. From the .../data directory run the following command 'xinteract -A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml'

winmail.dat

David Shteynberg

unread,

Mar 30, 2006, 2:48:58 PM3/30/06

to spctools...@googlegroups.com

Actually, it would probably take longer (~twice as long) to do the search but specifying the labels as variable because in that case you are considering all possible permutations of variable mods in your peptides which increases your database size by roughly a factor of two for every variable modification you use. Are you searching with proteolytically (enzyme) unconstrained settings? You may be able to speed things up by doing the search with one terminal of the peptide being proteolytically constrained. I think if you do a benchmark on a small size file you will see that with 2 variable mods the search takes longer than two searches with static mods.

-David

From: spctools...@googlegroups.com [mailto:spctools...@googlegroups.com] On Behalf Of Ott, Lee William
Sent: Thursday, March 30, 2006 11:38 AM
To: spctools...@googlegroups.com
Subject: RE: parsing the peptides

I am running a 12 step MuDPIT. If I run two searches, it would take twice as long to run Sequest. It would then take twice as long to analyze the 24 .html files. This would take at least a couple of weeks to analyze this dataset. It would seem more efficient if only one search needs to be performed.

I would like to try to make it work with only one Sequest search.

Thanks,

Lee

From: spctools...@googlegroups.com on behalf of David Shteynberg
Sent: Thu 3/30/2006 2:24 PM
To: spctools...@googlegroups.com
Subject: RE: parsing the peptides

Hi Lee,

I believe we’ve already been successful quantifying this type of labeling with the existing tools.

You need to do two searches: heavy and light. Assume you have a directory …/data.

Create two directories ../data/heavy …/data/light

in the corresponding directories run two searches specifying the labels as static mods

Convert the resulting summary.html files to pepXML separately in the tow directories

From the …/data directory run the following command ‘xinteract –A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml’

Ott, Lee William

unread,

Mar 30, 2006, 3:00:57 PM3/30/06

to spctools...@googlegroups.com

I do specify tryptic peptides. I agree that the variable modifications would take longer to do the Sequest search. However, I would imagine the pipeline analysis would be much slower. ASAPRatio would have to look through twice as many peptides. In my experience, ASAPRatio is the slowest part of the entire process. Decreasing the number of input peptides should decrease the amount of time ASAPRatio takes.
Thanks,
Lee

________________________________

-David

________________________________

Thanks,

Lee

________________________________

Hi Lee,

You need to do two searches: heavy and light. Assume you have a directory .../data.

1. Create two directories ../data/heavy .../data/light

2. in the corresponding directories run two searches specifying the labels as static mods
3. Convert the resulting summary.html files to pepXML separately in the tow directories
4. From the .../data directory run the following command 'xinteract -A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml'

winmail.dat

Jimmy Eng

unread,

Mar 30, 2006, 3:05:55 PM3/30/06

to spctools...@googlegroups.com

I believe you want RefreshParser.cxx; look around lines 194 where
peptides are read in from the pepXML file. Beyond those hints, good luck.

David Shteynberg

unread,

Mar 30, 2006, 4:24:25 PM3/30/06

to spctools...@googlegroups.com

Hi Lee,

ASAPRatio works with validated peptides, so many of the false identifications should not get passed to this stage. I think that by doing two searches you gain the benefit on not retaining false positives that contain both light and heavy labels and are therefore, not likely to be correct. I don’t believe that by doing one search you are actually reducing the number of correct peptides ID’s input to ASAPRatio. By doing two searches with static mods you should recover roughly the same total number of correct ID’s as by doing a single search with variable mods. I hope that this makes sense. The best way to test whether my thoughts are correct would be to run some benchmarks (with small datasets) and compare the running time.

You need to do two searches: heavy and light. Assume you have a directory …/data.

Create two directories ../data/heavy …/data/light

in the corresponding directories run two searches specifying the labels as static mods

Convert the resulting summary.html files to pepXML separately in the tow directories

From the …/data directory run the following command ‘xinteract –A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml’

Let me know if you need me to further elaborate on any of these steps.

Thanks,

-David

From: spctools...@googlegroups.com [mailto:spctools...@googlegroups.com] On Behalf Of Ott, Lee William
Sent: Thursday, March 30, 2006 10:45 AM
To: spctools...@googlegroups.com
Subject: RE: parsing the peptides

I would like to change the source code to ignore the Z when looking in the database. Where do I need to change this in the source code?

Thanks,

Lee

From: spctools...@googlegroups.com on behalf of Jimmy Eng
Sent: Thu 3/30/2006 1:22 PM
To: spctools...@googlegroups.com
Subject: Re: parsing the peptides

Ott, Lee William

unread,

Apr 3, 2006, 9:15:06 AM4/3/06

to spctools...@googlegroups.com

David,
I tried running ASAPRatio with the two searches and ran into the same problem that I had before. Here is a copy from my post last April 2005.

"I was running my data through the new pipeline. When I look at the interact.XML file, I see the modifications setup correctly for the K and C(carboxyamidomethylation). However, there is a listing for amino acid N which isn't specified anywhere in the sequest params file. There also is no "n" for the N-termini as specified in the params file. "

At this point I can't get my GIST data through the pipeline. I am stuck with the one search method I developed because I cannot get DatabaseParser to correctly parse out the protein name. I am stuck with the TPP because of the problem with converting my .html files to .xml.

Thanks,
Lee

________________________________

Hi Lee,

Thanks,

-David

________________________________

Thanks,

Lee

________________________________

-David

________________________________

Thanks,

Lee

________________________________

Hi Lee,

You need to do two searches: heavy and light. Assume you have a directory .../data.

1. Create two directories ../data/heavy .../data/light

2. in the corresponding directories run two searches specifying the labels as static mods
3. Convert the resulting summary.html files to pepXML separately in the tow directories
4. From the .../data directory run the following command 'xinteract -A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml'

winmail.dat

David Shteynberg

unread,

Apr 3, 2006, 12:17:33 PM4/3/06

to spctools...@googlegroups.com

Hi Lee,

What is the static N term modification that you are using in the Sequest params file? I’ve seen this error before (and it should be corrected in the latest version) with Sequest2XML having a problem because there are a few ways that N-term static mods can be specified. Here are two ways to specify this parameter:

1:

add_Cterm_peptide = 0.0000 ; added to each peptide C-terminus

add_Cterm_protein = 0.0000 ; added to each protein C-terminus

add_Nterm_peptide = xxxxx ; added to each peptide N-terminus

add_Nterm_protein = 0.0000 ; added to each protein N-terminus

2:

add_C_terminus = 0.0000 ; added to each peptide C-terminus

add_N_terminus = xxxxx ; added to each peptide N-terminus

The first way is the new way to specify this param, which will not work unless you are running the latest version of TPP (2.8.0 or above). I suspect your params are specified as in option 1. I suggest you specify these with option 2 and rerun the TPP from the Sequest2XML step. (There is no need to re-search the spectra.) Let me know if that still doesn’t work (or if you have version 2.8.0 or above and modification is still not getting correctly recorded, in which case we still need to fix this problem.)

You need to do two searches: heavy and light. Assume you have a directory …/data.

Create two directories ../data/heavy …/data/light

in the corresponding directories run two searches specifying the labels as static mods

Convert the resulting summary.html files to pepXML separately in the tow directories

From the …/data directory run the following command ‘xinteract –A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml’

Ott, Lee William

unread,

Apr 3, 2006, 1:14:31 PM4/3/06

to spctools...@googlegroups.com

David,
That worked. Why is the mass specified in the xml file as 46 when I specified it as 45 in the parameters file?
Thanks,
Lee

________________________________

Hi Lee,

1:

2:

Thanks,

-David

________________________________

David,

Thanks,

Lee

________________________________

Hi Lee,

Thanks,

-David

________________________________

Thanks,

Lee

________________________________

-David

________________________________

Thanks,

Lee

________________________________

Hi Lee,

You need to do two searches: heavy and light. Assume you have a directory .../data.

1. Create two directories ../data/heavy .../data/light

2. in the corresponding directories run two searches specifying the labels as static mods
3. Convert the resulting summary.html files to pepXML separately in the tow directories
4. From the .../data directory run the following command 'xinteract -A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml'

winmail.dat

David Shteynberg

unread,

Apr 3, 2006, 1:29:53 PM4/3/06

to spctools...@googlegroups.com

Hi Lee,

That’s great! I might be wrong here, but I think that the off-by-one difference has to do with accounting for the extra hydrogen on the N terminus. Anyways, let me know if that is preventing you from getting good quantitation peaks.

You need to do two searches: heavy and light. Assume you have a directory …/data.

Create two directories ../data/heavy …/data/light

in the corresponding directories run two searches specifying the labels as static mods

Convert the resulting summary.html files to pepXML separately in the tow directories

From the …/data directory run the following command ‘xinteract –A-lnK-S heavy/<your pepxml files>*.xml light/<your pepxml files>*.xml’

Reply all

Reply to author

Forward