how to specify heavy label for oxidised methionine in ASAPRatio

73 views
Skip to first unread message

Alastair Skeffington

unread,
Aug 5, 2020, 2:02:11 PM8/5/20
to spctools-discuss
Hello,

I'm trying to run a 14N/15N labelling experiment through ASAPRatio. I've been running the first steps like this - here for the results of a database search with heavy masses:

InteractParser sample_interact.pep.xml sample.pep.xml

PeptideProphetParser sample_interact.pep.xml

RefreshParser sample_interact.pep.xml ./EhuxAllproteins_MCC_decoy.fasta

ASAPRatioPeptideParser sample_interact.pep.xml  -lACDEFGHIKLMNPQRSTVWY -r8 -mA72.0779R160.1857N116.1026D116.0874C104.1429E130.1140Q130.1292G58.0513H140.1393I114.1576L114.1576K130.1723M132.1961F148.1739P99.1152S89.0773T102.1038W188.0793Y164.0633V101.1311

At this point I get a warning:

WARNING: Found more than one variable mod on 'M'. Please make sure to specify a heavy mass for this residue

So I have two questions:

1) How do I specify the heavy mass for oxidised methionine? Mox ? And is phosphorylated serine coded Sp ?

2) I've used -r8 instead of the default 0.5. My reasoning is that a medium sized heavy peptide could easily differ from the 14N counterpart by 16 Da. Assuming charge +2, then using a m/z range of 8. Does this sound remotely sensible?

Many thanks!
Alastair

David Shteynberg

unread,
Aug 5, 2020, 2:37:16 PM8/5/20
to spctools-discuss
Hi Alastair,

If your search results are either all heavy or all light (not variable mod searched) then you should also use option -S.  

1). You cannot specify anything but single amino acids in this string.  Your quantitation will be based on peptides without PTMs in this dataset.

2). -r8 is a MUCH too wide window to recover the MS1 signal in RTspace.  The lower this number the more selective the tool is at isolating your target signal.  With -r8 you will not be quantifying the correct signal, unless you have a very bare sample.

If you are able to share this data I can try running it to help you optimize your settings.

Thanks,
-David

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/4853fda5-cc2f-48e2-a9e2-2ae93225b288o%40googlegroups.com.

Alastair Skeffington

unread,
Aug 6, 2020, 4:17:00 PM8/6/20
to spctools-discuss
Hi David,

Many thanks for your reply.

So it any peptides with modifications will simply be ignored for quantification and I can ignore the warning message?

Yes - each search was either light or heavy as defined in the static modifications. 

And the -r parameter is then the window to search in the RT dimension? Because the command line option says 'range around precursor m/z to search for peak' I assumed this was for the m/z dimension. I thought that the shift of the peak would also mostly be in the m/z dimension - but I'm no mass spectrometrist!

I would be amazing if you had a moment to have a look at the data - thanks very much for offering. I've put two example pairs of files here: https://we.tl/t-cVDD1MVDOr 

For each sample there is a light 'L' version of the search results and a heavy 'H' version. I've also included the search database (based on some PacBio data some I'm afraid it's quite big with a lot of isoforms).

Many thanks,
Alastair

To unsubscribe from this group and stop receiving emails from it, send an email to spctools...@googlegroups.com.

Alastair Skeffington

unread,
Aug 17, 2020, 2:20:59 PM8/17/20
to spctools-discuss
Hi David,

The link to the data has now expired, Let me know if you are still willing to have a look and I can send it again.

Otherwise maybe you could briefly describe the process you would go through?

Many thanks,
Alastair

David Shteynberg

unread,
Aug 17, 2020, 2:51:54 PM8/17/20
to spctools-discuss
Hello Alastair,

I downloaded the file you shared with me, however, there were no mzML files to match the pepXML files so I couldn't actually try the analysis.   I will make another attempt if you can provide the mass spec data. 

Thanks,
-David

To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/3f3e8d01-8fd7-4d40-acde-1b7147769108o%40googlegroups.com.

Alastair Skeffington

unread,
Aug 17, 2020, 3:23:55 PM8/17/20
to spctools-discuss
Hi David,

Ah sorry - my mistake. Here's a new link with a tarball including the mxXML files.


Many thanks!
Alastair

Alastair Skeffington

unread,
Aug 26, 2020, 5:48:59 AM8/26/20
to spctools-discuss
Hi David,

Did you manage to grab the data before the link expired?

Thanks,
Alastair

David Shteynberg

unread,
Aug 26, 2020, 4:47:57 PM8/26/20
to spctools-discuss
Hello Alastair,

Unless there is a mistake, I think the N15 mass string should be:

 -mA72.03415R160.10111N116.04293D116.02694C161.0307E130.04259Q130.05858G58.02146H140.05891I114.08406L114.08406K130.09496M132.04049F148.06841P98.05276S88.03203T102.04768W188.07931Y164.06333V100.06841


See the following post:


However, I am not detecting correct results in your heavy 'H' search results.  I would start your troubleshooting there, beginning with the heavy mass of proline.

Sorry this has taken me some time to get to and keep me posted to your progress.

Cheers,
-David






To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/c46993a6-b014-4790-b6bd-5afa5810e550n%40googlegroups.com.

Alastair Skeffington

unread,
Aug 31, 2020, 8:25:46 AM8/31/20
to spctools-discuss
Hi David,

Thanks for pointing out the wrong mass - there were a couple of others as well. So I've got quite a lot further with the analysis, but I'm not convinced that it's working as it should. Maybe it's easiest if I detail the steps I've gone through and ask some questions as I go. It would be great if you could answer these / annotate with any other thoughts you have:

Step 1: Search using standard 14N labelled masses. The 15N search seems to be fairly useless (see later). No variable modifications in search, because otherwise ASAPRatio complains and fails to run in 'static' mode.

Step 2: InteractParser $intout $in

Step 3: PeptideProphetParser $intout DECOY=XXX_ NONPARAM

Note I get a lot of model failures for this data. I get less model failures when I use fully parametric modelling, but then the decoy search hits are not properly taken into account meaning that in the final prot.xml file I have single -protein protein groups consisting only of a decoy hit. So I continued with the semi-parametric options.

Step 3: ASAPRatioPeptideParser $in  -lACDEFGHIKLMNPQRSTVWY -r0.5 -S -mA72.04581R160.1359N116.0603D116.0356C161.0393E130.0513Q130.076G58.03016H140.085I114.0928L114.0928K130.1124M132.0492F148.0771P98.06146S88.04073T102.0564W188.0967Y164.072V100.0771

So the -m string specifies the modified masses (ie 15N masses) for comparison with the 14N search results. 

This doesn't work the other way round (taking the 15N search results and using the 14N masses in the -m string. This is because of cysteine carbamylation, meaning that ASAPRatio sees the C mass as being a 'heavy' static modification, and the other  residues as being 'light' modifications. This results in an error message that there are a mixture of heavy and light modifications).

Step 4: ProteinProphet $light $out ASAP_PROPHET

I get ratios for reasonable numbers of proteins - but unfortunately I'm not convinced these are reliable. To take one example:
> L/H ratio given as 2.98
> Based on peptide: CTTSAAATSTSSGR  
> Looking at the details in the viewer I see "light +2 m/z 679.3" and "heavy +2 m/z 679.8"
> There are 17 N atoms in the peptide, 16 with one neutral loss. m/z for the mass difference between light and heavy in the latter case will then be 8. So ASAPRatio should be looking for a mass of about 687.
> When I look in the 3D data viewer there are candidate peaks in this region.

I also tried xpress and got quantification that were quite different (L/H is 0.66 in this example). The trouble with the xpress output is that there doesn't seem to be a way to click through to the underlying data for inspection. In fact using Petunia I see no way of finding out what peptide or heavy peak identification the ratio is based on. Presumably this information is buried somewhere in the prot.xml / pep.xml files?

So my questions are:
1. Should -r be bigger to allow ASAPRatio to find the correct heavy peak?
2. Why does ASAPRatio accept a heavy peak that is so obviously the wrong mass given the static modifications? It seems to be completely ignoring them!
3. Ideally I would use the 15N search results to validate the identity of the heavy peak where possible. Is there a way to do this other than looking it up manually?

I've uploaded a pdf with the above example, as well as some data files to: https://we.tl/t-uaqpZ4gy7J 

Many thanks for the help!
Alastair

Alastair Skeffington

unread,
Sep 7, 2020, 7:07:48 AM9/7/20
to spctools-discuss

Further updates:

I haven't got any further with getting ASAPRatio to identify what I think are the correct mass differences between heavy and light.

The samples I am analysing are from a cellular fractionation experiment comparing two conditions where a particular structure is either present or absent. Thus if the data from the two conditions samples were hypothetically analysed separately I wouldn't necessarily expect a massively good alignment between runs. There may be a lot of proteins that are only identified in one condition. So my plan to analyse the data and be sure of my answers is now:

1. Do separate 15N adn 14N searches and run peptide prophet
2. Combine in iprophet to ensure that 14N peptides are not erroneously assigned to 15N spectra and vice versa. 
3. Run protein protphet
4. Run ASAPRatio to get peak areas for the 14N and 15N samples individually
4. Post process the protein prophet output using custom scripts to:
a) identify proteins that are only found with 14M PSMs or only 15N PSMs
b) For proteins with mixed IDs, identify peptides identified in bot ha 14N and 15N version. Use the ASAPRatio results to make a ratio for these peptides and subsequently calculate a protein ratio.

Does this sound sensible?

Thanks,
Alastair
Reply all
Reply to author
Forward
0 new messages