RefreshParser within xinteract unable to find all proteins corresponding to identified peptides

25 views
Skip to first unread message

Vidya Venkatraman

unread,
Jun 5, 2019, 2:15:18 PM6/5/19
to spctools-discuss
Hi Everyone,

I am trying to run xinteract with the -D option so that RefreshParser can use the fasta database to find all proteins corresponding to identified peptides.

This is the command I am using:
xinteract -OARPwd -dDECOY_ -PPM -nR -D/scratch/venkatramanv/iphronesis/reference_files/Databases/UP_Human_Rev_Can+Iso_20190410_DECOY.fasta -N/scratch/venkatramanv/iphronesis/platform_workspace/jobs/SJOB2122/peptideprophet/2122_PDAY_DDA_Isoforms.comet.interact.pep.xml *.comet.pep.xml

Despite using the -D database option, i noticed that some peptides in the interact.pep.xml are matched uniquely to a specific isoform (even though that peptide is not proteotypic or unique to any isoform). 

For example:

<spectrum_query spectrum="XL_150506_IDA_PDAY_F8_proteinpilot.08799.08799.2" start_scan="8799" end_scan="8799" precursor_neutral_mass="1096.592165" assumed_charge="2" index="20868" retention_time_sec="2254.3">
<search_result>
<search_hit hit_rank="1" peptide="VVEHPEFLK" peptide_prev_aa="M" peptide_next_aa="A" protein="sp|P06396-2|GELS_HUMAN" num_tot_proteins="1" num_matched_ions="11" tot_num_ions="16" calc_neutral_pep_mass="1096.591696" massdiff="0.000470" num_tol_term="1" num_missed_cleavages="0" num_matched_peptides="6184">
<search_score name="xcorr" value="2.205"/>
<search_score name="deltacn" value="0.575"/>
<search_score name="deltacnstar" value="0.000"/>
<search_score name="spscore" value="350.4"/>
<search_score name="sprank" value="1"/>
<search_score name="expect" value="4.94E-03"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="1.0000" all_ntt_prob="(0.0000,1.0000,1.0000)">
<search_score_summary>
<parameter name="fval" value="5.5754"/>
<parameter name="ntt" value="1"/>
<parameter name="nmc" value="0"/>
<parameter name="massd" value="0.429"/>
<parameter name="isomassd" value="0"/>
<parameter name="RT" value="1372.21"/>
<parameter name="RT_score" value="0.02"/>
</search_score_summary>
</peptideprophet_result>
</analysis_result>
</search_hit>
</search_result>
</spectrum_query>

See attached the sequence alignment of all isoforms of GELS_HUMAN, you can see that this peptide matches the canonical form as well as all isoforms hence the  num_tot_proteins should be = "4" and  protein should be ="sp|P06396|GELS_HUMAN" with <alternative_protein protein="sp|P06396-2|GELS_HUMAN"/>,
 <alternative_protein protein="sp|P06396-3|GELS_HUMAN"/> , <alternative_protein protein="sp|P06396-4|GELS_HUMAN"/>

I am happy to share all the comet.pep.xml & interact.pep.xml files along with the fasta database via Box folder. 

Can someone please help me understand if this is expected behavior? if so, how i can fix this?

Regards
Vidya

VVEHPEFLK.JPG

Jimmy Eng

unread,
Jun 5, 2019, 2:27:54 PM6/5/19
to spctools...@googlegroups.com
Without more info, here's my educated guess at an explanation:  that peptide is not a fully tryptic peptide in the proteins you list as it is preceded by a methionine residue.  For isoform P06396-2, the preceding methionine is the first residue of the sequence and I'm assuming you did a Comet search with the "clip_nterm_methionine" parameter set to "1".  With that start methionine removed, this allowed that peptide to be fully tryptic in that particular isoform hence only that isoform is listed as a matched protein.  The enzyme context does matter with RefreshParser; it's simply not a peptide string match.

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To post to this group, send email to spctools...@googlegroups.com.
Visit this group at https://groups.google.com/group/spctools-discuss.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/6b292366-55e1-4fe2-9994-ea9fff3d97a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vidya Venkatraman

unread,
Jun 7, 2019, 5:10:33 PM6/7/19
to spctools-discuss
Thanks Jimmy for your prompt response. 

I do see that for isoform P06396-2, the preceding methionine is the first residue of the sequence and I used both comet & tandem for the database search. 

For Comet search, I did have the "clip_nterm_methionine" parameter set to "1" so you explanation makes sense. 

But for Tandem search, I am not sure if Tandem has an equivalent parameter. But i can see that the tandem peptideprophet pepxml also identifies this peptide as unique to isoform P06396-2.

<spectrum_query spectrum="XL_150506_IDA_PDAY_F7_proteinpilot.09460.09460.2" start_scan="9460" end_scan="9460" precursor_neutral_mass="1096.5930" assumed_charge="2" index="28557" retention_time_sec="2248.41">
<search_result>
<search_hit hit_rank="1" peptide="VVEHPEFLK" peptide_prev_aa="M" peptide_next_aa="A" protein="sp|P06396-2|GELS_HUMAN" protein_descr="Isoform 2 of Gelsolin OS=Homo sapiens OX=9606 GN=GSN" num_tot_proteins="1" num_matched_ions="17" tot_num_ions="16" calc_neutral_pep_mass="1096.5917" massdiff="0.001" num_tol_term="2" num_missed_cleavages="0" is_rejected="0">

Do you know if there is a parameter similar to 'clip_nterm_methionine' in tandem as well? and
Does RefreshParser have any parameters to control this behavior?

Many Thanks
Vidya








You received this message because you are subscribed to a topic in the Google Groups "spctools-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spctools-discuss/Iq4P_h0DizM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spctools-discu...@googlegroups.com.

To post to this group, send email to spctools...@googlegroups.com.
Visit this group at https://groups.google.com/group/spctools-discuss.

Jimmy Eng

unread,
Jun 7, 2019, 5:19:22 PM6/7/19
to spctools...@googlegroups.com
I believe this X!Tandem parameter affects the initiator methionine cleavage:  https://www.thegpm.org/TANDEM/api/pqa.html  
And I'm not aware of any RefreshParser parameters that control this behavior; I am using an older version (5.0.0) of the TPP though.

Vidya Venkatraman

unread,
Jun 7, 2019, 5:59:08 PM6/7/19
to spctools-discuss
Thats interesting. I went back to check my Xtandem parameters & this parameter is set to 'no' so it seems there is some other native logic within Xtandem that controls this behavior.

<note type="input" label="protein, quick acetyl">no</note>

Regards
Vidya

Reply all
Reply to author
Forward
0 new messages