Hi Everyone,
I am trying to run xinteract with the -D option so that RefreshParser can use the fasta database to find all proteins corresponding to identified peptides.
This is the command I am using:
xinteract -OARPwd -dDECOY_ -PPM -nR -D/scratch/venkatramanv/iphronesis/reference_files/Databases/UP_Human_Rev_Can+Iso_20190410_DECOY.fasta -N/scratch/venkatramanv/iphronesis/platform_workspace/jobs/SJOB2122/peptideprophet/2122_PDAY_DDA_Isoforms.comet.interact.pep.xml *.comet.pep.xml
Despite using the -D database option, i noticed that some peptides in the interact.pep.xml are matched uniquely to a specific isoform (even though that peptide is not proteotypic or unique to any isoform).
For example:
<spectrum_query spectrum="XL_150506_IDA_PDAY_F8_proteinpilot.08799.08799.2" start_scan="8799" end_scan="8799" precursor_neutral_mass="1096.592165" assumed_charge="2" index="20868" retention_time_sec="2254.3">
<search_result>
<search_hit hit_rank="1" peptide="VVEHPEFLK" peptide_prev_aa="M" peptide_next_aa="A" protein="sp|P06396-2|GELS_HUMAN" num_tot_proteins="1" num_matched_ions="11" tot_num_ions="16" calc_neutral_pep_mass="1096.591696" massdiff="0.000470" num_tol_term="1" num_missed_cleavages="0" num_matched_peptides="6184">
<search_score name="xcorr" value="2.205"/>
<search_score name="deltacn" value="0.575"/>
<search_score name="deltacnstar" value="0.000"/>
<search_score name="spscore" value="350.4"/>
<search_score name="sprank" value="1"/>
<search_score name="expect" value="4.94E-03"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="1.0000" all_ntt_prob="(0.0000,1.0000,1.0000)">
<search_score_summary>
<parameter name="fval" value="5.5754"/>
<parameter name="ntt" value="1"/>
<parameter name="nmc" value="0"/>
<parameter name="massd" value="0.429"/>
<parameter name="isomassd" value="0"/>
<parameter name="RT" value="1372.21"/>
<parameter name="RT_score" value="0.02"/>
</search_score_summary>
</peptideprophet_result>
</analysis_result>
</search_hit>
</search_result>
</spectrum_query>
See attached the sequence alignment of all isoforms of GELS_HUMAN, you can see that this peptide matches the canonical form as well as all isoforms hence the num_tot_proteins should be = "4" and protein should be ="sp|P06396|GELS_HUMAN" with <alternative_protein protein="sp|P06396-2|GELS_HUMAN"/>,
<alternative_protein protein="sp|P06396-3|GELS_HUMAN"/> , <alternative_protein protein="sp|P06396-4|GELS_HUMAN"/>
I am happy to share all the comet.pep.xml & interact.pep.xml files along with the fasta database via Box folder.
Can someone please help me understand if this is expected behavior? if so, how i can fix this?
Regards
Vidya