Error in fitted models for MSgf+ search

74 views
Skip to first unread message

Asif Ahmed

unread,
Feb 9, 2021, 8:00:25 AM2/9/21
to spctools-discuss
Hi,

I ran a MSgf+ search (with decoy) for my samples (a mixture of N14 and N15 proteins, want to detect N14 proteins in the sample) converted resulted .mzid file to pepXML using IDconvert, changed the paths using "update path" and run Peptideprophet (Use accurate mass binning, using PPM ,  Use decoy hits to pin down the negative distribution ,  Decoy Protein names begin with: DECOY_,  Use Non-parametric model and report decoy hits)  and iprophet and protein prophet combined (as default settings) in Petunia. 

The run went well without any error, but the models of "Learned NSP distribution" as well as others are showing some abnormality. Can you advise me, based on the fitted models, can I can accept the result at 0.99 to 1 probability?

ASIF 

msgf1.PNG

msgf2.PNG

David Shteynberg

unread,
Feb 9, 2021, 1:43:35 PM2/9/21
to spctools-discuss
Hello Asif,

Unfortunately this analysis tells me that the DECOY-estimated FDR (error rate) is about 50%-60% amongst the highest scoring proteins in this analysis.  I don't believe these are "acceptable" results.  The problem is likely somewhere upstream of the ProteinProphet analysis, I cannot exactly tell without seeing more of the dataset. 

Best,
-David 

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/0649dd4c-223f-4928-94f2-a17e3c7f0e76n%40googlegroups.com.

Asif Ahmed

unread,
Feb 9, 2021, 4:29:50 PM2/9/21
to spctools...@googlegroups.com
Hi David, 

Thanks for your reply and appreciate  your interpretation. 

How can i share the dataset with you? I assume you might need the mzXML file (~1.2gb), mzid file (/pepxml file) and the database file?

Asif

You received this message because you are subscribed to a topic in the Google Groups "spctools-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spctools-discuss/tYk3ZszMyzM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/CAGJJY%3D8DaS089_k5x6xJtQOk1U19_JMaxdASgDiSs7crdT9_ew%40mail.gmail.com.
--
Khandaker Asif Ahmed
PhD Student| Applied BioSciences, Macquarie University, NSW, Australia
Postgraduate Research Student| CSIRO Land and Water Flagship, Black Mountain, ACT, Australia
Address: Room No: S1.03, Building No: 101, CSIRO Clunies Ross Street, Black Mountain, ACT 2601, Australia
Mobile: (+61)0434018803Skype ID: kh.asifratul

David Shteynberg

unread,
Feb 9, 2021, 4:48:19 PM2/9/21
to spctools-discuss
You can compress the directory and post your dataset in the cloud and I will pull it down.   Perhaps you can start with your search parameters.  N15 labelling creates mass-shifts on every amino acid, how are you setting these?  What PeptideProphet options are you using?  Any unusual options you are setting to get this data to process?

Thanks,
-David


Asif Ahmed

unread,
Feb 9, 2021, 6:27:42 PM2/9/21
to spctools-discuss
Hi David,

In my sample- N15 fed (heavy) female fly mated with N14 (normal) male fly, after mating, I dissected the female reproductive organs and process the sample using S-trap kit. 
So, "theoretically" in the mated female reproductive tract proteome, there would be plenty of female proteins (which would be N15) and a tiny amount of male proteins (N14 proteins).
Our aim is to identify male-originated proteins from the samples and for now, I just focused on normal search rather than N15 labelling search.
The protocol worked well for PD's SequestHT, Comet and Tandem search giving ~150 hits, and now trying to add MsGf+ in the analysis.w
For the database, we are using trinity assembly of male reproductive organ RNAseq, made a 6 frame translation of the assembly and add decoys (with prefix of DECOY_ ).

For peptide prophet in petunia, I used the following parameters, as shown in the tutorial (not any unusual settings at all).

Use accurate mass binning, using PPM,  
Use decoy hits to pin down the negative distribution. Decoy Protein names begin with: DECOY_,  
Use Non-parametric model and 
Report decoy hits with a computed probability

Please find the datasets containing:
Thermo RAW data file (201010_P31528_Sample_S1_QE_HFX_HpH_7_5p2.raw), 
mzML file (201010_P31528_Sample_S1_QE_HFX_HpH_7_5p2.mzML), 
mzid file with conf (asiftest_instrument2_S1.mzid and MSGFPlus_conf.txt) , 
Trinity database fasta file (with decoy), and resulted file from peptide prophet (MsGF_inst1_interact.ipro.pep.xml).


ASIF

David Shteynberg

unread,
Feb 9, 2021, 6:55:02 PM2/9/21
to spctools-discuss
It appears you forgot to include the _DECOY version of the database.  Can you check?

Asif Ahmed

unread,
Feb 9, 2021, 7:26:41 PM2/9/21
to spctools-discuss

David Shteynberg

unread,
Feb 10, 2021, 1:55:21 PM2/10/21
to spctools-discuss
Hi Asif,

I am not sure what happened but the decoys in your search are tagged XXX_ in the search and DECOY_ in the database. 


<search_hit hit_rank="1" peptide="GGGGGGGGGGGWGWVGGWGRGGGGER" peptide_prev_aa="-" peptide_next_aa="K" protein="XXX_ORF1_TRINITY_DN327608_c2_g1_i1:114:212_UNMAPPED" num_tot_proteins="0" calc_neutral_pep_mass="2241.979927335605" massdiff="0.033447265625" num_tol_term="2" num_missed_cleavages="1" protein_descr="originally identified as XXX_ORF1_TRINITY_DN327608_c2_g1_i1:114:212 in database e:/TPP_DataFolder/dbase/Trinity_TEA_ORFfinder_Proteins_1oct2020_DECOY.fasta">


 I think you have to first, carefully check your search parameters, that they are compatible with N15 labeling, and second, verify you are using the correct database in your search, if you plan to use DECOY_ in the TPP analysis the search algorithm should not "know" about them (they should be hidden from the search algorithm because they will be used to validate its performance.)

Cheers,
-David

Asif Ahmed

unread,
Feb 12, 2021, 11:01:17 PM2/12/21
to spctools-discuss
Hi David, 

Sorry for the late reply. It took some time to re-run the files. 
Agree with you, there was a problem with my "Decoy_" search.

For decoy-based MSGf+ search, I followed the step described at MSgf Github page (https://github.com/MSGFPlus/msgfplus/issues/98) and re-run the search. After running and converting into pepXML (using msconvert) and renaming it ".pep.xml", I combined different fractions of the same sample together (before running to Mass Spec, I fractioned my samples in 8 different fractions) in i-prophet. and subsequently, run in peptide prophet and protein prophet. Now the models look good to me and planning to filter the hits from 0.9-1 probability range.

Thanks for your support. Just a question - 

In one of my tandem search in TPP, I did not get any protein hits (but following same param, with different reps is giving hits), but got ~100 hits in the separate comet and MsGf+ search for the same sample. Is it normal? 

ASIF
msgf.PNG

Asif Ahmed

unread,
Feb 12, 2021, 11:04:04 PM2/12/21
to spctools-discuss
I combined fractions on peptide-prophet stage, and subsequently run i-prophet and protein prophet.
sorry for the writing mistake.

ASIF
Reply all
Reply to author
Forward
0 new messages