Output sequence coverage info

24 views
Skip to first unread message

Yasir Ahmed

unread,
Mar 28, 2023, 5:03:14 PM3/28/23
to spctools-discuss
Greetings Friends,

Is there a command line tool within the TPP that can output sequence residue coverage percentages (Obs/Tot)? I can get those for individual proteins using the ProtXMLViewer GUI (see attached image), but would like to do this for every protein in my database.

Cheers,
Yasir

Screen Shot 2023-03-28 at 4.29.18 PM.png

Yasir Ahmed

unread,
Mar 28, 2023, 5:23:06 PM3/28/23
to spctools-discuss
Never mind, got it with ProtXMLViewer.pl (duh). 

Yasir Ahmed

unread,
Mar 28, 2023, 5:50:53 PM3/28/23
to spctools-discuss
I guess I have another question about coverage: using ProtXMLViewer (which is similar to the output from ProteinProphet), the coverage values are almost always zero. For example, the example I show above has Obs/Dig of 82%, but is showing percent coverage to be zero in the ProteinProphet output. Any idea why? And can one get those Obs/Dig values?

Luis Mendoza

unread,
Mar 28, 2023, 6:46:58 PM3/28/23
to spctools...@googlegroups.com
Hello Yasir,

We have recently identified a bug in ProteinProphet that mis-reports the coverage as zero for all proteins.  This affects TPP versions 6.1.0 and 6.2.0.  We will be releasing an update soon that corrects this and other bugs.

Even then, the value that is reported in protXML is the ratio of observed amino acids to total.  There is no simple way to get the coverage based on the digestible portion of the protein, though we could think about adding that as a feature.

Let me know if you would like to get a preview version of just ProteinProphet in advance of our release and I can post that separately, just specify if you use Linux or Windows.  Alternatively, if you have a pre-6.1.0 version of TPP installed, you can run ProteinProphet and get the coverage values reported.

Cheers,
--Luis


--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/bcc98ae1-0f89-4472-9a66-f0eeed496d5dn%40googlegroups.com.

Yasir Ahmed

unread,
Mar 28, 2023, 6:59:42 PM3/28/23
to spctools-discuss
Thanks Luis, that's helpful. For now I'll just use an older version of TPP to get the coverage values.

Cheers,
Yasir

Luis Mendoza

unread,
Mar 28, 2023, 9:57:59 PM3/28/23
to spctools...@googlegroups.com
Sure thing.

And to answer your original question: there is a command-line tool in TPP called batchcoverage (look in the bin/ directory) that calculates this residue coverage -- and is the one used by ProteinProphet to populate that attribute in protXML.

It needs an input file of the form:
>PROTEIN_1
PEPTIDE
ANOTHERONE
ANDTHISONE
>PROTEIN_2
ELVIS
WASHERE
>PROTEIN_3
...

And you run it by providing a reference fasta file location and output file name, like this:
batchcoverage <database> <inputfile> <outputfile>

The output file will contain a 2-column list of <protein> <coverage> values.

Hope this helps, and stay tuned for updates!
--Luis



Reply all
Reply to author
Forward
0 new messages