spectral-counts fails to process percolator output

30 views
Skip to first unread message

Daryl W-M

unread,
May 6, 2021, 7:51:58 PM5/6/21
to crux-users
Hello. Thank you for the new 4.0 update. I'm using the latest build and trying to use spectral-counts to process percolator output, but it seems that spectral-counts is finding in the percolator target psms file a float where it expects to find the sequence string.

./crux-4.0.Linux.x86_64/bin/crux spectral-counts --overwrite T --output-dir /media/big-ssd/experiments/P3830/spectral-counts-output --protein-database ~/otf-peak-detect/fasta/ups1-ups2-yeast.fasta --verbosity 30 --fileroot P3830 /media/big-ssd/experiments/P3830/percolator-output-pasef-recalibrated/P3830.percolator.target.psms.txt

WARNING: The output directory '/media/big-ssd/experiments/P3830/spectral-counts-output' already exists.

Existing files will be overwritten.

WARNING: The file '/media/big-ssd/experiments/P3830/spectral-counts-output/P3830.spectral-counts.log.txt' already exists and will be overwritten.

INFO: CPU: deep-thought-home

INFO: Crux version: 4.0-fbfabf9-2021-04-06

INFO: Fri 7 May 09:43:48 AEST 2021

INFO: Beginning spectral-counts.

INFO: Reached protein 1000

INFO: Reached protein 2000

INFO: Reached protein 3000

INFO: Reached protein 4000

INFO: Reached protein 5000

INFO: Reached protein 6000

INFO: Total proteins found: 6097

FATAL: An exception occurred: Could not convert string '611.078'

The number seems to be the total matches/spectrum column of the psms file.

Is there a command I'm missing?

Regards,
Daryl.

Rita Chupalov

unread,
May 7, 2021, 5:06:42 PM5/7/21
to Daryl W-M, crux-users
Hi Daryl,
sorry about the problem. Could you share your input files: the FASTA and the Percolator output so that I can reproduce the problem?
Thank you,
Rita

--
You received this message because you are subscribed to the Google Groups "crux-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crux-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/crux-users/000d6fd2-88b1-4c35-b991-71b073849708n%40googlegroups.com.


--
Rita Chupalov

Software Developer
at UW Genome Sciences Department

Daryl W-M

unread,
May 7, 2021, 7:35:10 PM5/7/21
to crux-users
Hello Rita; thanks for your reply. The FASTA and Percolator target psms file are attached. This is the first time I've tried spectral-counts so it could well be something I've done wrong. Thanks for looking into it.

Regards,
Daryl.
ups1-ups2-yeast.fasta.zip
P3830.percolator.target.psms.txt.zip

Rita Chupalov

unread,
May 19, 2021, 5:12:03 PM5/19/21
to Daryl W-M, crux-users
  Daryl,
it looks like "matches per spectrum" column in your matches file contains a single float number: 611.078. It doesn't look right. It should be integer and it probably shouldn't be the same for all peptides. Could you provide the command line you use to generate this file? 
Sorry for the late reply.
Rita

On Fri, May 7, 2021 at 4:35 PM Daryl W-M <daryl...@gmail.com> wrote:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

            Warning text added to this message by
                  UW Information Technology
                        he...@uw.edu

  ATTACHMENTS RENAMED

  This message came to the UW with an attached file with a
  name that ended in .zip or .exe.  Because files of this
  type can automatically infect computers with a virus, the
  attachment has been renamed.

  o If the sender of the message is known to you, and you
    were expecting the message, you need simply save the
    attachment using the original name or save it as is and
    rename it back to the original name on your computer.

  o If the sender is not known to you, it is possible that
    the attachment contains a virus and you may simply
    delete the message.

  o If this message claims to be official and
    instructs you to open the attachment to get
    important information, it is likely to be fake.
    Virus writers are increasingly using sophisticated
    social engineering techniques to mislead people.

  UW-IT never sends important information about your
  account or password in an email attachment.  Instead
  you will be directed to a web page on a UW-IT site.

  If you have further questions, please contact your
  local computing support or
    UW Information Technology
    email: he...@uw.edu
    phone: 206.221.5000

           Warning text added to this message by
                 UW Information Technology
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Rita Chupalov

unread,
May 20, 2021, 2:00:31 AM5/20/21
to Daryl Wilding-McBride, crux-users
Thank you, Daryl,
I would also need the .fasta and the .params files to reproduce the issue.
Rita

On Wed, May 19, 2021 at 8:11 PM Daryl Wilding-McBride <daryl...@gmail.com> wrote:
Hi Rita,

Here's the percolator command I use to process comet output (actually for a different experiment but the issue is the same):
crux percolator --overwrite T --subset-max-train 1000000 --klammer F --maxiter 10 --output-dir ./P3856/percolator-output-pasef-recalibrated --picked-protein ./fasta/Human_Yeast_Ecoli.fasta --protein T --protein-enzyme trypsin --search-input auto --verbosity 30 --fileroot P3856 ./comet-output-pasef-recalibrated/P3856_YHE211_1_Slot1-1_1_5104.comet.pin

The comet command I use is this:
crux comet --parameter-file ./comet/TimsTOF-recalibration.params --output-dir ./comet-output-pasef-recalibrated --fileroot "P3856_YHE211_1_Slot1-1_1_5104" ./exp-P3856-run-P3856_YHE211_1_Slot1-1_1_5104-features-pasef-recalibrated.mgf ./fasta/Human_Yeast_Ecoli.fasta

Attached is the MGF.
Thanks,
Daryl.

Rita Chupalov

unread,
Jun 22, 2021, 5:43:24 PM6/22/21
to Daryl Wilding-McBride, crux-users
Hi Daryl,
I experimented with your data a bit and it looks like the culprit here is percolator's --subset-max-training option you are using. If removed, the spectral matches per protein column becomes correct (integer-valued) and spectral-counts runs normally. Though you probably need to use Human_Yeast_Ecoli.fasta because the other fasta file doesn't have all the proteins.
The comet output is not too big and percolator runs reasonably fast without limiting the training set size even on my modest development machine. 
I will dig into it a bit more to find why it is happening, but at least you have a workaround for now.

Rita

Daryl Wilding-McBride

unread,
Jun 22, 2021, 8:52:19 PM6/22/21
to Rita Chupalov, crux-users
Hi Rita. Thank you for the workaround; much appreciated.

Reply all
Reply to author
Forward
0 new messages