Aziz Alnakli
Master of Research Candidate (Bioinformatics Research Group)
Faculty of Science and Engineering | Level 1 Room 120, F7B Building (4 Wally's Walk)
Macquarie University, NSW 2109, Australia
Hi Aziz, there is no specific maximum size. In general, the larger your database, the greater the memory requirements and CPU time needed for the software tools and the lower your sensitivity, although it depends on the degree of redundancy in your database. Hundreds of MB and hundreds of thousands of protein sequences is routine.
--
--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/CANqhZV4npzoZMuEkNiivFHpcdpENUgzdJLyHb0rANiUFeksceA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/ff866da069b346e6b74a6c9f44a3d561%40mail.gmail.com.
Aziz Alnakli
Master of Research Candidate (Bioinformatics Research Group)
Faculty of Science and Engineering | Level 1 Room 120, F7B Building (4 Wally's Walk)
Macquarie University, NSW 2109, Australia
Hi Aziz, I can’t give an estimate because it depends on so many things. How many spectra? Fully tryptic, semi-trypic? How many PTMs? Are you using Comet? How many cores/threads does your machine have/did you let Comet use? Etc. If you use Comet, you can usually look at the output to see its progress. You should see something like:
Search start: 09/29/2020, 10:35:28 AM
- Load spectra: 15009
- Search progress: 29%
- Post analysis: done
- Load spectra: 15009
- Search progress: 54%
Etc.
To get a sense of how quickly progress is being made.
Also, check the RAM usage on your machine to make sure the computer isn’t swapping due to low memory conditions.
Eric
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/CANqhZV4DaJAuLp%3DVkWK%3DgHufegU_qPi1mZZhr05SN%2B6WsHc2pQ%40mail.gmail.com.
Hi Eric
I appreciate your help Eric so much, and Shoba (my supervisor) actually knows you and says hi.
I am doing the search with the X!Tandem pipeline instead of Comet, and maybe that is why I am unable to view the progress of the X!Tandem search.
I appreciate it if you comment on the screenshot of the job I am currently running. Is it looking healthy?
Also, You may look at the screenshot below to have an idea about the specs of the desktop I am using to perform the analysis.
And here is another question. Can I run multiple jobs at the same time? Does it affect how quickly the results are retrieved?
Thanks for your support.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/edca9eb6fec53f43e2ad7c9f7d869d30%40mail.gmail.com.
Aziz Alnakli
Master of Research Candidate (Bioinformatics Research Group)
Faculty of Science and Engineering | Level 1 Room 120, F7B Building (4 Wally's Walk)
Macquarie University, NSW 2109, Australia
I don’t recall now what the X!Tandem progress output enough to say what your screenshot shows. Maybe someone else knows.
Regarding the CPU usage, it looks like you have plenty of memory, so that’s good. But it does look like X!Tandem is not using all of the core on the machine. You could set the number of threads to use up to 8 on that machine.
The parameter is spectrum, threads:
https://www.thegpm.org/TANDEM/api/st.html
Of course, if you use all 8 threads, it might make the computer difficult to use interactively if this machine is used for interactive use.
You could also run another search in parallel, yes. But keep an eye on the memory, because if you run too many at once, you will run out of memory.
Regards,
Eric
From: spctools...@googlegroups.com <spctools...@googlegroups.com> On Behalf Of AZIZ ALNAKLI
Sent: Thursday, October 22, 2020 5:48 PM
To: spctools...@googlegroups.com
Subject: Re: [spctools-discuss] maximum fasta file size
Hi Eric
I appreciate your help Eric so much, and Shoba (my supervisor) actually knows you and says hi.
I am doing the search with the X!Tandem pipeline instead of Comet, and maybe that is why I am unable to view the progress of the X!Tandem search.
I appreciate it if you comment on the screenshot of the job I am currently running. Is it looking healthy?
Also, You may look at the screenshot below to have an idea about the specs of the desktop I am using to perform the analysis.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/CANqhZV7r-b9LbCrZzK%3DSXEwdd83sFu2%2B_m6M__SEkuaz12LUpQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/8c6381505c52cdc18d2064ab62429ed9%40mail.gmail.com.
Aziz Alnakli
Master of Research Candidate (Bioinformatics Research Group)
Faculty of Science and Engineering | Level 1 Room 120, F7B Building (4 Wally's Walk)
Macquarie University, NSW 2109, Australia
Hi, Comet does have a parameter for generating an internal decoy database:
http://comet-ms.sourceforge.net/parameters/parameters_201901/decoy_search.php
but I confess I’ve never used it, so I have no experience with that. Maybe someone else does.
I always create my own appended decoy database.
But if your database is enormous, there is good reason to avoid that.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/CANqhZV4t6D-0EGEry-yr4eoHVJ_4UQveWsQYU_bsCUYHW5XfzA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/0b15ebf9b6e3b243a049938fa5d07118%40mail.gmail.com.
Aziz Alnakli
Master of Research Candidate (Bioinformatics Research Group)
Faculty of Science and Engineering | Level 1 Room 120, F7B Building (4 Wally's Walk)
Macquarie University, NSW 2109, Australia
Hi Aziz, thanks for your questions. A few comments in green:
Can I process multiple mzML files together?
After you process all the mzML files, separately, you would use the TPP tools (PeptideProphet) to merge the results into a single results, yes, if that’s the question.
Would that affect the specificity of the results?
Generally merging the results of multiple runs will lead to better models and better specificity of results
I have 24 mzml files (~5 GB each)
I wonder if these files are centroided or profile mode? 5 GB seems like to profile mode? You may benefit from centroiding your data first? But it depends on what instrument you’re using and many other factors
which I need to run each for 3 hours 10 times, so this will take me around 270 hrs to analyse. Therefore, I thought of batch running them and I wanted to get some suggestions before proceeding.
I suggest optimizing your search on one file first before processing all. Lest you expend 270 hours of compute time only to find a problem. Make sure you’re using all available cores on your machine. Maybe you can spread the task among multiple machines. Either local computers or cloud computing servers, etc.
I hope you find these answers helpful.
Regards,
Eric
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/CANqhZV4QKV-A_DOsq1OcE5H4Bt9Jw4pymGv3NgfiVUuys60GHg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/CAJqD6EOZ%3D%2B2f392WDaXmQF0dadQWU4Gz6ZAuuy0eY1j%3DxiFegw%40mail.gmail.com.