Error 396514 - is it a parse rule problem?

rhodea

unread,

Apr 20, 2010, 5:21:26 AM4/20/10

to PEAKS_forum

Dear friends,

When I do protein ID search with a homemade protein database, it
produces the following problems:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start doing protein database search...
Filtering protein database using spectra 3329 - 3828
Filtering protein database using spectra 3829 - 4328
Filtering protein database using spectra 4329 - 4828
Filtering protein database using spectra 4829 - 5328
Filtering protein database using spectra 5329 - 5828
Filtering protein database using spectra 5829 - 6328
Filtering protein database using spectra 6329 - 6828
Filtering protein database using spectra 6829 - 7009
Doing protein ID using spectra 3329 - 5169
Doing protein ID using spectra 5170 - 7009
Writing results...
finish updating 2011 promatch time = 0.31299999356269836 seconds
finish updating pepmMatch time = 0.765999972820282 seconds
Protein database search done.
Start cleaning up internal results...
Cleaning done.
Start doing protein database search...
Filtering protein database using spectra 3329 - 3828
Filtering protein database using spectra 3829 - 4328
Error: -396514!
batchRawSearch task fails.
Error: -396514!
batchRawSearch task fails.
Filtering protein database using spectra 4329 - 4828
Filtering protein database using spectra 4829 - 5328
Error: -396514!
Error: -396514!
batchRawSearch task fails.
batchRawSearch task fails.
Filtering protein database using spectra 5329 - 5828
Filtering protein database using spectra 5829 - 6328
Error: -396514!
batchRawSearch task fails.
Error: -396514!
batchRawSearch task fails.
Filtering protein database using spectra 6329 - 6828
Filtering protein database using spectra 6829 - 7009
Error: -396514!
batchRawSearch task fails.
Error: -396514!
batchRawSearch task fails.
generateProteinCandidate task fails.
Doing protein ID using spectra 3329 - 5169
Doing protein ID using spectra 5170 - 7009
batchSearch task fails.
batchSearch task fails.
SummarizeDBSearch task fails.
Protein database search fails.
Start cleaning up internal results...
Cleaning done.
++++++++++++++++++++++++++++++++++++++++++++

I am not sure whether it is a parse rule problem, because when I
search with ncbi or uniprot database, it can work, but with this
special database, it fails. My database is compiled like
+++++++++++++++++++++++++++++++++++++++++++++
>fig|4442701.3.peg.44762
RTENMELKTTVDKSFLELFYKEIECDIENTNSKLNLHVLRIENNKFCYHELIQKLYNCFI
TYSLSRVEVQTYVDKGRWGELYTKAASKFRNFDENDGEAGELLLYCFLESHLNAPKILTK
LEIKLSSNDYAKGSDGIHLLKIKDGEYQLIFGESKLDKKLTTSISEAFKSIHEFVTRDKN
NVTDEIGLINSQLFKEAFD
+++++++++++++++++++++++++++++++++++++++++++++

Then, how to set the parse rule? or is it because of other reasons?

--
You received this message because you are subscribed to the Google Groups "PEAKS_forum" group.
To post to this group, send email to peaks...@googlegroups.com.
To unsubscribe from this group, send email to peaks_forum...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/peaks_forum?hl=en.

Dan

unread,

Apr 22, 2010, 11:27:14 AM4/22/10

to PEAKS_forum

Hello rhodea,

I think the problem you're facing is related to the parse rules. It
looks like your database is set up somewhat like a Swissprot
database. In the database configuration page try entering the
following :

rule to parse acccession/id from the FASTA title: >(fig\|)*\(\S*\)
rule to parse description from the FASTA title: >(fig\|)*\(\S*\)

The reason why I suggest parsing both the description and the
accession id in the same way is because the normal rules look for a
space before the description. Another option would be to add a space
before the description and keep the old description parse rule: \s+\(.*
\)

I hope this helps. If you still have trouble feel free to contact me
at d...@bioinfor.com

Kind Regards,
Dan Maloney

吕凡

unread,

May 19, 2010, 9:32:43 AM5/19/10

to peaks...@googlegroups.com

Dear Dan Maloney,

Thank you for your kind answer. When I try protein ID with your suggested parse rule, it works only when I extract part of the database, and fails when the whole database is used. So I am afraid whether it could be a problem of database. I download it from this MG-RAST website:

http://metagenomics.nmpdr.org/metagenomics.cgi?page=DownloadFile&job=2624&file=4442701.3.protein.fa.gz

I have try the same de novo dataset using other databases, it worked. And I also try other datasets with the same MG-RAST database, it failed. And I try this database using XTandem algorithem, it works. So where do you thing the problem lie in?

I am looking forward for you helpful suggestions.

Sincerely,

Rhodea

2010/4/22 Dan <d...@bioinfor.com>

吕凡

unread,

May 19, 2010, 9:55:30 AM5/19/10

to peaks...@googlegroups.com

I find that when the protein sequence contains "*", e.g. the following, then the protein ID will fail. Then how to do with it?

>fig|MM46789

PESITQGKLITALKQMGFDKVFDANFFEDISISEEIDELLYRIKNSGTLPMVSGCSPGLC

KFVENGFPDLKNHLTARGSSGKIFGALAKENCVSVSFMPCIAKKFETRVLENGSSPNVDF

SLTPRELAQMIRAAGITFDNLPESPFDTLSIPIQRGTQFDTSLINIHEEEAPKGGQKGIT

ERVLNIQGTNVKVMRVRGLSNARNVLESIRNGKCDADLIKIMSCPGGCMLRL*TPRQPSA

RNRA*R

2010/5/19 吕凡 <rho...@gmail.com>

Dan

unread,

May 19, 2010, 4:59:48 PM5/19/10

to PEAKS_forum

Dear Rhodea,

I have tested that database and we don't recognize the '*' symbol.
What does this represent in the database?

吕凡

unread,

May 19, 2010, 5:08:19 PM5/19/10

to peaks_forum

I think it means the translation termination and stop (http://www.ebi.ac.uk/Tools/emboss/transeq/help.html), because this database is originally created from metagenome data. It also appears sometimes when I translate an EST (expressed sequence tag) DNA database to protein.

2010/5/19 Dan <d...@bioinfor.com>

Dan

unread,

May 20, 2010, 10:04:16 AM5/20/10

to PEAKS_forum

Dear Rhodea,

That symbol is what is causing the database not to be searchable. One
option I tested is to open the database in notepad and use the Edit ->
Replace tool to delete the '*' symbols. I tested this on a small
subset. It may take a long time to delete every instance of this
symbol. I will mention this symbol to the development team.

吕凡

unread,

May 20, 2010, 10:17:22 AM5/20/10

to peaks...@googlegroups.com

thank you!

2010/5/20 Dan <d...@bioinfor.com>

Reply all

Reply to author

Forward