GLAM2SCAN find no hits

29 views
Skip to first unread message

Assa Yeroslaviz

unread,
Oct 4, 2018, 4:16:06 PM10/4/18
to MEME Suite Q&A
Hi,
I have a question concerning the GLAM2SCAN results. 
I am trying to run a motif search against the human genome with a regular expression for motif i calculated using GLAM2. I have used 18 different possible pattern to calculate the motif, out of 14 of the patterns, a regular expression was calculated.

I then tried to scan the alignment using GLAM2SCAN. When I run the analysis against human genome from the Ensembl genomes and proteomes (version 92), I get no hits at all. 
But when I run the analysis against the human genome from the Ensembl Ab Initio Predicted Proteins DB, I get multiple hits. 
The problem with the Ab Initio DB is that I can connect the GENSCAN* IDs to any protein, so they are not really helpful to me.

I was wondering how this can be - What is the big difference between the Ab Initio DB and the "standard" Ensembl Proteome, that I can't find any hits in the later?

Am I doing something wrong with the analysis?

Thanks
Assa

cegrant

unread,
Oct 5, 2018, 4:19:26 PM10/5/18
to meme-...@googlegroups.com
The Ensemble Ab Initio Predicted Proteins DB contains proteins that have been predicted from the raw DNA sequence using the tool Genscan. The proteins in the standard Ensembl Proteome have been experimentally verified. The missing proteins may be Genscan false positives or they may simply have yet to be experimentally verified.

The steps needed to link the GENSCAN IDs to a genomic region that can be checked for verified proteins are described in this earlier answer:


Assa Yeroslaviz

unread,
Oct 8, 2018, 8:58:02 AM10/8/18
to MEME Suite Q&A
Hi,

Now there is a new version for Human, version 94. Here I also don't find any results when comparing my data with the human proteome, which is very strange, as I do find results when comparing it to the mouse proteome.
Just as an example I took one protein I have found in my search against mouse - Celsr2 (ENSMUSP00000088046)
When looking at the sequence (fastA) of this protein from the two organisms, I can see the exact sequence. I still think there is some kind of an error within the search. The two protein sequences are very similar in sequences.

Here is a link to the sequence of human  and of mouse Celsr2. Both have the exact same motif i am searching for:

Screenshot 2018-10-08 14.55.11.png

So how can it be, that it is not found in the search?

I would appreciate your help.

thanks in advance

Assa

Assa Yeroslaviz

unread,
Oct 8, 2018, 9:06:31 AM10/8/18
to MEME Suite Q&A
Just to add to the answer above. With version 94 I also don't find any hits in the ab-Initio database when searching against human proteome. But i have had found some when used the older version (93). What have had changed?

cegrant

unread,
Oct 8, 2018, 10:38:50 AM10/8/18
to MEME Suite Q&A
I'm afraid I'm confused. You seem to be asking two different questions. One question seems to be why you find matches in the ab initio  protein database, but not in the Ensembl proteome databases. As I explained, comparing results between a reference proteome database and the ab initio database is problematic. The former only contains experimentally validated proteins, while the latter contains "candidate proteins" identified only by software. Many "proteins" in the ab initio database will turn out to be false positives, and the program mayl miss some proteins that already have experimental verification.

In your other question you seem to be asking why a motif you think should match in two sequences in two different databases is only found in one. To have a shot at answering that I'd need to have a copy of the motif that you used to scan with, and any of the advanced options you set.

With version 94 I also don't find any hits in the ab-Initio database when searching against human proteome. But i have had found some when used the older version (93). What have had changed?

Note that we don't create these databases. We simply make them available on our website for the convenience of users. You'd have to go to the Ensembl website to see what changes there have been in the two editions.  

Assa Yeroslaviz

unread,
Oct 9, 2018, 4:22:19 AM10/9/18
to MEME Suite Q&A
sorry about the confusion. 

What you called my first question, is not important any more. I understood what you mentioned in the answers above. I just found it a bit wired that the new version (Nr. 94) is so different, that no results were found.

What I do want to better understand is how it is possible, that the alomst exact same sequence from ouse and human were found only in mouse and not in human queries.
 
In your other question you seem to be asking why a motif you think should match in two sequences in two different databases is only found in one. To have a shot at answering that I'd need to have a copy of the motif that you used to scan with, and any of the advanced options you set.

With version 94 I also don't find any hits in the ab-Initio database when searching against human proteome. But i have had found some when used the older version (93). What have had changed?

Note that we don't create these databases. We simply make them available on our website for the convenience of users. You'd have to go to the Ensembl website to see what changes there have been in the two editions.  
 
I didn't try just one input, but a bigger list of them. But the motif Iam talking about is this - [AG]WLPPNL.

When compared with the mouse genome i have found over 25 proteins, but none against the human. 
But when i look at for example the Celsr2 gene (these three proteins were identified - ENSMUSP00000088046.3, ENSMUSP00000122329.1, ENSMUSP00000044261.8), They have almost the exact same sequence. The image I attached in the previous response is the sequence from human and clearly contains the query sequence. 
This is why I was asking. I hope you can help me understand it.

thanks
Assa

CharlesEGrant

unread,
Oct 22, 2018, 5:23:08 PM10/22/18
to MEME Suite Q&A
 But the motif Iam talking about is this - [AG]WLPPNL.

GLAM2Scan doesn't accept regular expressions as motifs. It only accepts motifs in GLAM2 format. Could you post the motif file you used for your search?

I'd also note that GLAM2 and GLAM2Scan are specifically designed for motifs with gaps, but it appears the motif you are looking at doesn't have gaps. Have you tried your search with FIMO instead?
Reply all
Reply to author
Forward
0 new messages