Pseudogenes

115 views
Skip to first unread message

Meh...@mskcc.org

unread,
Aug 23, 2022, 3:34:43 PM8/23/22
to gen...@soe.ucsc.edu

To Whom It May Concern,

 

I was hoping to get some help in trying to identify if particular genes have a pseudogene and what that pseudogenes name/transcript is.  I saw in the archives that the appropriate track to use is the GENCODE track, and I do see that I have to turn it to show and full to view it in the tracks.  However, I’m having trouble 1) getting data to show up in the track even with things turned on to display and 2) understand any data that is in there.

 

For example, I know that PMS2’s pseudogene is called PMS2CL.  However, when I search for PMS2, turn on the GENCODE track, and select the sub-filter of “pseudo”, nothing is displayed for PMS2.

When I search for PMS2CL, then I see transcripts under the pseudogene track.

 

The same thing happens when I search for CHEK2 versus CHEK2P2.

 

Is there a way to use UCSC Genome Browser to identify a gene’s pseudogene or pseudogenes?  Is there also a way to use the UCSC Genome Browser, especially since GENCODE gives transcripts, to determine where the pseudogene overlaps with the real gene?

 

Thank you in advance for any help you can provide!


Sincerely,

Nikita M.

 

 

Nikita Mehta, MS, CGC

Genetic Analysis Specialist, Sr

Diagnostic Molecular Genetics Laboratory, Department of Pathology

 

Memorial Sloan Kettering Cancer Center

1250 First Ave., New York, NY 10065

Schwartz Building

Meh...@mskcc.org

 

Please consider the environment before printing this page or its attachments.

 

=====================================================================

Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.

Daniel Schmelter

unread,
Aug 24, 2022, 6:29:22 PM8/24/22
to Meh...@mskcc.org, gen...@soe.ucsc.edu

Hello Nikita,

Thank you for contacting the Genome Browser support team with your question about pseudogenes.

You are right to be looking in the Gencode Pseudogenes track, but it seems like there's no data there for the PMS2 region. The Pseudogenes track only reports the positions of pseudogene annotations themselves, not a list of all regions similar to the current range. For an alignment of the current region to the entire genome, which should reveal pseudogenes and includes a base-by-base alignment, you can turn on the Self-Chain track. You could also use BLAT to search for similar sequences and use those results to find Pseudogenes.

A slightly broader method of finding Pseudogenes using UCSC tools would be to do a Table Browser query of the Pseudogenes dataset's name2 field using a wildcard search, such as PMS*. You can do this by selecting the Pseudogene dataset and then clicking Filter "Create". A test of this method resulted in dozens of hits to many PMS transcripts, though they may need to be looked through individually.


I hope this was helpful! If you have any more questions, please reply-all to our public support email at gen...@soe.ucsc.edu. For private communication, please reply-all to genom...@soe.ucsc.edu.
All the best,

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/BL0PR18MB2163288046BE5D1C1DCCFB8CF5709%40BL0PR18MB2163.namprd18.prod.outlook.com.

Meh...@mskcc.org

unread,
Aug 25, 2022, 5:41:28 PM8/25/22
to dsch...@ucsc.edu, gen...@soe.ucsc.edu

Hi Daniel,

 

Thank you for this response.  I think I might find it easier and more successful at a first try to use the Table Browser search.  I was initially in there trying to use the filter search, but didn’t know how to do it correctly.

 

Following your directions, I set the track to GENCODE V41lift37, the table to Pseudogenes, and then made the filter using the * after the gene.  My follow-up questions are 1) why does this gene name have to go into the name2 field of the filter instead of name (I tested it and it doesn’t work after hitting get output) and 2) can I search for multiple genes at once, perhaps somehow using the free-form query?  I tested the latter two, but couldn’t get it to work, but I’m not sure if my syntax is just wrong for that section.

 

Thanks again for your help!

Nikita

 

PS - I’ll definitely keep Self-Chain and BLAT searches in mind for actual sequence alignment.  I think I’m finding several pseudogenes for some genes as you said, and to determine if there is any interference from those in a clinical assay, the sequence search probably will be more useful.

 

Nikita Mehta, MS, CGC

Genetic Analysis Specialist, Sr

Diagnostic Molecular Genetics Laboratory, Department of Pathology

 

Memorial Sloan Kettering Cancer Center

1250 First Ave., New York, NY 10065

Schwartz Building

Meh...@mskcc.org

 

Please consider the environment before printing this page or its attachments.

 



*** Only open attachments or links from trusted senders. Report phishing to
inf...@mskcc.org ***

 

Gerardo Perez

unread,
Sep 2, 2022, 8:44:29 PM9/2/22
to Meh...@mskcc.org, gen...@soe.ucsc.edu

Hello, Nikita.

Thank you for your follow-up questions.

1) why does this gene name have to go into the name2 field of the filter instead of name (I tested it and it doesn’t work after hitting get output)

The name field in the GENCODE V41lift37 track consists of transcript identifiers that start with ENST. The name2 field consists of gene ids such as PMS2CL and CHEK2P2. You can check the fields of a track by clicking the table schema next to the table: option:

table_schema.png

2) can I search for multiple genes at once, perhaps somehow using the free-form query?

Yes, you can use the free-form query option to search for an additional gene. For example, on the Filter on Fields page, you can do the following:

name2 does match PMS2* AND

OR Free-form query: name2 like "CHEK2*"

The free-form query takes in SQL syntax, such as:

name2 like "CHEK2%" 
name2 like "CHEK2*" 
name2 like 'CHEK2%'
name2 like 'CHEK2*'

Also, we are working on improving the display under the main gene, where you will see the locations of all pseudogenes for the current gene in different colors.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


Mehta, Nikita

unread,
May 2, 2024, 3:54:22 PMMay 2
to Gerardo Perez, gen...@soe.ucsc.edu

Hello Gerardo,

 

I’m looking at pseudogenes again, and I was wondering if the UCSC was able to implement a track that “improv[es] the display under the main gene, where you will see the locations of all pseudogenes for the current gene in different colors” as you previously mentioned.

 

Thanks!

Nikita

 

Nikita Mehta, MS, CGC

Genetic Analysis Specialist, Sr

Diagnostic Molecular Genetics Laboratory, Department of Pathology

 

Memorial Sloan Kettering Cancer Center

1250 First Ave., New York, NY 10065

Schwartz Building

Meh...@mskcc.org

 

Please consider the environment before printing this page or its attachments.

 

From: Gerardo Perez <gpe...@ucsc.edu>
Sent: Friday, September 2, 2022 8:44 PM
To: Mehta, Nikita N./Pathology <Meh...@mskcc.org>
Cc: gen...@soe.ucsc.edu
Subject: [EXTERNAL] Re: Re: [genome] Pseudogenes

 

Hello, Nikita.

Thank you for your follow-up questions.

1) why does this gene name have to go into the name2 field of the filter instead of name (I tested it and it doesn’t work after hitting get output)

The name field in the GENCODE V41lift37 track consists of transcript identifiers that start with ENST. The name2 field consists of gene ids such as PMS2CL and CHEK2P2. You can check the fields of a track by clicking the table schema next to the table: option:

Disclaimer ID:MSKCC

Gerardo Perez

unread,
May 13, 2024, 8:25:28 PMMay 13
to Mehta, Nikita, gen...@soe.ucsc.edu

Hello, Nikita.

There has been some progress. We recently got the data from the Gerstein group but still need some additional data fields. If you are interested, we would be happy to share an early version and you can provide us some feedback. For example, would you be interested to see the actual base pair alignments, or would the exon annotation itself be good enough? 

I hope this is helpful. Please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute

Mehta, Nikita

unread,
Jul 8, 2024, 6:00:23 PMJul 8
to Gerardo Perez, gen...@soe.ucsc.edu

Hi Gerardo,

 

I’m so sorry for never responding and letting this sit for 2 months.  If it would help to have someone test things, I’d be happy to see an early version (if it’s still early and not already in production) of this pseudogene track.


I think from my perspective as a variant curator, the exon structure is sufficient.  However, I know my colleagues would probably appreciate base pair alignments (for panel design, primer design, etc.).  For example, I’ve seen them use Clustal to align sequence manually.

 

Please let me know what I can do!

 

Thanks,

Nikita

 

Nikita Mehta, MS, CGC

Genetic Analysis Specialist, Sr

Diagnostic Molecular Genetics Laboratory, Department of Pathology

 

Memorial Sloan Kettering Cancer Center

1250 First Ave., New York, NY 10065

Schwartz Building

Meh...@mskcc.org

 

Please consider the environment before printing this page or its attachments.

 

From: Gerardo Perez <gpe...@ucsc.edu>
Sent: Monday, May 13, 2024 8:25 PM
To: Mehta, Nikita <Meh...@mskcc.org>
Cc: gen...@soe.ucsc.edu
Subject: [EXTERNAL] Re: [genome] Pseudogenes

 

Hello, Nikita. There has been some progress. We recently got the data from the Gerstein group but still need some additional data fields. If you are interested, we would be happy to share an early version and you can provide us some feedback. 

Jairo Navarro Gonzalez

unread,
Jul 11, 2024, 6:30:31 PMJul 11
to Mehta, Nikita, Gerardo Perez, gen...@soe.ucsc.edu

Hello,

Thank you for using and helping improve the UCSC Genome Browser.

I have added a note to contact you once the track has developed enough so you can review the track. Unfortunately, I cannot give an estimated date for when it will be available for review.

If you have any further questions, please reply to gen...@soe.ucsc.edu.

All messages sent to that address are archived on a publicly accessible Google Groups forum.


If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser


Reply all
Reply to author
Forward
0 new messages