To Whom It May Concern,
I was hoping to get some help in trying to identify if particular genes have a pseudogene and what that pseudogenes name/transcript is. I saw in the archives that the appropriate track to use is the GENCODE track, and I do see that I have to turn it to show and full to view it in the tracks. However, I’m having trouble 1) getting data to show up in the track even with things turned on to display and 2) understand any data that is in there.
For example, I know that PMS2’s pseudogene is called PMS2CL. However, when I search for PMS2, turn on the GENCODE track, and select the sub-filter of “pseudo”, nothing is displayed for PMS2.

When I search for PMS2CL, then I see transcripts under the pseudogene track.

The same thing happens when I search for CHEK2 versus CHEK2P2.
Is there a way to use UCSC Genome Browser to identify a gene’s pseudogene or pseudogenes? Is there also a way to use the UCSC Genome Browser, especially since GENCODE gives transcripts, to determine where the pseudogene overlaps with the real gene?
Thank you in advance for any help you can provide!
Sincerely,
Nikita M.
Nikita Mehta, MS, CGC
Genetic Analysis Specialist, Sr
Diagnostic Molecular Genetics Laboratory, Department of Pathology
Memorial Sloan Kettering Cancer Center
1250 First Ave., New York, NY 10065
Schwartz Building
Please consider the environment before printing this page or its attachments.
=====================================================================
Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.
Hello Nikita,
Thank you for contacting the Genome Browser support team with your question about pseudogenes.
You are right to be looking in the Gencode Pseudogenes track, but it seems like there's no data there for the PMS2 region. The Pseudogenes track only reports the positions of pseudogene annotations themselves, not a list of all regions similar to the current range. For an alignment of the current region to the entire genome, which should reveal pseudogenes and includes a base-by-base alignment, you can turn on the Self-Chain track. You could also use BLAT to search for similar sequences and use those results to find Pseudogenes.
A slightly broader method of finding Pseudogenes using UCSC tools would be to do a Table Browser query of the Pseudogenes dataset's name2 field using a wildcard search, such as PMS*. You can do this by selecting the Pseudogene dataset and then clicking Filter "Create". A test of this method resulted in dozens of hits to many PMS transcripts, though they may need to be looked through individually.
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/BL0PR18MB2163288046BE5D1C1DCCFB8CF5709%40BL0PR18MB2163.namprd18.prod.outlook.com.
Hi Daniel,
Thank you for this response. I think I might find it easier and more successful at a first try to use the Table Browser search. I was initially in there trying to use the filter search, but didn’t know how to do it correctly.
Following your directions, I set the track to GENCODE V41lift37, the table to Pseudogenes, and then made the filter using the * after the gene. My follow-up questions are 1) why does this gene name have to go into the name2 field of the filter instead of name (I tested it and it doesn’t work after hitting get output) and 2) can I search for multiple genes at once, perhaps somehow using the free-form query? I tested the latter two, but couldn’t get it to work, but I’m not sure if my syntax is just wrong for that section.
Thanks again for your help!
Nikita
PS - I’ll definitely keep Self-Chain and BLAT searches in mind for actual sequence alignment. I think I’m finding several pseudogenes for some genes as you said, and to determine if there is any interference from those in a clinical assay, the sequence search probably will be more useful.
Nikita Mehta, MS, CGC
Genetic Analysis Specialist, Sr
Diagnostic Molecular Genetics Laboratory, Department of Pathology
Memorial Sloan Kettering Cancer Center
1250 First Ave., New York, NY 10065
Schwartz Building
Please consider the environment before printing this page or its attachments.
*** Only open attachments or links from trusted senders. Report phishing to
inf...@mskcc.org ***
Hello, Nikita.
Thank you for your follow-up questions.
1) why does this gene name have to go into the name2 field of the filter instead of name (I tested it and it doesn’t work after hitting get output)
The name field in the GENCODE V41lift37 track consists of transcript identifiers that start with ENST. The name2 field consists of gene ids such as PMS2CL and CHEK2P2. You can check the fields of a track by clicking the table schema next to the table: option:
2) can I search for multiple genes at once, perhaps somehow using the free-form query?
Yes, you can use the free-form query option to search for an additional gene. For example, on the Filter on Fields page, you can do the following:
name2 does match PMS2* AND
OR Free-form query: name2 like "CHEK2*"
The free-form query takes in SQL syntax, such as:
Also, we are working on improving the display under the main gene, where you will see the locations of all pseudogenes for the current gene in different colors.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Gerardo Perez
UCSC Genomics Institute
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/BL0PR18MB216315B195181E8523433AA6F5729%40BL0PR18MB2163.namprd18.prod.outlook.com.
Hello Gerardo,
I’m looking at pseudogenes again, and I was wondering if the UCSC was able to implement a track that “improv[es] the display under the main gene, where you will see the locations of all pseudogenes for the current gene in different colors” as you previously mentioned.
Thanks!
Nikita
Genetic Analysis Specialist, Sr
Diagnostic Molecular Genetics Laboratory, Department of Pathology
Memorial Sloan Kettering Cancer Center
1250 First Ave., New York, NY 10065
Schwartz Building
Please consider the environment before printing this page or its attachments.
From: Gerardo Perez <gpe...@ucsc.edu>
Sent: Friday, September 2, 2022 8:44 PM
To: Mehta, Nikita N./Pathology <Meh...@mskcc.org>
Cc: gen...@soe.ucsc.edu
Subject: [EXTERNAL] Re: Re: [genome] Pseudogenes
Hello, Nikita.
Thank you for your follow-up questions.
1) why does this gene name have to go into the name2 field of the filter instead of name (I tested it and it doesn’t work after hitting get output)
The name field in the GENCODE V41lift37 track consists of transcript identifiers that start with ENST. The name2 field consists of gene ids such as PMS2CL and CHEK2P2. You can check the fields of a track by clicking the table schema next to the table: option:
![]()
Hello, Nikita.
There has been some progress. We recently got the data from the Gerstein group but still need some additional data fields. If you are interested, we would be happy to share an early version and you can provide us some feedback. For example, would you be interested to see the actual base pair alignments, or would the exon annotation itself be good enough?
I hope this is helpful. Please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Gerardo Perez
UCSC Genomics Institute
Hi Gerardo,
I’m so sorry for never responding and letting this sit for 2 months. If it would help to have someone test things, I’d be happy to see an early version (if it’s still early and not already in production) of this pseudogene track.
I think from my perspective as a variant curator, the exon structure is sufficient. However, I know my colleagues would probably appreciate base pair alignments (for panel design, primer design, etc.). For example, I’ve seen them use Clustal to align sequence
manually.
Please let me know what I can do!
Thanks,
Nikita
Genetic Analysis Specialist, Sr
Diagnostic Molecular Genetics Laboratory, Department of Pathology
Memorial Sloan Kettering Cancer Center
1250 First Ave., New York, NY 10065
Schwartz Building
Please consider the environment before printing this page or its attachments.
From: Gerardo Perez <gpe...@ucsc.edu>
Sent: Monday, May 13, 2024 8:25 PM
To: Mehta, Nikita <Meh...@mskcc.org>
Cc: gen...@soe.ucsc.edu
Subject: [EXTERNAL] Re: [genome] Pseudogenes
Hello, Nikita. There has been some progress. We recently got the data from the Gerstein group but still need some additional data fields. If you are interested, we would be happy to share an early version and you can provide us some feedback.
Hello,
Thank you for using and helping improve the UCSC Genome Browser.
I have added a note to contact you once the track has developed enough so you can review the track. Unfortunately, I cannot give an estimated date for when it will be available for review.
If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/PH7PR18MB5666EAA3ED033F05A74D044BF5DA2%40PH7PR18MB5666.namprd18.prod.outlook.com.
Oh wow! Thank you so much! This will make tracking complications due to pseudogene much easier. I’m sure others will find use out of this track too.
Thanks again!
Nikita
Genetic Analysis Specialist, Sr
Diagnostic Molecular Genetics Laboratory, Department of Pathology
Memorial Sloan Kettering Cancer Center
1250 First Ave., New York, NY 10065
Schwartz Building
Please consider the environment before printing this page or its attachments.
From: Gerardo Perez <gpe...@ucsc.edu>
Sent: Monday, March 31, 2025 7:54 PM
To: Mehta, Nikita <Meh...@mskcc.org>
Cc: gen...@soe.ucsc.edu
Subject: [EXTERNAL] Re: [genome] Pseudogenes
Hello, Nikita. We wanted to follow up to let you know that we have released the Pseudogenes track for hg38. Here is the link to the track announcement: https: //genome. ucsc. edu/goldenPath/newsarch. html#033125. I hope this is helpful. If you have
Hi Gerardo,
I’m trying to play around with the track a little bit, and I’m not sure if I’m not doing something correctly or I just need to understand the information presented better.
Using PMS2 as an example, I loaded that gene into GRCh38 and the track shows that it is a parent gene (purple) with pseudogene (gray lines below). However, when I click on any of those gray lines, representing the pseudogenes, it’s hard to tell which one it is. Let’s say I’m looking for PMS2CL, the PGOHUMTID is hard to correlate (as I’m unfamiliar with this ID). I think “PMS2CL” is a HUGO ID, so could that be given as well in the hover-over and detailed views?
On the other hand, when I look up PMS2CL in GRCh38, it does not come up at all in the pseudogene subtrack and PMS2 is listed there as opposed to the parent gene subtrack. PMS2 is also color-coded in blue for being processed, which I guess means that PMS2CL is a processed pseudogene, but then when you click on PMS2, it says unprocessed_pseudogene (which I assume is also meant to refer to PMS2CL even though the gene in that track is labeled PMS2). Either I’m not using this track properly or perhaps these are errors?
Finally, I think by searching for a specific pseudogene, the view does show which exons overlap with the parent gene. However, when you search for the parent gene, this exon-level detail is not present, which I think would also be useful to determine when a variant or CNV call could be complicated by a pseudogene. Furthermore, since the pseudogenes aren’t easily identifiable, I thought perhaps I could get the PGOHUMTID by search for a specific pseudogene, but I do not see the one for PMS2CL when I do that search so that I can go back to the PMS2 search and identify the gray line that corresponds to PMS2CL.
Could you let me know if I’m not using the track in the ideal way and if I’m missing features?
Thanks,
Nikita
Genetic Analysis Specialist, Sr
Diagnostic Molecular Genetics Laboratory, Department of Pathology
Memorial Sloan Kettering Cancer Center
1250 First Ave., New York, NY 10065
Schwartz Building
Please consider the environment before printing this page or its attachments.
Hello, Nikita.
Thank you for following up with us and for sharing your feedback on the Pseudogenes track.
There was an error in how we labeled the color-coded pseudogene types on both the track description page and the news announcement. These have now been corrected to reflect that unprocessed pseudogenes are shown in blue and processed pseudogenes in olive green. We appreciate you bringing this to our attention.
We have created an internal ticket to consider incorporating your suggestions, such as adding HUGO IDs to the item details page and the mouse hover text for the gray pseudogene indicators. We are also looking into why the track displays PMS2 but not PMS2CL.
It would be helpful if you could take a look at the track once the updates are in place and share any feedback you may have.
Regarding your note about the view does show which exons overlap with the parent gene, could you clarify what you meant? The track currently shows the full pseudogene as annotated by Yale but does not indicate which specific exons overlap with the parent gene.
I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Gerardo Perez
UCSC Genomics Institute
Hello,
Sorry for the delay, but thanks for the updates! I was looking at this again today and noticed the following after I searched for PMS2 in the search bar.
Coloring


Regarding requests and suggestions (adding HUGO IDs to the item details page and the mouse hover text for the gray pseudogene indicators), appreciate that!
Thank you for looking into why the search for a pseudogene doesn’t seem to work properly.
Regarding exon view, please see my initial question with screenshots, which will hopefully clarify.


I hope that clarifies things! Please let me know if and when there are any more things to try out. I’m more than happy to as I really appreciate this effort!!
Thanks again,
Nikita
Genetic Analysis Specialist, Sr
Diagnostic Molecular Genetics Laboratory, Department of Pathology
Memorial Sloan Kettering Cancer Center
1250 First Ave., New York, NY 10065
Schwartz Building
Please consider the environment before printing this page or its attachments.
From: Gerardo Perez <gpe...@ucsc.edu>
Sent: Friday, April 4, 2025 11:13 PM
To: Mehta, Nikita <Meh...@mskcc.org>
Cc: gen...@soe.ucsc.edu
Subject: [EXTERNAL] Re: [genome] Pseudogenes
Hello, Nikita. Thank you for following up with us and for sharing your feedback on the Pseudogenes track. There was an error in how we labeled the color-coded pseudogene types on both the track description page and the news announcement. These
Hello Nikita,
Thank you for sharing your feedback on the Pseudogenes track. We have relayed your message to the engineer working on the track development. Please let us know if you have any other suggestions or ways we can improve the track to better serve our users.
If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser
Hello, Nikita.
Thank you again for your helpful feedback on the Pseudogenes track.
We have made several updates based on your comments:
Could you take a look at the track on our development site and let us know if you have any feedback? Here is a session that shows PMS2CL on our development site: https://genome-test.gi.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Gerardo&hgS_otherUserSessionName=PSM2
Please include genom...@soe.ucsc.edu in any replies to ensure visibility by the team.
Gerardo Perez
UCSC Genomics Institute
Hello Gerardo and Team,
Thank you so much for taking my feedback into consideration! I had some time to look at your most recent updates today. My comments are below.
The updated description page is helpful, so first, I want to make sure I understand the purpose of the Pseudogene Parent track and Pseudogene track properly. Is it correct that the Pseudogene Parent track lists all the pseudogenes that exist for the gene that you search for? That is in part what is said by this sentence: “These indicators do not show pseudogene locations directly but instead indicate how many pseudogenes are associated with each gene and link to their genomic regions in the Pseudogenes track.” The Pseudogenes track is meant to show you the type of pseudogene and its structure, but only if you search for that specific pseudogene (e.g., PMS2 search will only give you information in the parent track whereas PMS2CL search will only give you information in the pseudogene track). Both subtracks allow you to link between the gene and pseudogenes (essentially allowing you to toggle between using the parent and pseudogene tracks).
The HUGO IDs are immensely helpful. I like that they are present on the side on the parent track and on mouseover. I did notice that the parent gene is at the top (as expected) but also listed again amongst the pseudogenes; I don’t necessarily mind that, but when I hover over the top one, only PMS2CL is listed as the pseudogene position and I think when I hover over the bottom one, the rest of the pseudogenes are listed. I’m also not clear as to why the structure looks different for PMS2 between the top row and the bottom row (looks shifted).

Super helpful, especially if there is a pseudogene that one has not heard of and then you want to look at it. Or even if you just want structure of a pseudogene that you do know about and don’t want to go through a multistep process to get to it.
Oh ok, I understand that this is a limitation based on the data you have. I wanted to see if I could manually figure this out. The following steps seem to work, but I’m no programming expert. Do you think there might be some way to use an algorithm to do all of this and somehow get an exon overlap track to work?
Thanks again for all of your time! I’m going to test this out some more as I do some pseudogene investigations for my lab.
Happy Weekend!
Nikita
Genetic Analysis Specialist, Sr
Diagnostic Molecular Genetics Laboratory, Department of Pathology
Memorial Sloan Kettering Cancer Center
1250 First Ave., New York, NY 10065
Schwartz Building
Please consider the environment before printing this page or its attachments.
From: Gerardo Perez <gpe...@ucsc.edu>
Sent: Wednesday, May 7, 2025 5:04 PM
To: Mehta, Nikita <Meh...@mskcc.org>
Cc: gen...@soe.ucsc.edu
Subject: [EXTERNAL] Re: [genome] Pseudogenes
Hello, Nikita. Thank you again for your helpful feedback on the Pseudogenes track. We have made several updates based on your comments: HUGO IDs: We added HUGO gene symbols to the item details page and the mouseover text for the gray indicators
Hello, Nikita.
We apologize for the delay in our response. We will address your questions below:
Is it correct that the Pseudogene Parent track lists all the pseudogenes that exist for the gene that you search for?
The Pseudogenes track is meant to show you the type of pseudogene and its structure, but only if you search for that specific pseudogene (e.g., PMS2 search will only give you information in the parent track whereas PMS2CL search will only give you information in the pseudogene track).
When you search for a gene, the search results will include the pseudogenes available from the Yale Pseudogenes track (not from the Yale Pseudogene Parents track). For example, searching for PMS2 shows pseudogene results listed under Yale Pseudogenes on the search results page. The following screenshot shows the PMS2 pseudogene results:

A search for PMS2CL lists the PMS2CL pseudogene under the Yale Pseudogenes track, as shown in the following screenshot:


Both subtracks allow you to link between the gene and pseudogenes (essentially allowing you to toggle between using the parent and pseudogene tracks).
Yes, both subtracks allow you to link between the gene and its pseudogenes.
I did notice that the parent gene is at the top (as expected) but also listed again amongst the pseudogenes; I don’t necessarily mind that, but when I hover over the top one, only PMS2CL is listed as the pseudogene position and I think when I hover over the bottom one, the rest of the pseudogenes are listed. I’m also not clear as to why the structure looks different for PMS2 between the top row and the bottom row (looks shifted).
The two parent gene entries appear because PMS2 has two transcripts. Each transcript is associated with different pseudogenes, which may appear above or below the transcript display. The ENST00000441476.6 PMS2 transcript is associated with the PMS2CL pseudogene, while the ENST00000643595.1 PMS2 transcript is associated with PMS2P1–PMS2P12, AC004980.8, and CH17-264B6.3.


Do you think there might be some way to use an algorithm to do all of this and somehow get an exon overlap track to work?
Our engineer shared that your approach of taking a parent exon and using BLAT to find where it maps in the genome does precisely what you want, namely, it finds the pseudogenes that contain it. This approach will also find orthologs and gene family members, which is why it has not been applied on a genome-wide scale. However, if you perform this manually and then check the pseudogene track for hits, or at least check that no protein-coding genes are annotated in the same region, it should work fine for your purpose.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Gerardo Perez
UCSC Genomics Institute