Hi
I have a list of CpGs, which I want to annotate to genes. My files are like
Chr Start End
Chr110526 10527
chr1 10572 10573
chr1 10580 10581
I downloaded a file from UCSC and it looks like this, please find the attached file.
I have 2 confusion regarding this reference file
1) Here for the same region 9339402-9340995, there are 8 NM identifiers, what does it mean?
Chr |
base |
end |
GeneName |
strand |
chr4 |
9339402 |
9340995 |
NM_001242330 |
+ |
chr4 |
9339402 |
9340995 |
NM_001242328 |
+ |
chr4 |
9339402 |
9340995 |
NM_001242327 |
+ |
chr4 |
9339402 |
9340995 |
NM_001242332 |
+ |
chr4 |
9339402 |
9340995 |
NM_001242331 |
+ |
chr4 |
9339402 |
9340995 |
NM_001242329 |
+ |
chr4 |
9339402 |
9340995 |
NM_001256867 |
+ |
chr4 |
9339402 |
9340995 |
NM_001242326 |
+ |
2) Here, beside issue number 1, there is another issue. Overlap between 3rd and 4th line is of 102748 bp, meaning 6785323-6888201 (which corresponds to a specific NR identifier), overlap with 6785453-7769706 (which corresponds to a specific NM identifier), what does it mean?
chr1 |
6785323 |
6888201 |
NR_038934 |
+ |
chr1 |
6785323 |
6888201 |
NM_001242701 |
+ |
chr1 |
6785323 |
6888201 |
NR_146202 |
+ |
chr1 |
6785453 |
7769706 |
NM_015215 |
+ |
chr1 |
6785453 |
7769706 |
NM_001349610 |
+ |
chr1 |
6785453 |
7769706 |
NM_001349609 |
+ |
chr1 |
6785453 |
7769706 |
NM_001349612 |
+ |
chr1 |
6785453 |
7769706 |
NM_001349608 |
+ |
Waiting for your reply
Regards
Shrinka Sen
PDF, Vancuover, Canada
Hello, Shrinka.
Thank you for your interest in the Genome Browser.
The NM_ are RefSeq identifiers for protein-coding transcripts and the NR_ are non-protein-coding transcripts (https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly).
Every transcript has a unique identifier (accession), a gene that it is assigned to, a sequence, and a list of exon chrom/start/end coordinates on a chromosome. Most genes have multiple transcripts associated with them and often have overlapping coordinates.
Multiple versions of transcripts with the same coordinates are isoforms. Isoforms are a collection of related RNA transcript sequences from the same gene. These transcripts share the same first and last exons, but have a different set of internal exons. Some transcripts may instead stop in the middle of a coding exon, which changes the protein. Some transcripts may even put the exons together in a different way or skip some exons entirely.
You may find our genes FAQ page helpful, where genes and transcripts are explained: https://genome.ucsc.edu/FAQ/FAQgenes.html#gene
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Gerardo Perez
UCSC Genomics Institute
--
To unsubscribe from this group and stop receiving emails from it, send an email to genome-www+...@soe.ucsc.edu.