INFORMATION NEEDED-URGENT

10 views
Skip to first unread message

Shrinka Sen

unread,
Jan 16, 2023, 2:22:42 PM1/16/23
to genome-www, UCSC Genome Browser Discussion List

Hi

     I have a list of CpGs, which I want to annotate to genes. My files are like

Chr Start End

Chr110526 10527

chr1 10572 10573       

chr1 10580 10581

 

I downloaded a file from UCSC and it looks like this, please find the attached file.

I have 2 confusion regarding this reference file

 

1)     Here for the same region 9339402-9340995, there are 8 NM identifiers, what does it mean?

 

Chr

base

end

GeneName

strand

chr4

9339402

9340995

NM_001242330

+

chr4

9339402

9340995

NM_001242328

+

chr4

9339402

9340995

NM_001242327

+

chr4

9339402

9340995

NM_001242332

+

chr4

9339402

9340995

NM_001242331

+

chr4

9339402

9340995

NM_001242329

+

chr4

9339402

9340995

NM_001256867

+

chr4

9339402

9340995

NM_001242326

+

 

2)     Here, beside issue number 1, there is another issue. Overlap between 3rd and 4th line is of     102748 bp, meaning 6785323-6888201 (which corresponds to a specific NR identifier), overlap with 6785453-7769706 (which corresponds to a specific NM identifier), what does it mean?

 

chr1

6785323

6888201

NR_038934

+

chr1

6785323

6888201

NM_001242701

+

chr1

6785323

6888201

NR_146202

+

chr1

6785453

7769706

NM_015215

+

chr1

6785453

7769706

NM_001349610

+

chr1

6785453

7769706

NM_001349609

+

chr1

6785453

7769706

NM_001349612

+

chr1

6785453

7769706

NM_001349608

+

 

Waiting for your reply


Regards

Shrinka Sen

PDF, Vancuover, Canada




WholeGenomeUCSCNCBIRef.txt

Gerardo Perez

unread,
Jan 18, 2023, 3:26:09 PM1/18/23
to Shrinka Sen, genome-www, UCSC Genome Browser Discussion List

Hello, Shrinka.

Thank you for your interest in the Genome Browser.

The NM_ are RefSeq identifiers for protein-coding transcripts and the NR_ are non-protein-coding transcripts (https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly).

Every transcript has a unique identifier (accession), a gene that it is assigned to, a sequence, and a list of exon chrom/start/end coordinates on a chromosome. Most genes have multiple transcripts associated with them and often have overlapping coordinates.

Multiple versions of transcripts with the same coordinates are isoforms. Isoforms are a collection of related RNA transcript sequences from the same gene. These transcripts share the same first and last exons, but have a different set of internal exons. Some transcripts may instead stop in the middle of a coding exon, which changes the protein. Some transcripts may even put the exons together in a different way or skip some exons entirely.

You may find our genes FAQ page helpful, where genes and transcripts are explained: https://genome.ucsc.edu/FAQ/FAQgenes.html#gene

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


--
To unsubscribe from this group and stop receiving emails from it, send an email to genome-www+...@soe.ucsc.edu.
Reply all
Reply to author
Forward
0 new messages