INFORMATION NEEDED-URGENT

10 views

Skip to first unread message

Shrinka Sen

unread,

Jan 16, 2023, 2:22:42 PM1/16/23

to genome-www, UCSC Genome Browser Discussion List

I have a list of CpGs, which I want to annotate to genes. My files are like

Chr Start End

Chr110526 10527

chr1 10572 10573

chr1 10580 10581

I downloaded a file from UCSC and it looks like this, please find the attached file.

I have 2 confusion regarding this reference file

1) Here for the same region 9339402-9340995, there are 8 NM identifiers, what does it mean?

Chr	base	end	GeneName	strand
chr4	9339402	9340995	NM_001242330	+
chr4	9339402	9340995	NM_001242328	+
chr4	9339402	9340995	NM_001242327	+
chr4	9339402	9340995	NM_001242332	+
chr4	9339402	9340995	NM_001242331	+
chr4	9339402	9340995	NM_001242329	+
chr4	9339402	9340995	NM_001256867	+
chr4	9339402	9340995	NM_001242326	+

2) Here, beside issue number 1, there is another issue. Overlap between 3rd and 4^th line is of 102748 bp, meaning 6785323-6888201 (which corresponds to a specific NR identifier), overlap with 6785453-7769706 (which corresponds to a specific NM identifier), what does it mean?

chr1	6785323	6888201	NR_038934	+
chr1	6785323	6888201	NM_001242701	+
chr1	6785323	6888201	NR_146202	+
chr1	6785453	7769706	NM_015215	+
chr1	6785453	7769706	NM_001349610	+
chr1	6785453	7769706	NM_001349609	+
chr1	6785453	7769706	NM_001349612	+
chr1	6785453	7769706	NM_001349608	+

Waiting for your reply

Regards

Shrinka Sen

PDF, Vancuover, Canada

WholeGenomeUCSCNCBIRef.txt

Gerardo Perez

unread,

Jan 18, 2023, 3:26:09 PM1/18/23

to Shrinka Sen, genome-www, UCSC Genome Browser Discussion List

Hello, Shrinka.

Thank you for your interest in the Genome Browser.

The NM_ are RefSeq identifiers for protein-coding transcripts and the NR_ are non-protein-coding transcripts (https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly).

Every transcript has a unique identifier (accession), a gene that it is assigned to, a sequence, and a list of exon chrom/start/end coordinates on a chromosome. Most genes have multiple transcripts associated with them and often have overlapping coordinates.

Multiple versions of transcripts with the same coordinates are isoforms. Isoforms are a collection of related RNA transcript sequences from the same gene. These transcripts share the same first and last exons, but have a different set of internal exons. Some transcripts may instead stop in the middle of a coding exon, which changes the protein. Some transcripts may even put the exons together in a different way or skip some exons entirely.

You may find our genes FAQ page helpful, where genes and transcripts are explained: https://genome.ucsc.edu/FAQ/FAQgenes.html#gene

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute

--
To unsubscribe from this group and stop receiving emails from it, send an email to genome-www+...@soe.ucsc.edu.

Reply all

Reply to author

Forward

0 new messages