Dear Bogdan Tanasa,
Thank you for your question about the UCSC Genome Browser.
You can find information on the BED format here:
https://genome.ucsc.edu/FAQ/FAQdownloads.html#download35
Scroll down to the section titled “Name of fourth column in BED output”
The “0” in FAM138F_exon_1_0_chr1_35277_r indicates the number of bases added to the regions requested. For example, if you added 100 bases then the file name would read:
chr1 35276 35481 FAM138F_exon_1_100_chr1_35377_r 0 -
The “chr1_35277” in "chr1_35277_r" indicates the position of the first base. If you have specified bases added to the requested features (for example, Exons plus 100 bases on each end), then columns 2 and 3 of the output wouldn't be the exact coordinates of the exon, they would start and end 100 bases before/after the exon. So, this part of the information is an easy way to see where the actual feature starts as displayed in the browser. It is "as displayed in the browser" because the coordinates in our tables almost always have 0-based starts (as they do in columns 2 and 3 of this output) but display as 1-based in the browser (for more info see the FAQ), but the start position listed in the section of the 4th column is actually 1 based. It will be the exact coordinate the feature starts on as displayed in the browser.
The “r" in “chr1_35277_r” indicates the strand of an item, "r" representing the reverse strand and "f" representing the forward strand.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
-Chris V
UCSC Genome Browser
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
Thank you for using the UCSC Genome Browser and your question about RefFlat files.
Yes, you are correct, in this example, there are 20 transcripts.
For example, if you go to the gateway page and search for this gene "ZNF41" in hg38, and look in the RefSeq track, you will see 20 transcripts for this gene.
18 of your rows display this exon region:
chrX 47445177 47449474 ZNF41_exon_0_0_chrX_47445178_r 0 -
2 of your rows display this smaller exon region:
chrX 47445177 47449366 ZNF41_exon_0_0_chrX_47445178_r 0 -
You can visually see the two smaller exons in the session below:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=jnavarr5&hgS_otherUserSessionName=MLQ.18086.hg38.RefSeq
Here is more information about RefSeq:
http://www.ncbi.nlm.nih.gov/books/NBK21091/
For example, from the intro, "Be aware, however, that the RefSeq collection does include alternatively spliced transcripts encoding the same protein or distinct protein isoforms, in addition to orthologs, paralogs, and alternative haplotypes for some organisms, which will affect the outcome of a database query."
I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser