gene location file

16 views
Skip to first unread message

‍김홍비[학생](약학대학 한약학과)

unread,
Aug 26, 2019, 11:32:56 AM8/26/19
to gen...@soe.ucsc.edu
To whom it may concern,
Hello I'm HongBi Kim studying bioinformatics in Korea Kyung Hee University.
I am emailing you to get your help.
I have files that I attatched.
This is information about the location of gene on chromosome 10.
I downloaded these file at genome browser. 
But It was too long time ago so I don't remember where I downloaded it.
I also want information about other genes. 
Please let me know where I can get these files.

I'm looking forward to hearing from you.
Your sincerely

Hong-Bi Kim

order_chr10-.txt
order_chr10+.txt

Luis Nassar

unread,
Aug 27, 2019, 11:39:53 AM8/27/19
to ‍김홍비[학생](약학대학 한약학과), UCSC Genome Browser Discussion List

Hello Hong-Bi,

Thank you for your interest in the Genome Browser.

We were not able to identify what file the information you provided came from. The file lists unique gene symbols with a single exon start/end coordinate for each entry, but does not exactly match any schemas we provide. It seems there were some post-download modifications.

If you are looking for unique gene data on the hg38 assembly, however, we can help you get that. Using the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables), you can make the following selections to get one transcript per gene symbol for the whole genome:

  • clade: mammal
  • genome: Human
  • assembly: hg38
  • group: Genes and Gene Predictions
  • track: UCSC Genes
  • table: knownCanonical
  • region: genome
  • output format: selected fields from primary and related tables
  • output file add filename here to prompt downloads, e.g. knownCanonical.txt
  • get output
    Make the following selections:
  • chrom
  • chromStart
  • chromEnd
  • geneSymbol (From the "hg38.kgXref fields" section)
  • get output

Your output file should look as such:

#hg38.knownCanonical.chrom    hg38.knownCanonical.chromStart    hg38.knownCanonical.chromEnd    hg38.kgXref.geneSymbol
chr1    11106534    11262507    MTOR
chr1    11143897    11149537    MTOR-AS1
chr1    11152349    11152452    RNU6-537P
chr1    11189340    11195981    ANGPTL7
chr1    11226253    11226360    RNU6-291P
chr1    11232962    11233112    RPL39P6

It includes the chromosome, the transcription start (beginning of first exon), transcription end (end of final exon), and gene symbol. You may also wish to include strand information to include all the data in the files you provided.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAL3DNSW4WK70SvXqZ-p%3DBu7eRqN6n%2Boeh-Q7L2ZrWTY40MNw%2Bg%40mail.gmail.com.

Luis Nassar

unread,
Sep 2, 2019, 12:34:33 PM9/2/19
to ‍김홍비[학생](약학대학 한약학과), UCSC Genome Browser Discussion List
Hello Hong-Bi,

I am forwarding your message to our mailing list (gen...@soe.ucsc.edu) so the team can chime in as well. We will send you back a reply soon. 

Lou Nassar
UCSC Genomics Institute

On Mon, Sep 2, 2019 at 1:58 AM ‍김홍비[학생](약학대학 한약학과) <tir4r...@khu.ac.kr> wrote:
And the data that i exactly want is gene Transcript position including UTRs.
image.png
I think I should change following selections, don't I?
image.png
please let me know, thank you.
Your sincerely

Hong-Bi Kim

2019년 9월 2일 (월) 오후 5:45, ‍김홍비[학생](약학대학 한약학과) <tir4r...@khu.ac.kr>님이 작성:
 Thank you for your reply.
But I had a problem in searching data.
You said i should select track as USCS Genes. But I couldn't find USCS Genes. As you can see picture below, there is no USCS Genes. In this situation, how can i do?
  • track: UCSC Genes
image.pngimage.png

Your sincerely

Hong-Bi Kim

2019년 8월 28일 (수) 오전 12:39, Luis Nassar <lrna...@ucsc.edu>님이 작성:


--
Hong Bi Kim
 
Undergraduate Student
Department of Life and Nanopharmaceutical Science
B210 College of Pharmacy, Kyung Hee University
26 Kyunghee-daero, Dongdaemoon-gu
Seoul 02447
Republic of Korea


--
Hong Bi Kim
 
Undergraduate Student
Department of Life and Nanopharmaceutical Science
B210 College of Pharmacy, Kyung Hee University
26 Kyunghee-daero, Dongdaemoon-gu
Seoul 02447
Republic of Korea

Luis Nassar

unread,
Sep 3, 2019, 5:26:07 PM9/3/19
to ‍김홍비[학생](약학대학 한약학과), UCSC Genome Browser Discussion List

Hello Hong-Bi,

I apologize for mentioning the UCSC Genes track. That is the track that corresponds to these steps for the hg19 assembly. For the hg38 assembly, you will want to select GENCODE v29.

The steps would be the same as my previous response except for the track name. These steps will generate one transcript per gene symbol for the whole genome. These coordinates are transcription start/end, so they include UTRs:

    • clade: mammal
    • genome: Human
    • assembly: hg38
    • group: Genes and Gene Predictions
    • track: GENCODE v29
    • table: knownCanonical
    • region: genome
    • output format: selected fields from primary and related tables
    • output file add filename here to prompt downloads, e.g. knownCanonical.txt
    • get output
      Make the following selections:
    • chrom
    • chromStart
    • chromEnd
    • geneSymbol (From the "hg38.kgXref fields" section)
    • get output

      If you would instead like to get all transcripts, and not just one representation per gene, you can make the following selections. These will also give you transcription start/end, along with ENSEMBL id and associated gene symbol:

        • clade: mammal
        • genome: Human
        • assembly: hg38
        • group: Genes and Gene Predictions
        • track: GENCODE v29
        • table: knownGene
        • region: genome
        • output format: selected fields from primary and related tables
        • output file add filename here to prompt downloads, e.g. knownCanonical.txt
        • get output
          Make the following selections:
        • name
        • chrom
        • strand
        • txStart
        • txEnd
        • geneSymbol (From the "hg38.kgXref fields" section)
        • get output

          Those results should look as follows:

          #hg38.knownGene.name    hg38.knownGene.chrom    hg38.knownGene.strand    hg38.knownGene.txStart    hg38.knownGene.txEnd    hg38.kgXref.geneSymbol
          ENST00000456328.2    chr1    +    11868    14409    DDX11L1
          ENST00000450305.2    chr1    +    12009    13670    DDX11L1
          ENST00000488147.1    chr1    -    14403    29570    WASH7P
          ENST00000619216.1    chr1    -    17368    17436    MIR6859-1
          ENST00000473358.1    chr1    +    29553    31097    MIR1302-2HG
          ...
          

          If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

          Lou Nassar
          UCSC Genomics Institute

          Training videos & resources: http://genome.ucsc.edu/training/index.html
          Want to share the Browser with colleagues?
          Host a workshop: http://bit.ly/ucscTraining

          Luis Nassar

          unread,
          Sep 4, 2019, 12:09:39 PM9/4/19
          to ‍김홍비[학생](약학대학 한약학과), UCSC Genome Browser Discussion List
          Hello Hong-Bi,

          In the knownCanonical table (first example), the strand information is not displayed by default. If you look at the schema in the field selection you will see the following information:


          chromStartStart position (0 based). Represents transcription start for + strand genes, end for - strand genes
          chromEndEnd position (non-inclusive). Represents transcription end for + strand genes, start for - strand genes

          So the chromStart/chromEnd positions are relative to the strand. If you would like to add an additional field for strand, however, you can do that as well. In the page where you select your output fields (chromStart, chromEnd, geneSymbol, etc) there is a section below titled Linked Tables. Select the following box:


          hg38knownGeneTranscript from default gene set in UCSC browser

          Then click the allow selection from checked tables at the bottom. You will now see a new section in the page: hg38.knownGene fields

          From here you can select strand:


          strand+ or - for strand

          Along with the other selections. Your output will now include strand, and will look as follows:

          #hg38.knownCanonical.chrom    hg38.knownCanonical.chromStart    hg38.knownCanonical.chromEnd    hg38.kgXref.geneSymbol    hg38.knownGene.strand
          chr1    169853073    169893959    SCYL3    -
          chr1    169795048    169854080    C1orf112    +
          chr1    27612063    27635277    FGR    -
          chr1    196651877    196747504    CFH    +
          chr1    24357004    24413725    STPG1    -
          chr1    24415793    24469307    NIPAL3    +
          I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

          Lou Nassar
          UCSC Genomics Institute

          On Wed, Sep 4, 2019 at 2:46 AM ‍김홍비[학생](약학대학 한약학과) <tir4r...@khu.ac.kr> wrote:
          Thank you for your helping.
          I can get gene location file following your mails.
          I also need direction in which genes are coded, but the file that you inform me only includes the location of gene.
          Let me explain what I want.

          For example, DNA is double strand.
          So, exonstart of PRR26 is 649947 and exonend of PRR26 is 665169. Its direction is small number to big number. I will call it (+) strand.
          exonstart of TUBB8 is 49504 and exonend of TUBB8 is 46436. Its direction is big number to small number. I will call it (-) strand.

          But in the file that you inform me, all of end is bigger than all of start. I think it didn't considered direction.

          please let me know, thank you.
          Your sincerely

          Hong-Bi Kim



          2019년 9월 4일 (수) 오전 6:26, Luis Nassar <lrna...@ucsc.edu>님이 작성:

          ‍김홍비[학생](약학대학 한약학과)

          unread,
          Sep 5, 2019, 12:02:56 PM9/5/19
          to Luis Nassar, UCSC Genome Browser Discussion List
          Thank you very much.
          I sincerely appreciate your helping. You are so kind that you always reply kindly.
          Your help is very helpful for me.

          Your sincerely,
          Hong-Bi Kim

          2019년 9월 5일 (목) 오전 1:09, Luis Nassar <lrna...@ucsc.edu>님이 작성:
          Reply all
          Reply to author
          Forward
          0 new messages