kgTxInfo.txt for HG38

37 views
Skip to first unread message

Klaus Schmitz

unread,
Aug 14, 2020, 11:22:43 AM8/14/20
to gen...@soe.ucsc.edu
Dear UCSC:

Thank you for your fantastic job you do for all researchers in genetics with your genome browser.

I have used many of your annotations for Human diagnostic at children Hospital in Boston.

I have a question, I hope you can help me.

I used for many years I have used an annotation “kgTxInfo.txt”. It is hg18,. HG19 but not for hg38. I am working to move all my WES data to HG38 but I need this file. Could you let me know when will be available or if you have it with a different name?

Thank you, looking forward for your replay,

Klaus

------------------------------------------------------------------------
Klaus Schmitz Abe, PhD
Research Instructor at:
Division of Genetics and Genomics, Children's Hospital Boston
Harvard Medical School
Manton Center for Orphan Disease Research
Tel: +1 617- 919- 4798
Cell: +1 617- 319- 8182
Email: Klaus.Sc...@childrens.harvard.edu
Email: kl...@broadinstitute.org
3 Blackfan Circle CLS15072, Boston MA 02115
------------------------------------------------------------------------------


Matthew Speir

unread,
Aug 14, 2020, 2:53:44 PM8/14/20
to Klaus Schmitz, UCSC Genome Browser Discussion List
Hello, Klaus. 

Thank you for your question about the kgTxInfo table for the human genome assembly hg38.

This table was discontinued when transitioned to using GENCODE as the basis for the gene models in the knowGene, known*, and kg* tables in 2016. Can you tell us what fields you were using in these previous versions? We might be able to point you to the same or similar information in other tables. 

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Training videos & resources: http://genome.ucsc.edu/training/index.html

Want to share the Browser with colleagues? Host a workshop: http://bit.ly/ucscTraining

---

Matthew Speir

UCSC Cell Browser, Quality Assurance and Data Wrangler

Human Cell Atlas, User Experience Researcher

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/27861FE3-9CEE-4430-BEF7-4FCBDA0BEF92%40broadinstitute.org.

Klaus Schmitz

unread,
Aug 17, 2020, 5:07:28 PM8/17/20
to Matthew Speir, UCSC Genome Browser Discussion List
Dear Matthew:

Thank you so much for your email.

It is a mystery how UCSC decide if one Transcript is noncoding or coding. In hg19, the annotation kgTxInfo have 2 columns (“category” and “Description”). They help me to decide if the transcript is coding or noncoding. 

For RefGene and ensemble gene annotation, it is easy using “cdsStartStat” and “cdsEndStat" columns

Do you have another table in HG38 I can use if to know exactly if the transcript is coding or noncoding?

My best regards, and thank you again,
KSA

Matthew Speir

unread,
Aug 19, 2020, 4:06:54 PM8/19/20
to Klaus Schmitz, UCSC Genome Browser Discussion List
Hello, Klaus. 

For any gene track (refGene, ncbiRefSeq, ensGene, knownGene, etc), you can tell if a transcript is non-coding by looking to see if the thickStart column is equal to the thickEnd column. If thickStart == thickEnd, then this means a transcript is non-coding. 

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Training videos & resources: http://genome.ucsc.edu/training/index.html

Want to share the Browser with colleagues? Host a workshop: http://bit.ly/ucscTraining

--- 

Matthew Speir

UCSC Cell Browser, Quality Assurance and Data Wrangler

Human Cell Atlas, User Experience Researcher

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.


Klaus Schmitz

unread,
Aug 19, 2020, 6:21:14 PM8/19/20
to Matthew Speir, UCSC Genome Browser Discussion List
Hi Matthew:

Thank you again for your soon replay. If the answer is so easy, I will be very so happy. So, please let me understand.

For RefGene.txt or ensGene.txt I use the following columns as the files do not contain columns names

c("bin","name","chrom","strand","txStart","txEnd","cdsStart","cdsEnd","exonCount","exonStarts","exonEnds","score","name2","cdsStartStat","cdsEndStat","exonFrames")

Which one is thickStart or thickEnd? 

Sorry if this question is annoying for you,

Thank you again,
KSA

Brian Lee

unread,
Aug 19, 2020, 7:16:37 PM8/19/20
to Klaus Schmitz, Matthew Speir, UCSC Genome Browser Discussion List

Dear Klaus,

Thank you for using the UCSC Genome Browser and your question about non-coding genes. 

You can use setting cdsStart equal to cdsEnd to be equivalent to thickStart = thickEnd for identifying non-coding genes. 

In the browser, the start of coding in an annotation is designated by the beginning of a solid block explaining the naming of thinkEnd/Start in the BED format: http://genome.ucsc.edu/FAQ/FAQformat.html#format1

In the gene prediction (genePred) format the naming for thinkEnd/Start was replaced by the more clear cdsStart/End: http://genome.ucsc.edu/FAQ/FAQformat.html#format9

When using the tables of data by setting these two columns to identical coordinate ( cdsStart = cdsEnd) in essence it means that the annotation has no coding region, or would not display as a thick item (designating coding).  

To help illustrate this in an example here is a session where a gene, INE2/ENST00000630399.1, is highlighted that is non-coding: http://genome.ucsc.edu/s/brianlee/nonCoding

This INE2/ENST00000630399.1 data is stored in this way where the 7th and 8th fields are equivalent for cdsStart = cdsEnd, explaining why no thick item is drawn on the Browser to designate a coding region:

ENST00000630399.1 chrX - 15785715 15787589 15785715 15785715 1 15785715, 15787589, uc004cxf.1

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,




--

Klaus Schmitz

unread,
Aug 21, 2020, 11:57:20 AM8/21/20
to Brian Lee, Matthew Speir, UCSC Genome Browser Discussion List
Dear Brian Lee:

You answer is excellent, thank you so much. I was looking for this answer for years.

Best regards,
Klaus
Reply all
Reply to author
Forward
0 new messages