genePredTogtf

585 views
Skip to first unread message

Ji Xiangjun

unread,
Sep 28, 2015, 12:20:22 PM9/28/15
to genome
To whom it may concern,

genePredToGtf instruction said that 'use a refFlat table or extended genePred table or file to include the gene_name attribute in the output'. However, it also said that 'This will not work with a refFlat table dump file.' Is there any contradiction between them?
And I don't know how to add refFlat table or extended genePred table in which parameter of genePredTogtf.
It will be appreciated if you can give me an answer.

Best reguards

Matthew Speir

unread,
Oct 6, 2015, 2:48:10 PM10/6/15
to Ji Xiangjun, genome
Hi Ji,

Thank you for your question about the genePredToGtf utility. One of our engineers notes that this is not a contradiction. The refFlat and genePred table structures are presented at http://genome.ucsc.edu/FAQ/FAQformat.html#format9. By default, the tool pulls data from a table in a MySQL database. When pulling data from a database, it asks for specific columns by name, which means it doesn't matter what order these columns are in. This is why the tool can work with a genePred, extended genePred, or refFlat table.

If, however, the tool is pulling data from a file, it has to guess which columns are the ones it wants. The tool assumes the file is in the genePred format at that point. It doesn't work with a refFlat file because the data are organized differently in these files. Note the extra "geneName" column in the description of the refFlat format. You may be able to remove this extra "geneName" column from the refFlat file and then convert it to GTF using the genePredToGtf tool.

You can use the genePredToGtf with a genePred file by specifying "file" as your database. For example:

    genePredToGtf file myGenePredFile.txt myOutput.gtf

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages