A bug happened when converting gtf file to bed12 file using gtfTogenePred and genePredToBed

277 views
Skip to first unread message

Keren Zhou

unread,
Dec 8, 2015, 11:38:29 AM12/8/15
to genome
Dear UCSC Genome Team,

I'm a PhD student, and I want to convert the current version of mouse GENCODE annotation set vM5 to bed12 file. I know that there are two programs, named gtfTogenePred and genePredToBed, can complete this job.

When I used these two softwares to convert GENCODE annotation file to bed12 format file. But I found that there was a bug. When the conversion finished, I noticed that the 7th and 8th columns (thickStart and thickEnd) are the chromEND positions which should usually be set to the chromStart positions. Are there any bugs in these two programs?

Thank you! I hope for hearing from you!

Best wishes,
Zhoukr

Luvina Guruvadoo

unread,
Dec 14, 2015, 2:32:36 PM12/14/15
to Keren Zhou, genome
Hello Keren,

Thank you for your question. As it turns out, this is not a bug and is running as expected. If cdsStart and cdsEnd are the same, then the resulting thickStart and thickEnd in the BED file will also be the same. It actually does not matter what value is placed in those columns (txStart, txEnd, chromStart, or chromEnd). It's safer to put one of these values because the code demands that they are in bounds. For example, the first line of the GTF looks like this:
chr1    HAVANA  gene    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_status "KNOWN"; gene_name "4933401J01Rik"; level 2; havana_gene "OTTMUSG00000049935.1";

Converting it to genePred (note cdsStart=cdsEnd):

ENSMUST00000193812.1 chr1 + 3073252 3074322 3074322 3074322 1 3073252, 3074322,
And finally in BED format:
chr1    3073252 3074322 ENSMUST00000193812.1    0       +       3074322 3074322 0       1       1070,   0,

I hope this helps. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Regards,
Luvina
--
Luvina Guruvadoo
UCSC Genome Browser

http://genome.ucsc.edu




--






Reply all
Reply to author
Forward
0 new messages