bed12 for pygenomeTracks to show gene name

139 views
Skip to first unread message

Ashley S Doane

unread,
Dec 15, 2019, 2:36:42 AM12/15/19
to deepTools
Hi,

Wondering if it is possible with pygenomeTracks to have it print gene name instead of transcript ID from a bed12 file or other annotation track for genes?  I think a bed12 is supposed to have transcript ID or exon ID, otherwise the names will be duplicates, so perhaps a different format?

thanks,
Ashley

Vivek Bhardwaj

unread,
Dec 16, 2019, 7:41:49 AM12/16/19
to Ashley S Doane, deepTools
Hi

pyGenomeTracks shall use the "name" column of the bed/gtf file provided as input. So if you'd like to use the gene name you can use a bed file with names there. 

The names would be duplicated for duplicated gene/exon/transcripts field. If you want to collapse multiple transcripts as one gene, there's a newly added feature to do so using the GTF.  The config file would look something like this: 

[test gtf collapsed]
file = dm3_subset_BDGP5.78.gtf.gz
height = 10
title = gtf from ensembl one entry per gene
merge_transcripts = true
prefered_name = gene_name
fontsize = 12
file_type = bed

It takes quite a while to load though.

Best Wishes
Vivek

--------

Vivek Bhardwaj
Post-doc Researcher
Hubrecht Institute
3584CT Utrecht 
The Netherlands


--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deeptools/5f72e22b-ad8a-4364-9cc3-b354fc49d8a9%40googlegroups.com.

Ashley S Doane

unread,
Dec 17, 2019, 3:41:52 PM12/17/19
to deepTools
Thanks, I will try this!

Best,
Ashley


On Monday, December 16, 2019 at 7:41:49 AM UTC-5, Vivek Bhardwaj wrote:
Hi

pyGenomeTracks shall use the "name" column of the bed/gtf file provided as input. So if you'd like to use the gene name you can use a bed file with names there. 

The names would be duplicated for duplicated gene/exon/transcripts field. If you want to collapse multiple transcripts as one gene, there's a newly added feature to do so using the GTF.  The config file would look something like this: 

[test gtf collapsed]
file = dm3_subset_BDGP5.78.gtf.gz
height = 10
title = gtf from ensembl one entry per gene
merge_transcripts = true
prefered_name = gene_name
fontsize = 12
file_type = bed

It takes quite a while to load though.

Best Wishes
Vivek

--------

Vivek Bhardwaj
Post-doc Researcher
Hubrecht Institute
3584CT Utrecht 
The Netherlands


On Sun, Dec 15, 2019 at 8:36 AM Ashley S Doane <ashle...@gmail.com> wrote:
Hi,

Wondering if it is possible with pygenomeTracks to have it print gene name instead of transcript ID from a bed12 file or other annotation track for genes?  I think a bed12 is supposed to have transcript ID or exon ID, otherwise the names will be duplicates, so perhaps a different format?

thanks,
Ashley

--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deep...@googlegroups.com.

Ashley S Doane

unread,
Dec 17, 2019, 4:27:38 PM12/17/19
to deepTools
Btw, I got this warning:
UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the `disable_infer_transcripts` option to speed up database creation


It's still building the database, and I will try this option when it finishes.  I'm not sure what it would need to infer about transcripts from a complete gtf file.  I'm using the basic annotations gtf from gencode.

Also, I previously tried your suggestion of replacing the transcript with the corresponding gene_name in the bed12 file (leading to multiple rows having the same name).  I got the following error when doing this:
AssertionError: The number of blocks: 3 does not correspond tothe number of blocks sizes

 




On Monday, December 16, 2019 at 7:41:49 AM UTC-5, Vivek Bhardwaj wrote:
Hi

pyGenomeTracks shall use the "name" column of the bed/gtf file provided as input. So if you'd like to use the gene name you can use a bed file with names there. 

The names would be duplicated for duplicated gene/exon/transcripts field. If you want to collapse multiple transcripts as one gene, there's a newly added feature to do so using the GTF.  The config file would look something like this: 

[test gtf collapsed]
file = dm3_subset_BDGP5.78.gtf.gz
height = 10
title = gtf from ensembl one entry per gene
merge_transcripts = true
prefered_name = gene_name
fontsize = 12
file_type = bed

It takes quite a while to load though.

Best Wishes
Vivek

--------

Vivek Bhardwaj
Post-doc Researcher
Hubrecht Institute
3584CT Utrecht 
The Netherlands


On Sun, Dec 15, 2019 at 8:36 AM Ashley S Doane <ashle...@gmail.com> wrote:
Hi,

Wondering if it is possible with pygenomeTracks to have it print gene name instead of transcript ID from a bed12 file or other annotation track for genes?  I think a bed12 is supposed to have transcript ID or exon ID, otherwise the names will be duplicates, so perhaps a different format?

thanks,
Ashley

--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deep...@googlegroups.com.

Ashley S Doane

unread,
Dec 17, 2019, 7:49:40 PM12/17/19
to deepTools
Hi again,

So this worked and generated a plot with genes that have intron/exon information and named according to gene names.  It does take awhile (about an hour for 1 plot).   That's not so bad if the gene database it generates can be saved and reused for subsequent plots.  Is this possible?

Also, not sure what to make of the warnings: 
UserWarning: It appears you have a gene feature in your GTF file. You may want to use the `disable_infer_genes` option to speed up database creation
and 
UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the `disable_infer_transcripts` option to speed up database creation

I tried adding these options to the config file and on the command line, with no effect (command line gave an error).

Also, I tried plotting a very small area, and the time was the same- it seems it's making a genome-wide database, so I'm hoping the plan is to save the database for reuse.
Reply all
Reply to author
Forward
0 new messages