bed12 for pygenomeTracks to show gene name

Ashley S Doane

unread,

Dec 15, 2019, 2:36:42 AM12/15/19

to deepTools

Hi,

Wondering if it is possible with pygenomeTracks to have it print gene name instead of transcript ID from a bed12 file or other annotation track for genes? I think a bed12 is supposed to have transcript ID or exon ID, otherwise the names will be duplicates, so perhaps a different format?

thanks,

Ashley

Vivek Bhardwaj

unread,

Dec 16, 2019, 7:41:49 AM12/16/19

to Ashley S Doane, deepTools

Hi

pyGenomeTracks shall use the "name" column of the bed/gtf file provided as input. So if you'd like to use the gene name you can use a bed file with names there.

The names would be duplicated for duplicated gene/exon/transcripts field. If you want to collapse multiple transcripts as one gene, there's a newly added feature to do so using the GTF. The config file would look something like this:

[test gtf collapsed]
file = dm3_subset_BDGP5.78.gtf.gz
height = 10
title = gtf from ensembl one entry per gene
merge_transcripts = true
prefered_name = gene_name
fontsize = 12
file_type = bed

It takes quite a while to load though.

Best Wishes

Vivek

--------

Vivek Bhardwaj

Post-doc Researcher

Hubrecht Institute

3584CT Utrecht

The Netherlands

--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deeptools/5f72e22b-ad8a-4364-9cc3-b354fc49d8a9%40googlegroups.com.

Ashley S Doane

unread,

Dec 17, 2019, 3:41:52 PM12/17/19

to deepTools

Thanks, I will try this!

Best,

Ashley

On Monday, December 16, 2019 at 7:41:49 AM UTC-5, Vivek Bhardwaj wrote:

Hi

pyGenomeTracks shall use the "name" column of the bed/gtf file provided as input. So if you'd like to use the gene name you can use a bed file with names there.

The names would be duplicated for duplicated gene/exon/transcripts field. If you want to collapse multiple transcripts as one gene, there's a newly added feature to do so using the GTF. The config file would look something like this:

[test gtf collapsed]
file = dm3_subset_BDGP5.78.gtf.gz
height = 10
title = gtf from ensembl one entry per gene
merge_transcripts = true
prefered_name = gene_name
fontsize = 12
file_type = bed

It takes quite a while to load though.

Best Wishes
Vivek

--------

Vivek Bhardwaj
Post-doc Researcher
Hubrecht Institute
3584CT Utrecht
The Netherlands

On Sun, Dec 15, 2019 at 8:36 AM Ashley S Doane <ashle...@gmail.com> wrote:

Hi,

Wondering if it is possible with pygenomeTracks to have it print gene name instead of transcript ID from a bed12 file or other annotation track for genes? I think a bed12 is supposed to have transcript ID or exon ID, otherwise the names will be duplicates, so perhaps a different format?

thanks,
Ashley

--
You received this message because you are subscribed to the Google Groups "deepTools" group.

To unsubscribe from this group and stop receiving emails from it, send an email to deep...@googlegroups.com.

Ashley S Doane

unread,

Dec 17, 2019, 4:27:38 PM12/17/19

to deepTools

Btw, I got this warning:

UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the `disable_infer_transcripts` option to speed up database creation

It's still building the database, and I will try this option when it finishes. I'm not sure what it would need to infer about transcripts from a complete gtf file. I'm using the basic annotations gtf from gencode.

Also, I previously tried your suggestion of replacing the transcript with the corresponding gene_name in the bed12 file (leading to multiple rows having the same name). I got the following error when doing this:

AssertionError: The number of blocks: 3 does not correspond tothe number of blocks sizes

On Monday, December 16, 2019 at 7:41:49 AM UTC-5, Vivek Bhardwaj wrote:

Hi

pyGenomeTracks shall use the "name" column of the bed/gtf file provided as input. So if you'd like to use the gene name you can use a bed file with names there.

The names would be duplicated for duplicated gene/exon/transcripts field. If you want to collapse multiple transcripts as one gene, there's a newly added feature to do so using the GTF. The config file would look something like this:

[test gtf collapsed]
file = dm3_subset_BDGP5.78.gtf.gz
height = 10
title = gtf from ensembl one entry per gene
merge_transcripts = true
prefered_name = gene_name
fontsize = 12
file_type = bed

It takes quite a while to load though.

Best Wishes
Vivek

--------

Vivek Bhardwaj
Post-doc Researcher
Hubrecht Institute
3584CT Utrecht
The Netherlands

On Sun, Dec 15, 2019 at 8:36 AM Ashley S Doane <ashle...@gmail.com> wrote:

Hi,

Wondering if it is possible with pygenomeTracks to have it print gene name instead of transcript ID from a bed12 file or other annotation track for genes? I think a bed12 is supposed to have transcript ID or exon ID, otherwise the names will be duplicates, so perhaps a different format?

thanks,
Ashley

--
You received this message because you are subscribed to the Google Groups "deepTools" group.

To unsubscribe from this group and stop receiving emails from it, send an email to deep...@googlegroups.com.

Ashley S Doane

unread,

Dec 17, 2019, 7:49:40 PM12/17/19

to deepTools

Hi again,

So this worked and generated a plot with genes that have intron/exon information and named according to gene names. It does take awhile (about an hour for 1 plot). That's not so bad if the gene database it generates can be saved and reused for subsequent plots. Is this possible?

Also, not sure what to make of the warnings:

UserWarning: It appears you have a gene feature in your GTF file. You may want to use the `disable_infer_genes` option to speed up database creation

and

UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the `disable_infer_transcripts` option to speed up database creation

I tried adding these options to the config file and on the command line, with no effect (command line gave an error).

Also, I tried plotting a very small area, and the time was the same- it seems it's making a genome-wide database, so I'm hoping the plan is to save the database for reuse.

Reply all

Reply to author

Forward