Using genome annotation

852 views
Skip to first unread message

Genomeo

unread,
Mar 5, 2014, 12:39:03 PM3/5/14
to igv-...@googlegroups.com
Hi,

Using IGV Version 2.3.31 under ubuntu 12.04 LTS. -Xmx4000m

1) I want to load the ensembl gene set file Homo_sapiens.GRCh37.75.sorted.gtf with File> Load from File but is taking infinitely long and then it hangs. I have the index file in the same directory. Ensembl does not seem to provide a per-chromosome sub-files which I guess will make it easier. Do users commonly have to analyse only one portion of the genome at a time?

2) If I open Genomes > Load Genome from File and then hit cancel, IGV starts doing something in the background and freezes.

3) File> Load from Server > Tutorals seems to contain empty data sets. Am I missing anything fundamental about how these should be used?

Thanks,

G.



Hubert Rehrauer

unread,
Oct 20, 2014, 7:40:43 AM10/20/14
to igv-...@googlegroups.com
Similar problem here. I can't get IGV started with any .gtf file from Ensemble. It always takes all CPUs for a long time and then nothing seems to happen??

I understand that it seems related to the number of the features and the size of the files. Can someone provide more specific advice on how to make Ensembl GTF files display in IGV?

Many thanks for your help.

Best regards
Hubert

Jim Robinson

unread,
Oct 20, 2014, 7:56:29 AM10/20/14
to igv-...@googlegroups.com
Hi Hubert,

It could be the size, could you send me a URL to one of the files you
are trying to load?

Large annotation files should be indexed, which you can do with igvtools
or tabix.

Jim

Jim Robinson

unread,
Oct 20, 2014, 8:00:31 AM10/20/14
to igv-...@googlegroups.com
Also, to expound on the original question,  a gtf file is an annotation file,  which should be loaded from the File menu (not the Genomes menu).  If loaded from the Genomes menu IGV will try to treat it as a fasta file, which probably is leading to the freeze behavior.  


1) I want to load the ensembl gene set file Homo_sapiens.GRCh37.75.sorted.gtf with File> Load from File but is taking infinitely long and then it hangs. I have the index file in the same directory. Ensembl does not seem to provide a per-chromosome sub-files which I guess will make it easier. Do users commonly have to analyse only one portion of the genome at a time?

2) If I open Genomes > Load Genome from File and then hit cancel, IGV starts doing something in the background and freezes.

3) File> Load from Server > Tutorals seems to contain empty data sets. Am I missing anything fundamental about how these should be used?

Thanks,

G.



--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/3bb6893b-0c3b-472c-90ab-9140c94fdd88%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hubert Rehrauer

unread,
Oct 21, 2014, 7:12:51 AM10/21/14
to igv-...@googlegroups.com
Hi Jim

Many thanks for your quick reply. Yes probably it's the size it is nearly 1GB. The file is here:

http://fgcz-gstore.uzh.ch/public/genes.gtf

I now sorted and indexed it and loaded it into IGV, which was fast. But now I lost the possibility to browse to genes by name. If I enter a gene name in the locus field, IGV would not recognize it.

Best regards
Hubert

Hubert Rehrauer

unread,
Oct 21, 2014, 7:24:51 AM10/21/14
to igv-...@googlegroups.com
Hi Jim

further update: If I put the sorted gtf and the index in a .genome archive. It again fails to load. It uses all CPUs at 100% but nothing happens.

It seems that it only works if I load the gtf separately.

best regards
hubert

Jim Robinson

unread,
Oct 21, 2014, 7:51:00 AM10/21/14
to igv-...@googlegroups.com
Right, you can't put an index in a .genome file as its zipped,  the index would be invalid.   How did you manage to do that?   The .genome file predates indexed fastas,  there's really no reason to use them anymore unless you like the annotation file loaded automatically.

Jim
--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Jim Robinson

unread,
Oct 21, 2014, 8:01:18 AM10/21/14
to igv-...@googlegroups.com
Hi Hubert,

I'm reading my email today in reverse order, so I just saw this.  Indexed files don't support search by name because that would require loading the entire file,  defeating the purpose of the index.    You could extract just the gene records using grep into a new gtf file,  then load that non-indexed or put it in the .genome file.    The command would be

grep '    gene    ' genes.gtf > genes.reduced.gtf

The tabs around gene are important or you will get every line.  Type "ctrl-v" before typing each tab.   I'm downloading your file now to try it but it is taking a long time.

As you noted this is a problem with all enemble gtfs I've made a note to look into a better solution,  but a 1GB file on a client machine is a difficult problem.  "bed" files are much more effecient,  often 1/10th the size,  however this is at the cost of some information.

Jim



--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.

Jim Robinson

unread,
Oct 21, 2014, 9:26:32 AM10/21/14
to igv-...@googlegroups.com
Hi,  I downloaded your genes.gtf file and extracted just the "genes" records as described above.  The resulting file is small enough to load into igv,  and allows searching by gene name.   You loose the exon information, but you can load that separately with the indexed gtf you already have. 

The file is small enough to attach here.  You don't need to gunzip it to load, it can be loaded as is.  However if you use it in a ".genome" file you should gunzip it first.

Jim

genes.only.gtf.gz

Hubert Rehrauer

unread,
Oct 22, 2014, 2:47:17 PM10/22/14
to igv-...@googlegroups.com
Hi Jim

many thanks for your help. That makes things clearer.

From your previous post, I understand that you consider .genome files as outdated since now indexed fasta can be used. But I can't see how we can do without .genome files.

Our situation is:
We are a bioinformatics core and run data analyses for many research groups, also with custom genomes,
Therefore we host a "genome server" for IGV with the corresponding .genome files. Users then get a link to a jnlp file that specifies our server and a session XML file to be loaded. That way the users see their data just with one (double-)click. My understanding is that I can only provide this if I rely on .genome files.

regards
hubert

Jim Robinson

unread,
Oct 22, 2014, 9:29:20 PM10/22/14
to igv-...@googlegroups.com
Hi Hubert,

For your purposes perhaps .genome files might be more convenient.
However to answer your question, another approach would be to include
the URL to the fasta file in the session XML, and then one or more
annotation tracks (again in the session xml). In this way you aren't
limited to a single annotation track per genome. For example, you
could include both the abbreviated "gene only" annotation file to
support searching, as well as the fully indexed file. This would save
you the trouble of creating .genome files. However we will continue
to support them, I didn't mean to imply we would not.


Jim
Reply all
Reply to author
Forward
0 new messages