Loading large reference genome

1,899 views
Skip to first unread message

Marisa Miller

unread,
Dec 10, 2014, 6:53:09 PM12/10/14
to igv-...@googlegroups.com
Hello,
  I am trying to load a large reference genome (hexaploid wheat genome, ~10G file) and it is taking quite a while. The memory usage of IGV goes up to 9320M when trying to load. I am wondering if there are any strategies to make the loading time faster or to increase memory to IGV? My laptop has 16G total memory.

   Thank you,
      Marisa Miller

Jim Robinson

unread,
Dec 10, 2014, 7:15:26 PM12/10/14
to igv-...@googlegroups.com
Hi,

When you say genome what are you referring to exactly?  A fasta file,  a gff or gtf file,  or something else?

The fasta file is not the issue, as it is indexed.  It is most likely the annotation file (gff, gtf, bed).   If that is the case,  there are 2 strategies.  (1) reduce the size of the file by extracting a subset of the key features, what you keep depends on the file type to some extent.   Or (2) don't create a ".genome" file,  instead load the fasta and an INDEXED annotation file separately.    I recommend tabix for indexing the annotation file, its available in the samtools package, but you can also use IGV.   To do it in igv first load the fasta (Genomes > Load genome from file...   > select fasta),  then open the igvtools window from the Tools menu.   Use the pulldown to select index and select your annotation file.   Running this will produce an index file with an "idx" extension.   Keep this file co-located with the annotation file, then load the annotation file from the file menu as you normally would.   The index will be found automatically.

Jim

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/b4a5309c-b58b-46bb-9280-d5423333068e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marisa Miller

unread,
Dec 13, 2014, 11:17:05 AM12/13/14
to igv-...@googlegroups.com
Hi Jim,
  I had originally created a .genome file using a fasta file (sorted and indexed using samtools sort and faidx), and I have an annotation file in .gtf format. As you suggested I indexed the .gtf file using samtools, and am currently trying to load the genome (the .fa file) and then load the annotation file separately. However, when I try to load the .fa file (~10GB) IGV just times out and the file never loads. Do you have any suggestions for loading this file?

   Thanks,
     Marisa

Jim Robinson

unread,
Dec 13, 2014, 12:29:18 PM12/13/14
to igv-...@googlegroups.com
Is the fasta index is in the same directory as the fasta file,  with the standard name (fast file name + .fai)?    How are you loading the fasta file (exact menu option, send a screenshot if needed).


Marisa Miller

unread,
Dec 14, 2014, 9:25:31 AM12/14/14
to igv-...@googlegroups.com
Hi Jim,
  It is is the same directory and the file name is the exact same as the .fa file but with the addition of .fai to the end (so .fa.fai). I am loading the file (not a .genome file) from Genomes > load genome from file.

    Thanks,
     Marisa
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/3lL09EkGrAo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/548C776A.8080205%40broadinstitute.org.

Jim Robinson

unread,
Dec 14, 2014, 4:19:41 PM12/14/14
to igv-...@googlegroups.com
There's nothing obvious that would cause that, the size of the file is not an issue.  How many sequences are in this genome (chromosomes, contings, whatever)?

Marisa Miller

unread,
Dec 15, 2014, 2:09:19 PM12/15/14
to igv-...@googlegroups.com
Hi Jim,
  In the fasta file, it looks like there is 10776707 (at least based on grep -c ">").

I have another question for you as well. I have successfully loaded the wheat mitochondrial genome into IGV (http://www.ncbi.nlm.nih.gov/nuccore/81176508), but when I try to load the .gff file (converted from genbank to gff) I cannot see any annotation appear. Do you have any suggestions on how to get the .gff file to load properly?

     Thanks,
       Marisa

Jim Robinson

unread,
Dec 15, 2014, 3:31:35 PM12/15/14
to igv-...@googlegroups.com
That might be the issue,  how large is the fai file?  IGV wasn't really designed for this many contigs.

Marisa Miller

unread,
Dec 15, 2014, 4:47:31 PM12/15/14
to igv-...@googlegroups.com
The fai file is ~500MB.

Jim Robinson

unread,
Dec 15, 2014, 4:49:17 PM12/15/14
to igv-...@googlegroups.com
That's huge.  Again, IGV isn't designed for this case,  however other people have used it with some success.   I suggest you start IGV with as much memory as possible.  How are you launching it now?

Jim

Marisa Miller

unread,
Dec 15, 2014, 4:59:59 PM12/15/14
to igv-...@googlegroups.com
I am on a Mac so I have tried using the Mac app, as well as the "launch with 10 GB" java web start. I have 16GB total memory on my computer, so I'm not sure it will be enough for this task.

   Thanks,
    Marisa

Jim Robinson

unread,
Dec 15, 2014, 5:04:06 PM12/15/14
to igv-...@googlegroups.com
The "launch with 10 GB"  should definitely be more than enough memory to load the fasta. 

I am a bit confused, however.   The genome referenced below has a single sequence,  is this what you are trying to load?


I am on a Mac so I have tried using the Mac app, as well as the "launch with 10 GB" java web start. I have 16GB total memory on my computer, so I'm not sure it will be enough for this task.

   Thanks,
    Marisa

....  mitochondrial genome into IGV (http://www.ncbi.nlm.nih.gov/nuccore/81176508),

Marisa Miller

unread,
Dec 15, 2014, 5:24:14 PM12/15/14
to igv-...@googlegroups.com
Hi Jim,
  I am working with both the nuclear and mt genomes of wheat. The one you linked below is a different genome than the large nuclear one (ftp://ftpmips.helmholtz-muenchen.de/plants/wheat/IWGSC/genome_assembly/genome_arm_assemblies_CLEANED/).

The question I asked a couple of messages before was about loading the .gff file for the mt genome. When I convert the genbank file for the mt genome into a .gff file and try to load it, I still am not able to see any of the annotation in IGV.

   Thanks,
     Marisa
--

---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/3lL09EkGrAo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.

Jim Robinson

unread,
Dec 15, 2014, 6:52:19 PM12/15/14
to igv-...@googlegroups.com
OK.  RE the mt genome,  just use the genbank file (extension .gbk, rename if neccessary).   It includes both sequence and annotation.  Load it from the genome menu.

RE the wheat assembly,  IGV might not be the right tool for unfinished assemblies like this with that many contigs. 

Jim


You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/548F5F8C.8060308%40gmail.com.

Marisa Miller

unread,
Feb 25, 2015, 12:42:39 PM2/25/15
to igv-...@googlegroups.com
Hi Jim,

Sorry to reply to the same e-mail after quite a while, but I am just getting back to using IGV and have run into an issue.

When I load a .gbk mitochondrial genome from the genome menu (I do not create a .genome file) and then try to load a sorted and indexed bam file I am getting the error " 1_cs_mt.ab.sorted.bam
does not contain any sequence names which match the current genome." I know this can be fixed by placing an alias file in users/admin/igv/genomes, which I have done. However, this is not fixing the issue.

Do you have any recommendations for how to fix this problem? I have attached all the relevant files here for your information.

Thank you,
Marisa


I've linked 1 file to this email:
Mozilla Thunderbird makes it easy to share large files over email.

1_cs_mt.ab.sorted.bam.bai
tr_aestivum_cs_mt_NC_007579.gbk
tr_aestivum_mt_alias.tab

Jim Robinson

unread,
Mar 4, 2015, 10:10:22 AM3/4/15
to igv-...@googlegroups.com
Change the sequence name in your gbk file to match the name in your bam file.

Jim

Reply all
Reply to author
Forward
0 new messages