Genbank .gbk file gives error when loading

627 views
Skip to first unread message

WaltL

unread,
Sep 19, 2013, 4:33:23 PM9/19/13
to igv-...@googlegroups.com
I have tried to get this to work with different IGV versions .gbk files... so far no luck.

1)  I am using the Linux distro running on an ubuntu (12.04 LTS) box, and I have tried this with IGV versions 2.3.11 and 2.3.18
2)  I am trying to load a .gbk file for the MTB strain H37Rv, specifically NC_000962.3
3)  I obtained 2 different .gbk files: the first from the above NCBI fasta accession page by selecting "Genbank full" and saving the file locally, (13.4 MB file), and the second from the NCBI ftp server (NC_000962.gbk, 17.1MB)... not positive this is for an identical accession, i.e. ...2.3.gbk
4) Using the "Load genome from file" route, both files were tried, and both IGV versions give the same error message:

Using the first file, I get:
ERROR [2013-09-19 16:01:22,086]  [MessageUtils.java:60] [AWT-EventQueue-0]  For input string: "order3593369"
Using the second file, I get:
ERROR [2013-09-19 16:03:32,216]  [MessageUtils.java:60] [AWT-EventQueue-0]  For input string: "order625"

5)  I am able to load the fasta and gff with no issues, so there's the workaround, but I am curious as to why the supported .gbk format will not load.

i'll be happy to provide the java console output for these errors, if that would help.

Thanks,

Walt


 

Jim Robinson

unread,
Sep 19, 2013, 8:17:15 PM9/19/13
to igv-...@googlegroups.com
Could you give us the full ftp path to these .gbk files?

Thanks

Jim

WaltL

unread,
Sep 20, 2013, 8:55:20 AM9/20/13
to igv-...@googlegroups.com

Jim-


The gbk I downloaded from the accession page (the one that is the specific/latest genome version that I know want to use) is here: http://www.ncbi.nlm.nih.gov/nuccore/NC_000962.3

Thanks,

Walt 
 

Jim Robinson

unread,
Sep 25, 2013, 4:38:48 PM9/25/13
to igv-...@googlegroups.com
Hi Walt,

Thanks for the test file.  The problem was with IGV's treatment of the "order" operator,  as in 
   
     misc_feature    order(625..648,811..813,910..912)

We are now treating "order" as a synonym for "join" for display purposes.    This change will be reflected in tonight's build, and also in the next bug-fix release at the end of the week.

-- Jim

WaltL

unread,
Sep 26, 2013, 12:35:09 PM9/26/13
to igv-...@googlegroups.com
Thanks, Jim, for the explanation and quick resolution.  Much appreciated!

Best,

Walt 

WaltL

unread,
Dec 11, 2014, 5:53:09 PM12/11/14
to igv-...@googlegroups.com
Hi Jim,

I haven't worked on this project in awhile and I'm having a slightly related problem, so figured I'd just add to this thread.  I am currently running Xubuntu 14.04 LTS and IGV ver. 2.3.36 (44).  When first I started this post, I was variant calling with GATK against the H37Rv genome above.  After the .gbk issue was fixed last year I wrapped things up and moved on. I recently revisited this same dataset and noticed that if I loaded the .gbk file followed by my .vcf file, there were no variants shown, "no variants found" msg. This was definitely not the case the last time I worked with these same files.  To make sure nothing had been corrupted, I downloaded the .gb file again from NCBI  (as an aside, I guess most users know that the NCBI extension .gb must be changed to .gbk in order for it to load in IGV).  This loaded fine, but I still couldn't see any variants.  However, if I load the .fasta and the .vcf, I can see the variants, as expected.  If I load the the fasta, gff and vcf, I see variants and their cognate genes.  If I create a .genome file using that same fasta and gff, I can load and see the variants, but now the gene field is empty, i.e. it says "gene" rather than "annotations" on the sidebar and the field isn't populated.

So, I'm curious about this behavior since I use to be able to load the .gbk and .vcf and Voila... I could see everything!  I have 2 other, older variant projects where this same problem occurs.  There's obviously a workaround, I load 3 files, but I was wondering if you had any thoughts as to what may have changed in these later IGV versions that would cause this and how I might go about fixing/testing the issue. 

Best,
Walt

Jim Robinson

unread,
Dec 11, 2014, 6:13:23 PM12/11/14
to igv-...@googlegroups.com
Hi Walt,

All of these issues sound like mismatches in sequence names.  Are you user the sequence names in the .gb (I didn't realize the extension had been changed) are exactly the same as those in the fasta and vcf files?  You should be able to verify this visually in IGV by just examining the names in the chromsome/sequence dropdown.

I'll investigate if you can send me links to gbk and fasta file, and send a small sample of your vcf.   You can send the vcf sample privately to igv-...@broadinstitute.org

Jim

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/2d243b93-58b8-4f1c-89f2-fc13a2403c5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages