bcbio Java error, specific to gatk-framework

44 views
Skip to first unread message

Tanner Koomar

unread,
May 24, 2017, 12:49:21 PM5/24/17
to biovalidation
I have a bit of a mystery that I'm hoping someone can help me diagnose. TL;DR: the wrong version of Java is being invoked for `gatk-framework` on compute nodes (in both interactive sessions and a qsub jobs) of a new HPC cluster. However, bcbio-nextgen calls to `gatk` use the correct Java version. Strangest of all, the correct Java is used for both `gatk` and `gatk-framework` on login nodes.

Details: 
While testing bcbio-nextgen on a new HPC cluster, I was surprised to run into this java version error: 
CalledProcessError: Command 'set -o pipefail; unset JAVA_HOME && export PATH=/shared/wdata/bcbio/anaconda/bin:$PATH && /shared/wdata/bcbio/anaconda/bin/gatk-framework -Xms1500m -Xmx22932m -XX:+UseSerialGC -Djava.io.tmpdir=/shared/genome/SLI_WGS/joint/work/bcbiotx/tmpxQ_6NS -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T CombineVariants -R /shared/wdata/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa --out /shared/genome/SLI_WGS/joint/work/bcbiotx/tmpxQ_6NS/JOINT-joint-effects-combined.vcf.gz --variant:v0 /shared/genome/SLI_WGS/joint/work/joint/gatk-haplotype-joint/JOINT/JOINT-joint-effects-snp-SNPfilter.vcf.gz --variant:v1 /shared/genome/SLI_WGS/joint/work/joint/gatk-haplotype-joint/JOINT/JOINT-joint-effects-indel-INDELfilter.vcf.gz --rod_priority_list v0,v1 --genotypemergeoption PRIORITIZE --suppressCommandLineHeader --setKey null -nt 2
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/broadinstitute/gatk/engine/CommandLineGATK : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:803)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:442)
at java.net.URLClassLoader.access$100(URLClassLoader.java:64)
at java.net.URLClassLoader$1.run(URLClassLoader.java:354)
at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
' returned non-zero exit status 1

I made sure that bcbio's Java was the first in my path:
[login]$ which java
/usr/bin/java

[login]$ ln -s /shared/wdata/bcbio/anaconda/bin/java /shared/wdata/bcbio/bin/java

[login]$ which java
/shared/wdata/bcbio/bin/java

[login]$ java -version
openjdk version "1.8.0_102"
OpenJDK Runtime Environment (Zulu 8.17.0.3-linux64) (build 1.8.0_102-b14)
OpenJDK 64-Bit Server VM (Zulu 8.17.0.3-linux64) (build 25.102-b14, mixed mode)

Got the same error. Noticing that `gatk` and `gatk-framework` utilize different versions (3.7 vs 3.6), I thought to try `gatk` instead (`/shared/wdata/bcbio/anaconda/bin/gatk-framework` -> `/shared/wdata/bcbio/anaconda/bin/gatk`), and submit that command as a standalone job: 
INFO  11:20:47,472 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  11:20:47,482 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 
INFO  11:20:47,482 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
. . . etc. etc. etc. 

So the problem is in `gatk-framework` specifically, but when I test them, there is no Java error:
[login]$ gatk -version
3.7-0-gcfedb67
[login]$ gatk-framework -version
3.6-24-g59fd391

After an embarrassingly long time, an update to the newest development version of bcbio, and reinstalling `gatk-framework` as described here, I finally thought to try again by logging into a compute node with an interactive session:
[qlogin]$ gatk -version
3.7-0-gcfedb67
[qlogin]$ gatk-framework -version
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/broadinstitute/gatk/engine/CommandLineGATK : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:803)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:442)
at java.net.URLClassLoader.access$100(URLClassLoader.java:64)
at java.net.URLClassLoader$1.run(URLClassLoader.java:354)
at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)


Does the compute node honor my path's Java? I appears so:
[qlogin]$ which java
/shared/wdata/bcbio/bin/java

[qlogin]$ java -version
openjdk version "1.8.0_102"
OpenJDK Runtime Environment (Zulu 8.17.0.3-linux64) (build 1.8.0_102-b14)
OpenJDK 64-Bit Server VM (Zulu 8.17.0.3-linux64) (build 25.102-b14, mixed mode)


Which leaves me perplexed. I assume the problem is  with the HPC environment, not bcbio or `gatk-framework` -- but I'm at a loss for next steps to diagnose or rectify the situation. Any insights on what to try next would be helpful. 

Brad Chapman

unread,
May 24, 2017, 4:32:48 PM5/24/17
to Tanner Koomar, biovalidation

Tanner;
Thanks for the detailed report and sorry for all the problems. We've been
trying to make the Java usage as easy as possible by shipping java with bcbio
but apparently are still running issues.

The current behavior for both gatk-framework (at 3.6) and gatk (at 3.7) is to
use the java installed with anaconda by default. The only exception is that If
you have `JAVA_HOME` set (check with `echo $JAVA_HOME`) it will use that Java.
It doesn't use the java on your PATH at all. Is it possible that you have
different settings for JAVA_HOME on your login and compute nodes that explain
the differences when running from the commandline?

For bcbio, we unset JAVA_HOME so shouldn't run into that issue. For your
initial error -- was that from an older version of gatk-framework from last
year? If so, I believe the anaconda java issue was only fixed around July of
last year. You should have 3.6.24_1 which has this fix:

$ bcbio_conda list | grep gatk-framework
gatk-framework 3.6.24 1 bioconda

Hope that helps explain what is going on and gets things running cleanly for
you. Please let us know if you are still having issues,
Brad

> [ text/plain ]
> I have a bit of a mystery that I'm hoping someone can help me diagnose.
> TL;DR: the wrong version of Java is being invoked for `gatk-framework` on
> *compute* nodes (in both interactive sessions and a qsub jobs) of a new HPC
> cluster. However, bcbio-nextgen calls to `gatk` use the correct Java
> version. Strangest of all, the correct Java is used for both `gatk` and `
> gatk-framework` on *login *nodes.
> <https://github.com/chapmanb/bcbio-nextgen/issues/1524#issuecomment-241426147>,
> --
> You received this message because you are subscribed to the Google Groups "biovalidation" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biovalidatio...@googlegroups.com.
> To post to this group, send email to bioval...@googlegroups.com.
> Visit this group at https://groups.google.com/group/biovalidation.
> For more options, visit https://groups.google.com/d/optout.

Tanner Koomar

unread,
May 24, 2017, 5:13:40 PM5/24/17
to biovalidation, tanner...@gmail.com
Thank you for the (as always) quick reply, Brad. I should have included in my original post that I did check the `$JAVA_HOME` on both the login and compute node, and it is not set:
[login]$ echo $JAVA_HOME


[qlogin]$ echo $JAVA_HOME



Similarly, the  `gatk-framework` looks up to date.
[login]$ bcbio_conda list | grep gatk-framework 
gatk-framework            3.6.24                        1    bioconda

[qlogin]$ bcbio_conda list | grep gatk-framework 
gatk-framework            3.6.24                        1    bioconda


I suppose this leaves me with two questions:
1) Would it be safe/recommended to replace calls to `gatk-framework` with calls to `gatk` (since it is apparently unaffected)?
2) Is there a way to add some debugging to `gatk-framework` to determine which Java version it is attempting to use?

Brad Chapman

unread,
May 25, 2017, 6:21:30 AM5/25/17
to Tanner Koomar, biovalidation, tanner...@gmail.com

Tanner;
Thank you for following up and the additional details. I'm not sure what is
going on and the best way might be to manually debug what Java gets resolved.
If you edit the gatk-framework shell wrapper script to add `echo $java` here:

https://github.com/bioconda/bioconda-recipes/blob/master/recipes/gatk-framework/gatk-framework#L28

That should let us know which java it's using and we can try to debug more.

We can't replace gatk-framework with gatk right now, unfortunately, since the
licensing on those is different. This will change when we integrate the new
GATK4 which has one consistent open source license across all tools.

Hope the debugging helps us identify what is going on. Thanks for the help
digging into this,
Brad

> [ text/plain ]

Tanner Koomar

unread,
May 25, 2017, 6:42:36 PM5/25/17
to biovalidation, tanner...@gmail.com
This truly is odd. Adding a `print()` statement `gatk` and and `echo` to `gatk-framework`, it looks like  `gatk-framework` is never using the proper Java version -- but it is only throwing the error on the compute node: 

[login]$ gatk -version
/shared/wdata/bcbio/anaconda/bin/java
3.7-0-gcfedb67

[login] gatk-framework -version
/bin/java
3.6-24-g59fd391

[qlogin]$ gatk -version
/shared/wdata/bcbio/anaconda/bin/java
3.7-0-gcfedb67

[qlogin]$  gatk-framework -version
/bin/java

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/broadinstitute/gatk/engine/CommandLineGATK : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:803)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:442)
at java.net.URLClassLoader.access$100(URLClassLoader.java:64)
at java.net.URLClassLoader$1.run(URLClassLoader.java:354)
at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)



I really appreciate the help, but I also don't my anomalous situation to distract. I have gone back to running the pipeline on the older HPC cluster, which is still working fine. If I have to wait until GATK 4 is implemented into bcbio, it will not be the end of the world (in fact, my colleagues will probably appreciate not having to compete with my jobs on the newer cluster).

Brad Chapman

unread,
May 26, 2017, 9:24:39 AM5/26/17
to Tanner Koomar, biovalidation, tanner...@gmail.com

Tanner;
Thanks much for following up with these details. This helped me identify the
problem in the gatk-framework bash script. It was checking for the existence
of `$JAVA_HOME/bin/java` to determine whether to use a non-anaconda installed
java. Since $JAVA_HOME was unset, this led to checking `/bin/java`, finding it
on your machines and using it. `/bin/java` is not a standard java install
location, which is likely why we never hit it before. You're probably seeing
the difference between the login nodes and compute nodes because they have
different versions of java installed (Java 8 on the login node, Java 7 on the
compute nodes).

I pushed a fix to the wrapper script to avoid this, so if you update bcbio
tools or do:

bcbio_conda install -c conda-forge -c bioconda gatk-framework

It should grab the new version and now work cleanly. Thanks again for the help
tracking down the problem,
Brad

> [ text/plain ]
Reply all
Reply to author
Forward
0 new messages