Could Not Load Hsdis-amd64.dll Library Not Loadable Printassembly Is Disabled

35 views

Skip to first unread message

Consuela Ellett

unread,

Aug 3, 2024, 11:59:47 AM8/3/24

to inparawea

I wrote simple java program and I am getting A fatal error with Could not load hsdis-amd64.dylib; library not loadable; PrintAssembly is disabled. I tried to read several block but did not find solution. Can anyone please help me?Here is complete error message

The PrintAssembly HotSpot option is a diagnostic flag for the JVM that allows us to capture the generated assembly instructions of the JIT compiler. This requires the latest OpenJDK release or a new version of HotSpot, update 14 or above.

Java HotSpot(TM) 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled

You can see that the places and names you were trying are definitely among those the JDK searches (in my case, it probably would have searched more places, but stopped since the last location above is where it found the shared object).

Hey,
I am using axon in the project. Today, when I restart axon service, I am prompted with an openjdk related error: Could not load hsdis amd64.so; library not loadable; PrintAssembly is disabled, how should I solve it?
Help will be appreciate,
Regards

So you are packaging and running your application in docker.
I forgot to ask which version of the framework are you using, and if you are connecting your application to an AxonServer instance or if you are running on another FOSS stack.

This article briefly explains the prupose of vectorization, how it currently works in Java, and how to check if it's applied in a Java program. This knowledge can be turned into top-notch performance optimizations for arithmetic algorithms.
These techniques are low-level and suitable for special cases only. If you have a standard Java program that you want to performance-optimize, then you should first use other optimization techniques available. Only if you already optimized your Java code using other techniques, and you profiled it thereafter and you concluded that a part focusing on arithmetic calculations might run even faster with parallelization, then this article might be useful to you.

Typically, a program's code is executed serially. That means individual commands or statements are executed in sequence, one after another. Arithmetically focused programs are programs which do lots of calculations on numbers. Usually, these programs process big amounts of data, and many pieces of information get processed in the same way, one after another. E.g. for a simulation of 1000 particles there will be a step in the simulation which updates the location s of each particly with its current velocity v: s = s + v. This has to be done for every particle, i.e. for an array of particles s[0] to s[999].
If you can combine a few particles to a batch and process the batch in one go, this speeds things up. For the above example and a group of 4 particles, this means to group the calculations like follows:

More generally speaking, this does one kind of operation on multiple data elements at once. On the level of assembly code there are instructions specifically for these grouped operations. Therefor this concept is called single instruction, multiple data, abbreviated SIMD.

The SIMD instruction for + is called addps (SSE instruction set) or vaddps (AVX instruction set) on x86 CPUs. It takes two groups as operands where each group has either 4 elements (SSE) or 8 elements (AVX). It adds each element of one group to the corresponding element of the other group. In the above example, s[0..3] is one group and v[0..3] is the other group. The resulting x86 assembly code is:

SIMD is the name of the concept given from the perspective of instruction designers, namely the CPU manufacturers. But that's not the only perspective. In mathematics, ordered groups of a fixed number of elements (s[0..3] and v[0..3]) are called vectors. Therefor SIMD instructions are also called vector instructions. This is just another perspective on the same thing, this time from the user of the instructions.

Vectorization is the usage of vector instructions to speed up program execution. Vectorization can be done by a programmer or the possibilities for vectorization can be automatically realized by a compiler. In the latter case it's called auto vectorization.

After writing a Java program, the Java source code in Java-files gets compiled to bytecode and saved to class-files. And then, before or during the execution of the program, its bytecode is usually compiled again, this time from bytecode to native machine code. This latter compilation is usually done while the program is executed, hence it's a JIT compilation.

In Java, currently vectorization is not done by the programmer1, but it's done automatically by the compiler. The compiler takes in standard Java bytecode and automatically determines which part can be transformed to vector instructions. Common Java environments like OpenJDK or Oracle's Java can produce vectorized machine code.

In Snippet 1, the statement a[i] = a[i] * a[i]; gets executed on many consecutive elements of the array a. The compiler can check this and instead of doing every * individually, it can use a vector instruction to calculate multiple results at once.

To see the generated vector instructions as assembly code, we first have to create a compilable and runnable Java program which can benefit from vector instructions. For this, take the above for loop and put it into a Java file, into the square(...) method, along with a main(...) method. Write code so that square(...) is executed a million times, or at least a few hundreds of thousands of times. This convinces the compiler that square(...) is a method worth optimizing to the fullest. square(...) is then said to be "running hot" or to contain a "hot loop". This running hot is achieved by the for loop in main(...). So we have two loops, one in main(...) and one in square(...). The hot loop is the one in square(...).

Step 7. prints lots of information to the console, part of it being the disassembled native machine code. If you see lots of messages but no assembly instructions like mov, push, add, etc, then maybe you can find the following message somewhere in the output:
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
If you see this message, it means that Java couldn't find the file hsdis-amd64.so - it's not in the right directory or it doesn't have the right name. On Linux, this also happens when you created a symlink. Java doesn't accept symlinks here. Instead, you have to copy the file.

hsdis-amd64.so is the disassembler which is required for showing the resulting native machine code. After the JIT compiler compiles the Java bytecode to native machine code, hsdis-amd64.so is used to disassemble the native machine code to make it human readable. You can find more infos on how to get/install it on How to see JIT-compiled code in JVM.

After finding assembly instructions in the output, you might be surprised that you do not only find the assembly code of the square(...) method, but instead you find several versions of it. This is because the JIT compiler does not optimize the method fully on the first run. After some invocations of the method, it compiles it to native code without optimizations. After more invocations, it compiles the method again with some optimizations, but not all. And only after several thousand invocations, the compiler is convinced that the method is so impotant that it needs to be compiled with all optimizations switched on, including vectorization. So the best compilation usually is the last one in the output.

vmulss multiplies only one float with another one. So this is not what we want. (Here, scalar means just one and single-precision means 32 bit, i.e. float and not double). We instead want to find an instruction which multiplies many floats with many other floats in one go. So keep looking on. You will eventually find this:

vmulps is a true SIMD instruction (reminder: SIMD = single instruction, multiple data = vectorized instruction). Here, packed means multiple elements packed together in one register. This shows that auto vectorization was applied.