Steve, I can reproduce the following behaviors:
- When /opt/rocm-4.0.0/lib is in my LD_LIBRARY_PATH, then the PAPI
utilities work as expected.
- When /opt/rocm-4.1.1/lib, or /opt/rocm-4.2.0/lib is in my
LD_LIBRARY_PATH, then the rocm component is disabled and the error
message is: ROCM hsa_init() failed with error 4096.
- When /opt/rocm-4.3.0/lib, or /opt/rocm-4.4.0/lib is in my
LD_LIBRARY_PATH, then the utilities result in a core dump, just as you
experienced. When the core dump occurs gdb shows the following
backtrace:
#0 0x0000155554d9c70f in raise () from /lib64/libc.so.6
#1 0x0000155554d86b25 in abort () from /lib64/libc.so.6
#2 0x0000155553d625e3 in ?? () from
/cm/local/apps/gcc/9.2.0/lib64/libstdc++.so.6
#3 0x0000155553d6e006 in ?? () from
/cm/local/apps/gcc/9.2.0/lib64/libstdc++.so.6
#4 0x0000155553d6e051 in std::terminate() () from
/cm/local/apps/gcc/9.2.0/lib64/libstdc++.so.6
#5 0x0000155553d6dffb in
std::rethrow_exception(std::__exception_ptr::exception_ptr) () from
/cm/local/apps/gcc/9.2.0/lib64/libstdc++.so.6
#6 0x0000155554925326 in rocr::AMD::handleException() () from
/opt/rocm-4.4.0/lib/libhsa-runtime64.so
#7 0x0000155554922f10 in rocr::HSA::hsa_init() [clone .cold.46] ()
from /opt/rocm-4.4.0/lib/libhsa-runtime64.so
#8 0x000000000041a4d3 in _rocm_init_private () at
components/rocm/linux-rocm.c:733
#9 0x0000000000403244 in PAPI_get_component_info (cidx=cidx@entry=2)
at papi.c:1354
#10 0x00000000004029f8 in main (argc=<optimized out>, argv=<optimized
out>) at papi_component_avail.c:115
Since PAPI accesses vendor libraries through dlopen (instead of
linking against them at compile/link time), the run-time environment
plays a more important role than the environment at compile time.
In summary, rocm-4.0 currently works with PAPI, and we will work with
AMD to address the problems with more recent versions of rocm.
> On Thu, Sep 9, 2021 at 8:50 AM Kaufmann, Steve <
steven....@hpe.com> wrote:
>>
>> We are having all sorts of issues with the ROCM component. No doubt that this may be more of AMD ROCM/HIP/HSA issues than anything, but the latest is when building and executing with ROCM 4.3.0. Commands such as papi_component_avail and papi_native_avail dump core when run on a MI60 node with:
>>
>> terminate called after throwing an instance of 'rocprofiler::util::exception'
>> what(): OnLoad(), code objects tracking without intercept mode enabled
>>
>> Have you been able to build and test with the latest ROCM? Thanks, Steve
>
> --
> You received this message because you are subscribed to the Google Groups "perfapi-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
perfapi-deve...@icl.utk.edu.
> To view this discussion on the web visit
https://groups.google.com/a/icl.utk.edu/d/msgid/perfapi-devel/CAMa2CE9YFGQ73iMJqpjAjDUY%3D9z23wjoZ1TZY-L%3Djwz-94GfsQ%40mail.gmail.com.