Bug with PAPI on ALCF Theta

Kevin Huck

unread,

Jan 15, 2021, 5:40:17 PM1/15/21

to ptools-perfapi, sup...@alcf.anl.gov

PAPI team (and ALCF help desk) -

I am trying to profile an application (FLASH5) with TAU + PAPI on Theta, using the installed PAPI module (claims to be version 6.0.0.1), but with my own build of TAU (current master branch from git). When I run on a non-GPU allocation, I am getting a deadlock during initialization. I got an interactive allocation and ran under gdb4hpc, and I got the following callstack:

dbg all> bt
a{0..63}: #27 main at /gpfs/mira-home/khuck/src/FLASH5/source/Simulation/Flash.F90:43
a{0..63}: #26 flash at /gpfs/mira-home/khuck/src/FLASH5/source/Simulation/Flash.F90:47
a{0..63}: #25 driver_initparallel at /gpfs/mira-home/khuck/src/FLASH5/source/Driver/DriverMain/Driver_initParallel.F90:87
a{0..63}: #24 mpi_init_ at TauFMpi.c:3161
a{0..63}: #23 MPI_Init at TauMpi.c:2258
a{0..63}: #22 Tau_profile_c_timer at TauCAPI.cpp:2156
a{0..63}: #21 Tau_init_initializeTAU at TauInit.cpp:539
a{0..63}: #20 TauMetrics_init at TauMetrics.cpp:941
a{0..63}: #19 initialize_functionArray at TauMetrics.cpp:674
a{0..63}: #18 PapiLayer::initializePapiLayer at PapiLayer.cpp:647
a{0..63}: #17 PapiLayer::initializePAPI at PapiLayer.cpp:560
a{0..63}: #16 Tau_initialize_papi_library at PapiLayer.cpp:94
a{0..63}: #15 PAPI_library_init at papi.c:1074
a{0..63}: #14 PAPI_library_init at papi.c:1178
a{0..63}: #13 _papi_hwi_init_global at papi_internal.c:1951
a{0..63}: #12 _cray_cuda_init_component at components/cray_cuda/cray_cuda.c:227
a{0..63}: #11 cuInit
a{0..63}: #10
a{0..63}: #9
a{0..63}: #8
a{0..63}: #7
a{0..63}: #6
a{0..63}: #5
a{0..63}: #4
a{0..63}: #3
a{0..63}: #2
a{0..63}: #1
a{0..63}: #0  waitpid

It would appear that the PAPI module was built with the CUDA component (judging from the papi_component_avail output), but I am not using CUDA. I am emailing both ALCF and the PAPI team because it seems to me there are two possible solutions:

1) if the node doesn’t have CUDA support, don’t try to initialize the CUDA component (or at last don’t deadlock), or

2) there should be 2 PAPI modules, one for thetaGPU and one for regular ol’ theta.

I built my own PAPI library with vanilla settings (no components) and it worked fine.

Thanks!

Kevin

--
Kevin Huck, PhD
Research Associate / Computer Scientist
OACISS - Oregon Advanced Computing Institute for Science and Society
University of Oregon
kh...@cs.uoregon.edu
http://tau.uoregon.edu

http://oaciss.uoregon.edu

Scott, Robert R.

unread,

Jan 20, 2021, 4:57:19 PM1/20/21

to Kevin Huck, ALCF Support, ptools-perfapi

Hi Kevin,

A subject matter expert is looking at your PAPI problem. You should get a follow-up soon.

Best,

Robert
ALCF User Experience Team

On 1/15/21, 4:41 PM, "Kevin Huck" <kh...@cs.uoregon.edu> wrote:

User info for kh...@cs.uoregon.edu
=================================
Username: khuck
Full Name: Kevin A. Huck
Projects: BGQtools_esp,CSC249ADCD01,EarlyPerf_theta,I12_PEACEndStation,PEACEndStation*,PEACEndStation_2*,TAU,Tools,cca-tools,pilot-khuck
('*' denotes INCITE projects)
=================================

Scott, Robert R.

unread,

Jan 27, 2021, 6:55:21 PM1/27/21

to Kevin Huck, ptools-perfapi, ALCF Support

Hi Kevin,

Could you provide more detailed information how you built TAU with the existing PAPI and provide a quick reproducer for the deadlock issue? We will be able to investigate the cause of the issue once we have that information.

Thanks,

Robert
ALCF User Experience Team

> On Jan 15, 2021, at 4:40 PM, Kevin Huck <kh...@cs.uoregon.edu> wrote:
>
> User info for kh...@cs.uoregon.edu
> =================================
> Username: khuck
> Full Name: Kevin A. Huck
> Projects: BGQtools_esp,CSC249ADCD01,EarlyPerf_theta,I12_PEACEndStation,PEACEndStation*,PEACEndStation_2*,TAU,Tools,cca-tools,pilot-khuck
> ('*' denotes INCITE projects)
> =================================
>
>

Scott, Robert R.

unread,

Mar 5, 2021, 4:32:07 PM3/5/21

to Kevin Huck, ptools-perfapi

Hi Kevin,

I’m just following up on a ticket that seems to have stalled. Could you provide more information how you built TAU with the existing PAPI and provide a quick reproducer for the deadlock issue?

Thanks,
Robert

John Mellor-Crummey

unread,

Mar 5, 2021, 4:45:52 PM3/5/21

to Scott, Robert R., John Mellor-Crummey, Kevin Huck, ptools-perfapi, Dejan Grubisic, Mark W. Krentel, Xiaozhu Meng

I think this error report from Kevin and the callstack below illustrates a design problem with component PAPI.

My understanding is that when PAPI is compiled with GPU components included, when PAPI is initialized the GPU capabilities are initialized EAGERLY.

At sites that have some systems with GPUs and some without, we would like to have one installation of HPCToolkit that can be used on both. Without using PAPI built with a GPU component, when HPCToolkit is used to monitor an application, HPCToolkit will only turn on capabilities for GPU measurement and load a GPU tool library if the user indicated on the HPCToolkit command line that GPU performance measurement should be performed in an execution.

The callstack shown below is consistent with my understanding of component PAPI: if a GPU component is enabled when PAPI is built, the GPU component will be initialized at runtime regardless of whether a particular program execution will make use of a GPU.

Here is what we want to do with HPCToolkit:

We want one build of HPCToolkit that is capable of using GPUs but

(1) doesn’t require that GPUs are present on a node where HPCToolkit is being used

(2) doesn’t require that GPU software (e.g., NVIDIA’s CUDA and CUPTI libraries) are available on a node if HPCToolkit has not been directed to measure GPU performance.

GPU tool libraries (e.g., CUPTI or roctracer) should only be loaded into a program’s address space if the user requested GPU measurements.

I agree with Kevin’s request:

1) if the node doesn’t have CUDA support, don’t try to initialize the CUDA component

--
John Mellor-Crummey Professor
Dept of Computer Science Rice University
email: joh...@rice.edu phone: 713-348-5179

--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/ptools-perfapi/E718AC02-BB1C-42B6-86C1-25D03F81106C%40anl.gov.

Heike Jagode

unread,

Mar 5, 2021, 5:49:34 PM3/5/21

to Kevin Huck, ptools-perfapi, sup...@alcf.anl.gov

Hi Kevin,

If PAPI is configured (built and installed) with the cuda component, e.g. ./configure --with-components=cuda, but there is no NVIDIA device on the node, then this component will be disabled, and PAPI should work as usual. The same policy applied to all other PAPI components.

I tried to reproduce this on our local machine, and it looks like the following:

(1)
I clone PAPI and configure, build, install on the login node (where there are no GPUs):
./configure --prefix=$PWD/install --with-components="cuda nvml"
make && make install
(2)
I run papi_component_avail to see what components are enabled / disabled:
[jagode@login bin]$ ./papi_component_avail
....
Compiled-in components:
Name: perf_event Linux perf_event CPU counters
Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
\-> Disabled: No uncore PMUs or events found
Name: cuda CUDA events and metrics via NVIDIA CuPTI interfaces
\-> Disabled: CUDA initialization (cuInit) failed: no CUDA-capable device is detected
Name: nvml NVML provides the API for monitoring NVIDIA hardware (power usage, temperature, fan speed, etc)
\-> Disabled: The NVIDIA management library failed to initialize.

Active components:
Name: perf_event Linux perf_event CPU counters
Native: 179, Preset: 65, Counters: 6
PMUs supported: nhm_ex, ix86arch, perf, perf_raw

As you can see, the cuda and nvml components are diabled.

(3)
I continue using that PAPI installation to monitor a non-GPU event:

[jagode@login bin]$ ./papi_command_line PAPI_TOT_INS
Successfully added: PAPI_TOT_INS
PAPI_TOT_INS : 200623004

(4)
Now, if I run papi_component_avail (from the same PAPI installation) on a node that has GPUs, then these components become enabled and cuda events can be collected:

[jagode@b04 bin]$ ./papi_component_avail
...
Compiled-in components:
Name: perf_event Linux perf_event CPU counters
Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
Name: cuda CUDA events and metrics via NVIDIA CuPTI interfaces
Name: nvml NVML provides the API for monitoring NVIDIA hardware (power usage, temperature, fan speed, etc)

Active components:
Name: perf_event Linux perf_event CPU counters
Native: 162, Preset: 56, Counters: 10
PMUs supported: ix86arch, perf, perf_raw, hsw_ep

Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
Native: 850, Preset: 0, Counters: 112
PMUs supported: rapl, hswep_unc_cbo0, hswep_unc_cbo1, hswep_unc_cbo2, hswep_unc_cbo3
hswep_unc_cbo4, hswep_unc_cbo5, hswep_unc_cbo6, hswep_unc_cbo7, hswep_unc_cbo8
hswep_unc_cbo9, hswep_unc_ha0, hswep_unc_ha1, hswep_unc_imc0, hswep_unc_imc1
hswep_unc_imc4, hswep_unc_imc5, hswep_unc_pcu, hswep_unc_qpi0, hswep_unc_qpi1
hswep_unc_ubo, hswep_unc_r2pcie, hswep_unc_r3qpi0, hswep_unc_r3qpi1
hswep_unc_sbo0, hswep_unc_sbo1, hswep_unc_sbo2, hswep_unc_sbo3

Name: cuda CUDA events and metrics via NVIDIA CuPTI interfaces
Native: 792, Preset: 0, Counters: 792

Name: nvml NVML provides the API for monitoring NVIDIA hardware (power usage, temperature, fan speed, etc)
Native: 72, Preset: 0, Counters: 72

Since you mention that you built your own PAPI library, can you reconfigure your local PAPI so that it includes the cuda component, and report back if you are still running into issues? I'm asking because in your backtrace I see "_cray_cuda_init_component at components/cray_cuda" and we don't have a cray_cuda component. So, I don't know if there is a difference between the PAPI cuda component and the "cray_cuda" component from your bt.

Thanks,
Heike

--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.

To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/ptools-perfapi/C21B422F-256E-47D4-BBB3-D5B48E2A5092%40cs.uoregon.edu.

--

______________________________________
Heike Jagode, Ph.D., Research Asst. Professor
Innovative Computing Laboratory, University of Tennessee Knoxville
http://icl.utk.edu/~jagode/

Kevin Huck

unread,

Mar 5, 2021, 6:31:29 PM3/5/21

to Scott, Robert R., ptools-perfapi

Robert -

Sorry I didn’t reply sooner…I didn’t have the bandwidth to write up a “quick reproducer” at the time, and then forgot about it.

To reproduce this:

module unload darshan
module swap PrgEnv-intel PrgEnv-gnu
module load gcc cray-hdf5-parallel cray-python
module unload perftools-base
module load papi

git clone https://github.com/UO-OACISS/tau2.git

cd tau2

./configure -dwarf=download -bfd=download -mpi -pthread -papi=/opt/cray/pe/papi/6.0.0.1

make -j8 install

# Make the example

cd examples/mm

export CRAYPE_LINK_TYPE=dynamic

export TAU_OPTIONS="-optShared -optCompInst"

make clean; make

# Request an allocation…

qsub -t 60 -n 1 -q debug-flat-quad --attrs mcdram=flat:numa=quad -I

# Once allocation starts...

cd $HOME/src/tau2/examples/mm # or wherever you cloned TAU

# re-load the modules in the interactive session

module unload darshan
module swap PrgEnv-intel PrgEnv-gnu
module load gcc cray-hdf5-parallel cray-python
module unload perftools-base
module load papi

export TAU_METRICS=TIME:PAPI_RES_STL:PAPI_TOT_INS:PAPI_L1_DCM # to enable PAPI

export TAU_VERBOSE=1 # just to see if there is some progress...
aprun -n 1 -N 1 -d 4 -j 4 -cc depth -e OMP_NUM_THREADS=4 ./matmult

…and it will deadlock.

If you repeat those instructions but leave off the `-papi=…` configuration option for TAU, the program will run fine (it takes about 25 seconds). The same is true if you don’t set TAU_METRICS, it will bypass the PAPI initialization.

Thanks!

Kevin

Kevin Huck

unread,

Mar 5, 2021, 6:43:30 PM3/5/21

to Heike Jagode, ptools-perfapi, sup...@alcf.anl.gov

Heike -

Thanks for looking into it. Based on what you found, you might be right - the problem likely isn’t the generic CUDA component that comes with PAPI but a special Cray CUDA component?

I ran `aprun -n 1 -N 1 papi_component_avail` on one of the KNL compute nodes of Theta, and got:

Compiled-in components:
Name: perf_event Linux perf_event CPU counters
Name: perf_event_uncore Linux perf_event CPU uncore and northbridge

Name:   cray_npu   Cray network interconnect performance counters
Name:   cray_cuda     Nvidia GPU hardware counters
\-> Disabled: No CUDA-capable device available.
Name:   cray_rapl     Cray RAPL energy measurements
Name:   cray_pm     Cray Power Management counters

So papi_component_avail seems to be doing the right thing, but the regular PAPI component initialization process isn’t (because the cray_cuda component is causing the problem).

Thanks!

Kevin

John Mellor-Crummey

unread,

Mar 5, 2021, 6:48:38 PM3/5/21

to Heike Jagode, John Mellor-Crummey, Kevin Huck, ptools-perfapi, sup...@alcf.anl.gov, Xiaozhu Meng, Mark W. Krentel, Dejan Grubisic

Heike,

You indicated that the PAPI cuda component will be turned off on a non-GPU machine. That’s good.

However, on a machine with a GPU, I think that we see that when we initialize PAPI, it will call hsa_init or cuInit regardless of whether our program is using the GPU or not. Specifically, I think it EAGERLY turns on GPU features rather than turning them on in a LAZY fashion of some sort. Do I understand this correctly?

--
John Mellor-Crummey Professor
Dept of Computer Science Rice University
email: joh...@rice.edu phone: 713-348-5179

turnin

To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/ptools-perfapi/CAAMW9zD7a8CaDM9UzUx90MuMqzaVfMZaCbyGBrykVqCpkYCQ3w%40mail.gmail.com.

Heike Jagode

unread,

Mar 5, 2021, 7:21:22 PM3/5/21

to John Mellor-Crummey, Kevin Huck, ptools-perfapi, sup...@alcf.anl.gov, Xiaozhu Meng, Mark W. Krentel, Dejan Grubisic

John,

Yes, that is correct .. and this is done by design, so that the PAPI utility papi_native_avail lists *all* the events a user can choose from on a system. Papi_native_avail doesn't list events based on whether or not an application uses e.g. CUDA. It lists events based on what hardware (CPUs, GPUs, networks, etc.) is available on a system.

That said, we could look into changing this .. for instance, by having the PAPI utility papi_native_avail take care of initialization of all hardware support at its runtime so that it is still able to list all possible events. But when an application calls PAPI_init(), then it doesn't have to initialize support for all the hardware on a system. The initialization could be done "per hardware" based on the events that are added to an eventset. For example, when a cuda event is added, the cuda component could be initialized at that time (instead of during the time when PAPI_init() is called). The same applies for other components.

Do you have a strong reason for this? And specifically, could you elaborate why the current "call to cuInit" during PAPI_init()---regardless of whether an application uses GPUs---is troublesome for your scenarios? I'm asking because the above mentioned redesign will take some effort. And so, it would help to have strong reasons for it.

Thanks a lot,
Heike

John Mellor-Crummey

unread,

Mar 6, 2021, 2:08:44 PM3/6/21

to Heike Jagode, Kevin Huck, ptools-perfapi, sup...@alcf.anl.gov, Xiaozhu Meng, Mark W. Krentel, Dejan Grubisic

Heike,

you may have seen bug reports back to your team related to problems with GPU components that arise with processes that fork. initializing GPU components is not something that should be done unless absolutely necessary.

we would be happy to call an extra function to demand initialization of a GPU component and even to call a function prior to papi init that set a flag indicating to papi that it should not initialize any components during papi init.

(sent from my phone)

On Mar 5, 2021, at 6:21 PM, Heike Jagode <jag...@icl.utk.edu> wrote:

Heike Jagode

unread,

Mar 9, 2021, 7:50:40 PM3/9/21

to John Mellor-Crummey, Damien Genet, Kevin Huck, ptools-perfapi, Xiaozhu Meng, Mark W. Krentel, Dejan Grubisic

John,

We are working on this, and we will change the current behavior based on what I mentioned in my previous email. Meaning, we will not have a separated call or a flag for it. Instead, we will take care of it internally by moving the initialization of any non-CPU component out of PAPI_init()---regardless of whether a non-CPU component has been compiled into PAPI. The initialization of non-CPU components will be done as soon as something is done with specific component-events, such as (a) added to an eventset, or (b) event_name_to_code() is called, etc.

Damien Genet (CCed to this email) is working on this. We will keep you posted.

Heike

Damien Genet

unread,

Apr 27, 2021, 10:38:51 AM4/27/21

to John Mellor-Crummey, Heike Jagode, Kevin Huck, Xiaozhu Meng, Mark W. Krentel, Dejan Grubisic, ptools-perfapi, Anthony Danalis

Hello,

We recently updated the branch (https://bitbucket.org/icl/papi, feature/init_private) with the recent fixes done in the CUDA component.

If you have the opportunity to give it a try, feedback is welcome.

Regards,

On Wed, Apr 7, 2021 at 11:08 AM Damien Genet <dge...@icl.utk.edu> wrote:

Hello,

We implemented a solution where the initialization is in two phases, the regular init_component and the init_private.
Calls to add events to eventset or to get component info should trigger the private init.

This is located in the "feature/init_private" branch on the PAPI repo (https://bitbucket.org/icl/papi)
It has been tested for the cuda component using our utils. It is also implemented for nvml, rocm, and rocm_smi, but not thoroughly tested.
If you have the opportunity to give it a try, feedback is welcome.

Regards,

On Tue, Mar 9, 2021 at 9:56 PM John Mellor-Crummey <joh...@rice.edu> wrote:

On Mar 9, 2021, at 6:50 PM, Heike Jagode <jag...@icl.utk.edu> wrote:

We are working on this, and we will change the current behavior based on what I mentioned in my previous email. Meaning, we will not have a separated call or a flag for it. Instead, we will take care of it internally by moving the initialization of any non-CPU component out of PAPI_init()---regardless of whether a non-CPU component has been compiled into PAPI. The initialization of non-CPU components will be done as soon as something is done with specific component-events, such as (a) added to an eventset, or (b) event_name_to_code() is called, etc.

Damien Genet (CCed to this email) is working on this. We will keep you posted.Heike,

Thanks for the update. Thanks is a welcome change. FYI: we had to refactor our code to cope with PAPI_library_init initializing an AMD GPU.

HPCToolkit gets initialized the first time a thread gets created. A thread gets created by hsa_init. HPCToolkit calls PAPI_library_init which then initializes ROCm by calling hsa_init. You can see the complete callstack at the bottom. You can see why this is awkward for us.

--
John Mellor-Crummey Professor
Dept of Computer Science Rice University
email: joh...@rice.edu phone: 713-348-5179

#0 0x00007f3d85d0465d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f3d85cfd979 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2 0x00007f3d82a93e69 in rocr::os::AcquireMutex(void*) () from /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1
#3 0x00007f3d82ad7c7b in rocr::core::Runtime::Acquire() () from /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1
#4 0x00007f3d82ab9d4a in rocr::HSA::hsa_init() () from /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1
#5 0x00007f3d830a6be9 in _rocm_init_component (cidx=2) at components/rocm/linux-rocm.c:532
#6 0x00007f3d8308fca2 in _papi_hwi_init_global (PE_OR_PEU=PE_OR_PEU@entry=0) at papi_internal.c:1955
#7 0x00007f3d8308cdc3 in PAPI_library_init (version=<optimized out>) at papi.c:1162
#8 0x00007f3d8638d653 in init (self=0x7f3d86606140 <_papi_obj>) at ../../../../src/tool/hpcrun/sample-sources/papi-c.c:355
#9 0x00007f3d863961c4 in hpcrun_registered_sources_init () at ../../../../src/tool/hpcrun/sample_sources_registered.c:142
#10 0x00007f3d86382aa4 in monitor_init_process (argc=0x7f3d86348210 <monitor_argc>, argv=0x0, data=0x0) at ../../../../src/tool/hpcrun/main.c:929
#11 0x00007f3d86138221 in monitor_begin_process_fcn (user_data=user_data@entry=0x0, is_fork=is_fork@entry=0) at main.c:279
#12 0x00007f3d8613ae74 in pthread_create (thread=0x12abe80, attr=0x7ffcfda0f490, start_routine=0x7f3d82a93d70 <rocr::os::ThreadTrampoline(void*)>, arg=0x12abea0) at pthread.c:1056
#13 0x00007f3d82a9400d in rocr::os::CreateThread(void (*)(void*), void*, unsigned int) () from /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1
#14 0x00007f3d82ad29c5 in rocr::core::Runtime::SetAsyncSignalHandler(hsa_signal_s, hsa_signal_condition_t, long, bool (*)(long, void*), void*) () from /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1
#15 0x00007f3d82ad7bca in rocr::core::Runtime::Load() () from /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1
#16 0x00007f3d82ad7d2c in rocr::core::Runtime::Acquire() () from /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1
#17 0x00007f3d82ab9d4a in rocr::HSA::hsa_init() () from /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1
#18 0x00007f3d84ba4535 in ?? () from /opt/rocm-4.0.0/hip/lib/libamdhip64.so.4
#19 0x00007f3d84b647bf in ?? () from /opt/rocm-4.0.0/hip/lib/libamdhip64.so.4
#20 0x00007f3d84b78e7e in ?? () from /opt/rocm-4.0.0/hip/lib/libamdhip64.so.4
#21 0x00007f3d84a6ef05 in ?? () from /opt/rocm-4.0.0/hip/lib/libamdhip64.so.4
#22 0x00007f3d85d02cd7 in __pthread_once_slow () from /lib64/libpthread.so.0
#23 0x00007f3d84b0fb12 in __hipRegisterFunction () from /opt/rocm-4.0.0/hip/lib/libamdhip64.so.4
#24 0x00000000004049ed in __hip_module_ctor () at /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/ext/new_allocator.h:125
#25 0x000000000040691d in __libc_csu_init ()
#26 0x00007f3d842bb73e in __libc_start_main () from /lib64/libc.so.6
#27 0x00007f3d86137dee in __libc_start_main (main=<optimized out>, argc=3, argv=0x7ffcfda0faf8, init=0x4068d0 <__libc_csu_init>, fini=0x406940 <__libc_csu_fini>, rtld_fini=0x7f3d86dfb180 <_dl_fini>, stack_end=0x7ffcfda0fae8)
at main.c:564
#28 0x000000000040204e in _start ()

--

Damien