Problem using PAPI CUDA

283 views
Skip to first unread message

Talita

unread,
Sep 17, 2019, 5:06:58 PM9/17/19
to ptools-perfapi
Hi,

I've been trying to use PAPI with CUDA but I encountered some problems when I try to run the tests in papi/src/components/cuda/tests. I'm running Ubuntu 16.04 and I have two TITAN Xp, running CUDA 10.1.

I've compiled PAPI enabling the cuda component as explained in the README: ./configure --with-components=cuda
I've also specified CUDA_DIR and CUPTI_DIR. 

Here is the output of HelloWorld:

PAPI_VERSION     :    5      7       0
Name cuda:::event:active_cycles_pm:device=0 --- Code: 0x4000002f
PAPI_add_events failed
Failed to add events to eventset: Not supported by component
PAPI_start failed
END: Hello World!
PAPI_stop failed
           0 --> cuda:::event:active_cycles_pm:device=0 
PASSED


When I run papi_native_avail I can see all the events related to cuda, including the one I added in the example. But I'm getting this "Not supported by component" message. Sorry if I'm missing something. Any thoughts?

Thanks!
Talita

Anthony Castaldo

unread,
Sep 18, 2019, 5:20:45 PM9/18/19
to Talita, Heike Jagode, ptools-perfapi
Hi Talita,

I have an updated version in our bitbucket repository, which you can clone as
git clone https://bitbucket.org/icl/papi.git

This requires a different setup for the cuda component, instead of CUDA_DIR and CUPTI_DIR, you just need
to export PAPI_CUDA_ROOT=(same place as CUDA_DIR).

The rest is the same; ./configure --with-components="cuda" and then make.

The problem was we were trying to set a collection mode that only works on TESLA (and is desirable there),
but we shouldn't have shut down your add_event if it didn't work. I fixed that.

I tested it, but I don't have your exact hardware, so let me know if it works (or doesn't work).

-Tony


--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/ptools-perfapi/3f24b1ed-29d7-445c-a2e5-0a784f4c554d%40icl.utk.edu.

Talita Perciano

unread,
Sep 19, 2019, 1:01:19 PM9/19/19
to Anthony Castaldo, Heike Jagode, ptools-perfapi
Hi Tony,

Thanks for your reply. I got the updated version and I've just tested it. Unfortunately the test doesn't work again, but now the error is in PAPI_start:

PAPI_VERSION     :    5      7       1
Name cuda:::event:elapsed_cycles_sm:device=0 --- Code: 0x4000002f
PAPI_start failed -14: Unknown error code

END: Hello World!
PAPI_stop failed
           0 --> cuda:::event:elapsed_cycles_sm:device=0
PASSED

Any thoughts?

Thanks,
Talita
--
Dr. Talita Perciano
Research Scientist - CRD, Lawrence Berkeley National Laboratory
Data Analytics & Visualization Group
Computational Biosciences Group
Center for Advanced Mathematics for Energy Research Applications
One Cyclotron Road
Berkeley, CA 94720
059-3034B  M/S 59R3103

Anthony Castaldo

unread,
Sep 19, 2019, 1:15:24 PM9/19/19
to Talita Perciano, Heike Jagode, ptools-perfapi
Talita,

Dang. I will have to look into it, but I can't until Monday.

-Tony

Talita Perciano

unread,
Sep 19, 2019, 1:20:12 PM9/19/19
to Anthony Castaldo, Heike Jagode, ptools-perfapi
No problem, thanks!

Talita

Anthony Castaldo

unread,
Sep 23, 2019, 5:39:15 PM9/23/19
to ptools-perfapi, tonyca...@icl.utk.edu, jag...@icl.utk.edu
Hi Talita,

I MIGHT have reproduced the problem with our hardware. We do have a Titan, on it I am getting an error that is related to privilege; it says CUPTI_ERROR_INSUFFICIENT_PRIVILEGE.

It doesn't do that on any other device, on the TITAN I checked three events at random, and it does it for all of them.

You might be able to correct this running your code as SUDO.

BUT TO DOUBLE-CHECK, I have attached a code file that will print the error when you run it.

Nothing has been changed to fix it; this will just report the error message in more detail (and the line number of which call produced it).

Copy the attached file to replace papi/src/components/cuda/linux-cuda.c

Then rebuild PAPI (just a 'make' in papi/src should suffice),

And reproduce your error. The lines I enabled print to stderr.

Thanks for the help!

-Tony

linux-cuda.c

John Mellor-Crummey

unread,
Sep 23, 2019, 9:08:42 PM9/23/19
to Anthony Castaldo, John Mellor-Crummey, ptools-perfapi, jag...@icl.utk.edu
Perhaps the URL below may be of use

NVIDIA Development Tools Solutions - CUPTI_ERROR_INSUFFICIENT_PRIVILEGES: CUPTI Permission issue with Performance Counters




--
John Mellor-Crummey Professor
Dept of Computer Science Rice University
email: joh...@rice.edu phone: 713-348-5179

--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.

Brian Van Straalen

unread,
Sep 23, 2019, 9:33:32 PM9/23/19
to John Mellor-Crummey, Anthony Castaldo, ptools-perfapi, Heike Jagode
In what way are supercomputers to be considered an HPC environment when I have to run as administrator to see any hardware counters?      Are we ever going to fix this for modern supercomputer centers?

Brian Van Straalen




--
Brian Van Straalen         Lawrence Berkeley Lab
BVStr...@lbl.gov        Computational Research
(510) 486-4976            Division (crd.lbl.gov)

John Mellor-Crummey

unread,
Sep 23, 2019, 11:41:42 PM9/23/19
to Brian Van Straalen, John Mellor-Crummey, Anthony Castaldo, ptools-perfapi, Heike Jagode

> On Sep 23, 2019, at 8:33 PM, Brian Van Straalen <bvstr...@lbl.gov> wrote:
>
> In what way are supercomputers to be considered an HPC environment when I have to run as administrator to see any hardware counters? Are we ever going to fix this for modern supercomputer centers?

Brian,

I complain about this all of the time to everyone from the vendors to the supercomputer centers.

Talita Perciano

unread,
Sep 27, 2019, 3:36:23 PM9/27/19
to John Mellor-Crummey, Brian Van Straalen, Anthony Castaldo, ptools-perfapi, Heike Jagode
Hi Tony and others,

Thanks you for the reply and sorry for the delay, I only had the chance to go back to this now. So... I rerun the code so that I could see the details of the error and it turns out that it's not the same CUPTI_ERROR_INSUFFICIENT_PRIVILEGE. Here is what I get:

PAPI_VERSION     :    5      7       1
Name cuda:::event:elapsed_cycles_sm:device=0 --- Code: 0x4000002f
Line 1078 CUPTI_CALL macro '(*cuptiSetEventCollectionModePtr) (eventCuCtx,CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS)' failed with error #0000001B='CUPTI_ERROR_NOT_SUPPORTED'.
Line 1153 CUPTI_CALL macro '(*cuptiEventGroupSetEnablePtr) (groupset)' failed with error #00000009='CUPTI_ERROR_HARDWARE'.

PAPI_start failed -14: Unknown error code
END: Hello World!
PAPI_stop failed
           0 --> cuda:::event:elapsed_cycles_sm:device=0
PASSED


Any thoughts?

Thanks,
Talita

--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.

Anthony Castaldo

unread,
Sep 27, 2019, 4:16:30 PM9/27/19
to Talita Perciano, John Mellor-Crummey, Brian Van Straalen, ptools-perfapi, Heike Jagode
Hi Talita,

Not initially, I will investigate further on Monday.

It seems contradictory, the event is given to us by the cuda driver, we ask it for all its events at initialization time.

it seems strange it would give us an event that it doesn't actually support!

Did you try using it with sudo (or root privilege?) If not yourself, perhaps a sysadmin can navigate to your directory and try it for you...

I will look into it Monday.

-Tony

Talita Perciano

unread,
Sep 27, 2019, 4:30:44 PM9/27/19
to Anthony Castaldo, John Mellor-Crummey, Brian Van Straalen, ptools-perfapi, Heike Jagode
Ok, running with sudo I get this (which apparently gives the counter results):

PAPI_VERSION     :    5      7       1
Name cuda:::event:elapsed_cycles_sm:device=0 --- Code: 0x4000002f
Line 1078 CUPTI_CALL macro '(*cuptiSetEventCollectionModePtr) (eventCuCtx,CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS)' failed with error #0000001B='CUPTI_ERROR_NOT_SUPPORTED'.
END: Hello World!
      144285 --> cuda:::event:elapsed_cycles_sm:device=0
PASSED

But as pointed out by others, having to run this as sudo is so weird. I need this to get counters from a supercomputer...

Thanks,
Talita

Anthony Castaldo

unread,
Sep 27, 2019, 4:41:29 PM9/27/19
to Talita Perciano, John Mellor-Crummey, Brian Van Straalen, ptools-perfapi, Heike Jagode
Talita: Understood, but thanks for the SUDO run, the results give us a major clue at least.

We may not be able to fix it without some change to the system, I will ask our sysadmin if he knows what would have to be changed.

I'll let you know...

-Tony



Anthony Castaldo

unread,
Oct 15, 2019, 1:55:40 PM10/15/19
to Talita Perciano, John Mellor-Crummey, Brian Van Straalen, ptools-perfapi, Heike Jagode, Anthony Danalis, Damien Genet
Hi Talita: I have the best solution we can use for now.

Unfortunately, this change in NVIDIA drivers is the result of a security problem. NVIDIA tells us
Please see this security bulletin:  https://nvidia.custhelp.com/app/answers/detail/a_id/4738


and "This is not a temporary situation -- all drivers from 4.18.43+ now have this requirement."

 

The solution link is below, but NVIDIA says

"The onus is on the administrator to make this setting knowing full well that the GPU is now susceptible to side-channel attacks".


Here is the solution link:
https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters#SolnAdminTag

SUMMARY: We can override this new default behavior by the sysadmin creating a file: here is the quote from the link above:

"Alternatively, A file containing 'options nvidia "NVreg_RestrictProfilingToAdminUsers=0"' may be saved to /etc/modprobe.d"

Our sysadmin tried that on our system with Titans, I tested it, and it works, we have access to the counters.
We named our file "nvidia.conf".

I wish there was something better!

-Tony

Reply all
Reply to author
Forward
0 new messages