Unexpected Event status when calling custom GPU cuDNN op.

Roy Miles

unread,

Aug 13, 2020, 6:44:49 AM8/13/20

to Discuss

Hi,

I have created a custom-op that calls some cuda kernels using the current GPU stream, which works fine in all cases. The op also makes some calls to some cuDNN functions, whereby I have to create my own cuDNN handle, allocate resources etc and de-allocate them.

This all works fine when I test the op by just calling it multiple times in a for loop inside a @tf.function. However, as soon as I integrate the op into a standard neural network (keras model class) I receive the following error:
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1

Naturally, this is a super unhelpful error message and it is triggered after the first call to my custom-op. I have tried using separate streams, using the same d.stream for all the cuDNN calls etc.

If I uncomment my actual call to cudnnGetConvolutionForwardAlgorithm inside my custom op, the code runs fine.

If anyone has any hints, tips as to what I can do that would be very helpful.

Roy

Roy Miles

unread,

Aug 13, 2020, 6:45:35 AM8/13/20

to Discuss, Roy Miles

Oops, I mean the call to cudnnConvolutionForward*.

Roy Miles

unread,

Aug 13, 2020, 7:41:57 AM8/13/20

to Discuss, Roy Miles

I recieve the following errors in a simpler use scenario:

2020-08-13 12:40:21.844752: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:725] failed to record completion event; therefore, failed to create inter-stream dependency

2020-08-13 12:40:21.844771: I tensorflow/stream_executor/stream.cc:4959] [stream=0x559a6f9dfe10,impl=0x559a6f9e52a0] did not memcpy host-to-device; source: 0x559a839218c0

2020-08-13 12:40:21.844777: E tensorflow/stream_executor/stream.cc:334] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.

2020-08-13 12:40:21.844784: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

How do I set the completion of a stream event since my op is not run through any of these stream executor wrappers?

t kevin

unread,

Aug 13, 2020, 1:04:17 PM8/13/20

to Roy Miles, Discuss

Hi Roy,

The errors you saw about cuda events or interstream dependency is not
the cause but a consequence of "CUDA_ERROR_ILLEGAL_ADDRESS"
It indicates your customization of the cuda kernel somehow referenced
an illegal memory and didn't finish.
You should double check the kernel part.
Hope it helps.

Kevin

Roy Miles <rm1...@my.bristol.ac.uk> 于2020年8月13日周四下午7:42写道：

> --
> You received this message because you are subscribed to the Google Groups "Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/741fbbc5-7918-42c3-aa29-8384bb9461b5n%40tensorflow.org.

Roy Miles

unread,

Aug 16, 2020, 10:25:59 AM8/16/20

to Discuss, kevi...@gmail.com, Discuss, Roy Miles

Hi Kevin,

Thanks for your response!

It does seem to be illegal memory access like you said. But, I don't think it is inside my GPU kernel.

Just as for context, inside my op kernel I have something like (a lot of unnecessary stuff removed):

void Compute(OpKernelContext* context) override {

// ...

Tensor* output = NULL;

OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape({b, n, h, w}), &output));

OP_REQUIRES(context, output->NumElements() <= tensorflow::kint32max, errors::InvalidArgument("Too many elements in output"));

printf("Allocating[y]: %i \n", b * n * h * w);

functor::MyFunctor<Device, T>()(

...

output->flat<T>().data()

);

It works completely fine past this point (the GPU kernel runs fine with no errors) but I get the error with the output tensor.

Namely, if I call my custom op and just try to print the output tensor, then I get the illegal memory access (due to copying from GPU to CPU).

y = my_module.my_op(

x=x,

w=w,

//...

)

print(y) // Error is here

tensorflow.python.framework.errors_impl.InternalError: GPU sync failed.

I am really not sure how to progress with this as I'm pretty sure there is no mistake in my GPU kernel. Do I need to do something with output inside the compute function (the tensorflow custom op guide doesn't do anything with it).

Roy Miles

unread,

Aug 16, 2020, 3:38:35 PM8/16/20

to Discuss, Roy Miles, kevi...@gmail.com, Discuss

So y is the output from my custom op and you can see "Unable to get repr" which I assume means that it can't read the memory of y to find the values. The size is also small and so it is not a memory problem, which is what the answers online suggest.

Roy Miles

unread,

Aug 21, 2020, 12:07:41 PM8/21/20

to Discuss, Roy Miles, kevi...@gmail.com, Discuss

This has since been resolved and it was an error on my end! sorry :-)

Reply all

Reply to author

Forward