Unexpected Event status when calling custom GPU cuDNN op.

352 views
Skip to first unread message

Roy Miles

unread,
Aug 13, 2020, 6:44:49 AM8/13/20
to Discuss
Hi,

I have created a custom-op that calls some cuda kernels using the current GPU stream, which works fine in all cases. The op also makes some calls to some cuDNN functions, whereby I have to create my own cuDNN handle, allocate resources etc and de-allocate them.

This all works fine when I test the op by just calling it multiple times in a for loop inside a @tf.function. However, as soon as I integrate the op into a standard neural network (keras model class) I receive the following error:
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1

Naturally, this is a super unhelpful error message and it is triggered after the first call to my custom-op. I have tried using separate streams, using the same d.stream for all the cuDNN calls etc. 

If I uncomment my actual call to cudnnGetConvolutionForwardAlgorithm inside my custom op, the code runs fine. 

If anyone has any hints, tips as to what I can do that would be very helpful.

Roy

Roy Miles

unread,
Aug 13, 2020, 6:45:35 AM8/13/20
to Discuss, Roy Miles
Oops, I mean the call to cudnnConvolutionForward*.

Roy Miles

unread,
Aug 13, 2020, 7:41:57 AM8/13/20
to Discuss, Roy Miles
I recieve the following errors in a simpler use scenario:

2020-08-13 12:40:21.844752: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:725] failed to record completion event; therefore, failed to create inter-stream dependency
2020-08-13 12:40:21.844771: I tensorflow/stream_executor/stream.cc:4959] [stream=0x559a6f9dfe10,impl=0x559a6f9e52a0] did not memcpy host-to-device; source: 0x559a839218c0
2020-08-13 12:40:21.844777: E tensorflow/stream_executor/stream.cc:334] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2020-08-13 12:40:21.844784: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

How do I set the completion of a stream event since my op is not run through any of these stream executor wrappers?

t kevin

unread,
Aug 13, 2020, 1:04:17 PM8/13/20
to Roy Miles, Discuss
Hi Roy,

The errors you saw about cuda events or interstream dependency is not
the cause but a consequence of "CUDA_ERROR_ILLEGAL_ADDRESS"
It indicates your customization of the cuda kernel somehow referenced
an illegal memory and didn't finish.
You should double check the kernel part.
Hope it helps.

Kevin

Roy Miles <rm1...@my.bristol.ac.uk> 于2020年8月13日周四 下午7:42写道:
> --
> You received this message because you are subscribed to the Google Groups "Discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/741fbbc5-7918-42c3-aa29-8384bb9461b5n%40tensorflow.org.

Roy Miles

unread,
Aug 16, 2020, 10:25:59 AM8/16/20
to Discuss, kevi...@gmail.com, Discuss, Roy Miles
Hi Kevin,

Thanks for your response!
It does seem to be illegal memory access like you said. But, I don't think it is inside my GPU kernel.
Just as for context, inside my op kernel I have something like (a lot of unnecessary stuff removed):

  void Compute(OpKernelContext* context) override {
    // ...
    Tensor* output = NULL;
    OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape({b, n, h, w}), &output));
    OP_REQUIRES(context, output->NumElements() <= tensorflow::kint32max, errors::InvalidArgument("Too many elements in output"));
    printf("Allocating[y]: %i \n", b * n * h * w);

    functor::MyFunctor<Device, T>()(
        ...
        output->flat<T>().data()
    );

It works completely fine past this point (the GPU kernel runs fine with no errors) but I get the error with the output tensor.
Namely, if I call my custom op and just try to print the output tensor, then I get the illegal memory access (due to copying from GPU to CPU).

        y = my_module.my_op(
            x=x,
            w=w,
            //...
        )
        
        print(y)   // Error is here 

tensorflow.python.framework.errors_impl.InternalError: GPU sync failed.
I am really not sure how to progress with this as I'm pretty sure there is no mistake in my GPU kernel. Do I need to do something with output inside the compute function (the tensorflow custom op guide doesn't do anything with it).

Roy Miles

unread,
Aug 16, 2020, 3:38:35 PM8/16/20
to Discuss, Roy Miles, kevi...@gmail.com, Discuss
So y is the output from my custom op and you can see "Unable to get repr" which I assume means that it can't read the memory of y to find the values. The size is also small and so it is not a memory problem, which is what the answers online suggest.
b.PNG

Roy Miles

unread,
Aug 21, 2020, 12:07:41 PM8/21/20
to Discuss, Roy Miles, kevi...@gmail.com, Discuss
This has since been resolved and it was an error on my end! sorry :-)
Reply all
Reply to author
Forward
0 new messages