Hi,
I am trying to write my own attention mechanism by subclassing _BaseAttentionMechanism, but I am running into errors when in graph mode.
More precisely, the build method of my layer/mechanism needs to create a variable with a shape equal to the batch size.
Since the mechanism.setup_memory method is called in advance, there is already a member in the superclass named self.batch_size, but it seems that I cannot use it.
The error message looks like this:
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
@tf.function
def has_init_scope():
my_constant = tf.constant(1.)
with tf.init_scope():
added = my_constant * 2
The graph tensor has name: model/decoder/MyAttention/strided_slice:0
In the build method I also have access to self.keys, but trying to calculate the batch size with tf.shape(self.keys)[0] results in a similar error, the graph tensor name being ../memory_layer/Tensordor:0 this time.
Could you please help me find a solution ?
I find it (very) challenging to debug tf.function-related errors, particularly the one above.
Thank you