Does TransformerEncoder layer accept built-in mask?

Amin Shn

unread,

Feb 23, 2023, 4:45:57 AM2/23/23

to Keras-users

Hi All, I want to use the Keras TransformerEncoder layer (https://keras.io/api/keras_nlp/layers/transformer_encoder/), but I am not sure if it accepts a built-in mask (e.g. generated by the Masking() layer) or the "padding_mask" argument is the only way to feed in the masking information. My code looks like this:

masked_embedding = Masking(mask_value=0.)(pre_masked_embedding)

cont_emb = TransformerEncoder(num_heads=4, intermediate_dim=32)(masked_embedding)

Now I don't know whether the above code is enough or not. I tried the below approach as well:

masked_embedding = Masking(mask_value=0.)(pre_masked_embedding) cont_emb = TransformerEncoder(num_heads=4,intermediate_dim=32)(masked_embedding, padding_mask = masked_embedding._keras_mask)

But this throws a warning and an error:

WARNING:absl:You are explicitly setting `padding_mask` while the `inputs` have built-in mask, so the built-in mask is ignored. Output exceeds the size limit. Open the full output data in a text editor --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/amin/Desktop/PhD/3rd Project/Python codes/STraTS-main/Bert_TS.ipynb Cell 11 in <cell line: 1>() ----> 1 history = fore_model_interp.fit( 2 train_input, 3 train_output, 4 epochs=1000, 5 batch_size=70, 6 validation_split=0.1, 7 callbacks=[ 8 EarlyStopping(monitor="val_loss", patience=5, mode="min") 9 ], 10 ) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.__traceback__) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb File /var/folders/3m/_t8llt6n10z5xzvm7vxh37nw0000gp/T/__autograph_generated_file86q6c1l1.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator) 13 try: 14 do_return = True ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) ... File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/keras/engine/keras_tensor.py", line 254, in __array__ raise TypeError( TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.float32, name=None), name='Placeholder:0', description="created by layer 'tf.cast_15'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.

Can someone let me know if my first approach should be ok (I have no idea how to check whether the masking is properly done), and if not how should I do it?

Matthew Watson

unread,

Feb 23, 2023, 7:11:19 PM2/23/23

to Keras-users

Thanks for the question! The transformer layers in KerasNLP can accept an implicit mask. The simplest way to do this is via the mask_zero=True option for keras.layers.Embedding and keras_nlp.layers.TokenAndPositionEmbedding. Here's a whole transformer example...

inputs = keras.Input(shape=(None,), dtype="int32")
outputs = keras_nlp.layers.TokenAndPositionEmbedding(
    vocabulary_size=1_000,
    sequence_length=10,
    embedding_dim=16,
    mask_zero=True,
)(inputs)
outputs = keras_nlp.layers.TransformerEncoder(
    num_heads=4,
    intermediate_dim=32,
)(outputs)
outputs = keras.layers.GlobalAveragePooling1D()(outputs)
outputs = keras.layers.Dense(1, activation="sigmoid")(outputs)
model = keras.Model(inputs, outputs)

The main thing to double check here is that all the layers leading up to the TransformerEncoder block also support masking. The easiest way to check this is to print(transformer_block_input._keras_mask). If you can see the _keras_mask property there, you are all set. If you are doing something like just summing two embeddings with the + operator, you might need to propagate the implicit mask yourself.

Alternately, if you would rather have an explicit mask that is totally fine too...

inputs = keras.Input(shape=(None,), dtype="int32")
outputs = keras_nlp.layers.TokenAndPositionEmbedding(
    vocabulary_size=1_000,
    sequence_length=10,
    embedding_dim=16,
)(inputs)
outputs = keras_nlp.layers.TransformerEncoder(
    num_heads=4,
    intermediate_dim=32,
)(outputs, padding_mask=(inputs != 0))

We don't recommend mixing both explicit and explicit masks though, hence the warning in your second example.