Inplace modification error during back propagation when training a stateful LSTM using PyTorch as backend

33 views

Skip to first unread message

Driaan Jansen

unread,

Feb 6, 2025, 6:03:05 AMFeb 6

to Keras-users

I'm trying to create a stateful LSTM model with Keras using PyTorch as the backend. However, I always get the same error, whether I run it on my own local environment or when I run it on Colab.

The problem seems to be specific to how PyTorch handles the back propagation during training. I will provide full details below.

When I change the keras backend to Tensorflow I do not get this inplace modification error of the tensors. I also ran it with jax as backend and the code also runs without any errors. The problem is only when using torch as the backend.

Sample code:

import numpy as np
import os
os.environ["KERAS_BACKEND"] = "torch"
import keras

# Sample dataset generator for demonstration
def generate_time_series_data(batch_size, time_steps, num_features):
while True:
x = np.random.rand(batch_size, time_steps, num_features)
y = np.sum(x, axis=2) # Just an example: target is the sum of features along the time step
yield x, y

# Parameters
batch_size = 32 # Number of sequences per batch
time_steps = 10 # Length of each sequence
num_features = 3 # Number of features per time step
epochs = 10 # Number of epochs

# Build the LSTM model
model = keras.Sequential()
model.add(keras.Input(shape=(time_steps, num_features), batch_size=batch_size))
lstm_layer = keras.layers.LSTM(50,
stateful=True,
return_sequences=False) # return_sequences can be True if another LSTM is added
model.add(lstm_layer)
model.add(keras.layers.Dense(1, activation='linear')) # For scalar output

# Compile the model with optimizer and loss function
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss='mse', metrics=['mae'])

# Print model summary
model.summary()

# Generate dummy training data
train_generator = generate_time_series_data(batch_size, time_steps, num_features)
steps_per_epoch = 100 # Number of batches per epoch

# Train the model with stateful data
for epoch in range(epochs):
print(f"Epoch {epoch + 1}/{epochs}")
model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=1, verbose=1, shuffle=False)
# Reset states after each epoch
lstm_layer.reset_states()

Error when torch is used as backend:

Epoch 1/10

--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-4-5784a5208977> in <cell line: 0>() 3 for epoch in range(epochs): 4 print(f"Epoch {epoch + 1}/{epochs}") ----> 5 model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=1, verbose=1, shuffle=False) 6 # Reset states after each epoch 7 lstm_layer.reset_states()

3 frames

/usr/local/lib/python3.11/dist-packages/torch/autograd/graph.py in _engine_run_backward(t_outputs, *args, **kwargs) 827 ) # Calls into the C++ engine to run the backward pass 828 finally: --> 829 if attach_logging_hooks: 830 unregister_hooks() # type: ignore[possibly-undefined] RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [32, 50]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Using torch anomaly detection:

File "/usr/local/lib/python3.11/dist-packages/keras/src/layers/rnn/rnn.py", line 402, in call last_output, outputs, states = self.inner_loop( File "/usr/local/lib/python3.11/dist-packages/keras/src/layers/rnn/lstm.py", line 579, in inner_loop return super().inner_loop( File "/usr/local/lib/python3.11/dist-packages/keras/src/layers/rnn/rnn.py", line 342, in inner_loop return backend.rnn( File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/torch/rnn.py", line 347, in rnn final_outputs = _step(time, output_ta_t, *new_states) File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/torch/rnn.py", line 328, in _step output, new_states = step_function( File "/usr/local/lib/python3.11/dist-packages/keras/src/layers/rnn/rnn.py", line 334, in step output, new_states = self.cell(inputs, states, **cell_kwargs) File "/usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.11/dist-packages/keras/src/layers/layer.py", line 908, in __call__ outputs = super().__call__(*args, **kwargs) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/torch/layer.py", line 44, in forward return Operation.__call__(self, *args, **kwargs) File "/usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.11/dist-packages/keras/src/ops/operation.py", line 46, in __call__ return call_fn(*args, **kwargs) File "/usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 156, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.11/dist-packages/keras/src/layers/rnn/lstm.py", line 285, in call c, o = self._compute_carry_and_output_fused(z, c_tm1) File "/usr/local/lib/python3.11/dist-packages/keras/src/layers/rnn/lstm.py", line 227, in _compute_carry_and_output_fused c = f * c_tm1 + i * self.activation(z2) (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:110.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-5-6e68af8f7817> in <cell line: 0>() 3 for epoch in range(epochs): 4 print(f"Epoch {epoch + 1}/{epochs}") ----> 5 model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=1, verbose=1, shuffle=False) 6 # Reset states after each epoch 7 lstm_layer.reset_states()

3 frames

/usr/local/lib/python3.11/dist-packages/torch/autograd/graph.py in _engine_run_backward(t_outputs, *args, **kwargs) 827 ) # Calls into the C++ engine to run the backward pass 828 finally: --> 829 if attach_logging_hooks: 830 unregister_hooks() # type: ignore[possibly-undefined] RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [32, 50]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

The above is from Colab's environment.

Keras version = 3.8.0

torch version = 2.5.1+cu124

I've also asked my AI code assistant to modify the code snippet to not use Keras and only to use PyTorch directly, and then it also ran without any errors. So the problem to me looks like it has something to do with the integration between Keras and torch.

Is there someone that is willing to have a look at this problem? Or has it been logged somewhere already?

Many thanks

Driaan

Reply all

Reply to author

Forward

Message has been deleted

0 new messages