Hi Chong,
I had a similar problem, where some of the LSTM-network outputs were negative. It happened especially when training with data that contained a lot of zeros.
But I found a way to prevent this behavior without effecting the predictions that were already good before.
What I did is to use the "relu" activation function in LSTM and additionally constraint the weights of the dense output layer to be non-negative.
So maybe you could try it like this:
from keras.constraints import nonneg
.....
model = Sequential()
model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=256, activation='relu', return_sequences=True))
model.add(LSTM(output_dim=64, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(output_dim=1, activation='linear', W_constraint=nonneg())))
model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['mean_squared_error'])
I hope this might help you.
Cheers.