Time series prediction with multiple sequences input - LSTM

35,436 views
Skip to first unread message

duan...@gmail.com

unread,
Jul 15, 2015, 10:47:46 AM7/15/15
to keras...@googlegroups.com, Yue Duan
Hi all,

I'm new to NN and recently discovered Keras and I'm trying to implement LSTM to take in multiple time series for future value prediction. For example, I have historical data of 1)daily price of a stock and 2) daily crude oil price price, I'd like to use these two time series to predict stock price for the next day. Based on my understanding I tried a minimal model:

# Set dims
n_samples = 171
n_timesteps = 8
n_feat = 2
n_classes = 128

# Model
model = Sequential()
model.add(LSTM(n_feat, n_classes, activation='linear', inner_activation='linear', return_sequences=True))
model.add(Dropout(0.5))
model.add(TimeDistributedDense(n_classes, 1), activation='linear')
model.add(Activation('linear'))

model.compile(loss='mean_squared_error', optimizer='rmsprop')

# Train
model.fit(input_matrix, target, nb_epoch=10)
score = model.evaluate(X_test, Y_test, batch_size=16)

The input_matrix here is 3D matrix with dim (171, 8, 2) - 171 samples, 8 timesteps, 2 features, I broke the long time series into shorter subseries of 8 time steps. And the target 3D matrix with dim (171, 1, 1).
I ran it and it returned:

Epoch 0 171/171 [==============================] - 0s - loss: nan
 
I'm not sure if this model is the correct one to use for my problem, or did I build it correctly, I'd welcome any feedback and advice from you. 

Also, I'm confused about how to process the input. My input are two daily prices over past 20 years, should I use them as a long sequence (then there's only one sample I think...), or break into smaller pieces (# of pieces = # of samples)? Should I normalize the prices and maybe some other pre-processing? 

Thanks in advance for your help :)

Best,
Yue

Yue Duan

unread,
Jul 21, 2015, 10:14:09 AM7/21/15
to keras...@googlegroups.com, YD...@slb.com
I figured out a way to solve this problem so I think it might be helpful to post the solution here.

It turned out the activation and inner_activation functions I used for LSTM layer were wrong, thus the loss could not be calculated properly. I replaced them with sigmoid, tanh, and relu, all of them worked and gave losses that decreased with each epoch. Also, I replaced TimeDistributedDense layer with a simple Dense layer, so return_sequences=False for the LSTM layer.

As to the input, I wrote a function that transforms input and target long time series into small pieces of 3D array (nb_samples, time_steps, nb_features) to feed into the model. In my example, I used rolling windows of same length and corresponding targets to train the model. I tried training the model with both normalized and non-normalized data, the normalized data generally gave better results. 

I attached part of my code at the end, please let me know if you have any comments or suggestions. Thanks!


# merge data frames
merged = df1.merge(df2, left_index=True, right_index=True, how='inner').dropna()

# data prep
# use 100 days of historical data to predict 10 days in the future
data = merged.values
examples = 100
y_examples = 10
nb_samples = len(data) - examples - y_examples

# input - 2 features
input_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat = np.concatenate(input_list, axis=0)

# target - the first column in merged dataframe
target_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat = np.concatenate(target_list, axis=0)

# set up model
trials = input_mat.shape[0]
features = input_mat.shape[2]
hidden = 64
model = Sequential()
model.add(LSTM(features, hidden))
model.add(Dropout(.2))
model.add(Dense(hidden, y_examples))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer='rmsprop')

# Train
model.fit(input_mat, target_mat, nb_epoch=50)

francesc...@gmail.com

unread,
Nov 20, 2015, 11:26:09 AM11/20/15
to Keras-users, YD...@slb.com
This worked for me, although I had to specify 

model.add(LSTM(input_dim=features, output_dim=hidden))
model.add(Dropout(.2))
model.add(Dense(input_dim=hidden, output_dim=y_examples))

Do you know why the TimeDistributedDense (plus returning the sequences in the LSTM layer) is not appropriate?

Cheersm
Francesco

Tarnac

unread,
Nov 24, 2015, 4:38:32 PM11/24/15
to Keras-users, YD...@slb.com, francesc...@gmail.com
I'm confused about this example. If one is predicting 10 timesteps into the future, I thought one would be using TimeDistributedDense since multiple timesteps will be predicted. The way OP programmed it, the model is outputting with 10 neurons. My limited understanding would be this case would be covered by  two neuron output (one for each signal/stock/feature). And each of these outputs will have 10 timesteps... 

Yerbury

unread,
Nov 25, 2015, 7:34:57 AM11/25/15
to Keras-users, YD...@slb.com, francesc...@gmail.com
Do you have any real-world results to share?

Dmitry Lukovkin

unread,
Nov 30, 2015, 2:45:56 PM11/30/15
to Keras-users, YD...@slb.com
Hello,

Tried to play with this example, and there are some points:
  • I wasn't able to compile last edition of the code, so I made some corrections. Working (keras 0.2.0) code is here - https://gist.github.com/lukovkin/1aefa4509e066690b892, changes are in #set up model section.
  • Of course split of validation set should be added (better with separate test set for out-of-sample testing). Shuffling should be tried also
  • Ran it against ES1 (S&P500 E-mini) and CL1 (Light sweet oil) cont. prices from Quandl, it works, but the loss seems to be very high. 
  • May be it worth to use log prices instead of prices - prices time series could be in different scale, it affects network performance. Loss chart seems to be more robust at least (but keep in mind that we downscaled inputs, so loss function inevitably would go down). May be it also worth playing with standardization of inputs.

  • Training sample size of 100 - it is doubtful that price points as distant as 80-100 timesteps ago would affect points in the future (y(t+1)...y(t+10)) the way it could be effectively accounted for. May be it would work better with shorter samples. From previous experience, effective sample length for daily prices for NNT was about 20-30. I also tried to determined effective sample length using Embedding Dimension from dynamic chaos theory, and it also gave me lengths about 20-25 timesteps (but it needs additional testing)..
  • From the previously done work - open, high, low prices could also matter (or at least some function of OHLC that could reflect them all).
I am working on this subject too and have some additional findings to discuss if it is interesting.

Auto Generated Inline Image 1
Auto Generated Inline Image 2

francesc...@gmail.com

unread,
Dec 1, 2015, 4:54:45 AM12/1/15
to Keras-users, YD...@slb.com, francesc...@gmail.com
Hi, I am not really sure what TimeDistributedDense does, I just used a normal Dense layer with linear activation. The OP outputted 10 neurons to predict the next 10 steps of the *first* stock. I assume he thinks the second is related but does not want to predict that one.

I made a toy example with a sine wave. To prove the point above, I put a related second sequence, which is cosine, but I predict only sine. Hope it makes you understand :) ... I predict 500 data points in the future...

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, TimeDistributedDense
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM
from keras.preprocessing.sequence import pad_sequences
from keras.utils.layer_utils import print_layer_shapes


#sine and cos wave
import numpy as np


X
= np.linspace(0,1000,10000)
Y
= np.asarray([np.sin(X),np.cos(X)]).T


# data prep
# use 500 data points of historical data to predict 500 data points in the future
data
= Y
examples
= 500
y_examples
= 500

nb_samples
= len(data) - examples - y_examples


# input - 2 features
input_list
= [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat
= np.concatenate(input_list, axis=0)


# target - the first column in merged dataframe
target_list
= [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat
= np.concatenate(target_list, axis=0)


# set up model
trials
= input_mat.shape[0]
features
= input_mat.shape[2]
print trials
print features
hidden
= 64
model
= Sequential()

model
.add(LSTM(input_dim=features, output_dim=hidden))
model
.add(Dropout(.2))
model
.add(Dense(input_dim=hidden, output_dim=y_examples))

model
.add(Activation('linear'))
model
.compile(loss='mse', optimizer='rmsprop')


# Train

model
.fit(input_mat, target_mat, nb_epoch=2)
print_layer_shapes
(model, input_shapes =(input_mat.shape))

Regards,

Francesco

Tarnac

unread,
Dec 1, 2015, 2:03:36 PM12/1/15
to Keras-users, YD...@slb.com, francesc...@gmail.com
My understanding is that without TimeDistributedDense layer the network is NOT trained for the temporal relations in the data. With just a dense layer, it treats the data point by point. If there're temporal relations in the data, they'll be lost. I think this is how it works but I'm also not 100% sure. 

Tarnac

unread,
Dec 1, 2015, 3:55:03 PM12/1/15
to Keras-users, YD...@slb.com, francesc...@gmail.com

For the LSTM

Output shape:

  • if return_sequences: 3D tensor with shape: (nb_samples, timesteps, output_dim).
  • else: 2D tensor with shape: (nb_samples, output_dim).
If the return_sequences=True then it'll return the timesteps. Otherwise, it treats the timesteps as features and not look for dependencies in time. if return_sequences=True, then TimeDistributedDense layer will be needed. This is my understanding/ 

Dmitry Lukovkin

unread,
Dec 1, 2015, 4:43:05 PM12/1/15
to Keras-users, YD...@slb.com, francesc...@gmail.com
There is also discussion about TimeDistributedDense on keras github as well - https://github.com/fchollet/keras/issues/1029#issuecomment-160845826.
With useful explanations by EdwardRaff.

My outcomes of usage of LSTM and TimeDistributedDense for time series prediction are mixed. Intuition that I had was the following:
  • The input time series x(t-n), x(t-n+1),... x(t) produces outputs h(t-n), h(t-n+1),... h(t).
  • Time series is autoregressive in some kind, so its value on the n-th time step depends on (n-1)-th time step, etc.
  • So we can suppose for the model that outputs on the current step should inputs of the the next step, and the last output in the resulting sequence will be prediction x_hat(t+1): h(t-n) = x(t-n+1), h(t-n+1) = x(t-n+2), ..., h(t-1) = x(t), h(t) = x_hat(t+1).
I ran some models based on this intuition, like (this model uses LSTM-like network from Jozefowicz et al. paper):
modelR14J = Sequential()
modelR14J.add(JZS1(64, input_dim=4, input_length=32, return_sequences=True))
modelR14J.add(JZS1(4, return_sequences=True))
modelR14J.add(TimeDistributedDense(4))

After data preparation I had an array of inputs with length of 32 time steps each and array of outputs of the same length, shifted by 1 time step into the future.
So idea was to predict on 1 time step into the future.
I obtained rather good loss results, but when I ran a prediction on the validation data, I found that model was actually just 'imitating' input data as outputs, so the h(t) became equal to x(t) - not x(t+1), as were planned.

Black is the target, and blue is the output. If we shift output 1 step back, we will see an almost perfect fit.

So it seems that intuition was wrong, and LSTM+TimeDistributedDense need to be combined in another manner in order to provide reasonable results.
Auto Generated Inline Image 1

p.nec...@gmail.com

unread,
Dec 4, 2015, 6:21:39 AM12/4/15
to Keras-users, YD...@slb.com, francesc...@gmail.com
Your problem here is that financial data is just a nightmare to try to predict.
Your network basically assumes that no real change takes place so that's why the copying behavior

develope...@gmail.com

unread,
Jan 24, 2016, 11:36:31 PM1/24/16
to Keras-users, YD...@slb.com, francesc...@gmail.com
Dmitry, i playing around with your example  https://gist.github.com/lukovkin/1aefa4509e066690b892  with Keras 0.3.1 , Theano 0.7.0 with .theanorc file configured correctly   and quandl 2.8.9 install  but i get the error .


/Users/macpro/anaconda3/bin/python /Users/macpro/allcode/keras-master/examples/mem-ts-RNN.py
Using Theano backend.
(8097, 2)
Traceback (most recent call last):
  File "/Users/macpro/allcode/keras-master/examples/mem-ts-RNN.py", line 44, in <module>
    (X, Y) = _load_data(_prices.values)
  File "/Users/macpro/allcode/keras-master/examples/mem-ts-RNN.py", line 19, in _load_data
    for i in range(0, data.shape[0]/steps-1):
TypeError: 'float' object cannot be interpreted as an integer

Process finished with exit code 1

it is hitting this line:

def _load_data(data, steps = 40):
docX, docY = [], []
for i in range(0, data.shape[0]/steps-1):
docX.append(data[i*steps:(i+1)*steps,:])
docY.append(data[(i*steps+1):((i+1)*steps+1),:])
alsX = np.array(docX)
alsY = np.array(docY)
return alsX, alsY

Any idea ? i am still new to Keras and learning Keras

Dmitry Lukovkin

unread,
Jan 25, 2016, 3:14:55 AM1/25/16
to Keras-users
Sorry cannot reproduce it. I think it could be just because in the specific case data.shape[0] couldn't be divided by steps (=40 by default) with the integer result (gives float as a result). Try to convert division result in integer explicitly:
def _load_data(data, steps = 40):  
    docX, docY = [], []
    for i in range(0, int(data.shape[0]/steps)-1):
        docX.append(data[i*steps:(i+1)*steps,:])
        docY.append(data[(i*steps+1):((i+1)*steps+1),:])
    alsX = np.array(docX)
    alsY = np.array(docY)
    return alsX, alsY

Also, 
from keras.utils.layer_utils import print_layer_shapes
shouldn't work with keras 0.3.1, as soon as it was removed from keras.

Best regards,
Dmitry Lukovkin
Message has been deleted

develope...@gmail.com

unread,
Jan 26, 2016, 8:04:18 PM1/26/16
to Keras-users
Thanks Dmitry, have you tried implementing the Time-series modeling with undecimated fully convolutional neural networks (UFCNN) as described in this recent paper. Is it possible for you to write up the Keras implementation code using the Pseudocode in the paper and Quandl data 

Dmitry Lukovkin

unread,
Jan 27, 2016, 2:30:17 AM1/27/16
to Keras-users, develope...@gmail.com
Very interesting, haven't seen this paper yet, thanks.
I will have to have a closer look. Pseudocode in the paper relates to the strategy, not to the UFCNN implementation, so it is to find out how exactly to implement it.
At the current stage I can tell that it is strange, but my experiments show better performance of CNN (1D) than LSTM on financial time series, so any architectural improvement of CNN in application to TS could have a big effect. But there are still a lot of options to explore both with LSTM and CNN.

Best regards,
Dmitry Lukovkin

develope...@gmail.com

unread,
Jan 29, 2016, 10:29:24 PM1/29/16
to Keras-users, develope...@gmail.com
Thanks ,
i have been playing around with CNN(1D) and DBM on Financial time series . Do you have any idea on how to implement a continuous CNN(1D) training for real-time market data such as forex so that it receives real-time data , preprocess, train, fit & evaluate , predict and trade according to a logic  but it must be continuous 24 hours with probably window shifting , if you have a sample that would be fantastic. Any pointers?

best regards
Developer

Dmitry Lukovkin

unread,
Jan 30, 2016, 2:49:43 AM1/30/16
to Keras-users, develope...@gmail.com
Everything depends on the timeframe you are going to trade on.
We have implemented full pipeline for the stocks and daily data in our project - http://stocksneural.net.
Though it is not Keras-based and now we're working on improving the models and migration to Keras.
We planned for the intraday implementation too, but not on ticks frequency, 5 minutes for the start.

Best regards,
Dmitry Lukovkin

sdobro...@gmail.com

unread,
Feb 20, 2016, 1:58:46 AM2/20/16
to Keras-users, YD...@slb.com
Hi, I tried to feed the model with two identical sets (identical just for the sake of testing) of S&P 500 changes from the previous session (normalized to -1..1 range) to predict changes for the following n sessions. I've found that the training loss is going down as expected, but the validation loss stays the same (i.e. if I am not mistaken the network is not learning anything). Any idea why?

Alexunder

unread,
Feb 20, 2016, 12:59:57 PM2/20/16
to Keras-users
I'm facing the same thing. I hope I'm wrong but that could mean your validation and training sets have nothing in common to generalize on. At least with existing features and architecture. I'm very surprised and impressed how can single-layer LSTM achieve >80% accuracy on IMDB sentiment dataset in a couple of minutes and be absolutely helpless with time-series at the same time.

DSA

unread,
Feb 22, 2016, 6:43:37 PM2/22/16
to Keras-users
Alexunder - your point regarding the validation and training sets having nothing in common to generalize on, was spot on. To see if that's the case, I've replaced S&P 500 series with the US unemployment series (https://www.quandl.com/data/FRED/USAURHARMQDSMEI-Harmonized-Unemployment-Rate-All-Persons-for-the-United-States), which is much smoother. The same model worked just fine - both training and validation losses were going down and the predicted data made sense. The time series that I've used only had data through 2011, and the projected data after 4 runs with slightly different hyper parameters all looked like continuation of the trend, by the current time (i.e. 4 years ahead) arriving from 8+% to about 5.5% unemployment rate, which is pretty close to the actual unemployment rate of 4.9% now.

That in my mind validates that the model discussed in this tread works for time series forecasting, but for data with high noise ratios (such as S&P 500 daily data) perhaps needs to be tweaked further (or the series itself smoothed out).

Stefan Steuerwald

unread,
Feb 23, 2016, 5:07:40 AM2/23/16
to Keras-users
I'd recommend reading the book "The Signal and the Noise" by Nate Silver. It's about the predictability of things. There's case studies on baseball, stock market, earthquakes, weather etc. Weather and sports seem much more predictable than economic and market data.

I am day-trading index futures (German FDAX) on a seconds to minute timeframe. From this experience, I'd like to suggest going to intra-day data (tick-by-tick), as there are repeated small-scale patterns to be seen there. There may be more opportunity for a NN to generalize. And you'll have millions of data points to train on (this data is not readily available, it can be bought, though). The reason there is patterns is that the market participants act in a certain predictable way, because their actions are rehearsed, trained and they're professionals selected for performance. A large commission order is handled in a specific way, because there is a specific way to execute it optimally (i.e. get the best average price or for the customer). Arbitrage works under defined rules and necessities, so do short term traders, because there is a specific and optimal way to turn around a sizable position of a few 100 contracts or so (which cannot be flipped around instantly, because that would ruin the prices, so it has to be done bit by bit, which leaves traces in the price tape for everyone to see).

On a longer timeframe, more and more macro-scale events are influencing prices, which are not visible on a seconds time frame. The longer the timeframe, the more uncertain things become. I believe markets are reflexive in nature, i.e. the market participants are reacting to changes they see and in turn influence the system's future. That's not the case with the spectator of a baseball match or with the weather (on a short timescale). That's why I very much doubt that markets can be predicted satisfactorily by a simple NN without any background knowledge about the various market participants, their intentions and behaviour, their means to execute and the limits they have to observe.

On the other hand, I'd guess the machine learning departments of Goldman & Co are quite advanced in this, without talking much about it. A guy wielding keras and LSTMs is probably not going to beat them. However, there is actually no need to beat them, as they're operating in their own realm - they need to make many millions to pay off the effort, a one-person day trader only needs to make a few hundred a day. Maybe a NN can learn the short-term patterns of the big boys and jump on the train of a 100 contracts moving in one direction with 1 contract of its own. So, no harm in trying :-)

Ernst

unread,
Feb 23, 2016, 4:27:41 PM2/23/16
to Keras-users
Stefan, this is very interesting.

I am trying to predict EUR/USD 5 second bars with keras myself, and was not so successful until now.

I could predict the absolute rates reasonably well, but not with enough accuracy. As shown above, in my case, the system took the last rate and assumed it to be similar to the next.

However, when it comes to predicting the direction of the coming rate change or even its size, the net gets problems. Gradients diverge, explode or implode, the backprop does not converge, etc...



Stefan Steuerwald

unread,
Feb 23, 2016, 8:06:52 PM2/23/16
to Keras-users
Ernst,

I don't know the characteristics of the EUR/USD market well enough. In any case, if I have no idea what will happen, predicting the current value as the next one usually is the best bet I can make. I assume your NN did just that. From what I read, it makes a big difference how data is preprocessed, i.e. what are you feeding into the NN, absolute prices, relative price changes (returns), log returns, up/downtick only, etc etc. I am mostly clueless about NNs, so I have no idea why gradients diverge.

Thinking about it, if I were to try a NN on the markets, I would consider this (sorry for elaborating at length, but it's fun pondering):

1. Pick one market, and one market only (EUR/USD is fine, I only know the FDAX = DAX futures). I'm willing to bet money that a NN trained on one market sucessfully will utterly fail with another.

2. Go intra-day, tick by tick if I can. 1 minute bars will already aggregate and lose too much information. I look at how a 1 minute candle develops during that minute to support a trading decision. 5 seconds may be fine.

3. Ask an expert which other markets are influencing my chosen market. In the case of the FDAX, you can't ignore the DAX itself, EUR/USD, WTI these days, Bund futures, maybe USD/JPY, US indices starting at 15:30 each day). For example, most of the time, the FDAX has a negative correlation to EUR/USD. Why? There's a macroeconomic explanation. When the Euro goes down, German exports become cheaper for the rest of the world -> more exports. The dominant export-sensitive stocks in the DAX are autos, so autos go up, so the DAX goes up. The DAX is coupled to the FDAX by arbitrage, so influencing the market I'm trying to predict. This correlation holds most of the time, on a minute time scale, but not always, as it is sometimes buried beneath "more interesting" economic factors. Today the correlation might be strong, tomorrow weak, next year it might be gone. If it's there you trade it, if not you look for something else. In any case, this additional data needs to be presented to the NN.

4. Pay attention to daily market phases and pick one. EUR/USD (which is open 24 hours a day) is behaving differently during the European session than during the Asian session. FDAX is different 09:00 to 11:00 / mid-day / before 9 a.m. / after US open.

5. Pay attention to other regular events, e.g. options expiry dates. The FDAX, for example, is heavily influenced by DAX option trading. Writers of DAX call/put options hedge their positions in the FDAX. Whenever option expiry day is near, you can see the effects of the writers adjusting their hedges as the DAX moves, which means buying/selling hundreds or thousands of contracts, which will either amplify or dampen the market moves, depending on the position the writers hold. If I am unwilling to feed option data into the NN (i.e. the open interest table, only available daily), I should not use data around option expiry date. EUR/USD is totally different in this respect, serves just as an example that I need to know how my market works.

6. Discard news-related price movements from the data. Every day economic data is being published and there might be a corresponding price movement (little or huge). No sense confusing my NN by that. News trading is an art by itself and highly automated.

7. Going back to my first point, I wonder if I can find a market with a very small number of market participants. The less participants, the less influencing factors, the simpler. EUR/USD is the exact opposite, there's few things more complex than forex (I believe). Maybe I'd look at the Bund future (FGBL) or the US equivalents, again I don't know those markets, but arbitrage and options are not a factor there, so it's less market participants to worry about and understand, so the task for the NN should be easier. Obviously, the market chosen must have sufficient liquidity and a low cost of trading. EUR/USD is best in that respect, FGBL should be ok.

Finally, I am unsure about a sequence-analyzing NN like a LSTM for financial time series. Maybe a convnet is a better match for the task. I have toyed with reproducing Shakespeare using the character level LSTM keras has as an example. Market ticks or bars taken as "characters" will be very noisy, at any given time there'll be multiple market participants at work, their actions showing up interleaved at random on the tape. Maybe this is like training a LSTM not on pure Shakespeare, but on a text randomly generated from a mixup of Shakespeare, Schiller, Wikipedia, the tax code and the National Enquirer, all mixed together character by character. Garbled noise in, garbled noise out, but then that's what markets are. Shakespeare is still in there somewhere.

As I said, I am no NN expert (an interested layman at best).

Ernst

unread,
Feb 24, 2016, 6:31:31 AM2/24/16
to Keras-users
Stefan,
thank you very much for your mail, it contains a lot of very interesting thoughts!

1) Gradients - I am a layman on NN also, so be warned :-). I think that has to do with the fact that the differences to learn are so subtle, that small numbers become important, and the derivatives of this small numbers get blown up. NN-Experts: please correct me if I am wrong. I have read that in general, RNNs are hard to train because of this gradient problem, and that LSTM are built to improve that. It looks like the convergence of my NN depends on Activation, Optimizer, Objective Function,... so I am doing a grid search to test their influences.

2) I used the net that does not predict currency rate changes well to predict rate changes in sinus curves (taken with the same sampling frequency as my FX-rates) and that works very nice. FX rates are way more complex, I guess.

3) Roni Mittelman (http://arxiv.org/abs/1508.00317) descibes he wanted to train a LSTM to predict stock quotes, and failed because of exploding gradients. He describes his own UFCNN (Undecimated Fully Convoluted Neural Net) that works well according to the paper. I would be very interested in testing Roni Mittelmans UFCNN as published. However, the information given is not sufficent for a newbie in NN like me to reproduce this in Keras. I asked Roni for hints, but it seems he is now working for BlackRock, and may be not allowed to talk about his research any more. He used Caffe.

4) Using CNN instead of LSTM is a very good idea, I will try that and report back. Roni has reported that the normal CNNs do not perform well for our task, but I will research this in detail, anyway.

5) I will try training on tick data, but I will research other markets first where there are better chances, as you suggest. Until now, I shied away from futures because one needs to change the instruments regularly before expiry, and it seems to be an additional task for the net to learn that.

6) The input for the net is a big problem IMHO as you state. In Ronis paper he predicted stock quotes from an online challenge (http://www.circulumvite.com/home/trading-competition). It looks like they use real quotes, but they also provide technical indicators as input to train the net.

7) The input needs to be scaled to fit for the chosen activation function, and it should span a range as big as possible. You wrote that you read kind of input is important - do you happen to remeber where you read about this - I would like to learn more?

Thanks a lot,
Ernst

Dmitry Lukovkin

unread,
Feb 24, 2016, 8:07:45 AM2/24/16
to Keras-users
Pretty interesting discussion, guys!

The best results I could obtain with LSTM (actually, GRU) so far are using stateful network training. Right now it is a one-step ahead configuration, but it breaks very fast if I try to make multistep prediction with it.
I agree that performance depends on hyperparameters choice a lot, so some kind of hyperparameters optimization is required. And there is still a tendency to predict last observed value.

3 and 4) I've read this article too, it is interesting. It's rather obvious how to implement FCNN in Keras, but another thing is U (Undecimated) ;) Unfortunately this part in the article is not so specific, I think that I understand what is need to be done, but it seems that it will require building "Undecimated" versions of convolutional layers on top of the Keras. Roni Mittelman is also performs classification, not regression, but it is not a big issue.
I've tried CNN on their own, they seem to be working like some kind of filter (see picture of Y vs Y_hat below).

I think it could be promising to combine CNN (for feature detection) with LSTM (for taking into account temporal relationships), may be in Graph structure (not Sequential).

Best regards,
Dmitry Lukovkin
Auto Generated Inline Image 1

Stefan Steuerwald

unread,
Feb 24, 2016, 9:12:17 AM2/24/16
to Keras-users
Hi Ernst,


7) The input needs to be scaled to fit for the chosen activation function, and it should span a range as big as possible. You wrote that you read kind of input is important - do you happen to remeber where you read about this - I would like to learn more?

 
thank you, just a quick reply - I remember a paper where linear returns, log returns and something else were compared for performance, with log returns winning. Sadly, I don't remember it. Might have been in the context of support vector machines.
Here's an article detailing some mathematical reasons for log returns, not all of which appeal to me: https://quantivity.wordpress.com/2011/02/21/why-log-returns/
Note also the two other texts cited in there.

More later.
Stefan.

Dmitry Lukovkin

unread,
Feb 24, 2016, 10:02:13 AM2/24/16
to Keras-users
I agree that probably log returns should win (and most econometric models use them), but in my case it hadn't happened, at least definitely not in LSTM/GRU case. I had a strong expectation that log returns will be winning in CNN case, right now they don't, but I will try some more.

Ernst

unread,
Feb 24, 2016, 11:48:24 AM2/24/16
to Keras-users
very interesting, indeed :-)

1) Stefan thanks for the reference,  I will try log returns, the papers are very interesting!

2) Dmitry, thanks for the hint to try stateful LSTM/GRU. I ll give it a try. Do you use batchsize = 1, or do you reshuffle your arrays so that the entries with the same minibatch-ids are continuous?

3) The y / y_hat looks very interesting - like a moving average or so.
 
4)  I also had the idea to combine LSTM and CNN somehow, but have not tried it yet. I thought of a 1 D Convolution, there is a nice example in Keras I'd like to use

5) I agree, I guess we could make a FCNN with the papers available, but the U seems to be the problem. Description is a bit thin, I have not figured out how to do it. I found a paper: http://scholar.google.at/scholar_url?url=http://scholarship.rice.edu/bitstream/handle/1911/20049/Lan1995Non5NoiseRedu.PS%3Fsequence%3D1&hl=en&sa=X&scisig=AAGBfm1Ncd14dfL5W4iwS-VF6lRobMD1Lg&nossl=1&oi=scholarr&ved=0ahUKEwio24XG5JDLAhXkJXIKHbgAD6oQgAMIGigAMAA (Noise Reduction by an Undecimated Discrete Wavelet Transform), it looks like a way to filter the noise for the FCNN.

See you,
Ernst

Ernst

unread,
Feb 24, 2016, 12:00:59 PM2/24/16
to Keras-users
Re UFCNN: Better is this paper: The Undecimated Wavelet Decomposition and its Reconstruction http://jstarck.free.fr/IEEE_Undec07.pdf

Dmitry Lukovkin

unread,
Feb 24, 2016, 12:15:55 PM2/24/16
to Keras-users
Ernst,

2) I use different batchsizes (performing parameters search on them), but timesteps=1, as in the example - https://github.com/fchollet/keras/blob/master/examples/stateful_lstm.py. I do not shuffle input data in order to keep 'statefullness'. Some time ago I've tried with timestep bigger than 1, it was working too, but it was just a trial run, I will play with this config later. Only one burden - in the stateful config length of input data (nb_samples) should be divideable by batch_size (or maybe batch_size * timesteps) without remainder, otherwise it fails. Also behavior is different on Theano and TensorFlow if you try to work with TimeDistributedDense as the output - it works on TF (don't know if it works correct yet), but fails on the Theano.
5) I've started with browsing through the code of some SWT (UWT) implementations, but didn't have a lot of progress yet. In general it is comprehensible, but details are not clear, especially regarding left side of the picture in the article - what is to be concatenated and how.

Ernst

unread,
Feb 25, 2016, 2:36:48 AM2/25/16
to Keras-users
Dmitry,

2) thanks, the example for the stateful LSTM is very interesting!

5) yes, the UFCNN is a bit mysterious. Unfortunately, there are a lot of things on my todo-list, but if I am not successful with LSTM, GRU or LSTM/GRU&CNN, I will try to get the UFCNN working. I ll keep you posted...

Have you ever tried to feed technical indicators (Moving Average, RSI,..) into the LSTM as additional input? In the input files for the competition Roni Mittelman describes in his paper there are several technical indicators, but they do not reveal what kind of indicators.
And if a CNN acts as some kind of indicator / filter as shown in your figure, feeding precalculated indicators into a LSTM might be like combining CNN and LSTM, but in a cheap way, and more flexible since the precalculated indicators could more complex/diverse than just convolutions.




Stefan Steuerwald

unread,
Feb 25, 2016, 3:15:40 AM2/25/16
to Keras-users
Hi Ernst and Dmitry,

sorry for being slow answering, I'll just try to keep up... Have read this entire thread by now and followed a few links, fascinating things you have here.



2) I used the net that does not predict currency rate changes well to predict rate changes in sinus curves (taken with the same sampling frequency as my FX-rates) and that works very nice. FX rates are way more complex, I guess.

Definitely. I suspect FX rates or other prices are actually not predictable at all, at least not for timeframes that exceed the typical time a professional short term trader takes to turn around a position or react to a changing market (i.e. a few minutes at most in the case of the FDAX). Very short term, I, as a human, can see where things going because I have an idea what the other market participants are doing, so from experience and knowledge I have a hunch of what'll probably happen next.

Thinking about how I trade, I actually don't need to _predict a price 1 minute from now. I can do with less precision, so to say. I need a direction and an idea how far (approximately) a move can go. I need to _classify the price pattern I'm seeing to be able to decide to enter a trade or to stay out of the market. Same thing when exiting a trade. I need to decide whether I would still classify the pattern as the same class that persuaded me to enter a trade. As soon as that is no longer the case, I exit (or when my stoploss or takeprofit limit is reached).

I mean, is prediction the task to be solved? Or is it classification? But then, where do we get the class labels from?
 

3) Roni Mittelman (http://arxiv.org/abs/1508.00317) descibes he wanted to train a LSTM to predict stock quotes, and failed because of exploding gradients. He describes his own UFCNN (Undecimated Fully Convoluted Neural Net) that works well according to the paper. I would be very interested in testing Roni Mittelmans UFCNN as published. However, the information given is not sufficent for a newbie in NN like me to reproduce this in Keras. I asked Roni for hints, but it seems he is now working for BlackRock, and may be not allowed to talk about his research any more. He used Caffe.

Publishing such papers is the best way to get a job ;-)
 

5) I will try training on tick data, but I will research other markets first where there are better chances, as you suggest. Until now, I shied away from futures because one needs to change the instruments regularly before expiry, and it seems to be an additional task for the net to learn that.

That's right, futures expire and some positions are rolled over, some not. Doesn't matter for intra-day trades. On expiry day, a future will behave especially "wierd", so I'd exclude these days. When incorporating the underlying's prices, there's the additional complication that the fair value of the future will differ for different expiry dates, e.g. for the FDAX the June expiry spans dividend season, so FDAX prices are way higher than for the other expiries. The DAX is calculated such that dividends are reinvested, and arbitrage needs to adjust their position in the stock every time a dividend is paid out. That topic is not an easy one and also involves the individual tax situation of the arbitrageur, so yes, futures are more complex in that respect.

Also, futures that are settled by physical delivery behave differently from those that are cash settled.

I'm having great fun following your discussion. I'll try to contribute some more aspects, not on the details of NNs, but from a trader's perspective.

Stefan.

Alex Potapov

unread,
Feb 25, 2016, 5:59:13 AM2/25/16
to Stefan Steuerwald, Keras-users
Hi, Stefan, could you please describe your trading system a little bit more? Or maybe give us links to resources. Is it scalping? What are your trading decisions based on? Is it some sort of indicators, patterns like Elliott waves, or pure experience and intuition?

Or is it classification? But then, where do we get the class labels from?

One can use trend indicators, for example Zigzags for labeling. Last two days I was experimenting with ZZ trend prediction system based on simple 4-layer deep net and M15 bars. Details are here https://www.mql5.com/en/articles/1103. I can post some stats and graphs, but results are disappointing. System shows good performance on random validation data intertwined with training set, but becomes much worse on bars next to training set. Will try recurrent and convolution versions though.
--
You received this message because you are subscribed to a topic in the Google Groups "Keras-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/keras-users/9GsDwkSdqBg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/d76cbec6-9296-4dcc-b837-dd99e832249e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ernst

unread,
Feb 25, 2016, 8:24:18 AM2/25/16
to Keras-users
Hi Stefan,


sorry for being slow answering, I'll just try to keep up... Have read this entire thread by now and followed a few links, fascinating things you have here.

thanks for your answer !
 

2) I used the net that does not predict currency rate changes well to predict rate changes in sinus curves (taken with the same sampling frequency as my FX-rates) and that works very nice. FX rates are way more complex, I guess.

Definitely. I suspect FX rates or other prices are actually not predictable at all, at least not for timeframes that exceed the typical time a professional short term trader takes to turn around a position or react to a changing market (i.e. a few minutes at most in the case of the FDAX). Very short term, I, as a human, can see where things going because I have an idea what the other market participants are doing, so from experience and knowledge I have a hunch of what'll probably happen next.
 
How long do you hold your trades - is your gut feeling telling you - I hold this for 20 mins or so?
 
Thinking about how I trade, I actually don't need to _predict a price 1 minute from now. I can do with less precision, so to say. I need a direction and an idea how far (approximately) a move can go. I need to _classify the price pattern I'm seeing to be able to decide to enter a trade or to stay out of the market. Same thing when exiting a trade. I need to decide whether I would still classify the pattern as the same class that persuaded me to enter a trade. As soon as that is no longer the case, I exit (or when my stoploss or takeprofit limit is reached).


 
I mean, is prediction the task to be solved? Or is it classification? But then, where do we get the class labels from?

I try to predict the rate for T + 10 min. I assume, this is short enough so it can work and long enough to make profit.

I tried to use a classification approach, also but it failed. I used the classes BUY / Do Nothing / SELL. But the program can not predict with 100 %  accuracy and counts the mistakes. However, there are several kinds of mistakes: a false "Do nothing" is not a problem and will happen to any trader, it could be almost ignored when counting errors. Mistaking a SELL for a BUY, however, will cost money and should not  happen too often.

The loss functions available for NN however, do not take into consideration this different kinds of error.

If the data contains 55 % BUY trades and 45 % sell trades, the system could suggest 100 % BUY trades and would be right 55 % of the time. I would not trade this, however.


Publishing such papers is the best way to get a job ;-)

:-)
 

That's right, futures expire and some positions are rolled over, some not. Doesn't matter for intra-day trades. On expiry day, a future will behave especially "wierd", so I'd exclude these days. When incorporating the underlying's prices, there's the additional complication that the fair value of the future will differ for different expiry dates, e.g. for the FDAX the June expiry spans dividend season, so FDAX prices are way higher than for the other expiries. The DAX is calculated such that dividends are reinvested, and arbitrage needs to adjust their position in the stock every time a dividend is paid out. That topic is not an easy one and also involves the individual tax situation of the arbitrageur, so yes, futures are more complex in that respect.
 
Also, futures that are settled by physical delivery behave differently from those that are cash settled.


The biggest problem I see is when and how to move to the next contract. As you mention, it is not a problem of rolling over physical positions when short term trading, but I assume the NN needs new training when moving from the January contract of FDAX to the April contract for instance.

 
I'm having great fun following your discussion. I'll try to contribute some more aspects, not on the details of NNs, but from a trader's perspective.

This is very valuable to me.

I think, one needs more input than just quotes to a NN to predict / classify. What indicators would you feed to the net additionally. I thought of 1-3 moving averages, a RSI and possible a volatility measure. What would you as a trader recommend?

Thanks
Ernst
 
Stefan.

Stefan Steuerwald

unread,
Feb 25, 2016, 8:55:32 AM2/25/16
to Keras-users, sals...@gmail.com
Hi Alexunder,


On Thursday, February 25, 2016 at 11:59:13 AM UTC+1, Alexunder wrote:
Hi, Stefan, could you please describe your trading system a little bit more? Or maybe give us links to resources. Is it scalping? What are your trading decisions based on? Is it some sort of indicators, patterns like Elliott waves, or pure experience and intuition?

It's a scalping-type approach, average hold time is less than 1 minute. The main idea is not to trade the chart or use indicators or any other mechanical system, but to know who is doing what and why. I am not moving the market. Others are. My opinion is of no value. I need to get into the heads of the other market participants and find out their opinion. Then I can jump on a move and be carried with it for a few points.

Discussing charts and systems is certainly way off topic here in keras - let me do it anyway, as I hope to clarify which data a NN would need to process from my perspective. Here's a video snippet (9 mins long, watching in fast forward is ok): https://www.dropbox.com/sh/qcywd2p0rtd7ri4/AADFe6IAJhoipwItt_d_0ga8a?dl=0

What you see is the FDAX two days ago around 11:30, 1 minute candles. This is a pretty random example. It is nearing mid-day, no big moves. Let's just watch the chart, eyes on, brain off: There is a blue horizontal line at 9474.5 from a previous low. the price is testing this low 3,4,5 times, for minutes, even going a few points below that, without being able to actually break below and continue downwards. Then price disconnects from the blue line and starts upwards, up until nearly touching a previous local high that was broken.

Now let me explain what I am thinking while I see this and what triggers me to enter a position:
1. DAX is dormant, so's WTI oil, EUR/USD is moving very little (I get that information from a sideways glance at some other charts).
2. With no movement and no commission order in the market, the dominant market participant at this time is short term traders (STT's). A very important conclusion!
3. I see price is being pushed below the blue line. Downwards pushes seem jerky to me. I think: STT is shorting to push below the level to see if they can trigger a downwards move. This would enable them to cover their shorts lower for a profit.
4. I see this happening a few times. The thinking is always: STT is shorting the market, i.e. they are taking shorts on their books, i.e. they are taking risk. Taking risk only makes sense if they can reasonably expect to make a profit, so that is what they actually do expect. However, no one jumps on and the blue level is not decisively broken downwards.
5. After a few tries, they give up and what now? They have shorts on the book, altogether a few dozen to a hundred contracts maybe, they need to cover. I know that. They can't try to short the market forever without something happening, so eventually they'll give up (otherwise they'd deserve a kick, no sane trader repeats the same non-working thing forever). This is where I jump on and go long. I did that trade from 79 (market long) to 85 (limit exit, at the previous high), +6 points.

The point where exactly I enter the trade comes from experience and watching how exactly the candle develops. Does it go fast up, down slow? Jerky down, smooth up? The other way round?

Summary: I am not trading the chart or the pattern I see. Instead, I try trading the books of the other market participants. I use the chart and the patterns I see to build and test a hypothesis of the positions of the others. If I know their positions, I can anticipate what they'll do next, with a reasonable probability of success.

What does that mean for a NN approach?
1. I believe if I can see patterns, so can a NN.
2. I see patterns in the individual ticks, a dead 1 minute candle on paper or the screen loses information. I could not do the same trading style with only finished candles presented to me.
3. Patterns are not enough. I build a market hypothesis which I believe is essential for trading success. How could a NN do that? If at all, the model would be implicit somewhere inside the ticking neurons. Is a separate market model needed, outside any NN? With NNs doing what they are best at and with some other technology worrying about a market model?
4. The relevant time span for my trading style is 5-15 minutes worth of history, plus older price levels. A NN would need to see that much data.

Further:
Price levels (my horizontal lines) are important, whenever STT is the dominant market participant. STT very much trades these levels. In the absense of some bigger market moving factor and other incentives, that is all they can do. Fibonacci retracements seem to play a similar role. Price levels become outright unimportant, whenever a commission order is being executed. The reason is that commission traders simply doesn't care for any chart levels, they only care about executing the order to the specs of the customer. Hence, it is vital to know if there's an active order or not.
Conclusion for NN input data: Determine suitable price levels somehow and feed them into the NN (e.g. is the current price near/over/under/crossing a price level). This should at least help in STT-dominated market phases (of which there are several every day).

Hope I could clarify my thinking a bit. I strongly believe that just feeding ticks into a NN and expect profit is not enough. At the very least, additional data beyond the mere price of the predicted market will be required.

One more example to make clear why I think patterns alone are not enough: Suppose trader A is 100 contracts long and sells 50, leaving 50 long. Trader B is flat and sells 50 contracts, so he's now 50 short. In both cases, the chart will show 50 contracts sold, price is going down, we will see a down candle. But the market opinions of trader A and B are substantially different: A is still long, expecting a rising market. B is short, expecting a falling market. While their expectations are different, their visible action was the same: sell 50. But their book looks different!!! If a NN only sees price action, it will have no clue what the biggies think and it will probably do no better than a coin flip.

I hope I am mistaken :-) A NN making trading profits would be way more cool!

Stefan.

Dmitry Lukovkin

unread,
Feb 25, 2016, 9:07:04 AM2/25/16
to Keras-users
Hi, Ernst

Some comments.


On Thursday, February 25, 2016 at 4:24:18 PM UTC+3, Ernst wrote:


I think, one needs more input than just quotes to a NN to predict / classify. What indicators would you feed to the net additionally. I thought of 1-3 moving averages, a RSI and possible a volatility measure. What would you as a trader recommend?

Regarding moving averages - if we consider them as a smoothing/filtering functions, it could be useful, but then there's another question of how to choose filtering function and its parameters. I'm not quite sure that using standard TA SMAs/EMAs with standard windows is a solid approach. I think that choosing of a smoothing function and its parameters selection is a parameters optimization or machine learning problem on its own. 
Right now we are using Kalman filter (and it's a different question how to obtain optimal parameters).
One of the reasons why I started to consider CNNs is to make smoothing/feature recognition a part of the learning process.

Dmitry Lukovkin

unread,
Feb 25, 2016, 9:21:57 AM2/25/16
to Keras-users, sals...@gmail.com
Pretty impressive insight, Stefan, thank you!

Stefan Steuerwald

unread,
Feb 25, 2016, 2:04:39 PM2/25/16
to Keras-users
Ernst,

here's a few more comments on your last message:

 
How long do you hold your trades - is your gut feeling telling you - I hold this for 20 mins or so?

Not my style, the FDAX is moving quite fast, most other liquid futures are slower and warrant many minutes of holding time. My average holding time is approx. 1 minute. Some trades need just seconds to hit their target 5-10 points away, some are in hope-and-pray mode for much longer (which is mostly a mistake). When the trade is not moving my direction right away, I should try get out immediately. Very difficult, fighting hope all the time.
 

I try to predict the rate for T + 10 min. I assume, this is short enough so it can work and long enough to make profit.

Sounds good. Every market and every trading style will have their own prediction window. I mean, there's people trading on a months timescale, or weeks, or days, all is fine as long as there's a positive return.
 

I tried to use a classification approach, also but it failed. I used the classes BUY / Do Nothing / SELL. But the program can not predict with 100 %  accuracy and counts the mistakes. However, there are several kinds of mistakes: a false "Do nothing" is not a problem and will happen to any trader, it could be almost ignored when counting errors. Mistaking a SELL for a BUY, however, will cost money and should not  happen too often.

The loss functions available for NN however, do not take into consideration this different kinds of error.

 
It should be possible to use a custom loss function in keras.
 
If the data contains 55 % BUY trades and 45 % sell trades, the system could suggest 100 % BUY trades and would be right 55 % of the time. I would not trade this, however.

 
You're right. Another thought: My market moves differently on the buy side than on the sell side. Sell is faster, more vicious, more forceful. Down fast, up slowly, also on a very short timescale. I believe this is true for many markets, except forex, because the gain in one currency is the loss of the other. Because of this asymmetry, I'd like to think specializing a NN only for the buy side / sell side might be beneficial. I can support this by the fact that there is trading patterns a professional learns to execute first (and more) on the short side than on the long side, e.g. re-shorting at a downwards break level after the price moves back up after the break.


The biggest problem I see is when and how to move to the next contract. As you mention, it is not a problem of rolling over physical positions when short term trading, but I assume the NN needs new training when moving from the January contract of FDAX to the April contract for instance.

From my experience, the market is not very different when moving from one contract to the next. The price level will change, but little else. All the volume goes to the new front contract after the last one has expired. Transition takes place during the day before expiry and the expiry day itself, so these days are special.

 
I'm having great fun following your discussion. I'll try to contribute some more aspects, not on the details of NNs, but from a trader's perspective.

 
This is very valuable to me.

I think, one needs more input than just quotes to a NN to predict / classify. What indicators would you feed to the net additionally. I thought of 1-3 moving averages, a RSI and possible a volatility measure. What would you as a trader recommend?


Glad I can contribute something.

Concerning indicators, I don't use any. Any indicator that aggregates a few bars, e.g. RSI with its standard 14 bars back, is nothing but history and of no use for what I'm doing. I am not saying indicators can't be useful for other trading styles or for a NN, but what does a 14 bar average do for me if I'm trading just a minute? Ok, it does provide me with an exact number instead of a guess, but I am happy just watching the price.

There's a million trading systems out there that heap indicator upon indicator in the hope of finding the holy grail. Often such systems perfom quite well in one market phase and totally fail in another. A simple example is MA-crossover systems, they work great in a trend and accumulate losses in a sideways market. So if a system doesn't work, I often see the thinking that one just needs to add another indicator, another trading rule and if that doesn't work it means we need to put more research into finding the ultimate indicator. I believe a purely mechanical system is doomed to fail.

Instead of classic indicators, maybe it's worth thinking about other numerical means to get a grip on what I call candle development, i.e. price action:
1. Are market orders hitting the bid or lifting the offer, i.e. is there active selling or active buying? I see this information flashing by in my trading software, and I guess I am using it, maybe unconsciously. To get this information, time&sales is not sufficient, one needs the order book (at least best bid / best offer) to determine which side of the book was hit by a traded volume.
2. The order book itself is not a big help in the FDAX. The order book only shows limit orders, and most short term volume goes through market. In the order book, you see traders showing their positions to arbitrage, which cannot work without a well populated book, but that's about all.
3. What's the volume traded right now? What's the time between price ticks? I'd like to distinguish fast-moving markets with lots of volume from slow-movers with little volume.
4. Maybe one can put a numeric value to how price develops as I see it - price coming up slowly, going down sharply, maybe doing a wavelet transform on a N ticks window to find out if there's small or large moves in there and which direction is small and which direction is large. Sounds esoteric (sorry, I was a physicist in a previous life) and I absolutely have no reason to suggest wavelets other than I like them. ;-)

This is all just ideas - I have no clue how far a working NN system is away. At current progress rates we should be there by next Thursday. It is certainly a lot of work and requires a lot of wits and countless hours, maybe years. Maybe it'd be faster to learn how to trade manually, if one is not interested purely academically in getting a NN to profit.

Wishing a good night,
Stefan.

Stefan Steuerwald

unread,
Feb 25, 2016, 2:18:06 PM2/25/16
to Keras-users, sals...@gmail.com
Dmitry,

glad you like it! Kalman filter, never used one. I have a faint idea what it is, I'll read up on it.

I have one more thing for today, please see attachment. I'd like to support my comment on discarding news-related moves from the input data.

This is the FDAX with yesterdays Markit PMI index coming out 15:45. PMI came out worse than expectations. In this case, the market has decided on a swift, medium-size down reaction, seemingly out of nowhere. This price movement is not deductible at all from previous prices or from any other data available. The movement is algos selling on seeing the number (which is being released in machine-readable form), with human short term traders jumping on, if they're any good (I guess you know that). After the move is through, the market happily continues, just 50 points lower. One should restrict one's system from trading around news releases.

On the other hand, there's often a secondary and tertiary move after the initial news move - I can imagine a specialized news-trading NN that is exclusively trained on news-related data. I know there's human traders specializing in that, so why not a NN.

Stefan.

FDAX Markit PMI bad.jpg

Ernst

unread,
Feb 25, 2016, 4:10:06 PM2/25/16
to Keras-users
Stefan,

thank you very much for your detailed information, his is very helpful. The description of your trading style and your movie is creating ideas how to feed the net.

To conclude: it makes no sense to feed the net indicators, because the net could create this information when presented with the data. But it  needs more data it can not remember / conclude itself, like last local highs / lows, orderbook, etc.


The loss functions available for NN however, do not take into consideration this different kinds of error.
 
 
It should be possible to use a custom loss function in keras.

Yes, but I think I will try to move into Reinforcement Learning to solve that. I guess that could be a way to train the net the trading style you describe.

Have you seen this page:

http://ai.marketcheck.co.uk/Forex (click on "Start" and observe).

It is built from Andrej Karpathy's page:

http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html
 
 
You're right. Another thought: My market moves differently on the buy side than on the sell side. Sell is faster, more vicious, more forceful. Down fast, up slowly, also on a very short timescale. I believe this is true for many markets, except forex, because the gain in one currency is the loss of the other. Because of this asymmetry, I'd like to think specializing a NN only for the buy side / sell side might be beneficial. I can support this by the fact that there is trading patterns a professional learns to execute first (and more) on the short side than on the long side, e.g. re-shorting at a downwards break level after the price moves back up after the break.

Interesting

The biggest problem I see is when and how to move to the next contract. As you mention, it is not a problem of rolling over physical positions when short term trading, but I assume the NN needs new training when moving from the January contract of FDAX to the April contract for instance.

From my experience, the market is not very different when moving from one contract to the next. The price level will change, but little else. All the volume goes to the new front contract after the last one has expired. Transition takes place during the day before expiry and the expiry day itself, so these days are special.
 
Sounds good, worth a try..

Glad I can contribute something.

Concerning indicators, I don't use any. Any indicator that aggregates a few bars, e.g. RSI with its standard 14 bars back, is nothing but history and of no use for what I'm doing. I am not saying indicators can't be useful for other trading styles or for a NN, but what does a 14 bar average do for me if I'm trading just a minute? Ok, it does provide me with an exact number instead of a guess, but I am happy just watching the price.

 
Instead of classic indicators, maybe it's worth thinking about other numerical means to get a grip on what I call candle development, i.e. price action:

1. Are market orders hitting the bid or lifting the offer, i.e. is there active selling or active buying? I see this information flashing by in my trading software, and I guess I am using it, maybe unconsciously. To get this information, time&sales is not sufficient, one needs the order book (at least best bid / best offer) to determine which side of the book was hit by a traded volume.
2. The order book itself is not a big help in the FDAX. The order book only shows limit orders, and most short term volume goes through market. In the order book, you see traders showing their positions to arbitrage, which cannot work without a well populated book, but that's about all.
3. What's the volume traded right now? What's the time between price ticks? I'd like to distinguish fast-moving markets with lots of volume from slow-movers with little volume.
4. Maybe one can put a numeric value to how price develops as I see it - price coming up slowly, going down sharply, maybe doing a wavelet transform on a N ticks window to find out if there's small or large moves in there and which direction is small and which direction is large. Sounds esoteric (sorry, I was a physicist in a previous life) and I absolutely have no reason to suggest wavelets other than I like them. ;-)


Roni likes them, too and was successful, so there might be sth. in them :-)

This is all just ideas - I have no clue how far a working NN system is away. At current progress rates we should be there by next Thursday. It is certainly a lot of work and requires a lot of wits and countless hours, maybe years. Maybe it'd be faster to learn how to trade manually, if one is not interested purely academically in getting a NN to profit.

Thats a good point :-), but NNs are so much more fun for the former Theoretical Chemist in me...

Unfortunately, I will need some time to work on this (have a day job), but will keep you updated...

Thanks again & kind regards
Ernst

DSA

unread,
Feb 25, 2016, 6:32:30 PM2/25/16
to Keras-users
Btw, this guy is doing interesting things with AI for long term (one month ahead) predictions.


develope...@gmail.com

unread,
Feb 25, 2016, 10:23:48 PM2/25/16
to Keras-users
Hi Stefan, Dmitry, Ernst,

It would be great idea if anyone of you  can share some complete sample code on UFCNN and the FX trading for the newbies to play around and comment on their finding . Collective learning helps the entire community. Usually not sharing is not helping the community but only the author.

Share a complete sample might have.

1. Ingests Real-time data from e.g IB, Oanda , MT4  and so on. We can convert to Log data before feeding them to UFCNN.
2. Does Continuous UFCNN training, classification & prediction within a certain sliding window timeframe 
3. Classify Trend Up or Down with probability and use a Kalman filter 
4. Provide Buy or Sell Signal

Such a sample will give the community a starting point using Keras with Theano or Tensorflow and a lot of people will share or contribute to the starter code with what they have found out .What works and what does not. Talking about how something is implemented and not sharing the sample code does nothing to help the community.

We are here all of us to learn from one another and make each other a better human being.

Stefan Steuerwald

unread,
Feb 26, 2016, 2:04:07 AM2/26/16
to Keras-users
Hi Ernst,
 

Have you seen this page:
http://ai.marketcheck.co.uk/Forex (click on "Start" and observe).
It is built from Andrej Karpathy's page:
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

No, didn't see this page. Didn't know that reinforcement learning existed. I am so clueless, I need to go and do my homework. So reinforcement learning makes up suitable target values on the spot, based on whether a certain action gives a reward or a penalty, and then you can train a NN step by step, action by action. So it sort of generates class labels or regression targets by itself. Interesting!

So the AIForex Deep Q Demo learns to "play" the data in the input field perfectly, do I understand correctly? Why does "percentage correct" seem to stay near 50%? Did you build this demo?


Thats a good point :-), but NNs are so much more fun for the former Theoretical Chemist in me...

True :-) Manual trading can be very very boring.

Thanks,
Stefan.

Ernst

unread,
Feb 26, 2016, 2:20:25 AM2/26/16
to Keras-users
Hi Stefan,


Have you seen this page:
http://ai.marketcheck.co.uk/Forex (click on "Start" and observe).
It is built from Andrej Karpathy's page:
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

No, didn't see this page. Didn't know that reinforcement learning existed. I am so clueless, I need to go and do my homework. So reinforcement learning makes up suitable target values on the spot, based on whether a certain action gives a reward or a penalty, and then you can train a NN step by step, action by action. So it sort of generates class labels or regression targets by itself. Interesting!

Exactly - you might want to have a look at

https://www.youtube.com/watch?v=V1eYniJ0Rnk

That is built using the same technique.

If you need more info, google "deepmind  reinforcement learning  atari", there are several papers out there, if you do not find them, drop me a line.

 
So the AIForex Deep Q Demo learns to "play" the data in the input field perfectly, do I understand correctly? Why does "percentage correct" seem to stay near 50%? Did you build this demo?


It is built by Peter Henry & Dan Bickle, see https://groups.google.com/forum/#!msg/convnetjs/QMbTOdhsnRg/wP9sEnpOBQAJ and https://github.com/AIForex/AIForex_NN.
It is not finished, so I think it runs Forex Data since 1993, and the box on the page is not used yet. I tried to contact Peter, but got no reply.
  
Thats a good point :-), but NNs are so much more fun for the former Theoretical Chemist in me...

 
True :-) Manual trading can be very very boring.


But if it pays the rent :-)

Ernst

unread,
Feb 26, 2016, 2:31:11 AM2/26/16
to Keras-users
Hi Developer,


On Friday, February 26, 2016 at 4:23:48 AM UTC+1, Developer wrote:
Hi Stefan, Dmitry, Ernst,

It would be great idea if anyone of you  can share some complete sample code on UFCNN and the FX trading for the newbies to play around and comment on their finding . Collective learning helps the entire community. Usually not sharing is not helping the community but only the author.

I agree that sharing is valuable, and if you look here there is a wealth of info in this thread. However, we do not know how to build a UFCNN in detail, so we can not share it :-). The only person I know that has one is the author himself.
 
 
Share a complete sample might have.

1. Ingests Real-time data from e.g IB, Oanda , MT4  and so on. We can convert to Log data before feeding them to UFCNN.
2. Does Continuous UFCNN training, classification & prediction within a certain sliding window timeframe 
3. Classify Trend Up or Down with probability and use a Kalman filter 
4. Provide Buy or Sell Signal


At least myself - I do not have a functioning NN that can predict anything in the FX/stock market and is better than flipping a coin. I am struggling to build something and will keep you updated, but I can not share anything working right now because I do not have it. As soon as I have something I can share, I will do so.

Such a sample will give the community a starting point using Keras with Theano or Tensorflow and a lot of people will share or contribute to the starter code with what they have found out .What works and what does not. Talking about how something is implemented and not sharing the sample code does nothing to help the community.

We are here all of us to learn from one another and make each other a better human being.

I agree wholeheartedly. I learned a lot from the examples for Keras that Francois and the community have built !

There are some links in my last messages of this thread  where I learned a lot.

Cheers,
Ernst

Dmitry Lukovkin

unread,
Feb 26, 2016, 4:42:12 AM2/26/16
to Keras-users
I think that this discussion is useful, because here we can discuss different approaches and common pitfalls, and elaborate on prospective directions for further experiments.
As I see the current bottom line is that all off-the-shelf approaches to predict/classify financial instruments time series using Keras hadn't provided satisfactory results so far. That's why this discussion lacks sample code.
Then we discuss what to try next.
UFCNN is a one of prospective methods, but it requires significant efforts.
If talking about group work on this implementation, we should find out who will participate and in what manner, what subset and source of data will we use (I doubt that we will obtain tick data for free) and what will we do with results, except for publishing sample code here?

Best regards,
Dmitry Lukovkin

Ernst

unread,
Feb 26, 2016, 4:53:38 AM2/26/16
to Keras-users
Dmitry,

thanks for your mail, that is a very good suggestion!

I would love to collaborate on the UFCNN implementation!

The first thing I could provide is tick data and 5 sec. bars for American instruments for the last 6 months. The tick data I have access to right now is throttled, however, so that it does not have more than 2-4 ticks per second.

This might be good for testing, and if you want, I could also get better data if we need it. The condition is, however, that the data will not be distributed over the internet, and is only available for a closed group of participants (sorry, this is in the data license I have). When we want to publish the result, I am sure we can find a solution so that we can include data also.

Now, we would have to agree which symbols to use (one or a few, I can not provide data for all instruments in US).

I am working on EUR/USD right now, SPY would be a suggestion, and Stefan suggested US Bond Futures (any recommendation?)

Thanks,
Ernst

Ernst

unread,
Feb 26, 2016, 5:08:19 AM2/26/16
to Keras-users
We could also use the dataset that Roni Mittelman used for testing. It is tick data and is from a quant competition:

http://www.circulumvite.com/home/trading-competition

It contains all we need (trading simulator, description, ...) so we could be set up quick and we could compare our results to Roni`s paper 1:1.

Further, Roni used some mysterious Monte Carlo technique to calculate the maximum profit possible for the data set, and used this to train the net. I assume this is part of his success, so I guess we need it.





Dmitry Lukovkin

unread,
Feb 26, 2016, 5:41:47 AM2/26/16
to Keras-users
Ernst,

Yes, I think that for the start we could use this competition dataset and augment it with our data when we will have a more or less working model.
Regarding Viterbi algorithm - when I started to play around Roni's paper, I thought that optimal decisions could be obtained using simpler approach (especially taking into account lack of the details of the implementation in the paper) and drafted it (but for the another set of input data, so it should be adapted). Maybe Viterbi is more efficient or something.

Stefan Steuerwald

unread,
Feb 26, 2016, 9:11:19 AM2/26/16
to Keras-users
Hi Ernst,



Now, we would have to agree which symbols to use (one or a few, I can not provide data for all instruments in US).
I am working on EUR/USD right now, SPY would be a suggestion, and Stefan suggested US Bond Futures (any recommendation?)


Sounds like available data determines which symbols to look at.

I asked a Bund futures trader about the US treasuries, he said ZN (10 Year T-Notes) are the most liquid. A ZN trader would primarily also watch ZF and ZT (5 year and 2 year). I guess major exchange rates (EUR, USD, JPY) and the S&P500 would also be of interest as ZN-influencing markets. These days everyone is watching WTI, too.

So maybe this list:
ZN, ZF, ZT
S&P500
EUR/USD, USD/JPY (EUR/JPY to make the triangle complete?)
WTI
FGBL (?)
would form a kind of ecosystem, along with the corresponding economic data that's being published regularly.

Have a great weekend,
Stefan.

mina.n...@gmail.com

unread,
Feb 26, 2016, 11:34:42 AM2/26/16
to Keras-users
Hi Dmity,

I read your example on github (https://github.com/fchollet/keras/blob/master/examples/stateful_lstm.py) and it helped me a lot. Thank you. I have some questions related to your example! I have a sequential input, and I used your code exactly to train the network with my data. The point is that I want to predict multiple steps ahead not by increasing lahead  parameter in your example. Indeed, I want to do it in prediction step (I want to change predicting section only), not training step. I want to feed one input (x1) and predict the output value (y1), and then feed y1 as the next input (x2) to predict y2. I want to repeat this loop for some length (It depends to how many steps ahead of X1 I want to predict). I think this task is like sequence generation. I like to know your idea about this approach? If this approach is Ok then would you please guide me how to modify predicting section?

Best,
Mina

Dmitry Lukovkin

unread,
Feb 26, 2016, 2:09:15 PM2/26/16
to Keras-users, mina.n...@gmail.com
Hi Mina,

Actually example is not mine, I've just used it for the experiments.
Also, pay attention to the issue regarding this example - https://github.com/fchollet/keras/issues/1820, it is correct re: shuffle=True setting, but propositions on re-shuffling on the input sequence seem strange.

Regarding your question, it is exactly the multi-step approach which I use (one of).
If you want to use it on the prediction stage only (not on the training stage), I could propose the following - https://gist.github.com/lukovkin/0563d42224a6529fea38.
Please let me know if you will have comments or will find errors.
I've tried it on trained model for stock time series, results are not very good - not the complete trash, but not ready to use.
Also, I think that if take this approach to multi-step, some kind of error correction should be taken into account, see 'Improving Multi-step Prediction of Learned Time Series Models' and 'Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks'. May be I will implement them later.

Best regards,
Dmitry Lukovkin
Message has been deleted

DSA

unread,
Mar 3, 2016, 2:20:27 AM3/3/16
to Keras-users, mina.n...@gmail.com
Hi Dmitry, all,

While you are focusing on UFCNN, I am further experimenting with this LSTM example and have few questions:
1) Can the predicted sequence be not one of the input sequences? For example using sequences A and B to predict next value in sequence C, even if sequence C is not an input (aside that it's data used for training). I.e. in Dmitry's example input_list will be populated with sequences A and B and the target list will be populated with the corresponding sequence C values. A real world example would be predicting say GDP based on a set of leading economic indicator sequences.

2) Does the sequence data need to be normalized to any range (e.g. -1..1) or it doesn't matter?

3) Lastly why the following fchollet's example is adding two separate LSTM layers (one is return_sequences=True and the other return_sequences=False) and your example has only one LSTM layer with neither return_sequences nor stateful parameters specified?

model.add(LSTM(50,
batch_input_shape=(batch_size, tsteps, 1),
return_sequences=True,
stateful=True))
model.add(LSTM(50,
batch_input_shape=(batch_size, tsteps, 1),
return_sequences=False,
stateful=True))

As I mentioned few posts back I've got some meaningful results with a single sequence and now trying with multiple sequences. I'll update the thread with any results.

mina.n...@gmail.com

unread,
Mar 3, 2016, 10:41:30 AM3/3/16
to Keras-users, mina.n...@gmail.com

Hi  Dmitry and everybody


I have a conceptual question about LSTM and sequences.  Consider we have only one input sequence (a sequence of numbers with some patterns (not random) ) and we want to predict the future according to past. The first architecture that comes into my mind is as follows: (borrowed from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)


For training we can have two approaches:

·         First approach:

n = length of the input sequence

X_train = input sequence (length = n*1)

Y_train = shifted input sequence (only one) (length = n*1)

So, for each value in X_train we have one corresponding value in Y_train (next value in the sequence). Then for training I fedd values in the X_train one by one (batch_size = 1) and update weights by using an optimization method. Here I assume the whole input as a continuous sequence. And the memory of LSTM should be able to extract long and short time dependencies in input sequence. After training the network we have a RNN that can predict one step ahead for each input. For predicting multiple steps ahead (for example 3 steps ahead) we should do as follows: feed x1 to have y1, then feed y1 to have y2, then feed y2 to have y3.


·         Second approach:

The proposed approach in:

https://github.com/fchollet/keras/blob/master/examples/stateful_lstm.py

Here the approach is making some subsequences. Here we want to predict the future of the sequence based on the history. I considered the length of the history length_history and we want to predict future for the length length_future. I transformed the data to following format:
X_train is  n x length_history  numpy array and y_train is n x length_future numpy array (for details check the above link) . Then we can train model for predicting multiple steps ahead in future.

 

I implemented both of these approaches in Keras (LSTM with stateful mode)

 

My questions:

 

1.     1.   Which approach is correct? I myself think that the first approach is better and more logical. The reason is that the whole input is a sequence and making subsequences might be wrong!

2.     2.  Is the second approach sequence to sequence training?

3.     3.   How should I manage model internal states. I mean in prediction step when I should reset the state for both approaches?

4.     4.  I fed my input to a NN without any hidden layer (just a dense layer) and for both approaches this neural network (without any hidden layer) worked better than RNN with LSTM hidden layers! How is it possible?


Regards,

Mina


R

DSA

unread,
Mar 4, 2016, 2:58:29 PM3/4/16
to Keras-users, mina.n...@gmail.com
I've done a little more experimentation with multiple time series forecasting (predicting n steps into the future based on the past data, but not on the step by step prediction). I've used cosine as the base time series, and the second time series that would scale the cosine with the factor that randomly changes after certain number of steps. This would ensure that NN doesn't simply memorize training data and extend them into the future. Furthermore, I've set it up in a way that after the scale signal time series changes, the cosine time series should adjust its scale factor with 25 steps delay. This way we can validate if LSTM can understand if the scale signal time series is a leading indicator for the cosine scale and correctly forecast cosine scale into future. At the time of forecasting we know the new cosine scale, but it isn't reflected in the cosine time series yet.

Turns out LSTM does get it right most of the time, as illustrated by a couple of examples below. It seems to perform considerably better than SimpleRNN with the same parameters, and also appear to perform better than GRU with the same parameters. This is with a disclosure that I tried to tune a bit LSTM training and hyper parameters, but stayed with the same parameters for SimpleRNN and GRU. The code is here




p.nec...@gmail.com

unread,
Mar 7, 2016, 9:00:10 AM3/7/16
to Keras-users, mina.n...@gmail.com
Same parameters or same number of parameters?
For the same amount of hidden units a LSTM has 6x more params than a SimpleRNN.

DSA

unread,
Mar 7, 2016, 2:11:02 PM3/7/16
to Keras-users, mina.n...@gmail.com
Same code. I've just replaced the following line
model.add(LSTM(hidden, input_shape=(examples, features)))
with
model.add(SimpleRNN(hidden, input_shape=(examples, features)))
and also with
model.add(GRU(hidden, input_shape=(examples, features)))

Btw... reading "Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks" where it talks about encoding time series for NNs, I begin to doubt if the encoding of the input in this example is done optimally/properly (I am less concerned about optimally, but want to make sure that it doesn't compromise the ability to predict). The book says that the sliding window encoding which is obviously used in the example discussed in this thread should be used for non-recurrent feed forward networks (basically this is a way to make feed forward networks being able to handle time series) and that for recurrent networks such encoding is not necessary (i.e. just providing the entire time series as an input should be sufficient as opposed to providing an array of slices with expected output for each slice).

Any opinion on that? I'll try to modify the example correspondingly and see if I can make it work. 

Ernst

unread,
Mar 8, 2016, 3:59:56 AM3/8/16
to Keras-users, mina.n...@gmail.com
DSA,

in the LSTM you can set "stateful = True", than you do not need the sliding window, otherwise you do (default = False)

Ernst

DSA

unread,
Mar 8, 2016, 9:08:36 PM3/8/16
to Keras-users, mina.n...@gmail.com
Thank you! Is my assumption correct that recurrent networks supposed to be stateful by definition (i.e. that's the whole point of them)? When one may want to use stateful=False?

If I want to predict next 10 values in a numerical time series based on the past history, should I use stateful True or False?

If we use the sliding window approach (and stateful=false) do we still get benefits of recurrent networks comparing to Dense?

If we use stateful=True and want to predict next 10 values, how do we do that? Don't we get just single next value in such case?

Thanks again!
Sergei

garyga...@gmail.com

unread,
Mar 9, 2016, 3:38:49 PM3/9/16
to Keras-users
Gary,

Hi guys, interesting thread here.

I'm currently working on building a Reinforcement learning agent to trade using the Oanda FX Rest API.

Specifically, there're two general approach to it. One, as you have seen with Andrej's Deep Q learning, acts in discrete action spaces, meaning you choice of action has to be discrete. For example, you may have a deep Q leaner with 5 action outputs, one for buy one share, one for buy 2 shares, one for hold, one for sell 1 share, and lastly for sell 2 shares.

Another general method is over continuous action space, called Actor-critic. In this case you would specify your output action decision by a Tanh activation, this will give you over a range of possible continuous outputs from negative to positive.

Google deepmind uses these algorithms extensively for their Atari game playing, as well as continuous control problems such as robotic joint actuation, and 3d racing game (continuous action).

The deep component usually invovled a convolutional NN, but I've seen papers implementing LSTM as well, it should be very straight forward to swap out. There are some additional tricks involved like experience replay and target network, but these implementations in keras is not difficult.

I am in class at the moment but I will post more materials later, as well potential source code.

If you want to learn about reinforcement learning, you should look up David Silver's UCL lectures on YouTube, he is a superstar at Deepmind.

Ernst

unread,
Mar 10, 2016, 11:45:04 AM3/10/16
to Keras-users, garyga...@gmail.com
Gary,
thanks for your interesting message!  The David Silver videos are great!!


On Wednesday, March 9, 2016 at 9:38:49 PM UTC+1, garyga...@gmail.com wrote:
I'm currently working on building a Reinforcement learning agent to trade using the Oanda FX Rest API.

Specifically, there're two general approach to it.  One,  as you have seen with Andrej's Deep Q learning,  acts in discrete action spaces,  meaning you choice of action has to be discrete. For example, you may have a deep Q leaner with 5 action outputs, one for buy one share, one for buy 2 shares,  one for hold,  one for sell 1 share,  and lastly for sell 2 shares.

 

Another general method is over continuous action space,  called Actor-critic. In this case you would specify your output action decision by a Tanh activation, this will give you over a range of possible continuous outputs from negative to positive.

Google deepmind uses these algorithms extensively for their Atari game playing, as well as continuous control problems such as robotic joint actuation,  and 3d racing game (continuous action).


Which type will you implement? Type 1 should be simpler. Will you need to modify the loss function?
 

The deep component usually invovled a convolutional NN, but I've seen papers implementing LSTM as well,  it should be very straight forward to swap out. There are some additional tricks involved like experience replay and target network,  but these implementations in keras is not difficult.


I think you just need a net that can predict, so this is the important part.

Cheers,
Ernst

mcal...@tcd.ie

unread,
Mar 22, 2016, 8:24:35 AM3/22/16
to Keras-users, YD...@slb.com
Hi Yue,

I'd like to query your choice of an LSTM arcitecture. You said you were using the timeseries of two stock prices "to predict stock price for the next day". Why is a simple/deep NN not applicable for this problem? From my study into RNNs and LSTMs they're application is really only useful when one is trying to predict multiply time steps into the future, or as wikipedia puts it "Unlike traditional RNNs, an LSTM network is well-suited to learn from experience to classifyprocess and predict time series when there are very long time lags of unknown size between important events.

I'm in the process of conducting similar research that involves multiple time series but I only want to predict one time-step into the future. In this case I don't know why LSTM is more suited.

John

On Wednesday, July 15, 2015 at 3:47:46 PM UTC+1, Yue Duan wrote:
Hi all,

I'm new to NN and recently discovered Keras and I'm trying to implement LSTM to take in multiple time series for future value prediction. For example, I have historical data of 1)daily price of a stock and 2) daily crude oil price price, I'd like to use these two time series to predict stock price for the next day. Based on my understanding I tried a minimal model:

# Set dims
n_samples = 171
n_timesteps = 8
n_feat = 2
n_classes = 128

# Model
model = Sequential()
model.add(LSTM(n_feat, n_classes, activation='linear', inner_activation='linear', return_sequences=True))
model.add(Dropout(0.5))
model.add(TimeDistributedDense(n_classes, 1), activation='linear')
model.add(Activation('linear'))

model.compile(loss='mean_squared_error', optimizer='rmsprop')

# Train
model.fit(input_matrix, target, nb_epoch=10)
score = model.evaluate(X_test, Y_test, batch_size=16)

The input_matrix here is 3D matrix with dim (171, 8, 2) - 171 samples, 8 timesteps, 2 features, I broke the long time series into shorter subseries of 8 time steps. And the target 3D matrix with dim (171, 1, 1).
I ran it and it returned:

Epoch 0 171/171 [==============================] - 0s - loss: nan
 
I'm not sure if this model is the correct one to use for my problem, or did I build it correctly, I'd welcome any feedback and advice from you. 

Also, I'm confused about how to process the input. My input are two daily prices over past 20 years, should I use them as a long sequence (then there's only one sample I think...), or break into smaller pieces (# of pieces = # of samples)? Should I normalize the prices and maybe some other pre-processing? 

Thanks in advance for your help :)

Best,
Yue

刘渐江

unread,
Mar 23, 2016, 3:01:52 AM3/23/16
to Keras-users, YD...@slb.com
hello,can you tell how to load stock multi data to keras? i have data ,but i can't figure out how to load a data with many columns. Thanks!

在 2015年7月21日星期二 UTC+8下午10:14:09,Yue Duan写道:
I figured out a way to solve this problem so I think it might be helpful to post the solution here.

It turned out the activation and inner_activation functions I used for LSTM layer were wrong, thus the loss could not be calculated properly. I replaced them with sigmoid, tanh, and relu, all of them worked and gave losses that decreased with each epoch. Also, I replaced TimeDistributedDense layer with a simple Dense layer, so return_sequences=False for the LSTM layer.

As to the input, I wrote a function that transforms input and target long time series into small pieces of 3D array (nb_samples, time_steps, nb_features) to feed into the model. In my example, I used rolling windows of same length and corresponding targets to train the model. I tried training the model with both normalized and non-normalized data, the normalized data generally gave better results. 

I attached part of my code at the end, please let me know if you have any comments or suggestions. Thanks!


# merge data frames
merged = df1.merge(df2, left_index=True, right_index=True, how='inner').dropna()

# data prep
# use 100 days of historical data to predict 10 days in the future
data = merged.values
examples = 100
y_examples = 10
nb_samples = len(data) - examples - y_examples

# input - 2 features
input_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat = np.concatenate(input_list, axis=0)

# target - the first column in merged dataframe
target_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat = np.concatenate(target_list, axis=0)

# set up model
trials = input_mat.shape[0]
features = input_mat.shape[2]
hidden = 64
model = Sequential()
model.add(LSTM(features, hidden))
model.add(Dropout(.2))
model.add(Dense(hidden, y_examples))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer='rmsprop')

# Train
model.fit(input_mat, target_mat, nb_epoch=50)

lzh...@hbisolutions.com

unread,
May 4, 2016, 4:47:45 AM5/4/16
to Keras-users, YD...@slb.com
Hi Yue, 
     What type is input_mat? Is it a numpy.ndarray? Is its dimension still (171, 8, 2)? Thanks.

Minghao Gai

unread,
Jun 7, 2016, 6:03:04 PM6/7/16
to Keras-users, YD...@slb.com
Hi all,

I have a similar situation, but different problem.

For my data, I have 6 time series data, say, 6 stocks. I want to use the previous 20 steps to predict the next value for all the 6 stocks. It means that my input shape is [4700, 20, 6], and my target shape is [4700, 1,6]. 

I wrote my code as :
# set up model
model = Sequential()  
layers = [6, 300, 300, 6]
model.add(LSTM(
            input_dim=layers[0],
            output_dim=layers[1],
            return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(
            layers[2],
            return_sequences=True))
model.add(TimeDistributedDense(1))
model.add(Activation("linear"))

## compile model
start = time.time()
model.compile(loss="mape", optimizer="rmsprop")
print "Compilation Time : ", time.time() - start


I got the error message as 
ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 20)
Apply node that caused the error: Elemwise{sub,no_inplace}(activation_14_target, Reshape{3}.0)
Toposort index: 491
Inputs types: [TensorType(float32, 3D), TensorType(float32, (False, False, True))]
Inputs shapes: [(300L, 1L, 6L), (300L, 20L, 1L)]
Inputs strides: [(24L, 24L, 4L), (80L, 4L, 4L)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{Abs((i0 / i1))}}(Elemwise{sub,no_inplace}.0, Elemwise{Composite{clip(Abs(i0), i1, i2)}}.0), Elemwise{Composite{((i0 * i1 * i2 * Abs(i3) * sgn(i4)) / (i5 * i6 * i7 * i8 * i3 * i3))}}(TensorConstant{(1L, 1L, 1.. of -100.0}, InplaceDimShuffle{x,x,x}.0, InplaceDimShuffle{0,x,x}.0, Elemwise{Composite{clip(Abs(i0), i1, i2)}}.0, Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x,x,x}.0, InplaceDimShuffle{x,x,x}.0, InplaceDimShuffle{x,x,x}.0, InplaceDimShuffle{x,x,x}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "C:\Users\mingh_000\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2723, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "C:\Users\mingh_000\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2825, in run_ast_nodes
    if self.run_code(code, result):
  File "C:\Users\mingh_000\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-50-576c7070b65b>", line 17, in <module>
    model.compile(loss="mape", optimizer="rmsprop")
  File "C:\Users\mingh_000\Anaconda2\lib\site-packages\keras\models.py", line 339, in compile
    **kwargs)
  File "C:\Users\mingh_000\Anaconda2\lib\site-packages\keras\engine\training.py", line 588, in compile
    sample_weight, mask)
  File "C:\Users\mingh_000\Anaconda2\lib\site-packages\keras\engine\training.py", line 311, in weighted
    score_array = fn(y_true, y_pred)
  File "C:\Users\mingh_000\Anaconda2\lib\site-packages\keras\objectives.py", line 15, in mean_absolute_percentage_error
    diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true), K.epsilon(), np.inf))

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.


Can anyone help me with this problem? Thanks very much!


Best,
Maggie

DSA

unread,
Jun 7, 2016, 7:47:57 PM6/7/16
to Keras-users, YD...@slb.com
If you are predicting 6 features, I think you should have
model.add(TimeDistributedDense(6))
instead of
model.add(TimeDistributedDense(1))

mani...@googlemail.com

unread,
Jun 9, 2016, 3:28:36 PM6/9/16
to Keras-users, mina.n...@gmail.com

Hey Mina,

I have the exact same question as you in your last post.
Did you get some answers for your question or experience to get the right
neural network form to solve your task by your own?

regards

Manuel

alvin...@gmail.com

unread,
Sep 8, 2016, 10:00:04 PM9/8/16
to Keras-users, mina.n...@gmail.com
Hi Dmitry,

I tried the code you suggested for multi step prediction. It fails at the append part saying that the size doesn't match. May I know what needs to be done to correct this error?

Thanks,
Alvin

alvin...@gmail.com

unread,
Sep 8, 2016, 11:58:56 PM9/8/16
to Keras-users, mina.n...@gmail.com
Hi Mina and Dmitry,

Below is my code where I try to predict multiple steps ahead. Is this implementation correct?

from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Activation, Dense, LSTM, Dropout, TimeDistributedDense, RepeatVector, TimeDistributed

# since we are using stateful rnn tsteps can be set to 1
tsteps =1
batch_size = 25
epochs = 50 
# number of elements ahead that are used to make the prediction
lahead = 1

# def gen_cosine_amp(amp=100, period=1000, x0=0, xn=50000, step=1,k=0.0001):
def gen_cosine_amp(amp=100, period=1000, x0=0, xn=5000, step=1,k=0.0001):
""" Generates an absolute cosine time series with an amplitude exponentially decresasing

"""
cos = np.zeros(((xn-x0) * step, 1,1))
for i in range(len(cos)):
idx = x0 + i * step
cos[i,0,0] = amp * np.cos(2*np.pi * idx/period)
cos[i,0,0] = cos[i, 0,0] * np.exp(-k * idx)
return cos

print ("Generating Data")
cos = gen_cosine_amp()
print("Input shape: ", cos.shape)

expected_output = np.zeros((len(cos),1))
for i in range(len(cos) - lahead):
expected_output[i, 0] = np.mean(cos[i+1:i+lahead + 1])

print("Output shape")
print (expected_output.shape)

print ("Creating Model")
model = Sequential()
model.add(LSTM(50, batch_input_shape=(batch_size, tsteps, 1), return_sequences=True, stateful = True))
model.add(Dropout(0.2))
model.add(LSTM(50, batch_input_shape=(batch_size, tsteps, 1), return_sequences=True, stateful = True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='rmsprop')


expected_output_t = np.reshape(expected_output, (expected_output.shape[0], 1,1))

print('Training')

train_size = int(len(cos)*0.67)

x_train = cos[0:train_size,]
y_train = expected_output_t[0:train_size,]

numIteration = len(cos)/batch_size

for i in range(epochs):
print('Epoch', i, '/', epochs)
model.fit(x_train, y_train, batch_size=batch_size, verbose=1, nb_epoch=1, shuffle=False)
model.reset_states()


print ('Predicting')
x_test = cos[train_size:,]

print("x_test.shape: ", x_test.shape)


predict_train_output_t = model.predict(x_train, batch_size=batch_size)
predict_test_output_t = model.predict(x_test, batch_size=batch_size)

# predict next 10 steps
for i in range (10):
print("i : " , i)
prediction_output = model.predict(predict_test_output_t, batch_size=batch_size)
print("prediction_output.shape: ",prediction_output.shape)
predict_test_output_t = prediction_output 


Thanks in advance


On Saturday, 27 February 2016 03:09:15 UTC+8, Dmitry Lukovkin wrote:

pvm...@gmail.com

unread,
Sep 9, 2016, 6:47:46 AM9/9/16
to Keras-users, mina.n...@gmail.com
Hello Mina,

I am actually working on a similar problem you are working on (i.e. Multiple length sequence input, predicting multiple step ahead).

I am quite new to Keras, but this is the way I am trying to solve it.
  • Using stateful mode of LSTM.
  • Consider batch_size =1, and time_sequence=1. This means that
    • For each input sequence with length n
    • You feed a feature vector to the network manually one by one.
    • After n loops, you reset the network, and move on to train the next input sequence.
    • Once you finish with all input sequences, you continue with the next epoch.
By training this way, it will become easy when you want to predicting multiple steps ahead (for example 3 steps ahead) as you mentioned above.

This blog post really help me understanding stateful lstm. 
http://philipperemy.github.io/keras-stateful-lstm/

Hope this help.

Milk
Message has been deleted

Vinayakumar R

unread,
Sep 9, 2016, 10:22:05 AM9/9/16
to Keras-users, YD...@slb.com
Hi i have 100 features with 1 class label and my data size is 5000 and 4000 are belongs to one class other one thousand belongs to class 2, class 3 class 4. I have tried LSTM but it is able to give only 18% and it is predicting all class labels as 0. My code is given below. Could you please tell an approach or anybody having code for doing this type of anomaly detection.

# reshape input to be [samples, time steps, features]
X_train = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))  // shape will be (5000,1,100)
X_test = np.reshape(testT, (testT.shape[0], 1, testT.shape[1])) // shape will be (1000,1,100)

batch_size=32

model = Sequential()
model.add(LSTM(256, input_dim=100, return_sequences=True))
#model.add(Dropout(0.8))
model.add(LSTM(256, input_dim=244, return_sequences=False))
#model.add(Dropout(0.8))
model.add(Dense(4))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
model.fit(X_train, y_train, batch_size, nb_epoch=10, validation_data=(X_test, y_test))

loss, accuracy = model.evaluate(X_test, y_test)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))

DSA

unread,
Sep 9, 2016, 1:35:53 PM9/9/16
to Keras-users, YD...@slb.com
I am not an expert, but spent some time with LSTMs. Few thoughts/ideas.
* 100 features seems like a relatively large number of features for a 5000 item data set. Try to get your classification working first with just one or just a couple of features, to ensure that conceptually everything in your model is correct and it is able to produce some meaningful predictions. Then add additional features to further refine the model.
* Also try to get it working with just one LSTM layer first (btw, why input_dim is different between the LSTM layers)?
* Your number of epochs is 10, which seems to be too low, try more like hundreds (adjusting learning rate if necessary).
* From the history object produce acc and val_acc plots, to ensure that it looks healthy and does not indicate overfitting (this article describe what a good training plot should look like http://cs231n.github.io/neural-networks-3/#accuracy

TheShabbyblue

unread,
Sep 23, 2016, 4:05:24 PM9/23/16
to Keras-users, YD...@slb.com
I also trying to predict the stock market and I am wondering how big data to use to train my network.I am trading on 1 min and for now I use history data for the last day( 1440 mins ).What you will recommend?

wada...@gmail.com

unread,
Dec 31, 2016, 11:10:06 PM12/31/16
to Keras-users, garyga...@gmail.com
Hi Gary,

I just joined this thread. Did you end up making progression on this trading agent?

Thanks,
Wayne

sudars...@gmail.com

unread,
Jan 25, 2017, 9:43:21 AM1/25/17
to Keras-users, YD...@slb.com
Hi,
I have three columns in my data set, the columns are "month", "price", "volume". Data is available from 2011 Jan to 2016 Dec. Now I need to predict the prices from 2017 Jan to 2020 Dec. I have predicted the Volumes for 2017 to 2020 by using Stacked LSTM method. I wanted to predict the prices using the historical price and volumes. I understand that with your method this is possible. I would be really grateful if you could tell me how I should edit your following lines of code to give my input and target variable

# input - 2 features
input_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat = np.concatenate(input_list, axis=0)

# target - the first column in merged dataframe
target_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat = np.concatenate(target_list, axis=0)

Thanks in advance :-)

vinayak...@gmail.com

unread,
Mar 4, 2017, 6:58:16 AM3/4/17
to Keras-users, YD...@slb.com
i have only one column which has a list of prices on time minutes. I want to predict the 10 minute values using the one hour past values. Has any body have code. If so could you please tell how to do this in keras

vinayak...@gmail.com

unread,
Mar 4, 2017, 7:27:38 AM3/4/17
to Keras-users, YD...@slb.com
Sample code for 1 minute price prediciton using LSTM with using only price values in minute wise

dujing...@gmail.com

unread,
May 22, 2017, 11:37:42 AM5/22/17
to Keras-users, YD...@slb.com
Hi,

Thanks for the solution! 
However, I got a question about endogeneity of the data, which means that if we don't consider some other relevant variables like interest rate, will the prediction be inaccurate? That's actually a stat problem. Just wondering if this may influence the result of lstm model, too. 

Thanks!


在 2015年7月15日星期三 UTC-4上午10:47:46,Yue Duan写道:

ukesh....@gmail.com

unread,
Jun 16, 2017, 9:30:13 AM6/16/17
to Keras-users, YD...@slb.com

I am new to deep learning and LSTM. I have a very simple question. I have taken a sample of demands for 50 time steps and I am trying to forecast the demand value for the next 10 time steps (up to 60 time steps) using the same 50 samples to train the model.

But unfortunately, the closest I came is splitting the sample demands into 67 training % and 33 testing % and my forecast is only forecasting for the 33% (35 - 50 time steps), but it never goes beyond 50 time steps as shown in the picture below. Can anybody help me with this issue?



I have attached my code below.


Thank you in advance.



import pandas
import matplotlib.pyplot as plt
dataset = pandas.read_csv('Dmd2ahr.csv')
plt.plot(dataset)
plt.show()
# LSTM for international airline passengers problem with window regression framing
import numpy
import matplotlib.pyplot as plt
from pandas import read_csv
import math
from keras.models import Sequential
from keras.layers import Dropout
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back), 0]
dataX.append(a)
dataY.append(dataset[i + look_back, 0])
return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
numpy.random.seed(7)
# load the dataset
dataframe = read_csv('Dmd2ahr.csv')
dataset = dataframe.values
dataset = dataset.astype('float32')
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.7)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
print(len(train), len(test))
# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)

# reshape input to be [samples, time steps, features]
#trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
#testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], 1))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(32, input_shape=(look_back, 1)))
model.add(Dropout(0.3))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics = ['accuracy'])
model.fit(trainX, trainY, epochs=300, batch_size=1, verbose=2)
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform([trainY])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([testY])
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict
# plot baseline and predictions
plt.plot(scaler.inverse_transform(dataset))
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()
print (trainPredict)
print (testPredict)



On Tuesday, July 21, 2015 at 10:14:09 AM UTC-4, Yue Duan wrote:
I figured out a way to solve this problem so I think it might be helpful to post the solution here.

It turned out the activation and inner_activation functions I used for LSTM layer were wrong, thus the loss could not be calculated properly. I replaced them with sigmoid, tanh, and relu, all of them worked and gave losses that decreased with each epoch. Also, I replaced TimeDistributedDense layer with a simple Dense layer, so return_sequences=False for the LSTM layer.

As to the input, I wrote a function that transforms input and target long time series into small pieces of 3D array (nb_samples, time_steps, nb_features) to feed into the model. In my example, I used rolling windows of same length and corresponding targets to train the model. I tried training the model with both normalized and non-normalized data, the normalized data generally gave better results. 

I attached part of my code at the end, please let me know if you have any comments or suggestions. Thanks!


# merge data frames
merged = df1.merge(df2, left_index=True, right_index=True, how='inner').dropna()

# data prep
# use 100 days of historical data to predict 10 days in the future
data = merged.values
examples = 100
y_examples = 10
nb_samples = len(data) - examples - y_examples

# input - 2 features
input_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat = np.concatenate(input_list, axis=0)

# target - the first column in merged dataframe
target_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat = np.concatenate(target_list, axis=0)

# set up model
trials = input_mat.shape[0]
features = input_mat.shape[2]
hidden = 64
model = Sequential()
model.add(LSTM(features, hidden))
model.add(Dropout(.2))
model.add(Dense(hidden, y_examples))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer='rmsprop')

# Train
model.fit(input_mat, target_mat, nb_epoch=50)
Auto Generated Inline Image 1

Denis Sheremetov

unread,
Sep 17, 2017, 5:04:54 AM9/17/17
to Keras-users
Hi, very interesting discussion. Maybe I missed something here but will be happy if someone can explain me how to join different inputs like price/volume/different pairs an once in training model. Thanks in advance.

boufelj...@gmail.com

unread,
Feb 11, 2018, 12:26:55 PM2/11/18
to Keras-users
Dear Francesco,
Could you please indicate what I should do in order to make predictions for all the time series that are fed to the network?

Regards.

On Tuesday, December 1, 2015 at 10:54:45 AM UTC+1, francesc...@gmail.com wrote:
Hi, I am not really sure what TimeDistributedDense does, I just used a normal Dense layer with linear activation. The OP outputted 10 neurons to predict the next 10 steps of the *first* stock. I assume he thinks the second is related but does not want to predict that one.

I made a toy example with a sine wave. To prove the point above, I put a related second sequence, which is cosine, but I predict only sine. Hope it makes you understand :) ... I predict 500 data points in the future...

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, TimeDistributedDense
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM
from keras.preprocessing.sequence import pad_sequences
from keras.utils.layer_utils import print_layer_shapes


#sine and cos wave
import numpy as np


X
= np.linspace(0,1000,10000)
Y
= np.asarray([np.sin(X),np.cos(X)]).T


# data prep
# use 500 data points of historical data to predict 500 data points in the future
data
= Y
examples
= 500
y_examples
= 500

nb_samples
= len(data) - examples - y_examples


# input - 2 features
input_list
= [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat
= np.concatenate(input_list, axis=0)


# target - the first column in merged dataframe
target_list
= [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat
= np.concatenate(target_list, axis=0)


# set up model
trials
= input_mat.shape[0]
features
= input_mat.shape[2]
print trials
print features
hidden
= 64
model
= Sequential()
model
.add(LSTM(input_dim=features, output_dim=hidden))
model
.add(Dropout(.2))
model
.add(Dense(input_dim=hidden, output_dim=y_examples))
model
.add(Activation('linear'))
model
.compile(loss='mse', optimizer='rmsprop')


# Train
model
.fit(input_mat, target_mat, nb_epoch=2)
print_layer_shapes
(model, input_shapes =(input_mat.shape))

Regards,

Francesco


On Tuesday, 24 November 2015 22:38:32 UTC+1, Tarnac wrote:
I'm confused about this example. If one is predicting 10 timesteps into the future, I thought one would be using TimeDistributedDense since multiple timesteps will be predicted. The way OP programmed it, the model is outputting with 10 neurons. My limited understanding would be this case would be covered by  two neuron output (one for each signal/stock/feature). And each of these outputs will have 10 timesteps... 

On Friday, November 20, 2015 at 8:26:09 AM UTC-8, francesc...@gmail.com wrote:
This worked for me, although I had to specify 

model.add(LSTM(input_dim=features, output_dim=hidden))
model.add(Dropout(.2))
model.add(Dense(input_dim=hidden, output_dim=y_examples))

Do you know why the TimeDistributedDense (plus returning the sequences in the LSTM layer) is not appropriate?

Cheersm
Francesco

On Tuesday, 21 July 2015 16:14:09 UTC+2, Yue Duan wrote:
I figured out a way to solve this problem so I think it might be helpful to post the solution here.

It turned out the activation and inner_activation functions I used for LSTM layer were wrong, thus the loss could not be calculated properly. I replaced them with sigmoid, tanh, and relu, all of them worked and gave losses that decreased with each epoch. Also, I replaced TimeDistributedDense layer with a simple Dense layer, so return_sequences=False for the LSTM layer.

As to the input, I wrote a function that transforms input and target long time series into small pieces of 3D array (nb_samples, time_steps, nb_features) to feed into the model. In my example, I used rolling windows of same length and corresponding targets to train the model. I tried training the model with both normalized and non-normalized data, the normalized data generally gave better results. 

I attached part of my code at the end, please let me know if you have any comments or suggestions. Thanks!


# merge data frames
merged = df1.merge(df2, left_index=True, right_index=True, how='inner').dropna()

# data prep
# use 100 days of historical data to predict 10 days in the future
data = merged.values
examples = 100
y_examples = 10
nb_samples = len(data) - examples - y_examples

# input - 2 features
input_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat = np.concatenate(input_list, axis=0)

# target - the first column in merged dataframe
target_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat = np.concatenate(target_list, axis=0)

# set up model
trials = input_mat.shape[0]
features = input_mat.shape[2]
hidden = 64
model = Sequential()
model.add(LSTM(features, hidden))
model.add(Dropout(.2))
model.add(Dense(hidden, y_examples))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer='rmsprop')

# Train
model.fit(input_mat, target_mat, nb_epoch=50)

18...@columbus.edu.co

unread,
Nov 29, 2018, 9:34:02 AM11/29/18
to Keras-users
Hello. Did you ever come up with a solution?
Reply all
Reply to author
Forward
0 new messages