# Set dimsn_samples = 171n_timesteps = 8n_feat = 2n_classes = 128# Modelmodel = Sequential()model.add(LSTM(n_feat, n_classes, activation='linear', inner_activation='linear', return_sequences=True))model.add(Dropout(0.5))model.add(TimeDistributedDense(n_classes, 1), activation='linear')model.add(Activation('linear'))model.compile(loss='mean_squared_error', optimizer='rmsprop')# Trainscore = model.evaluate(X_test, Y_test, batch_size=16)model.fit(input_matrix, target, nb_epoch=10)
Epoch 0 171/171 [==============================] - 0s - loss: nan
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, TimeDistributedDense
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM
from keras.preprocessing.sequence import pad_sequences
from keras.utils.layer_utils import print_layer_shapes
#sine and cos wave
import numpy as np
X = np.linspace(0,1000,10000)
Y = np.asarray([np.sin(X),np.cos(X)]).T
# data prep
# use 500 data points of historical data to predict 500 data points in the future
data = Y
examples = 500
y_examples = 500
nb_samples = len(data) - examples - y_examples
# input - 2 features
input_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat = np.concatenate(input_list, axis=0)
# target - the first column in merged dataframe
target_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat = np.concatenate(target_list, axis=0)
# set up model
trials = input_mat.shape[0]
features = input_mat.shape[2]
print trials
print features
hidden = 64
model = Sequential()
model.add(LSTM(input_dim=features, output_dim=hidden))
model.add(Dropout(.2))
model.add(Dense(input_dim=hidden, output_dim=y_examples))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer='rmsprop')
# Train
model.fit(input_mat, target_mat, nb_epoch=2)
print_layer_shapes(model, input_shapes =(input_mat.shape))For the LSTM
Output shape:
return_sequences: 3D tensor with shape: (nb_samples, timesteps, output_dim).(nb_samples, output_dim).modelR14J = Sequential()modelR14J.add(JZS1(64, input_dim=4, input_length=32, return_sequences=True))modelR14J.add(JZS1(4, return_sequences=True))modelR14J.add(TimeDistributedDense(4))def _load_data(data, steps = 40):
docX, docY = [], []
for i in range(0, data.shape[0]/steps-1):
docX.append(data[i*steps:(i+1)*steps,:])
docY.append(data[(i*steps+1):((i+1)*steps+1),:])
alsX = np.array(docX)
alsY = np.array(docY)
return alsX, alsY
def _load_data(data, steps = 40): docX, docY = [], [] for i in range(0, int(data.shape[0]/steps)-1): docX.append(data[i*steps:(i+1)*steps,:]) docY.append(data[(i*steps+1):((i+1)*steps+1),:]) alsX = np.array(docX) alsY = np.array(docY) return alsX, alsYfrom keras.utils.layer_utils import print_layer_shapes
7) The input needs to be scaled to fit for the chosen activation function, and it should span a range as big as possible. You wrote that you read kind of input is important - do you happen to remeber where you read about this - I would like to learn more?
2) I used the net that does not predict currency rate changes well to predict rate changes in sinus curves (taken with the same sampling frequency as my FX-rates) and that works very nice. FX rates are way more complex, I guess.
3) Roni Mittelman (http://arxiv.org/abs/1508.00317) descibes he wanted to train a LSTM to predict stock quotes, and failed because of exploding gradients. He describes his own UFCNN (Undecimated Fully Convoluted Neural Net) that works well according to the paper. I would be very interested in testing Roni Mittelmans UFCNN as published. However, the information given is not sufficent for a newbie in NN like me to reproduce this in Keras. I asked Roni for hints, but it seems he is now working for BlackRock, and may be not allowed to talk about his research any more. He used Caffe.
5) I will try training on tick data, but I will research other markets first where there are better chances, as you suggest. Until now, I shied away from futures because one needs to change the instruments regularly before expiry, and it seems to be an additional task for the net to learn that.
Or is it classification? But then, where do we get the class labels from?
--
You received this message because you are subscribed to a topic in the Google Groups "Keras-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/keras-users/9GsDwkSdqBg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/d76cbec6-9296-4dcc-b837-dd99e832249e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
sorry for being slow answering, I'll just try to keep up... Have read this entire thread by now and followed a few links, fascinating things you have here.
2) I used the net that does not predict currency rate changes well to predict rate changes in sinus curves (taken with the same sampling frequency as my FX-rates) and that works very nice. FX rates are way more complex, I guess.
Definitely. I suspect FX rates or other prices are actually not predictable at all, at least not for timeframes that exceed the typical time a professional short term trader takes to turn around a position or react to a changing market (i.e. a few minutes at most in the case of the FDAX). Very short term, I, as a human, can see where things going because I have an idea what the other market participants are doing, so from experience and knowledge I have a hunch of what'll probably happen next.
Thinking about how I trade, I actually don't need to _predict a price 1 minute from now. I can do with less precision, so to say. I need a direction and an idea how far (approximately) a move can go. I need to _classify the price pattern I'm seeing to be able to decide to enter a trade or to stay out of the market. Same thing when exiting a trade. I need to decide whether I would still classify the pattern as the same class that persuaded me to enter a trade. As soon as that is no longer the case, I exit (or when my stoploss or takeprofit limit is reached).
I mean, is prediction the task to be solved? Or is it classification? But then, where do we get the class labels from?
That's right, futures expire and some positions are rolled over, some not. Doesn't matter for intra-day trades. On expiry day, a future will behave especially "wierd", so I'd exclude these days. When incorporating the underlying's prices, there's the additional complication that the fair value of the future will differ for different expiry dates, e.g. for the FDAX the June expiry spans dividend season, so FDAX prices are way higher than for the other expiries. The DAX is calculated such that dividends are reinvested, and arbitrage needs to adjust their position in the stock every time a dividend is paid out. That topic is not an easy one and also involves the individual tax situation of the arbitrageur, so yes, futures are more complex in that respect.
Also, futures that are settled by physical delivery behave differently from those that are cash settled.
I'm having great fun following your discussion. I'll try to contribute some more aspects, not on the details of NNs, but from a trader's perspective.
Stefan.
Hi, Stefan, could you please describe your trading system a little bit more? Or maybe give us links to resources. Is it scalping? What are your trading decisions based on? Is it some sort of indicators, patterns like Elliott waves, or pure experience and intuition?
I think, one needs more input than just quotes to a NN to predict / classify. What indicators would you feed to the net additionally. I thought of 1-3 moving averages, a RSI and possible a volatility measure. What would you as a trader recommend?
How long do you hold your trades - is your gut feeling telling you - I hold this for 20 mins or so?
I try to predict the rate for T + 10 min. I assume, this is short enough so it can work and long enough to make profit.
I tried to use a classification approach, also but it failed. I used the classes BUY / Do Nothing / SELL. But the program can not predict with 100 % accuracy and counts the mistakes. However, there are several kinds of mistakes: a false "Do nothing" is not a problem and will happen to any trader, it could be almost ignored when counting errors. Mistaking a SELL for a BUY, however, will cost money and should not happen too often.
The loss functions available for NN however, do not take into consideration this different kinds of error.
If the data contains 55 % BUY trades and 45 % sell trades, the system could suggest 100 % BUY trades and would be right 55 % of the time. I would not trade this, however.
The biggest problem I see is when and how to move to the next contract. As you mention, it is not a problem of rolling over physical positions when short term trading, but I assume the NN needs new training when moving from the January contract of FDAX to the April contract for instance.
I'm having great fun following your discussion. I'll try to contribute some more aspects, not on the details of NNs, but from a trader's perspective.
This is very valuable to me.
I think, one needs more input than just quotes to a NN to predict / classify. What indicators would you feed to the net additionally. I thought of 1-3 moving averages, a RSI and possible a volatility measure. What would you as a trader recommend?
The loss functions available for NN however, do not take into consideration this different kinds of error.
It should be possible to use a custom loss function in keras.
You're right. Another thought: My market moves differently on the buy side than on the sell side. Sell is faster, more vicious, more forceful. Down fast, up slowly, also on a very short timescale. I believe this is true for many markets, except forex, because the gain in one currency is the loss of the other. Because of this asymmetry, I'd like to think specializing a NN only for the buy side / sell side might be beneficial. I can support this by the fact that there is trading patterns a professional learns to execute first (and more) on the short side than on the long side, e.g. re-shorting at a downwards break level after the price moves back up after the break.
The biggest problem I see is when and how to move to the next contract. As you mention, it is not a problem of rolling over physical positions when short term trading, but I assume the NN needs new training when moving from the January contract of FDAX to the April contract for instance.
From my experience, the market is not very different when moving from one contract to the next. The price level will change, but little else. All the volume goes to the new front contract after the last one has expired. Transition takes place during the day before expiry and the expiry day itself, so these days are special.
Glad I can contribute something.
Concerning indicators, I don't use any. Any indicator that aggregates a few bars, e.g. RSI with its standard 14 bars back, is nothing but history and of no use for what I'm doing. I am not saying indicators can't be useful for other trading styles or for a NN, but what does a 14 bar average do for me if I'm trading just a minute? Ok, it does provide me with an exact number instead of a guess, but I am happy just watching the price.
Instead of classic indicators, maybe it's worth thinking about other numerical means to get a grip on what I call candle development, i.e. price action:
1. Are market orders hitting the bid or lifting the offer, i.e. is there active selling or active buying? I see this information flashing by in my trading software, and I guess I am using it, maybe unconsciously. To get this information, time&sales is not sufficient, one needs the order book (at least best bid / best offer) to determine which side of the book was hit by a traded volume.
2. The order book itself is not a big help in the FDAX. The order book only shows limit orders, and most short term volume goes through market. In the order book, you see traders showing their positions to arbitrage, which cannot work without a well populated book, but that's about all.
3. What's the volume traded right now? What's the time between price ticks? I'd like to distinguish fast-moving markets with lots of volume from slow-movers with little volume.
4. Maybe one can put a numeric value to how price develops as I see it - price coming up slowly, going down sharply, maybe doing a wavelet transform on a N ticks window to find out if there's small or large moves in there and which direction is small and which direction is large. Sounds esoteric (sorry, I was a physicist in a previous life) and I absolutely have no reason to suggest wavelets other than I like them. ;-)
This is all just ideas - I have no clue how far a working NN system is away. At current progress rates we should be there by next Thursday. It is certainly a lot of work and requires a lot of wits and countless hours, maybe years. Maybe it'd be faster to learn how to trade manually, if one is not interested purely academically in getting a NN to profit.
Have you seen this page:
http://ai.marketcheck.co.uk/Forex (click on "Start" and observe).
It is built from Andrej Karpathy's page:
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html
Thats a good point :-), but NNs are so much more fun for the former Theoretical Chemist in me...
Have you seen this page:
http://ai.marketcheck.co.uk/Forex (click on "Start" and observe).
It is built from Andrej Karpathy's page:
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html
No, didn't see this page. Didn't know that reinforcement learning existed. I am so clueless, I need to go and do my homework. So reinforcement learning makes up suitable target values on the spot, based on whether a certain action gives a reward or a penalty, and then you can train a NN step by step, action by action. So it sort of generates class labels or regression targets by itself. Interesting!
So the AIForex Deep Q Demo learns to "play" the data in the input field perfectly, do I understand correctly? Why does "percentage correct" seem to stay near 50%? Did you build this demo?
Thats a good point :-), but NNs are so much more fun for the former Theoretical Chemist in me...
True :-) Manual trading can be very very boring.
Hi Stefan, Dmitry, Ernst,It would be great idea if anyone of you can share some complete sample code on UFCNN and the FX trading for the newbies to play around and comment on their finding . Collective learning helps the entire community. Usually not sharing is not helping the community but only the author.
Share a complete sample might have.1. Ingests Real-time data from e.g IB, Oanda , MT4 and so on. We can convert to Log data before feeding them to UFCNN.2. Does Continuous UFCNN training, classification & prediction within a certain sliding window timeframe3. Classify Trend Up or Down with probability and use a Kalman filter4. Provide Buy or Sell Signal
Such a sample will give the community a starting point using Keras with Theano or Tensorflow and a lot of people will share or contribute to the starter code with what they have found out .What works and what does not. Talking about how something is implemented and not sharing the sample code does nothing to help the community.We are here all of us to learn from one another and make each other a better human being.
Now, we would have to agree which symbols to use (one or a few, I can not provide data for all instruments in US).
I am working on EUR/USD right now, SPY would be a suggestion, and Stefan suggested US Bond Futures (any recommendation?)
| model.add(LSTM(50, | |
| batch_input_shape=(batch_size, tsteps, 1), | |
| return_sequences=True, | |
| stateful=True)) | |
| model.add(LSTM(50, | |
| batch_input_shape=(batch_size, tsteps, 1), | |
| return_sequences=False, | |
| stateful=True)) |
Hi Dmitry and everybody
I have a conceptual question about LSTM and sequences. Consider we have only one input sequence (a sequence of numbers with some patterns (not random) ) and we want to predict the future according to past. The first architecture that comes into my mind is as follows: (borrowed from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
For training we can have two approaches:
· First approach:
n = length of the input sequence
X_train = input sequence (length = n*1)
Y_train = shifted input sequence (only one) (length = n*1)
So, for each value in X_train we have one corresponding value in Y_train (next value in the sequence). Then for training I fedd values in the X_train one by one (batch_size = 1) and update weights by using an optimization method. Here I assume the whole input as a continuous sequence. And the memory of LSTM should be able to extract long and short time dependencies in input sequence. After training the network we have a RNN that can predict one step ahead for each input. For predicting multiple steps ahead (for example 3 steps ahead) we should do as follows: feed x1 to have y1, then feed y1 to have y2, then feed y2 to have y3.
· Second approach:
The proposed approach in:
https://github.com/fchollet/keras/blob/master/examples/stateful_lstm.py
Here the
approach is making some subsequences. Here we want to predict the future of the
sequence based on the history. I considered the length of the history length_history
and we want to predict future for the length length_future. I transformed the
data to following format:
X_train is n x length_history numpy array and y_train is n x length_future numpy
array (for details check the above link) . Then we can train model for
predicting multiple steps ahead in future.
I implemented both of these approaches in Keras (LSTM with stateful mode)
My questions:
1. 1. Which approach is correct? I myself think that the first approach is better and more logical. The reason is that the whole input is a sequence and making subsequences might be wrong!
2. 2. Is the second approach sequence to sequence training?
3. 3. How should I manage model internal states. I mean in prediction step when I should reset the state for both approaches?
4. 4. I fed my input to a NN without any hidden layer (just a dense layer) and for both approaches this neural network (without any hidden layer) worked better than RNN with LSTM hidden layers! How is it possible?
Regards,
Mina
R
Hi guys, interesting thread here.
I'm currently working on building a Reinforcement learning agent to trade using the Oanda FX Rest API.
Specifically, there're two general approach to it. One, as you have seen with Andrej's Deep Q learning, acts in discrete action spaces, meaning you choice of action has to be discrete. For example, you may have a deep Q leaner with 5 action outputs, one for buy one share, one for buy 2 shares, one for hold, one for sell 1 share, and lastly for sell 2 shares.
Another general method is over continuous action space, called Actor-critic. In this case you would specify your output action decision by a Tanh activation, this will give you over a range of possible continuous outputs from negative to positive.
Google deepmind uses these algorithms extensively for their Atari game playing, as well as continuous control problems such as robotic joint actuation, and 3d racing game (continuous action).
The deep component usually invovled a convolutional NN, but I've seen papers implementing LSTM as well, it should be very straight forward to swap out. There are some additional tricks involved like experience replay and target network, but these implementations in keras is not difficult.
I am in class at the moment but I will post more materials later, as well potential source code.
If you want to learn about reinforcement learning, you should look up David Silver's UCL lectures on YouTube, he is a superstar at Deepmind.
I'm currently working on building a Reinforcement learning agent to trade using the Oanda FX Rest API.Specifically, there're two general approach to it. One, as you have seen with Andrej's Deep Q learning, acts in discrete action spaces, meaning you choice of action has to be discrete. For example, you may have a deep Q leaner with 5 action outputs, one for buy one share, one for buy 2 shares, one for hold, one for sell 1 share, and lastly for sell 2 shares.
Another general method is over continuous action space, called Actor-critic. In this case you would specify your output action decision by a Tanh activation, this will give you over a range of possible continuous outputs from negative to positive.
Google deepmind uses these algorithms extensively for their Atari game playing, as well as continuous control problems such as robotic joint actuation, and 3d racing game (continuous action).
The deep component usually invovled a convolutional NN, but I've seen papers implementing LSTM as well, it should be very straight forward to swap out. There are some additional tricks involved like experience replay and target network, but these implementations in keras is not difficult.
Hi all,I'm new to NN and recently discovered Keras and I'm trying to implement LSTM to take in multiple time series for future value prediction. For example, I have historical data of 1)daily price of a stock and 2) daily crude oil price price, I'd like to use these two time series to predict stock price for the next day. Based on my understanding I tried a minimal model:# Set dimsn_samples = 171n_timesteps = 8n_feat = 2n_classes = 128# Modelmodel = Sequential()model.add(LSTM(n_feat, n_classes, activation='linear', inner_activation='linear', return_sequences=True))model.add(Dropout(0.5))model.add(TimeDistributedDense(n_classes, 1), activation='linear')model.add(Activation('linear'))model.compile(loss='mean_squared_error', optimizer='rmsprop')# Trainscore = model.evaluate(X_test, Y_test, batch_size=16)model.fit(input_matrix, target, nb_epoch=10)The input_matrix here is 3D matrix with dim (171, 8, 2) - 171 samples, 8 timesteps, 2 features, I broke the long time series into shorter subseries of 8 time steps. And the target 3D matrix with dim (171, 1, 1).
I ran it and it returned:Epoch 0 171/171 [==============================] - 0s - loss: nanI'm not sure if this model is the correct one to use for my problem, or did I build it correctly, I'd welcome any feedback and advice from you.Also, I'm confused about how to process the input. My input are two daily prices over past 20 years, should I use them as a long sequence (then there's only one sample I think...), or break into smaller pieces (# of pieces = # of samples)? Should I normalize the prices and maybe some other pre-processing?Thanks in advance for your help :)Best,Yue
I figured out a way to solve this problem so I think it might be helpful to post the solution here.It turned out the activation and inner_activation functions I used for LSTM layer were wrong, thus the loss could not be calculated properly. I replaced them with sigmoid, tanh, and relu, all of them worked and gave losses that decreased with each epoch. Also, I replaced TimeDistributedDense layer with a simple Dense layer, so return_sequences=False for the LSTM layer.As to the input, I wrote a function that transforms input and target long time series into small pieces of 3D array (nb_samples, time_steps, nb_features) to feed into the model. In my example, I used rolling windows of same length and corresponding targets to train the model. I tried training the model with both normalized and non-normalized data, the normalized data generally gave better results.I attached part of my code at the end, please let me know if you have any comments or suggestions. Thanks!# merge data framesmerged = df1.merge(df2, left_index=True, right_index=True, how='inner').dropna()# data prep# use 100 days of historical data to predict 10 days in the futuredata = merged.valuesexamples = 100y_examples = 10nb_samples = len(data) - examples - y_examples# input - 2 featuresinput_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]input_mat = np.concatenate(input_list, axis=0)# target - the first column in merged dataframetarget_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]target_mat = np.concatenate(target_list, axis=0)# set up modeltrials = input_mat.shape[0]features = input_mat.shape[2]hidden = 64model = Sequential()model.add(LSTM(features, hidden))model.add(Dropout(.2))model.add(Dense(hidden, y_examples))model.add(Activation('linear'))model.compile(loss='mse', optimizer='rmsprop')# Trainmodel.fit(input_mat, target_mat, nb_epoch=50)
ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 20)
Apply node that caused the error: Elemwise{sub,no_inplace}(activation_14_target, Reshape{3}.0)
Toposort index: 491
Inputs types: [TensorType(float32, 3D), TensorType(float32, (False, False, True))]
Inputs shapes: [(300L, 1L, 6L), (300L, 20L, 1L)]
Inputs strides: [(24L, 24L, 4L), (80L, 4L, 4L)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{Abs((i0 / i1))}}(Elemwise{sub,no_inplace}.0, Elemwise{Composite{clip(Abs(i0), i1, i2)}}.0), Elemwise{Composite{((i0 * i1 * i2 * Abs(i3) * sgn(i4)) / (i5 * i6 * i7 * i8 * i3 * i3))}}(TensorConstant{(1L, 1L, 1.. of -100.0}, InplaceDimShuffle{x,x,x}.0, InplaceDimShuffle{0,x,x}.0, Elemwise{Composite{clip(Abs(i0), i1, i2)}}.0, Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x,x,x}.0, InplaceDimShuffle{x,x,x}.0, InplaceDimShuffle{x,x,x}.0, InplaceDimShuffle{x,x,x}.0)]]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "C:\Users\mingh_000\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2723, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "C:\Users\mingh_000\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2825, in run_ast_nodes
if self.run_code(code, result):
File "C:\Users\mingh_000\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-50-576c7070b65b>", line 17, in <module>
model.compile(loss="mape", optimizer="rmsprop")
File "C:\Users\mingh_000\Anaconda2\lib\site-packages\keras\models.py", line 339, in compile
**kwargs)
File "C:\Users\mingh_000\Anaconda2\lib\site-packages\keras\engine\training.py", line 588, in compile
sample_weight, mask)
File "C:\Users\mingh_000\Anaconda2\lib\site-packages\keras\engine\training.py", line 311, in weighted
score_array = fn(y_true, y_pred)
File "C:\Users\mingh_000\Anaconda2\lib\site-packages\keras\objectives.py", line 15, in mean_absolute_percentage_error
diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true), K.epsilon(), np.inf))
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.I am new to deep learning and LSTM. I have a very simple question. I have taken a sample of demands for 50 time steps and I am trying to forecast the demand value for the next 10 time steps (up to 60 time steps) using the same 50 samples to train the model.
But unfortunately, the closest I came is splitting the sample demands into 67 training % and 33 testing % and my forecast is only forecasting for the 33% (35 - 50 time steps), but it never goes beyond 50 time steps as shown in the picture below. Can anybody help me with this issue?
I have attached my code below.
Thank you in advance.
I figured out a way to solve this problem so I think it might be helpful to post the solution here.It turned out the activation and inner_activation functions I used for LSTM layer were wrong, thus the loss could not be calculated properly. I replaced them with sigmoid, tanh, and relu, all of them worked and gave losses that decreased with each epoch. Also, I replaced TimeDistributedDense layer with a simple Dense layer, so return_sequences=False for the LSTM layer.As to the input, I wrote a function that transforms input and target long time series into small pieces of 3D array (nb_samples, time_steps, nb_features) to feed into the model. In my example, I used rolling windows of same length and corresponding targets to train the model. I tried training the model with both normalized and non-normalized data, the normalized data generally gave better results.I attached part of my code at the end, please let me know if you have any comments or suggestions. Thanks!# merge data framesmerged = df1.merge(df2, left_index=True, right_index=True, how='inner').dropna()
# data prep# use 100 days of historical data to predict 10 days in the futuredata = merged.valuesexamples = 100y_examples = 10nb_samples = len(data) - examples - y_examples
# input - 2 featuresinput_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]input_mat = np.concatenate(input_list, axis=0)# target - the first column in merged dataframetarget_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]target_mat = np.concatenate(target_list, axis=0)
# set up modeltrials = input_mat.shape[0]features = input_mat.shape[2]hidden = 64model = Sequential()model.add(LSTM(features, hidden))model.add(Dropout(.2))model.add(Dense(hidden, y_examples))model.add(Activation('linear'))
model.compile(loss='mse', optimizer='rmsprop')# Trainmodel.fit(input_mat, target_mat, nb_epoch=50)
Hi, I am not really sure what TimeDistributedDense does, I just used a normal Dense layer with linear activation. The OP outputted 10 neurons to predict the next 10 steps of the *first* stock. I assume he thinks the second is related but does not want to predict that one.I made a toy example with a sine wave. To prove the point above, I put a related second sequence, which is cosine, but I predict only sine. Hope it makes you understand :) ... I predict 500 data points in the future...from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, TimeDistributedDense
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM
from keras.preprocessing.sequence import pad_sequences
from keras.utils.layer_utils import print_layer_shapes
#sine and cos wave
import numpy as np
X = np.linspace(0,1000,10000)
Y = np.asarray([np.sin(X),np.cos(X)]).T
# data prep
# use 500 data points of historical data to predict 500 data points in the future
data = Y
examples = 500
y_examples = 500
nb_samples = len(data) - examples - y_examples
# input - 2 features
input_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]
input_mat = np.concatenate(input_list, axis=0)
# target - the first column in merged dataframe
target_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]
target_mat = np.concatenate(target_list, axis=0)
# set up model
trials = input_mat.shape[0]
features = input_mat.shape[2]
print trials
print features
hidden = 64
model = Sequential()
model.add(LSTM(input_dim=features, output_dim=hidden))
model.add(Dropout(.2))
model.add(Dense(input_dim=hidden, output_dim=y_examples))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer='rmsprop')
# Train
model.fit(input_mat, target_mat, nb_epoch=2)
print_layer_shapes(model, input_shapes =(input_mat.shape))Regards,Francesco
On Tuesday, 24 November 2015 22:38:32 UTC+1, Tarnac wrote:I'm confused about this example. If one is predicting 10 timesteps into the future, I thought one would be using TimeDistributedDense since multiple timesteps will be predicted. The way OP programmed it, the model is outputting with 10 neurons. My limited understanding would be this case would be covered by two neuron output (one for each signal/stock/feature). And each of these outputs will have 10 timesteps...
On Friday, November 20, 2015 at 8:26:09 AM UTC-8, francesc...@gmail.com wrote:This worked for me, although I had to specifymodel.add(LSTM(input_dim=features, output_dim=hidden))model.add(Dropout(.2))model.add(Dense(input_dim=hidden, output_dim=y_examples))Do you know why the TimeDistributedDense (plus returning the sequences in the LSTM layer) is not appropriate?CheersmFrancesco
On Tuesday, 21 July 2015 16:14:09 UTC+2, Yue Duan wrote:
I figured out a way to solve this problem so I think it might be helpful to post the solution here.It turned out the activation and inner_activation functions I used for LSTM layer were wrong, thus the loss could not be calculated properly. I replaced them with sigmoid, tanh, and relu, all of them worked and gave losses that decreased with each epoch. Also, I replaced TimeDistributedDense layer with a simple Dense layer, so return_sequences=False for the LSTM layer.As to the input, I wrote a function that transforms input and target long time series into small pieces of 3D array (nb_samples, time_steps, nb_features) to feed into the model. In my example, I used rolling windows of same length and corresponding targets to train the model. I tried training the model with both normalized and non-normalized data, the normalized data generally gave better results.I attached part of my code at the end, please let me know if you have any comments or suggestions. Thanks!# merge data framesmerged = df1.merge(df2, left_index=True, right_index=True, how='inner').dropna()
# data prep# use 100 days of historical data to predict 10 days in the futuredata = merged.valuesexamples = 100y_examples = 10nb_samples = len(data) - examples - y_examples
# input - 2 featuresinput_list = [np.expand_dims(np.atleast_2d(data[i:examples+i,:]), axis=0) for i in xrange(nb_samples)]input_mat = np.concatenate(input_list, axis=0)# target - the first column in merged dataframetarget_list = [np.atleast_2d(data[i+examples:examples+i+y_examples,0]) for i in xrange(nb_samples)]target_mat = np.concatenate(target_list, axis=0)
# set up modeltrials = input_mat.shape[0]features = input_mat.shape[2]hidden = 64model = Sequential()model.add(LSTM(features, hidden))model.add(Dropout(.2))model.add(Dense(hidden, y_examples))model.add(Activation('linear'))
model.compile(loss='mse', optimizer='rmsprop')# Trainmodel.fit(input_mat, target_mat, nb_epoch=50)