Re: Algorithmic trading

Dan M. Shaw

unread,

Nov 17, 2011, 10:41:10 PM11/17/11

to ai-ml...@googlegroups.com

On Thu, Nov 17, 2011 at 8:09 AM, Maxime Leclerc <maksim...@gmail.com> wrote:

I think an intermediary step before looking at more complex models
like neural networks might be to try some form of regression like the
ones covered in ml-class (linear, polynomial, logistic, etc.).

Maxime, Dhruv...

I was reading the project description on Kaggle. I would start with a cluster analysis on the Response portion of the data. It may be that there really several problems here. Then I'd try some form of machine learning on the Predictor part of the data and see if I could classify each Predictor in the correct Response cluster. Then I'd try to find models for the Response clusters.

We might find some synergy if one team member specializes in cluster analysis and another in classification (for example), or if different people try to model different Response clusters.

Also, I notice that the Response data consists of bid and ask prices, without volumes. A limit price without a volume is almost meaningless, which suggests that this is a toy problem, a test competition to identify people to invite to the real competition.

Anyway, just some thoughts on this problem. The Stanford AI class is my priority right now. Afterwards, I'm not sure whether I want to take another class, or find a project to work on.

-- Dan

Maxime Leclerc

unread,

Nov 19, 2011, 3:23:36 PM11/19/11

to ai-ml...@googlegroups.com

Hi Paul,

For our first submission, I simply used the sample file included on
kaggle's website without writing any code.

I've been quite busy with deadlines in four classes but I was thinking
we can probably re-use the Octave/Matlab code from http://ml-class.org
and http://openclassroom.stanford.edu.

I'm planning to spend more time on kaggle projects as soon as I finish
a SVM project I have in Machine Learning class I'm taking in a local
University.

As Dan mentionned, volumes would most likely be used if we were to
build a real model.

However, I think we can still learn from "toy problems" since we're
fairly new to AI competitions. If you prefer, we can participate in a
different competition. I think we need to get some experience with
simple competitions before we jump to large competitions like
http://heritagehealthprize.com.

What do you think?

Thanks for your interest. ;) Maxime

On Sat, Nov 19, 2011 at 2:48 PM, Paul Tan <paul...@gmail.com> wrote:>
Could you post the code you used to generate the data?>> I'm trying to
get started, but wanted to get a framework to input the> data and
generate the output data that the competition requested.>> I am
thinking of just putting the data into a neural network with a> large
number of hidden units and see what happens. I will post my> Octave
code here for others to see if they have a better idea.>>> Thanks.>>
Paul Tan.>

Paul Tan

unread,

Nov 19, 2011, 6:12:46 PM11/19/11

to ai-ml-class

I agree (I'm in the ml-class as well). I was just planning to take
the nn code from ex4 and use it on the data.

I'll take a look at the download sample from the site and see if I can
load it into Octave.

Paul Tan.

On Nov 19, 3:23 pm, Maxime Leclerc <maksim.lek...@gmail.com> wrote:
> Hi Paul,
>
> For our first submission, I simply used the sample file included on
> kaggle's website without writing any code.
>
> I've been quite busy with deadlines in four classes but I was thinking
> we can probably re-use the Octave/Matlab code fromhttp://ml-class.org

> andhttp://openclassroom.stanford.edu.

>
> I'm planning to spend more time on kaggle projects as soon as I finish
> a SVM project I have in Machine Learning class I'm taking in a local
> University.
>
> As Dan mentionned, volumes would most likely be used if we were to
> build a real model.
>
> However, I think we can still learn from "toy problems" since we're
> fairly new to AI competitions. If you prefer, we can participate in a
> different competition. I think we need to get some experience with

> simple competitions before we jump to large competitions likehttp://heritagehealthprize.com.

>
> What do you think?
>
> Thanks for your interest. ;) Maxime
>

> On Sat, Nov 19, 2011 at 2:48 PM, Paul Tan <paulc...@gmail.com> wrote:>
>
> Could you post the code you used to generate the data?>> I'm trying to
> get started, but wanted to get a framework to input the> data and
> generate the output data that the competition requested.>> I am
> thinking of just putting the data into a neural network with a> large
> number of hidden units and see what happens. I will post my> Octave
> code here for others to see if they have a better idea.>>> Thanks.>>
> Paul Tan.>
>

> On Thu, Nov 17, 2011 at 10:41 PM, Dan M. Shaw <olom...@gmail.com> wrote:
>
>
>
>
>
>
>
> > On Thu, Nov 17, 2011 at 8:09 AM, Maxime Leclerc <maksim.lek...@gmail.com>

Rashmi Ravinathan

unread,

Nov 19, 2011, 7:41:15 PM11/19/11

to ai-ml...@googlegroups.com

I would say the "toy problems" approach is way better for newbies to
AI like myself. Again, I can commit to them once I am done with the
AI class here. I am trying to do the programming assignments for the
regular (on-campus) students to get a feel for programming in the AI
field. Also, there is going to be a programming contest in AI class
(according to one of the office hours), but I have not seen any
announcement made yet.

Thanks,
Rashmi

Michael Taylor

unread,

Nov 19, 2011, 7:55:17 PM11/19/11

to ai-ml...@googlegroups.com

Hey Folks,

Also would you guys suggest we try to plot the training data? To get a feel for what we are dealing with?

Thank you,
Michael

Rashmi Ravinathan

unread,

Nov 19, 2011, 10:47:13 PM11/19/11

to ai-ml...@googlegroups.com

Hey, was there was some talk about posting code :-)

Michael Taylor

unread,

Nov 19, 2011, 10:50:39 PM11/19/11

to ai-ml...@googlegroups.com

We do have a repo set up somewhere however I don't think any1 here really has anything yet since we all have our hands full with the AI midterm

Michael Taylor

unread,

Nov 20, 2011, 12:39:43 AM11/20/11

to ai-ml...@googlegroups.com

Here is how to set up a database for the algo training data

http://www.kaggle.com/c/AlgorithmicTradingChallenge/forums/t/1032/importing-the-data-into-a-database

Rashmi Ravinathan

unread,

Nov 20, 2011, 2:48:56 AM11/20/11

to ai-ml...@googlegroups.com

Thanks Michael, I will start looking into this once the midterm is done

Dan M. Shaw

unread,

Nov 20, 2011, 3:57:02 AM11/20/11

to ai-ml...@googlegroups.com

On Sat, Nov 19, 2011 at 7:39 PM, Michael Taylor <michaeljame...@gmail.com> wrote:

Here is how to set up a database for the algo training data

http://www.kaggle.com/c/AlgorithmicTradingChallenge/forums/t/1032/importing-the-data-into-a-database

I saw this. Maybe I'm missing something here, but this seems like a really slow way to handle the data. We aren't doing lookups, we aren't doing anything relational, we aren't doing anything that requires a client/server architecture. We have flat files that need to be processed one record at a time. Why not just read the CSV files directly?

-- Dan

Michael Taylor

unread,

Nov 20, 2011, 4:00:11 AM11/20/11

to ai-ml...@googlegroups.com

Hi Dan,

You are right database's suck when you do not need them
However this is an easy way to get around a ram problem on our pc's if needed (It looks like I can read a csv directly its just I might have to wait 30 mins for it to plot)

Cheers,
Michael

Ivo Danihelka

unread,

Nov 20, 2011, 8:35:05 AM11/20/11

to ai-ml-class

I prepared a script to visualize the price changes.
It displays 1000 rows by default.
Rows of the same type are displayed.
Use the command line options to change the filter.

The script:
https://github.com/fidlej/atrade/blob/master/displaycurves.py

Example usage:
./displaycurves.py path/to/training.csv

Anyone can fork the git repo and enhance the code.

Ivo Danihelka

unread,

Nov 20, 2011, 3:10:09 PM11/20/11

to ai-ml-class

I'm happy and you should too.
We are currently 3rd on the leaderboard:
http://www.kaggle.com/c/AlgorithmicTradingChallenge/Leaderboard

I will describe the used method in a next email.
For now, enjoy the hot news.

Ivo Danihelka

unread,

Nov 20, 2011, 3:50:57 PM11/20/11

to ai-ml-class

About the method:
I was NOT using a regression.
The events even51, ..., event100 are happening at when an event
occurs.
They are not proper points for an X axis.
You don't have the X points to predict y from the test data.

To minimize the error, I was using means.
Mean of a set of values minimizes their squared error.
I compute means of price changes.
price_change = event[t].price - event[50].price

Each mean is for rows with the same (security_id, initiator). I also
keep different means at event51, ..., event100. It allows to represent
changing of the mean when more events are coming.
When predicting a value, I add the mean change to the price of the
50th event.

See means.py and produce_response.py in the git repo for details.
https://github.com/fidlej/atrade

Feel free to experiment.
You can try computing means over different groups.
You can also do "Error Analysis" as suggested by Ng. See what
validation examples are problematic.

Dan M. Shaw

unread,

Nov 20, 2011, 4:56:59 PM11/20/11

to ai-ml...@googlegroups.com

On Sun, Nov 20, 2011 at 10:50 AM, Ivo Danihelka <i...@danihelka.net> wrote:

See means.py and produce_response.py in the git repo for details.
https://github.com/fidlej/atrade

Thanks! I don't know much about Python, but I was able to install everything and run your code. I can learn a lot from this.

-- Dan

Maxime Leclerc

unread,

Nov 20, 2011, 5:00:39 PM11/20/11

to ai-ml...@googlegroups.com

Hi Ivo,

Wow that's great progress. Thanks a lot for this contribution, I'm
looking forward to helping you out and improving your solution in a
few days. Doing "Error Analysis" will most likely help us focus on
areas to improve.

Good job ;) Maxime

Ivo Danihelka

unread,

Nov 21, 2011, 3:17:03 AM11/21/11

to ai-ml-class

Another reason why a table of means is a good representation:
The table can represent a non-continuous function. The price can jump
when an event occurs.

Rohan Pillai

unread,

Nov 21, 2011, 8:15:12 AM11/21/11

to ai-ml...@googlegroups.com

I see now how python can be helpful. I was using octave so long but it couldn't load the characters and time in it, I tried fixing it with C++ but it was time consuming.
I have started learning python, hope I will be ready soon to contribute.

Argod

unread,

Nov 21, 2011, 5:58:29 PM11/21/11

to ai-ml-class

Hi...

I'm finding difficult making sense of the data...

as I understand, each row is a day, and in a day there is only 50
movements (?).

the movements are
Q, if the prices are adjusted (but in the data I see that there are
times where the event is Q, but I see no data change in the ask or
bid )
T if there has a trade (but we have no information ?)

can some of you help me of give me some documentation so I can
understand the challenge?

I looked at Ivo's code, but I'm not too familiar with Phyton.. and I'm
finding it difficult to follow.

thnx!

AIBrisbane

unread,

Nov 22, 2011, 3:07:17 AM11/22/11

to ai-ml-class

> I looked at Ivo's code, but I'm not too familiar with Phyton.. and I'm
> finding it difficult to follow.
>

Well, I will try and summarize Ivo's code for those who do not know
python.
Let me know if I have missed something important or got it wrong.

Summary of Ivo's work:
We train on Training data and update the testing data to produce
output.
- for each Bid/Ask value in training set, starting from 51st Bid thru
100th Bid, we calculate the mean/average difference between it and
50th Bid/Ask
- we keep a running total of this difference in file Changes.pickle ,
summing over the same key combination ('security_id'/'initiator' ).
- once trained, we open up the testing file, and create a new output
file by replacing 51st thru 100th Bid/Ask prices with 50th Bid/Ask
price plus the mean/average value for the corresponding column from
Changes.pickle file.

File Changes.pickle :
- is produced/updated by program 'means.py' and is used by program
'produce_response.py' to generate output file for submission to
Kaggle.
- the raw 'Changes.pickle' file is a serialized document. see
http://docs.python.org/library/pickle.html for more info on this.
- It is better to understand the de-serialized structured of the
file.
- Each row(viewed as a set of columns) in this file is specific to
a particular key ('security_id'/'initiator' combination).
- The first element of this row (index=0) tells us how many rows in
training set with the same key have been processed so far. This value
is used to calculate the mean(average) later on
- The remaining pair columns keep the sum of difference between the
base ask/bid price (set to 50th ask/bid price in input row) and bid/
ask prices in the input training row, starting from 51st bid until
100th bid. The relevant code for this is in 'means.py' file line 60,61
and 62
- For example if the input row in training set has (base)50th bid/
ask price $23/$24 and 87th bid/ask price as $18/$19.5, the difference
-$5/-$4.5 is added to the existing values at columns 75/76
- The row in the Changes.pickle file to which the above
difference is updated depends on the key value
('security_id'/'initiator' combination)
- When all training examples have been used to update corresponding
rows in Changes.pickle file, the output data is produced by
- lines 41 thru 45 in file produce_response.py
- as discussed in summary, this value is 50th Bid/Ask value(in
testing file) plus the mean of Ask/Bid difference calculated for the
training set.
- we use the element at zero index for each row in the
Changes.pickle file to calculate the mean Bid/Ask price difference in
that row.

Hmm, my explanation didn't really turn out as clear as I wanted it to
be. But it is a start...

Rashmi Ravinathan

unread,

Nov 22, 2011, 10:36:17 AM11/22/11

to ai-ml...@googlegroups.com

Is anyone interested in building a small robot? Sorry, I know this is
not the topic being discussed here...But that is something I want to
work on in future :-)

Uni

unread,

Nov 22, 2011, 10:48:15 AM11/22/11

to ai-ml...@googlegroups.com

Yes.. I'd love to be a part of anything related to robotics. I'm a beginner though.. but count me in for any research-based work, and programming in MATLAB. What kind of a robot do you have in mind?

Deepankar Sharma

unread,

Nov 22, 2011, 10:49:49 AM11/22/11

to ai-ml...@googlegroups.com

Hello Rashmi,

Actually I have been working on some computer vision related code and I would be interested in building a small inexpensive robot to test my algorithms for robustness. I do test my code on the beagle board-xm which is fairly small so moving that board onto a robot should not be too onerous.

Regards,

Deepankar

Argod

unread,

Nov 22, 2011, 12:02:08 PM11/22/11

to ai-ml-class

thanx AIBrisbane!!!, I understand better now!

On Nov 22, 3:07 am, AIBrisbane <rafi.fer...@gmail.com> wrote:
> > I looked at Ivo's code, but I'm not too familiar with Phyton.. and I'm
> > finding it difficult to follow.
>
> Well, I will try and summarize Ivo's code for those who do not know
> python.
> Let me know if I have missed something important or got it wrong.
>
> Summary of Ivo's work:
> We train on Training data and update the testing data to produce
> output.
> - for each Bid/Ask value in training set, starting from 51st Bid thru
> 100th Bid, we calculate the mean/average difference between it and
> 50th Bid/Ask
> - we keep a running total of this difference in file Changes.pickle ,
> summing over the same key combination ('security_id'/'initiator' ).
> - once trained, we open up the testing file, and create a new output
> file by replacing 51st thru 100th Bid/Ask prices with 50th Bid/Ask
> price plus the mean/average value for the corresponding column from
> Changes.pickle file.
>
> File Changes.pickle :
> - is produced/updated by program 'means.py' and is used by program
> 'produce_response.py' to generate output file for submission to
> Kaggle.

> - the raw 'Changes.pickle' file is a serialized document. seehttp://docs.python.org/library/pickle.htmlfor more info on this.

Ivo Danihelka

unread,

Nov 22, 2011, 5:43:25 PM11/22/11

to ai-ml-class

Is somebody submitting on Kaggle as "zenbot"?
It has now the same score as our submission.

Herbert Mühlburger

unread,

Nov 22, 2011, 5:56:53 PM11/22/11

to ai-ml...@googlegroups.com

Am 22.11.11 23:43, schrieb Ivo Danihelka:

> Is somebody submitting on Kaggle as "zenbot"?
> It has now the same score as our submission.

Yes this seems a bit "strange". Have no idea.

Regards,
Herbert
--
=================================================================
Herbert Muehlburger Software Development and Business Management
Graz University of Technology
www.muehlburger.at www.twitter.com/hmuehlburger
=================================================================

Maxime Leclerc

unread,

Nov 22, 2011, 6:13:50 PM11/22/11

to ai-ml...@googlegroups.com

Hi,

Yes I saw that too. I guess it's not too hard to find our group with a
few google searches.

Do you think we should switch to a code repository with passwords?

Thanks. Maxime

2011/11/22 Herbert Mühlburger <herbert.m...@gmail.com>:

Emre Çelikten

unread,

Nov 22, 2011, 6:32:12 PM11/22/11

to ai-ml...@googlegroups.com

Was the group public?

It might be someone in the group, too... It is a difficult problem.

Can you see who downloads the source code if you switch to one with
passwords? That would be the only way to deal with it.

On 11/23/2011 01:13 AM, Maxime Leclerc wrote:
> Hi,
>
> Yes I saw that too. I guess it's not too hard to find our group with a
> few google searches.
>
> Do you think we should switch to a code repository with passwords?
>
> Thanks. Maxime
>

> 2011/11/22 Herbert M�hlburger<herbert.m...@gmail.com>:

Herbert Mühlburger

unread,

Nov 23, 2011, 3:54:06 AM11/23/11

to ai-ml...@googlegroups.com

Hi!

Am 23.11.2011 00:13, schrieb Maxime Leclerc:
> Yes I saw that too. I guess it's not too hard to find our group with a
> few google searches.
>
> Do you think we should switch to a code repository with passwords?

If you want to switch to a "private" repro maybe [1] is a good choice?
I think it will be always a kind of problem as long as you can win
some money on Kaggle. Sharing your code is great but if someone
takes advantage out of it's not that great any more for the whole group.

[1] https://bitbucket.org/

Cheers,
Herbert

Rashmi Ravinathan

unread,

Nov 23, 2011, 7:47:00 AM11/23/11

to ai-ml...@googlegroups.com

Thanks for your interest in Robotics Uni, Deepankar. Maxime, Thanks
for starting a thread on that. Actually, I am planning to do it post
AI-Class. Is that ok with you guys? BTW, mind you, this would be my
first Robotics project. So, like Maxime suggested lets do small
"learning" projects first. Your thoughts?

2011/11/23 Herbert Mühlburger <herbert.m...@gmail.com>:

Uni

unread,

Nov 23, 2011, 8:36:30 AM11/23/11

to ai-ml...@googlegroups.com

This would be my first too. And I'm also free after the AI class - thesis and all. So fine by me :).

abinaya mahendiran

unread,

Nov 23, 2011, 12:30:11 PM11/23/11

to ai-ml...@googlegroups.com

Maxime,

My kaggle id is Abinaya. Add me in the team :)

I will start working on our project shortly :)

Regards,

abi

Ivo Danihelka

unread,

Nov 23, 2011, 3:43:23 PM11/23/11

to ai-ml-class

I corrected the order on the Kaggle Leaderboard.
We are 3rd again.
http://www.kaggle.com/c/AlgorithmicTradingChallenge/Leaderboard

The improvement was made by analyzing the errors on a validation set.
One security_id is then handled specially.
More details can be seen on our Kaggle "Submissions" page.

It is getting harder to get an improvement in the score.
It will be interesting to read the winning entry description after the
end of the competition.

AIBrisbane

unread,

Nov 23, 2011, 9:02:25 PM11/23/11

to ai-ml-class

I agree. I was wondering about the same when we signed up for a repo
at google code.

On Nov 23, 9:32 am, Emre Çelikten <emre.celikten.1...@gmail.com>
wrote:

> Was the group public?
>
> It might be someone in the group, too... It is a difficult problem.
>
> Can you see who downloads the source code if you switch to one with
> passwords? That would be the only way to deal with it.
>
> On 11/23/2011 01:13 AM, Maxime Leclerc wrote:
>
>
>
>
>
>
>
> > Hi,
>
> > Yes I saw that too. I guess it's not too hard to find our group with a
> > few google searches.
>
> > Do you think we should switch to a code repository with passwords?
>
> > Thanks. Maxime
>

> > 2011/11/22 Herbert M hlburger<herbert.muehlbur...@gmail.com>:

Joseph Pollard

unread,

Nov 23, 2011, 11:06:12 PM11/23/11

to ai-ml-class

Ivo, I tried pulling the latest submission, but its saying file cant
be found. Is it working for you, or anyone else?

On Nov 23, 3:43 pm, Ivo Danihelka <i...@danihelka.net> wrote:
> I corrected the order on the Kaggle Leaderboard.

> We are 3rd again.http://www.kaggle.com/c/AlgorithmicTradingChallenge/Leaderboard

Eby John Issac

unread,

Nov 24, 2011, 2:15:25 AM11/24/11

to ai-ml-class

Am I the only one here who does not understand the problem
completely? Is there any other web page other than
http://www.kaggle.com/c/AlgorithmicTradingChallenge/Details/Background
which describes the task? I would appreciate if someone could answer
these questions.

What exactly are we trying to predict? What is special about the 50th
bid?

Regards
Eby

dileep ciddi

unread,

Nov 24, 2011, 3:45:08 AM11/24/11

to ai-ml...@googlegroups.com

i am not able to understand the problem statement and aim of the task completely. can some one come up with a good explanation

--
Dileep Ciddi

Final year B.tech

Electrical Engineering

NIT Warangal

Ivo Danihelka

unread,

Nov 24, 2011, 4:04:23 AM11/24/11

to ai-ml...@googlegroups.com

On Thu, Nov 24, 2011 at 8:15 AM, Eby John Issac <eby...@gmail.com> wrote:
> Is there any other web page other than
> http://www.kaggle.com/c/AlgorithmicTradingChallenge/Details/Background
> which describes the task?

Some other resources:
1) The data schema description on the Data page:
http://www.kaggle.com/c/AlgorithmicTradingChallenge/Data

We predict the "Responses" part of the rows.

2) Download and read the data files.
Spend time looking at them.
Visualize them.
Some nice visualization tips from past competitions:
http://blog.kaggle.com/2011/03/23/getting-in-shape-for-the-sport-of-data-sciencetalk-by-jeremy-howard/

3) Read the existing threads on the Kaggle forum. The organizers
provided some info about the dataset there.

--
Ivo Danihelka

Blaž Šnuderl

unread,

Nov 24, 2011, 5:05:44 AM11/24/11

to ai-ml...@googlegroups.com

Could i be added to the team? My username is snuderl

John Hurliman

unread,

Dec 14, 2011, 7:00:23 PM12/14/11

to ai-ml...@googlegroups.com

On a slightly related note, I met with some of the Kaggle guys last night after realizing they are next door to me. Small world! They're a great group of people.

John

Reply all

Reply to author

Forward