clojure success story ... hopefully :-)

248 views
Skip to first unread message

bradford cross

unread,
Aug 14, 2009, 3:10:26 PM8/14/09
to clo...@googlegroups.com
We have just released flightcaster.com which uses statistical inference and machine learning to predict flight delays in advance of airlines (initial results appear to do so with 85 - 90 % accuracy.)

The webserver and webapp are all rails running on the Heroku platform; which also serves our blackberry and iphone apps.

The research and data heavy lifting is all in Clojure

Distributed data mining is done via a custom layer on top of cascading (which is a layer on top of hadoop.)  All run on EC2 and S3 using the very nice cloudera AMIs and deployment scripts.

In addition to the machine learning, the layer atop cascading performs all the complex data data filtering and transformation operations; including distributed joins from heterogeneous data sources and transformations into a time series view that is fed to the machine learning computations that are rolled into mappers and reducers.  Remember, this is data from airlines and the FAA, it is not pretty.  Web data is messy but we have lots of good frameworks, libs and sanitizers for web data.

We wrapped cascading in a thin layer that we use to wrap clojure functions in the cascading function objects and inject those into individual steps in the workflows.  This gets us very close to normal function composition for the client code.  Ultimately, we want to be able to do normal function composition to compose cascading workflows in the same way as we would would do vanilla function composition for small test runs on our local machines.  This is an execution agnostic programming model; client code doesn't bear the signs of distributed execution. 

As a beneficial side effect, we found that this model forces us to have more fine grained abstractions - because each operation must be ultimately be injectable into a map-reduce phase, otherwise your paralleizm will be unnecessarily course grained.  This steers us clear of monolithic uber-expressions.

Another aspect of the design that allows us to do this is that the data transformations write out clojure data structure literals, so we are entirely insulated from the normal hadoop input/output formats...the wrapper layer just uses the normal clojure reader to read in the strings from hadoop and apply the vanilla clojure functions to the data structures.  But we are not limited to only clojure data structure literals.  We also inject other readers that can read other strings to clojure data structures, for example. we use Dan Larkin's wonderful json lib for the initial reads of the raw json data we store.

All the analytical code is custom, so we don't use many 3rd party libs outside of cascading, hadoop, the invaluable jets3t for working with s3.  Oh, and of course, - since we do so much with temporal analysis - joda-time is the only way to work with dates in a sane way on the jvm. :-)

If you travel a lot, check us out: flightcaster.com ... we have iphone and blackberry apps.  Unfortunately this is domestic US air travel only at the moment due to the difficulty of of obtaining data for international carriers and aviation agencies.

bradford cross

unread,
Aug 14, 2009, 3:16:59 PM8/14/09
to clo...@googlegroups.com
whoa...missed the google spellcheckers' warning on: paralleizm ... although that may be the proper lolkidde spelling :-)

Chad Harrington

unread,
Aug 14, 2009, 4:39:05 PM8/14/09
to clo...@googlegroups.com
Bradford,
I just bought the iPhone app.  Looks very cool.

I saw a presentation at the JavaOne after-meeting with Rich Hickey about flightcaster.  Were you the presenter?  The machine learning notation seemed to work very well in Clojure.  Are there any portions of this cool stuff that you can share with the community?

Chad Harrington
chad.ha...@gmail.com


On Fri, Aug 14, 2009 at 12:10 PM, bradford cross <bradford...@gmail.com> wrote:

bradford cross

unread,
Aug 14, 2009, 7:18:24 PM8/14/09
to clo...@googlegroups.com

Hi Chad, yep, that was me.  We do hope to open source some stuff soon. 

First will probably be our wrappers for cascading/hadoop and s3. 

Next might be some core language extensions which might be good in contrib or some other lib. 

If we release any basic stats or machine learning stuff we may try to merge into incanter if it seems like a fit but haven't had time to check out incanter as I'd like.

For now this is all on the back burner since building stuff has to be the priority for us and we're people constrained. :)

On Aug 14, 2009 4:01 PM, "Chad Harrington" <chad.ha...@gmail.com> wrote:

Bradford,
I just bought the iPhone app.  Looks very cool.

I saw a presentation at the JavaOne after-meeting with Rich Hickey about flightcaster.  Were you the presenter?  The machine learning notation seemed to work very well in Clojure.  Are there any portions of this cool stuff that you can share with the community?

Chad Harrington
chad.ha...@gmail.com


On Fri, Aug 14, 2009 at 12:10 PM, bradford cross <bradford...@gmail.com> wrote:

> > We have just released flightcaster.com which uses statistical inference and machine learning to ...


--~--~---------~--~----~------------~-------~--~----~ You received this message because you are sub...

John Harrop

unread,
Aug 15, 2009, 9:08:38 PM8/15/09
to clo...@googlegroups.com
On Fri, Aug 14, 2009 at 7:18 PM, bradford cross <bradford...@gmail.com> wrote:

Hi Chad, yep, that was me.  We do hope to open source some stuff soon. 

First will probably be our wrappers for cascading/hadoop and s3. 

Next might be some core language extensions which might be good in contrib or some other lib. 

If we release any basic stats or machine learning stuff we may try to merge into incanter if it seems like a fit but haven't had time to check out incanter as I'd like.


Very interesting.

Are you using

(binding [*read-eval* false]
  ...)

when reading Clojure data structures out of strings obtained over your distributed node network? If you're not it's possible you have a security hole that could be exploited by a hostile node masquerading as a legitimate one. (Though likely an attacker would have to penetrate your firewall and get loose in your LAN, gaining privileges on at least one of your machines, to exploit it.) 

Specifically, a #=() form in the stream would otherwise allow a sort of injection attack. If you use the Clojure reader on other untrusted data, such as fragments of web pages (to parse numbers, say), the same applies: without that binding for those reads, you may be vulnerable in a similar manner. If data from web forms, vulnerable in a very similar manner to SQL injection.

Security becomes especially important if you figure to do big parallel reductions on office PC spare cycles instead of dedicated hardware. Those PCs might vary in how sensitive the information on them is, and in how trustworthy their users are. You don't want a newly-hired clerk in sales sending crafted network packets that give him privileges on the desktop computer of the CFO or among the R&D department's boxes. The latter lets him sell industrial espionage data to the highest bidder, likely a competitor; the former, possibly do some insider trading or suchlike (and when the SEC shows up to investigate some suspicious trades, they'll be looking at your CFO, as he was the one nominally privy to the inside info). So a breach could cause anything from embarrassment (porn popups during board meeting Powerpoint presentations; intentional pranks) to competitive or legal trouble.

Chas Emerick

unread,
Aug 16, 2009, 10:41:52 AM8/16/09
to clo...@googlegroups.com

On Aug 14, 2009, at 3:10 PM, bradford cross wrote:

We have just released flightcaster.com which uses statistical inference and machine learning to predict flight delays in advance of airlines (initial results appear to do so with 85 - 90 % accuracy.)

The webserver and webapp are all rails running on the Heroku platform; which also serves our blackberry and iphone apps.

The research and data heavy lifting is all in Clojure 

Congratulations to you and your team!  I'm glad to see more clojure getting out into production environments.

Cheers,

- Chas

Rich Hickey

unread,
Aug 19, 2009, 12:04:37 PM8/19/09
to clo...@googlegroups.com

Very cool - congrats!

Rich

Jan Rychter

unread,
Aug 21, 2009, 4:04:32 AM8/21/09
to clo...@googlegroups.com
bradford cross <bradford...@gmail.com> writes:
> Hi Chad, yep, that was me. We do hope to open source some stuff soon.
>
> First will probably be our wrappers for cascading/hadoop and s3.

Those would be of great interest to many of us. Please do.

--J.

Sigrid

unread,
Aug 21, 2009, 2:02:20 PM8/21/09
to Clojure
Hi,

I read the related story on InfoQ and found it an extremely
interesting and motivating read, Clojure being applied in such an
interesting field as machine learning!

There is something in the article I'd like to understand better, so
i'm just asking here on the group:

"The way that Rich elected to de-couple destructuring bind from
pattern matching was brilliant."

Could someone point me to what the difference is? I know pattern
matching e.g. from the PLT scheme implementation, and there the
pattern matching also provides the binding and destructuring I
think...?

Excuse me if it's a stupid question, it just made me curious to
know :-;

Sigrid

Meikel Brandmeyer

unread,
Aug 21, 2009, 3:16:55 PM8/21/09
to clo...@googlegroups.com
Hi,

Am 21.08.2009 um 20:02 schrieb Sigrid:

> Could someone point me to what the difference is? I know pattern
> matching e.g. from the PLT scheme implementation, and there the
> pattern matching also provides the binding and destructuring I
> think...?

The difference is, that in pattern matching you can also specify
values on the left side. For example in OCaml:

type foo = [ Foo of int ];

value frobnicate x =
match x with
[ Foo 5 -> do_something ()
| Foo 7 -> do_something_else ()
| Foo x -> do_more x ];

(Please bear with me if I don't remember all the details of the syntax.)

While this is not possible in Clojure:

(let [[x 5 y] [1 2 3]]
...)

The five on the left hand side is not allowed.

Hope this helps.

Sincerely
Meikel

Michel Salim

unread,
Aug 21, 2009, 3:41:50 PM8/21/09
to clo...@googlegroups.com
On Fri, 2009-08-21 at 11:02 -0700, Sigrid wrote:
> Hi,
>
> I read the related story on InfoQ and found it an extremely
> interesting and motivating read, Clojure being applied in such an
> interesting field as machine learning!
>
> There is something in the article I'd like to understand better, so
> i'm just asking here on the group:
>
> "The way that Rich elected to de-couple destructuring bind from
> pattern matching was brilliant."
>
> Could someone point me to what the difference is? I know pattern
> matching e.g. from the PLT scheme implementation, and there the
> pattern matching also provides the binding and destructuring I
> think...?
>
Clojure allows destructuring of vectors, which happens to be what its
functions' argument lists are, so you get most of the benefits of
pattern matching. It's not full-blown, though, so (correct me if I'm
wrong) the equivalent of this is not possible:

length [] = 0
length (_:xs) = 1 + (length xs)

Regards,

--
Michel

Kevin Downey

unread,
Aug 21, 2009, 3:50:15 PM8/21/09
to clo...@googlegroups.com
user=> (defmulti length empty?)
#'user/length

user=> (defmethod length true [x] 0)
#<MultiFn clojure.lang.MultiFn@1807ca8>

user=> (defmethod length false [x] (+ 1 (length (rest x))))
#<MultiFn clojure.lang.MultiFn@1807ca8>

user=> (length [1 2 3 4])
4
--
And what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these things?

Michel Salim

unread,
Aug 21, 2009, 5:55:31 PM8/21/09
to clo...@googlegroups.com
On Fri, 2009-08-21 at 12:50 -0700, Kevin Downey wrote:
> user=> (defmulti length empty?)
> #'user/length
>
> user=> (defmethod length true [x] 0)
> #<MultiFn clojure.lang.MultiFn@1807ca8>
>
> user=> (defmethod length false [x] (+ 1 (length (rest x))))
> #<MultiFn clojure.lang.MultiFn@1807ca8>
>
> user=> (length [1 2 3 4])
> 4
>
Très cool! This could be applied to Meikel's post as well -- you *can*
write your own predicate function that in effect test for values. It
will just be -- ugly.

Is there a performance hit with this style (due to using multimethods)
or will this be optimized away in practice?

--
Michel

signature.asc

Stuart Sierra

unread,
Aug 21, 2009, 8:28:43 PM8/21/09
to Clojure
On Aug 21, 5:55 pm, Michel Salim <michael.silva...@gmail.com> wrote:
> Is there a performance hit with this style (due to using multimethods)
> or will this be optimized away in practice?

There is a slight performance penalty over a normal function call. I
think the dispatching takes one function call, a hash lookup, and an
equality test.

-SS

Richard Newman

unread,
Aug 22, 2009, 4:24:50 AM8/22/09
to clo...@googlegroups.com
> There is a slight performance penalty over a normal function call. I
> think the dispatching takes one function call, a hash lookup, and an
> equality test.

Strictly speaking, an isa? test. That's where the ad hoc hierarchy
functionality ties in.

Sigrid

unread,
Aug 22, 2009, 5:08:54 AM8/22/09
to Clojure
Hi Meikel, hi all,

thanks for the explanation, I think I got it now. I suppose something
in the sentence I quoted led me to think that pattern matching was
"less" in a way than destructuring, whereas in fact it seems to be the
opposite - pattern matching seems to presuppose destructuring if I'm
correct now.

Still then (regarding "The way that Rich elected to de-couple
destructuring bind from pattern matching was brilliant.") , it is
unclear to me why it was such a good idea not to include pattern
matching, or, to somehow keep them separate...

Ciao,
Sigrid
>  smime.p7s
> 2KAnzeigenHerunterladen

James Sofra

unread,
Aug 22, 2009, 1:26:20 AM8/22/09
to Clojure
This seems like a pretty nice pattern matching implementation for
Clojure.
http://www.brool.com/index.php/pattern-matching-in-clojure

Cheers,
James

Michel Salim

unread,
Aug 22, 2009, 12:58:21 PM8/22/09
to clo...@googlegroups.com
On Fri, 2009-08-21 at 22:26 -0700, James Sofra wrote:
> This seems like a pretty nice pattern matching implementation for
> Clojure.
> http://www.brool.com/index.php/pattern-matching-in-clojure
>
Beautiful!

Cheers,

--
Michel

bradford cross

unread,
Aug 23, 2009, 2:00:20 AM8/23/09
to clo...@googlegroups.com
On Sat, Aug 22, 2009 at 2:08 AM, Sigrid <key...@gmx.de> wrote:

Hi Meikel, hi all,

thanks for the explanation, I think I got it now. I suppose something
in the sentence I quoted led me to think that pattern matching was
"less" in a way than destructuring, whereas in fact it seems to be the
opposite - pattern matching seems to presuppose destructuring if I'm
correct now.

Correct, pattern matching is built using destructuring bind.
 


Still then (regarding "The way that Rich elected to de-couple
destructuring bind from pattern matching was brilliant.") , it is
unclear to me why it was such a good idea not to include pattern
matching, or, to somehow keep them separate...

Destructuring is useful all over the place, not just for pattern matching.  For example, it is really useful in function parameter vectors.
 

Michel Salim

unread,
Aug 23, 2009, 2:24:18 AM8/23/09
to clo...@googlegroups.com
On Sat, 2009-08-22 at 23:00 -0700, bradford cross wrote:

>
> Destructuring is useful all over the place, not just for pattern
> matching. For example, it is really useful in function parameter
> vectors.

I consider that to be an example of pattern matching, though.

--
Michel


bradford cross

unread,
Aug 23, 2009, 2:58:58 AM8/23/09
to clo...@googlegroups.com

As far as I understand it.  Pattern matching is built from destructuring bind, but destructuring bind is not pattern matching.  Pattern matching follows a match-when-return, or match-with-return logical flow written as:  [match_val] -> return_val

using Maikel's esample:


type foo = [ Foo of int ];

value frobnicate x =
       match x with
       [ Foo 5 -> do_something ()
       | Foo 7 -> do_something_else ()
       | Foo x -> do_more x ];
 
Pattern matching is a composition of destructuring bind, predicates, and guard clauses.  Destructuring bind can be used elsewhere, without predicates or guards,  in which case I don't call it pattern matching.  Although maybe I am wrong on some technical terminology, this is how I think of it.




--
Michel





Michel Salim

unread,
Aug 23, 2009, 4:42:56 PM8/23/09
to clo...@googlegroups.com
On Sat, 2009-08-22 at 23:58 -0700, bradford cross wrote:
>
>
> On Sat, Aug 22, 2009 at 11:24 PM, Michel Salim
> <michael....@gmail.com> wrote:
>
> On Sat, 2009-08-22 at 23:00 -0700, bradford cross wrote:
>
> >
> > Destructuring is useful all over the place, not just for
> pattern
> > matching. For example, it is really useful in function
> parameter
> > vectors.
>
>
> I consider that to be an example of pattern matching, though.
>
> As far as I understand it. Pattern matching is built from
> destructuring bind, but destructuring bind is not pattern matching.
> Pattern matching follows a match-when-return, or match-with-return
> logical flow written as: [match_val] -> return_val
>
Fair enough; I guess it's just that, prior to Clojure, you tend to get
one with the other (thus pattern-matching functional language let you
both bind or match, both within a function and (for Haskell, at least)
in the function declaration as well.

Regards,

--
Michel


Reply all
Reply to author
Forward
0 new messages