[ANN] Cascalog 1.8.5 Released!

111 views
Skip to first unread message

Sam Ritchie

unread,
Jan 4, 2012, 1:19:13 PM1/4/12
to cascal...@googlegroups.com
Hey all,

I've just released 1.8.5 to Clojars! This is a big release for Cascalog, with a number of welcome improvements.

* API Documentation

API docs can now be found at http://nathanmarz.github.com/cascalog/. I'm using James Reeves's Codox library to produce this, and I hope you'll find this as useful as I have. (I'm cheating a bit by generating docs for 1.9.0-wip, but the API is exactly the same.)

* Kryo Serialization

thanks to Kryo and Alex Miller's Carbonite, Cascalog can now use Clojure data structures as first class objects within tuples. (I have a full list of all supported datastructures on the Changelog.) Kryo does a pretty great job of serializing objects it's never seen before, so please don't feel limited by this listing. In the next few weeks I'll be putting together a guide on how to extend Cascalog with your own custom Kryo serializations.

* Defmain Improvements

Functions defined with defmain now act identically to those defined by defn, with the added benefit that they generate a named class that you can call with Hadoop. For example.

(ns foo.queries
    (:use cascalog.api))

(defmain run-query [input-path output-path]
    "docstring!"
    (?- (hfs-seqfile output-path)
         (my-query input-path)))

Can be called from Hadoop like so:

hadoop jar <jarname> foo.queries.run_query "/input/path" "/output/path"

And from inside of clojure as expected:

(run-query input-path output-path)

To get this behavior, add your namespace to the :aot key in project.clj.

* Proper Set Behavior in Tuples

Previously, a set was treated as a tuple, causing this predicate to fail:

(hash-set ?a ?b ?c :> ?set) 
;; fail, as Cascalog expected three output vars!

Now sets are treated as 1-tuples, allowing constructions like

(defparallelagg mk-set-parallel
    :init-var #'hash-set
    :combine-var #'clojure.set/union)

* Miscellaneous

* In cascalog.playground, bootstrap and bootstrap-emacs are now functions.
* def*ops and defmain include :no-doc true and :skip-wiki true keys in the metadata of hidden generated functions, allowing Cascalog to be used with Codox or Autodoc.
* memory-source-tap now uses the settings from job-conf.clj (bugfix)

Please let me know if you have any questions about all of this! Thanks to everyone who pointed out bugs in the SNAPSHOT version of this release, you guys rock.

Cheers,
--
Sam Ritchie, Twitter Inc
@sritchie09

(Too brief? Here's why! http://emailcharter.org)

Sam Ritchie

unread,
Jan 4, 2012, 1:24:59 PM1/4/12
to cascalog-user
I forgot to add the obligatory how-to note.

To use 1.8.5, add the following to project.clj:

[cascalog "1.8.5"]

Enjoy!
Sam

On Jan 4, 10:19 am, Sam Ritchie <sritchi...@gmail.com> wrote:
> Hey all,
>
> I've just released 1.8.5 to Clojars! This is a big release for Cascalog,
> with a number of welcome improvements.
>
> * *API Documentation*
> *
> *
> API docs can now be found athttp://nathanmarz.github.com/cascalog/. I'm
> using James Reeves's Codox <https://github.com/weavejester/codox> library
> to produce this, and I hope you'll find this as useful as I have. (I'm
> cheating a bit by generating docs for 1.9.0-wip, but the API is exactly the
> same.)
> *
> *
> ** Kryo Serialization*
> *
> *
> thanks to Kryo and Alex Miller's Carbonite, Cascalog can now use Clojure
> data structures as first class objects within tuples. (I have a full list
> of all supported datastructures on the
> Changelog<https://github.com/nathanmarz/cascalog/blob/master/CHANGELOG.md>.)
> Kryo does a pretty great job of serializing objects it's never seen before,
> so please don't feel limited by this listing. In the next few weeks I'll be
> putting together a guide on how to extend Cascalog with your own custom
> Kryo serializations.
>
> ** Defmain Improvements*
>
> Functions defined with defmain now act identically to those defined by
> defn, with the added benefit that they generate a named class that you can
> call with Hadoop. For example.
>
> (ns foo.queries
>     (:use cascalog.api))
>
> (defmain run-query [input-path output-path]
>     "docstring!"
>     (?- (hfs-seqfile output-path)
>          (my-query input-path)))
>
> Can be called from Hadoop like so:
>
> hadoop jar <jarname> foo.queries.run_query "/input/path" "/output/path"
>
> And from inside of clojure as expected:
>
> (run-query input-path output-path)
>
> To get this behavior, add your namespace to the :aot
> key<https://github.com/technomancy/leiningen/blob/master/sample.project.c...>
> in
> project.clj.
>
> * *Proper Set Behavior in Tuples*
> *
> *
> Previously, a set was treated as a tuple, causing this predicate to fail:
>
> (hash-set ?a ?b ?c :> ?set)
> ;; fail, as Cascalog expected three output vars!
>
> Now sets are treated as 1-tuples, allowing constructions like
>
> (defparallelagg mk-set-parallel
>     :init-var #'hash-set
>     :combine-var #'clojure.set/union)
>
> ** Miscellaneous*

Andrew Xue

unread,
Jan 4, 2012, 11:43:12 PM1/4/12
to cascalog-user
sweet!

Andrew Xue

unread,
Jan 6, 2012, 4:23:20 PM1/6/12
to cascalog-user
i noticed in tests that queries now default numbers to Long instead of
Integer

like

def test-data [
[13694803 "A"]
[13694806 "B"]
[13694807 "C"]
[13694809 "D"]
[13694817 "E"]
[13694818 "F"]
])

(def test-query (<- [!user_id !val] (test-data !user_id !val)
(:distinct false)))

!user_id emits Long instead of Integer

which is probably more correct so thats good

Sam Ritchie

unread,
Jan 6, 2012, 4:37:32 PM1/6/12
to cascal...@googlegroups.com
That's a clojure 1.3 thing, actually. On Cascalog 1.9.0-wip, Longs and Ints will compare properly with no issues, so you'll never have issues with numeric type mismatch.

Cheers,
Sam
--
Sam Ritchie, Twitter Inc
Reply all
Reply to author
Forward
0 new messages