Cheers,
Sam
On Dec 7, 11:25 am, Sam Ritchie <sritchi...@gmail.com> wrote:
> Hey all,
>
> I've pushed Cascalog 1.8.5-SNAPSHOT to Clojars <http://clojars.org/cascalog>.
> Cascalog 1.8.5 adds support for Kryo <http://code.google.com/p/kryo/>
> serialization;
> practically, this allows you to use clojure primitives and collections (in
> addition to some common java primitives and collections) as fields in your
> tuples. See the ChangeLog (located
> here<https://github.com/nathanmarz/cascalog/blob/develop/CHANGELOG.md>)
However, Kryo serialization speed varies and can be slower for many
kinds of Clojure data (1-2x range).
As with anything, if you really want to know, run some tests on *your*
data. :)
François
PS : anyone using cascalog and reading this group out there in Paris,
France?
On Dec 7, 11:08 pm, Sam Ritchie <sritchi...@gmail.com> wrote:
> Here are a few examples of how this opens up the door to use a far greater
> number of clojure.core functions in your cascalog queries:https://gist.github.com/1444898
>
>
>
>
>
>
>
>
>
> On Wed, Dec 7, 2011 at 11:25 AM, Sam Ritchie <sritchi...@gmail.com> wrote:
> > Hey all,
>
> > I've pushed Cascalog 1.8.5-SNAPSHOT to Clojars<http://clojars.org/cascalog>.
> > Cascalog 1.8.5 adds support for Kryo <http://code.google.com/p/kryo/> serialization;
> > practically, this allows you to use clojure primitives and collections (in
> > addition to some common java primitives and collections) as fields in your
> > tuples. See the ChangeLog (located here<https://github.com/nathanmarz/cascalog/blob/develop/CHANGELOG.md>)
> > (Too brief? Here's why!http://emailcharter.org)
Is Cascalog meant now to require Clojure 1.3? Carbonite imports BigInt which causes problems in 1.2. Here is the patch I used so I could test Kryo serialization: https://gist.github.com/931f7ffaccdffbf97d8d
Getting an EOFException, here's an example: https://gist.github.com/614e469ce5161fdf40c0 . Looks just like the error I got when my Comparison implementation was broken, but I note cascading.kryo doesn't have one... must one be careful not to have Cascalog sort by a Kryo-serialized field?
Another issue: after upgrading to the snapshot, code that worked
before is now breaking with a somewhat similar error:
https://gist.github.com/9fecf1d84ebf5143c3af . This despite the fact
that the query is wrapped in (with-job-conf {"io.serializations"
"org.apache.hadoop.io.serializer.WritableSerialization,my,custom,serializations"}).
Should this disable Kryo serialization entirely?
I get the same error here when sinking to an hfs-seqfile as with
stdout, and there are no memory source taps involved.
I haven't been able to reproduce yet with an analogous simple query on
primitive types, which makes me think there is something wrong with my
custom serialization, but the code works fine in 1.8.4. Any ideas?
I'll keep trying to extract a simple reproduction or maybe try to
bisect the Cascalog history...
I thought promotion from Long to BigInt was automatic, but it doesn't
seem to be the case, do you guys think it's normal?
(* 9000 9000 9004 9001 1405 )
=> 9223326680220000000
(class (* 9000 9000 9004 9001 1405 ))
=> java.lang.Long
(* 9000 9000 9004 9002 1405 )
=> ArithmeticException integer overflow
clojure.lang.Numbers.throwIntOverflow (Numbers.java:1374)
Also, I am getting an error when I mix datatypes on one field, which
raises the Comparator issue previously mentioned.
git://gist.github.com/1459791.git
Cheers,
François
On 11 déc, 04:47, tomoj <thomasj...@gmail.com> wrote:
> Yeah, I patched your fork. Here is the problem with Clojure 1.2.1:https://gist.github.com/7505aa3e77cd7f28d782
>
> Another issue: after upgrading to the snapshot, code that worked
> before is now breaking with a somewhat similar error:https://gist.github.com/9fecf1d84ebf5143c3af. This despite the fact
> that the query is wrapped in (with-job-conf {"io.serializations"
> "org.apache.hadoop.io.serializer.WritableSerialization,my,custom,serializat ions"}).
in 1.2 :
REPL started; server listening on localhost port 12871
user=> (class (* 9000 9000 9000 9000 9000 9000 9000 9000))
java.math.BigInteger
user=> (* 9000 9000 9000 9000 9000 9000 9000 9000)
43046721000000000000000000000000
user=> 43046721000000000000000000000000
43046721000000000000000000000000
user=> (class 43046721000000000000000000000000)
java.math.BigInteger
in 1.3 :
REPL started; server listening on localhost port 7690
user=> (class (* 9000 9000 9000 9000 9000 9000 9000 9000))
ArithmeticException integer overflow
clojure.lang.Numbers.throwIntOverflow (Numbers.java:1374)
user=> (* 9000 9000 9000 9000 9000 9000 9000 9000)
ArithmeticException integer overflow
clojure.lang.Numbers.throwIntOverflow (Numbers.java:1374)
user=> 43046721000000000000000000000000
43046721000000000000000000000000N
user=> (class 43046721000000000000000000000000)
clojure.lang.BigInt
François
François
here is the test case:
(def test-2-data [["a" 1]
["b" 2]
["c" 3]])
(?<- (stdout) [?map] (test-2-data ?alpha ?num)
(zipmap
[:string :numeral]
[?alpha ?num] :> ?map))
stacktrace:
cascading.CascadingException: unable to compare Tuples, likely a
CoGroup is being attempted on fields of different types or custom
comparators are incorrectly set on Fields
at
cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:
81)
at
cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:
33)
at
cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:
74)
...
Caused by: java.lang.ClassCastException:
clojure.lang.PersistentArrayMap cannot be cast to java.lang.Comparable
at clojure.lang.Util.compare(Util.java:104)
at cascalog.hadoop.ClojureKryoSerialization
$1.compare(ClojureKryoSerialization.java:36)
at
cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:
77)
On Dec 11, 7:21 am, François Le Lay <mfw...@gmail.com> wrote:
> Sorry for the noise, I figured things out. This is expected behavior
> in 1.3, based on BigInt contagion.http://dev.clojure.org/display/doc/Documentation+for+1.3+Numerics