Maybe you can do:
(reduce (fn [[sales upgrades demo :as v] data]
(cond
(is-sales? data) [(conj sales data) upgrades demo]
(is-upgrade? data) [sales (conj upgrades data) demo]
(is-demo? data) [sales upgrades (conj demo data)]
:else v))
[[] [] []] (get-idata))
This will hold the resulting collections completely in memory, but not
the idata part. This is maybe not what you want, because it's too big.
Or it is what you want, because you want to close the file at some
point in time. I think defining anything lazy will hold onto idata.
(or at least part of the end results)
Sincerely
Meikel
Another variation:
(use 'clojure.contrib.seq-utils)
(defn record-type [data]
(cond
(is-sale? data) :sales
(is-upgrade? data) :upgrades
(is-demo? data) :demos))
(group-by record-type (filter record-type (get-idata)))
If the three output lists themselves are too large, I'd just explicitly
sum your units with reduce:
(reduce
(fn [counts data]
(let [type (record-type data)]
(assoc counts type (+ (units data)
(get counts type 0)))))
{} (get-idata))
=> {:sales 1233, :upgrades 17, :demos 42, nil 30}
> If the three output lists themselves are too large, I'd just explicitly
> sum your units with reduce:
>
> (reduce
> (fn [counts data]
> (let [type (record-type data)]
> (assoc counts type (+ (units data)
> (get counts type 0)))))
> {} (get-idata))
>
> => {:sales 1233, :upgrades 17, :demos 42, nil 30}
Actually come to think of it, this sort of thing is common enough that
you could pull out a 'reduce-by' function like this:
(defn reduce-by [grouper f val coll]
(reduce
(fn [m x]
(let [group (grouper x)]
(assoc m group (f (get m group val) x))))
(sorted-map) coll))
Then group-by could be easily defined in terms of it:
(defn group-by [f coll]
(reduce-by f conj [] coll))
And it makes your unit summing example:
(reduce-by record-type
(fn [count data] (+ count (units data)))
0 (get-idata))
Hmmm. the first way that occurred to me is just make your tagging
function return a set:
(defn record-types [x]
(disj
#{(when (is-sale? x) :sales)
(when (is-upgrade? x) :upgrades)
(when (is-demo? x) :demos)}
nil))
So you'd then get groups like:
{#{:sales :upgrades} [...]
#{:sales} [...]
#{:upgrades} [...]}
But unfortunately it seems that because the group-by in
clojure.contrib.seq-utils uses a sorted-map it can't have keys that are
sets (as there's no ordering defined for sets). You could either use
vectors instead of sets, or better yet lets just define reduce-by and
group-by to use a hash-map instead:
(defn reduce-by [grouper f val coll]
(reduce (fn [m x]
(let [group (grouper x)]
(assoc m group (f (get m group val) x))))
{} coll))
(defn group-by [grouper coll]
(reduce-by grouper conj [] coll))
This has the advantage that you can do things like intersections
("number of sales sales that were also upgrades"). If you don't care
about intersections and just want the three :sales, :upgrades, and
:demos lists, then I guess we start with 'for' to generate [tag record]
pairs:
(for [record (get-idata)
type (record-types record)]
[type record])
So if "data1" is both sales and upgrades then you'd get:
([:sales data1] [:upgrades data1] [:sales data2] [:demos data3] ...)
We can then reduce-by grouped on the first value in the pair (the tag)
and then transform the resulting maps to select just the second item in
the pair (the record):
(reduce-by
first #(conj %1 (second %2)) []
(for [record (get-idata)
type (record-types record)]
[type record]))
=> {:sales [data1 data2 ... ],
:upgrades [data1 data3 ...],
:demos [data4 ...]}
Am 19.10.2009 um 01:34 schrieb Dmitry Kakurin:
> This is in line with what I was thinking for my own custom filter
> function.
> Now how would you modify it if the same record can be both "sales" and
> "upgrade" at the same time?
> I.e. if filters are not mutually exclusive.
I'd define a helper. (Code untested)
(defmacro ->if
"Acts like ->, but pipes the expression only through the next step
if the predicate clause returns true. Otherwise it skips the step
going on with the next steps, if any."
([x] x)
([x pred form & forms]
(let [x-gs (gensym "x")]
`(let [~x-gs ~x]
(if (-> ~x-gs ~pred)
(-> ~x-gs form (->if ~@forms))
(->if ~x-gs ~@forms))))))
(letfn [(add-sale
[[sales upgrades demos] x]
[(conj sales x) upgrades demos])
(add-upgrade
[[sales upgrades demos] x]
[sales (conj upgrades x) demos])
(add-demo
[[sales upgrades demos] x]
[sales upgrades (conj demos x)])]
(reduce (fn [sales-upgrades-demos data]
(->if sales-ugprades-demos
(do (is-sales? data)) (add-sale data)
(do (is-upgrade? data)) (add-upgrade data)
(do (is-demo? data)) (add-demo data)))
[[] [] []] (get-idata)))
This is maybe trying to be a little too clever, but well... I found
something like ->if useful at times. Note that, the predicate there
normally depends on the thing piped through the chain with ->. However
in our case it is not. That's what's the do is for: to ignore the
argument of the ->.
Looking at this, I would think, that the other approaches using the
group are much more elegant.
Hmm.. Another approach might be juxt:
(defn conj-if
"Conjoins x to coll if x fulfills pred."
[pred coll x]
(if (pred x)
(conj coll x)
coll))
(reduce (juxt #(conj-if is-sales? (nth %1 0) %2)
#(conj-if is-upgrade? (nth %1 1) %2)
#(conj-if is-demo? (nth %1 2) %2))
[[] [] []] (get-idata))
This looks much nicer, although there is still the ugly nth
"destructuring" for vector. And juxt is maybe only semi-official API
at the moment... If you don't want to depend on contrib, this might be
an alternative, though...
Sincerely
Meikel
Hehe. I think I got a bit carried away generalising mine, but I found
it interesting. :-) I think your way or Meikal's juxt (which is really
neat, I didn't know about juxt) is much better for this specific
problem. I guess mine is more suited for when the 'groups' are derived
from the data (like the 'SQLalike' examples Timothy came up with) rather
than predefined by the code.
> But there is still ex-C++ programmer sitting inside my head, screaming
> "Aaaaah, so many temporaries and lookups, this is so inefficient" :-).
> I'll try to make "him" shut up by comparing perf with my multi-filter
> approach.
>
On my PC your multi-filter is 3-4 times faster than my last reduce-by
version, which I guess is not too bad considering my version has all
those lookups and temporary objects (I tested by filtering a large range
of integers into groups divisible by 2, 3 and 6). The JVM seems to be
actually pretty fast at creating objects and Clojure does lots of little
tricks to speed up lookups, like caching hash codes and using vectors
instead of hash tables for small maps.
To quiet my inner premature optimisation fanatic I usually rationalise
with "the disk is going to be the bottleneck anyway" and "lets try the
first way that occurs to me and only if it turns out to be too slow in
practice only then tweak it".
P.S. Why does it take almost 2 DAYS for my messages to appear in thisgroup? I've sent two last messages on Sunday afternoon.
Most people writing in a lisp use an editor that does paren
highlighting. So for example in Emacs, when I have something like this:
(foo (bar (baz (quox '(a b c)))))
It's hard to show this in an email but if I move the cursors (_) to the
right of one of closing parens, then emacs will highlight the one it
matches with, by changing the background colour:
(foo (bar *(*baz (quox '(a b c)))_))
Some people also use something called "rainbow parens" which is where
each pair of matching parens are colour coded. Others use s-expression
highlighting, which colour codes the whole expression. You can see
various versions of this here:
http://lemonodor.com/archives/001207.html
It's worth trying out an editor with Clojure support (Emacs with SLIME,
Vim with VimClojure, Enclojure, the IDEA plugin etc), they have useful
shortcuts like hotkeys for moving between s-expressions and you can
quickly evaluate things from in your editor without having to switch to
the REPL. For example if I write (apply + [1 2 3 4]) in a file in Emacs
and hit Ctrl-X Ctrl-E then "10" appears in the status line.
Am 21.10.2009 um 01:40 schrieb Dmitry Kakurin:
> Where do I use a sequence? reduce? or this line?
No. reduce itself is eager.
>>> (repeat (count filters) [])
> Should I wrap it in (vec ...) or is there a better way?
repeat, map, filter, cycle, take, iterate, etc. are all sequence
functions. They will turn their argument into sequence (via seq) and
return again a sequence. As I said: I don't know. It's probably not
much of a difference. If it works for you, use it! Worry about speed
when it's a problem.
> I've read it many times, and I've tried to do it.
> But how do you solve this practical problem:
> When you need to insert something in the middle, how do you find the
> right closing paren
> where to split it? I use TextMate if it matters.
As others said: try to get some support into your editor. I use Vim
with VimClojure. It colors the paren ("rainbow parens") so it's
relatively easy to match the paren visually. Even without that aid Vim
would highlight the correct paren when putting the cursor on its
partner paren. % let's me move between matching parens etc... With
some training the "paren problem" (if you want to call it like that)
will mostly vanish.
> P.S. Why does it take almost 2 DAYS for my messages to appear in this
> group? I've sent two last messages on Sunday afternoon.
Because Google Groups is one of the worst interfaces ever? I don't
know what it's problem is, but mailing lists are not a terribly new
idea. After 20 years of working mailing lists Google Groups is really
disappointing. I noticed this delay also from the web interface. Maybe
you should try to send emails directly from you mailing client.
Sincerely
Meikel
>>> (repeat (count filters) [])repeat, map, filter, cycle, take, iterate, etc. are all sequence
> Should I wrap it in (vec ...) or is there a better way?
functions. They will turn their argument into sequence (via seq) and
return again a sequence.
Am 22.10.2009 um 03:48 schrieb Dmitry Kakurin:
> Has someone developed a better support for "paren problem" for
> TextMate editor?
> Right now all it does is flashing matching open paren when I type the
> closing one. And that's it (apart from cmd-shift-B which is too much
> work).
>
> Also TextMate Clojure bundle seems to be replicated in several places,
> which one is considered the latest and greatest?
I'm sorry. I don't use TextMate, so I can't tell. Maybe someone on the
list might help here.
Sincerely
Meikel