Unexpected third parameter to an aggregator.

20 views
Skip to first unread message

Matt Stump

unread,
Feb 3, 2012, 6:59:36 PM2/3/12
to cascal...@googlegroups.com
Howdy,

I'm learning cascalog and I've run into some behavior I don't understand, and I was hoping that you guys could shed some light. I'm processing a file which contains a bunch of hashes serialized as s-expressions, one hash per line. I'm attempting to just run a big reduce over the set, to group the records under some common attributes in a giant hash. My aggregator is getting an unexpected third parameter which is always the integer 1, and I don't understand why. I've created a minimal gist, with some sample data at the top. Any help on this topic would be greatly appreciated.

gist.clj

Sam Ritchie

unread,
Feb 3, 2012, 8:27:54 PM2/3/12
to cascal...@googlegroups.com
That's strange, I'm not sure why, though I think it might have something to do with the way you're writing your query. There are a few issues here:

1) a defmapop (like s-expression-parse) should be written just like a defn form, unless it's taking parameters. Since you don't need the ^String type hint, you can actually just use read-string in your query:

(defn textline-parsed [dir]
  "parse input file, it's one hash serialized as an s-expression per line"
  (let [source (hfs-textline dir)]
    (<- [?data]
        (source ?line)
        (read-string ?line :> ?data)
        (:distinct false))))

With the aggregator, I'm not really sure what you're trying to do. Aggregators need to take input variables and produce output variables, like this:

(agg ?input :> ?output)

Your aggregator is effectively taking no input variables and producing 1 output variable, which is going to cause unexpected behavior.

If you're trying to pass in the ?a dynamic variable for val, you'll need to write your query like this:

(defn query [output-tap input-path]
  (let [artifacts (textline-parsed input-path)]
    (?<- output-tap
         [?group]
         (artifacts ?a)
         (group-by-group-id ?a :> ?group))))

Let me know if that helps at all, and we can keep on hacking w/ an updated gist.

Cheers,
Sam

-- 
Sam Ritchie, Twitter Inc
703.662.1337
@sritchie09

(Too brief? Here's why! http://emailcharter.org)

Matt Stump

unread,
Feb 3, 2012, 11:38:31 PM2/3/12
to cascal...@googlegroups.com
Sam,

Your suggestions fixed the issue. Thank you very much.  That was my first cascalog script, which was mostly an agglomeration of examples I found. I only had the roughest idea of what I was doing. Your explanation helped clarify a couple things for me.  Thanks again.

--Matt

Sam Ritchie

unread,
Feb 4, 2012, 12:20:43 AM2/4/12
to cascal...@googlegroups.com
Matt, not a problem! Feel free to keep asking questions as issues come up. I'm also quite interested in feedback on what you're finding most helpful (or confusing). I fear that the learning curve is still steeper than it needs to be.

Cheers,
Sam

Matt Stump

unread,
Feb 4, 2012, 11:34:30 PM2/4/12
to cascal...@googlegroups.com
I was thinking about what feedback if any I could give about documentation and the learning process and I think the most important thing that could be contributed is API docs. Right now most of the functions have no documentation, and I think documentation of that sort is best supplied by the core maintainers of the project. I can document what I discover, but I'll miss things, and I won't know the intent of the designer. Think of API docs as a multiplier, by having them you enable more people to explore and write blog posts and example code which further educates people.

Sam Ritchie

unread,
Feb 6, 2012, 12:09:10 PM2/6/12
to cascal...@googlegroups.com
Matt, this is definitely good feedback. I actually do have API documentation up here: 


but it's far from complete. (Notable missing items are the def*op macros.)

I'll put some work into this this week and give the README an update.


On Sat, Feb 4, 2012 at 8:34 PM, Matt Stump <mst...@sourceninja.com> wrote:
I was thinking about what feedback if any I could give about documentation and the learning process and I think the most important thing that could be contributed is API docs. Right now most of the functions have no documentation, and I think documentation of that sort is best supplied by the core maintainers of the project. I can document what I discover, but I'll miss things, and I won't know the intent of the designer. Think of API docs as a multiplier, by having them you enable more people to explore and write blog posts and example code which further educates people.



Reply all
Reply to author
Forward
0 new messages