suggestions for aggregating the output of multiple szl runs

hblanks

unread,

Nov 28, 2010, 10:27:20 AM11/28/10

to szl-users

Good morning,

I've had a pretty easy time getting szl to parse individual log files,
but I'm having a harder time figuring out what the right way is to
aggregate the output of multiple szl runs. So far, my best (and
working) guess looks like:

szl parser.szl log_file1 > log_file1.szl
szl parser.szl log_file2 > log_file2.szl
szl parser.szl log_file3 > log_file3.szl
cat parser-defs.szl log_file1.szl log_file2.szl log_file3.szl | szl /
dev/stdin -output_tables "*"

Where parser.szl is the logic for parsing the log files, parser-
defs.szl declares the output tables generated by parser.szl (cf. szl -
print_tables parser.szl).

That seems kinda kludgy though, at least in the sense that the
log_file*.szl files are not much smaller than the actual log files,
and in the sense that -output_tables "*" only generates plain text and
not protocol buffers (or some format() based output) as well. The
aggregation step also takes a particularly long time in the case that
the log_file*.szl files are large.

Might anyone be able to point me in the right direction here? I've
looked at the header files in src/public/, and the mapreduce demo C++
code in src/app/, and I imagine I could cobble a solution out in C++
if I really needed to. Still, it seemed like there should be already
be a clean way to do this already, which I just don't know about yet.

Thank you,

Hunter Blanks

Manisha Jain

unread,

Nov 29, 2010, 12:30:37 PM11/29/10

to szl-...@googlegroups.com

Is there a reason you are using "szl" instead of "sawmill saw"; using sawmill, you can use either saw --resaw or simply millmerge to get merged aggregates.

Examples:

http://www.corp.google.com/eng/howto/sawmill/sawmill-manual.html#reprocessing

http://www.corp.google.com/eng/howto/sawmill/sawmill-zeitgeist-example.html

-Manisha

hblanks

unread,

Nov 29, 2010, 12:55:36 PM11/29/10

to szl-users

Manisha, et al,

Thank you very much for writing. That sounds like precisely the tool
I'd hope to use, and I very much appreciate hearing about it.
It does not appear to have been open sourced, though, so I fear
there's no way for me (or other extra-Google szl users) to make use of
it.

Are there any plans to release such a tool? If so, might anyone be
willing to estimate a vague, possible timeline? (Q1 2011, Q2, etc.)

-HJB

hblanks

unread,

Dec 8, 2010, 10:21:51 AM12/8/10

to szl-users

All,

I'm sorry to be a bit irksome, but to follow up on this thread: am I
right to assume that no tool will be released to assist in the
aggregation of individual sawzall runs?

-HJB

Manisha Jain

unread,

Dec 8, 2010, 4:19:44 PM12/8/10

to szl-...@googlegroups.com

With your initial email, I didn't realize that it was for opensource Sawzall. To answer your question, no additional tools are planned currently, looking to Hadoop was suggested in the past. It would be great to see tools for aggregation coming out of open source community.

-Manisha

Reply all

Reply to author

Forward