Nov 28, 2010, 10:27:20 AM11/28/10
I've had a pretty easy time getting szl to parse individual log files,
but I'm having a harder time figuring out what the right way is to
aggregate the output of multiple szl runs. So far, my best (and
working) guess looks like:
szl parser.szl log_file1 > log_file1.szl
szl parser.szl log_file2 > log_file2.szl
szl parser.szl log_file3 > log_file3.szl
cat parser-defs.szl log_file1.szl log_file2.szl log_file3.szl | szl /
dev/stdin -output_tables "*"
Where parser.szl is the logic for parsing the log files, parser-
defs.szl declares the output tables generated by parser.szl (cf. szl -
That seems kinda kludgy though, at least in the sense that the
log_file*.szl files are not much smaller than the actual log files,
and in the sense that -output_tables "*" only generates plain text and
not protocol buffers (or some format() based output) as well. The
aggregation step also takes a particularly long time in the case that
the log_file*.szl files are large.
Might anyone be able to point me in the right direction here? I've
looked at the header files in src/public/, and the mapreduce demo C++
code in src/app/, and I imagine I could cobble a solution out in C++
if I really needed to. Still, it seemed like there should be already
be a clean way to do this already, which I just don't know about yet.