hblanks
unread,Nov 28, 2010, 10:27:20 AM11/28/10Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to szl-users
Good morning,
I've had a pretty easy time getting szl to parse individual log files,
but I'm having a harder time figuring out what the right way is to
aggregate the output of multiple szl runs. So far, my best (and
working) guess looks like:
szl parser.szl log_file1 > log_file1.szl
szl parser.szl log_file2 > log_file2.szl
szl parser.szl log_file3 > log_file3.szl
cat parser-defs.szl log_file1.szl log_file2.szl log_file3.szl | szl /
dev/stdin -output_tables "*"
Where parser.szl is the logic for parsing the log files, parser-
defs.szl declares the output tables generated by parser.szl (cf. szl -
print_tables parser.szl).
That seems kinda kludgy though, at least in the sense that the
log_file*.szl files are not much smaller than the actual log files,
and in the sense that -output_tables "*" only generates plain text and
not protocol buffers (or some format() based output) as well. The
aggregation step also takes a particularly long time in the case that
the log_file*.szl files are large.
Might anyone be able to point me in the right direction here? I've
looked at the header files in src/public/, and the mapreduce demo C++
code in src/app/, and I imagine I could cobble a solution out in C++
if I really needed to. Still, it seemed like there should be already
be a clean way to do this already, which I just don't know about yet.
Thank you,
Hunter Blanks