Good News: Google opensources Sawzall (parallel analysis language) & Y! plans to opensource S4 (realtime mapreduce framework)

58 views
Skip to first unread message

yarapavan

unread,
Nov 3, 2010, 4:22:20 PM11/3/10
to Cloud Computing
Folks,

The announcements from Yahoo! and Google are significant in advancing
the art and technology of cloud computing landscape:

(1) S4: Yahoo has built a general-purpose, real-time, distributed,
fault-tolerant, scalable, event driven, expandable platform called S4
which allows programmers to easily implement applications for
processing continuous unbounded streams of data. S4 clusters are built
using low-cost commoditized hardware, and leverage many technologies
from Yahoo!’s Hadoop project.

S4 is written in Java and uses the Spring Framework to build a
software component architecture. Over a dozen pluggable modules have
been created so far.

Read more about it at - http://wiki.s4.io/Manual/S4Overview

(2) Sawzall Language:

Sawzall is a procedural language developed for parallel analysis of
very large data sets (such as logs). It provides protocol buffer
handling, regular expression support, string and array manipulation,
associative arrays (maps), structured data (tuples), data
fingerprinting (64-bit hash values), time values, various utility
operations and the usual library functions operating on floating-point
and string values. For years Sawzall has been Google's logs processing
language of choice and is used for various other data analysis tasks
across the company.

Instead of specifying how to process the entire data set, a Sawzall
program describes the processing steps for a single data record
independent of others. It also provides statements for emitting
extracted intermediate results to predefined containers that aggregate
data over all records. The separation between per-record processing
and aggregation enables parallelization. Multiple records can be
processed in parallel by different runs of the same program, possibly
distributed across many machines. The language does not specify a
particular implementation for aggregation, but a number of aggregators
are supplied. Aggregation within a single execution is automatic.

Project Page: https://code.google.com/p/szl

For technically inclined, here is the link to the original paper by
Rob Pike et al:
* Title: Interpreting the Data: Parallel Analysis with Sawzall
* Link: http://labs.google.com/papers/sawzall-sciprog.pdf


Regards, Pavan
@yarapavan
Reply all
Reply to author
Forward
0 new messages