JAQL Performance

49 views
Skip to first unread message

Rob Stewart

unread,
Nov 7, 2009, 12:32:34 PM11/7/09
to jaql-...@googlegroups.com
Hi there. 

I'm in the process of writing a paper comparing some query interfaces with Hadoop. I want to include JAQL in that.

I have a few questions for this...

1. I am, at the moment, having a tough time of finding existing comparitive studies with JAQL e.g. Hive vs Pig  vs JAQL. Do any exist? 

2. Do you currently have any benchmarking tools you use prior to each new JAQL release to benchmark performance of each of the JAQL functions on uniform, and skewed data for instance?

3. If I am to design some queries to implement on JAQL and others, are there any that I could expect good and bad performance from JAQL? For instance, PigMix is what is used by Pig developers to see the results of joins, group, order by etc....

4. Are there any published papers out there with JAQL performance measured by: execution time, fault tolerance etc... ?


thanks.

vuk.ercegovac

unread,
Nov 9, 2009, 2:06:57 PM11/9/09
to Jaql Developers
Hi Rob,

Thanks for your interest! My replies are in-lined below:

[cut]
>
> 1. I am, at the moment, having a tough time of finding existing comparitive
> studies with JAQL e.g. Hive vs Pig  vs JAQL. Do any exist?
>
Hive's JIRA includes a comparison with Pig's performance. I believe
they use the benchmark described in the paper: A Comparison of
Approaches to Large-Scale Data Analysis (SIGMOD 2009).

> 2. Do you currently have any benchmarking tools you use prior to each new
> JAQL release to benchmark performance of each of the JAQL functions on
> uniform, and skewed data for instance?
We use a mix of real and synthetic application data sets and
workloads. None of this is released yet, however.

>
> 3. If I am to design some queries to implement on JAQL and others, are there
> any that I could expect good and bad performance from JAQL? For instance,
> PigMix is what is used by Pig developers to see the results of joins, group,
> order by etc....
>
Over the summer, an intern implemented PigMix with Jaql. We plan to
release this implementation + preliminary results soon. All of this
work is off of trunk and not the various versions that can be
downloaded. The largest hole at the moment that is currently being
addressed is a more efficient implementation of tee (similar to
"split"). I expect we'll want to consider other aspects as well, for
example scale-up and exercising functionality to dig deeper into
nested data.

> 4. Are there any published papers out there with JAQL performance measured
> by: execution time, fault tolerance etc... ?
>
At this time, no... but we're working on it.

> thanks.
Reply all
Reply to author
Forward
0 new messages