As promised, today I have made available my findings and experiment results from my research project, examining the high level languages: Pig, Hive and JAQL.
The project extends from existing studies, by evaluating the scale up, scale out, and runtime for 3 benchmarking applications. It also examines the ease of programming, and the computational power of each language.
I've created two documents: - Publication - A slide-by-slide presentation. 16 slides - *Suitable for most readers* - dissertation results chapter (18 pages of text)
Excuse the .HTML link - It is useful for me to record the number of hits the publication receives.
I welcome any feedback, either on this mailing list, or to my University email address for direct correspondence. Any questions regarding the benchmarks should be sent to my University email address.
Couple of things have changed, possibly since you've written the draft (all projects are a moving target):
1. options for number of reducers (along with any option you want to put into the conf) have been exposed to the language now.
2. we're surprised by the join results-- quite poor for both uniform and skewed data. do you have the data generators and queries available for us to have a look?
Also, it may be useful to separate extensibility, support for embedding, and language expressibility. For example, all languages surveyed have ways to be extended via UDF's/UDA's. Jaql has some extras to define (higher-order)functions (in Jaql itself) and modules. Yes, we have recursion as a result, but what we're after primarily is reuse and modularity for scripts so that we can use the right level of abstraction to help us manage complex tasks.
Thanks!
Vuk
On Mar 23, 7:11 am, Rob Stewart <robstewar...@googlemail.com> wrote:
> As promised, today I have made available my findings and experiment results > from my research project, examining the high level languages: Pig, Hive and > JAQL.
> The project extends from existing studies, by evaluating the scale up, scale > out, and runtime for 3 benchmarking applications. It also examines the ease > of programming, and the computational power of each language.
> I've created two documents: > - Publication - A slide-by-slide presentation. 16 slides - *Suitable for > most readers* > - dissertation results chapter (18 pages of text)
> Excuse the .HTML link - It is useful for me to record the number of hits the > publication receives.
> I welcome any feedback, either on this mailing list, or to my University > email address for direct correspondence. Any questions regarding the > benchmarks should be sent to my University email address.
What options did you use to generate the data? Can you send me your exact command line arguments you used for the generator? Can you also please share the pig scripts? Your results differ from similar experiments that we ran, so we are trying to understand the differences.
You can find the complete code in the link below. As you may have
realized, the tool I used (developed by the Pig developers) does not
easily and neatly let you generate two files to "join", i.e. two
inputs, with some common values in both. So I created a "made to fit"
generating script.
Usage:
1. download all 5 files
2. Generate the test data: run makeTestData and upload all files to
the HDFS
3. Once complete, you'll want to benchmark the join applications
3.a) Run javaJoin
3.b) Run jaqlJoin
4. 3 should provide a load of readable files with the runtime for each
operation.
Hopefully with a bit of intuition what I was trying to do may make
sense. Give the scripts a try.
There is a fair chance that these scripts will not run on your first
try, because I've tidied them up somewhat, and my runtime environment
is almost certainly different to yours i.e. naming convention of input
files in the HDFS directory structure.
Let me know if you encounter any problems, or have any further
questions, I will do my very best to help you out.
NOTE: You will *definitely* need to edit the classpaths in these
files, e.g. for the pig jar, zipfjar jar etc etc... These files will
not execute otherwise.
Rob Stewart
-- You received this message because you are subscribed to the Google Groups "Jaql Users" group.
To post to this group, send email to jaql-users@googlegroups.com.
To unsubscribe from this group, send email to jaql-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jaql-users?hl=en.