Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Performance & Programming Comparison of JAQL, Hive, Pig and Java
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  5 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Rob Stewart  
View profile  
 More options Mar 23 2010, 10:11 am
From: Rob Stewart <robstewar...@googlemail.com>
Date: Tue, 23 Mar 2010 14:11:55 +0000
Local: Tues, Mar 23 2010 10:11 am
Subject: Performance & Programming Comparison of JAQL, Hive, Pig and Java

Hi folks,

As promised, today I have made available my findings and experiment results
from my research project, examining the high level languages: Pig, Hive and
JAQL.

The project extends from existing studies, by evaluating the scale up, scale
out, and runtime for 3 benchmarking applications. It also examines the ease
of programming, and the computational power of each language.

I've created two documents:
- Publication - A slide-by-slide presentation. 16 slides - *Suitable for
most readers*
- dissertation results chapter (18 pages of text)

You can find these documents at:
http://www.macs.hw.ac.uk/~rs46/publications.html

Excuse the .HTML link - It is useful for me to record the number of hits the
publication receives.

I welcome any feedback, either on this mailing list, or to my University
email address for direct correspondence. Any questions regarding the
benchmarks should be sent to my University email address.

Thanks for taking an interest,

Rob Stewart


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
vuk.ercegovac  
View profile  
 More options Mar 23 2010, 7:27 pm
From: "vuk.ercegovac" <vuk.ercego...@gmail.com>
Date: Tue, 23 Mar 2010 16:27:26 -0700 (PDT)
Local: Tues, Mar 23 2010 7:27 pm
Subject: Re: Performance & Programming Comparison of JAQL, Hive, Pig and Java
Thanks for putting this together.

Couple of things have changed, possibly since you've written the draft
(all projects are a moving target):

1. options for number of reducers (along with any option you want to
put into the conf) have been exposed to the language now.

2. we're surprised by the join results-- quite poor for both uniform
and skewed data. do you have the data generators and queries available
for us to have a look?

Also, it may be useful to separate extensibility, support for
embedding, and language expressibility. For example, all languages
surveyed have ways to be extended via UDF's/UDA's. Jaql has some
extras to define (higher-order)functions (in Jaql itself) and modules.
Yes, we have recursion as a result, but what we're after primarily is
reuse and modularity for scripts so that we can use the right level of
abstraction to help us manage complex tasks.

Thanks!

Vuk

On Mar 23, 7:11 am, Rob Stewart <robstewar...@googlemail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rob Stewart  
View profile  
 More options Mar 23 2010, 7:58 pm
From: Rob Stewart <robstewar...@googlemail.com>
Date: Tue, 23 Mar 2010 16:58:51 -0700 (PDT)
Local: Tues, Mar 23 2010 7:58 pm
Subject: Re: Performance & Programming Comparison of JAQL, Hive, Pig and Java
Hi Vuk,

Re:Controlling reducers
----------------------------------
I was going on the information posted by yourself on the 19th January
here:
http://groups.google.com/group/jaql-users/browse_thread/thread/7fbc0c...

I realize that it is a moving target, and I will revise the document
(1.1) saying that this functionality now exists.

Re: Join
--------------------
Sure, no problem. I've been using the DataGenerator package put
together by the devs over at Pig: http://wiki.apache.org/pig/DataGeneratorHadoop

This creates two files, of one column format. This column is used to
join the datasets together. Here is the JAQL script:

$dir1 = read(del("Inputs/join/file1.dat", { fields: ["name"] } ));
$dir2 = read(del("Inputs/join/file2.dat", { fields: ["name"] } ));
join $dir1, $dir2 where $dir1.name == $dir2.name
into {$dir1.name}
-> write(hdfs('Outputs/join/join_output.jaql'));


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kevin Beyer  
View profile  
 More options Apr 5 2010, 11:28 pm
From: Kevin Beyer <kevin.beyer.j...@gmail.com>
Date: Mon, 5 Apr 2010 20:28:52 -0700 (PDT)
Local: Mon, Apr 5 2010 11:28 pm
Subject: Re: Performance & Programming Comparison of JAQL, Hive, Pig and Java
Hi, Rob --

What options did you use to generate the data?  Can you send me your
exact command line arguments you used for the generator?  Can you also
please share the pig scripts?  Your results differ from similar
experiments that we ran, so we are trying to understand the
differences.

Thanks a bunch.

-Kevin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rob Stewart  
View profile  
 More options Apr 30 2010, 7:16 pm
From: Rob Stewart <robstewar...@googlemail.com>
Date: Fri, 30 Apr 2010 16:16:30 -0700 (PDT)
Local: Fri, Apr 30 2010 7:16 pm
Subject: Re: Performance & Programming Comparison of JAQL, Hive, Pig and Java
Hi Kevin, and others.

You can find the complete code in the link below. As you may have
realized, the tool I used (developed by the Pig developers) does not
easily and neatly let you generate two files to "join", i.e. two
inputs, with some common values in both. So I created a "made to fit"
generating script.
Usage:
1. download all 5 files
2. Generate the test data: run makeTestData and upload all files to
the HDFS
3. Once complete, you'll want to benchmark the join applications
3.a) Run javaJoin
3.b) Run jaqlJoin
4. 3 should provide a load of readable files with the runtime for each
operation.

Hopefully with a bit of intuition what I was trying to do may make
sense. Give the scripts a try.

There is a fair chance that these scripts will not run on your first
try, because I've tidied them up somewhat, and my runtime environment
is almost certainly different to yours i.e. naming convention of input
files in the HDFS directory structure.

Let me know if you encounter any problems, or have any further
questions, I will do my very best to help you out.

http://www.macs.hw.ac.uk/~rs46/files/publications/MapReduce-Languages...

NOTE: You will *definitely* need to edit the classpaths in these
files, e.g. for the pig jar, zipfjar jar etc etc... These files will
not execute otherwise.

Rob Stewart

--
You received this message because you are subscribed to the Google Groups "Jaql Users" group.
To post to this group, send email to jaql-users@googlegroups.com.
To unsubscribe from this group, send email to jaql-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jaql-users?hl=en.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »