Couple of things have changed, possibly since you've written the draft
(all projects are a moving target):
1. options for number of reducers (along with any option you want to
put into the conf) have been exposed to the language now.
2. we're surprised by the join results-- quite poor for both uniform
and skewed data. do you have the data generators and queries available
for us to have a look?
Also, it may be useful to separate extensibility, support for
embedding, and language expressibility. For example, all languages
surveyed have ways to be extended via UDF's/UDA's. Jaql has some
extras to define (higher-order)functions (in Jaql itself) and modules.
Yes, we have recursion as a result, but what we're after primarily is
reuse and modularity for scripts so that we can use the right level of
abstraction to help us manage complex tasks.
Thanks!
Vuk
Re:Controlling reducers
----------------------------------
I was going on the information posted by yourself on the 19th January
here:
http://groups.google.com/group/jaql-users/browse_thread/thread/7fbc0c3fffe9dbc6/a71006554befd114
I realize that it is a moving target, and I will revise the document
(1.1) saying that this functionality now exists.
Re: Join
--------------------
Sure, no problem. I've been using the DataGenerator package put
together by the devs over at Pig: http://wiki.apache.org/pig/DataGeneratorHadoop
This creates two files, of one column format. This column is used to
join the datasets together. Here is the JAQL script:
$dir1 = read(del("Inputs/join/file1.dat", { fields: ["name"] } ));
$dir2 = read(del("Inputs/join/file2.dat", { fields: ["name"] } ));
join $dir1, $dir2 where $dir1.name == $dir2.name
into {$dir1.name}
-> write(hdfs('Outputs/join/join_output.jaql'));
What options did you use to generate the data? Can you send me your
exact command line arguments you used for the generator? Can you also
please share the pig scripts? Your results differ from similar
experiments that we ran, so we are trying to understand the
differences.
Thanks a bunch.
-Kevin