Specifying Number of Reducers

29 views
Skip to first unread message

Rob Stewart

unread,
Jan 18, 2010, 5:39:18 AM1/18/10
to jaql-...@googlegroups.com
Hi there,

I am testing JAQL (svn) against Pig, Hive etc.. and have come up to an issue I'm not sure about.

I've been given the guidelines for the number of reducers I should use for my experiments:
Number of DataNodes * Max number of reducers on each node * 0.9

Which gives me:
31 * 2 * 0.9 = 56

In Hive, Pig and Java, I am able to explicitly specify this number as a "number of reducers" argument. Is this possible to do when running JAQL scripts at present?

thanks,

Rob Stewart

vuk.ercegovac

unread,
Jan 18, 2010, 10:14:23 PM1/18/10
to Jaql Users
We have a work item to add this to the syntax. I think it has not been
checked in, however. For now, please make the adjustment via hadoop
cluster parameters (e.g., mapred-site.xml)

Just posting the following from the pig user list for further context:

"Anyway, let me point you to the test results:
ttp://www.macs.hw.ac.uk/~rs46/WordCount_Scale_Up_Execution.pdf"

and the Jaql query:

"And For your information, the JAQL script looks like:

$input = read(lines("Inputs/WordCount/wordsx1_skewed.dat"));
$input -> group by $word = $
into { $word, num: count($) }
-> write(hdfs('Outputs/WordCount/wordCountOutputx1_skewed.jaql'));

(Note, this is from JAQL svn snapshot).

Rob"

Rob Stewart

unread,
Jan 19, 2010, 5:24:22 AM1/19/10
to Jaql Users
Hi Vuk,

In fact, that posting on the Pig user list is from me ! (Rob = Rob
Setwart).

Thanks for the guidance on the mapred-site.xml.

I have another question for you. I want to output my results from the
JAQL word count as a plain text file. I have tried:
------------
$input = read(lines("Inputs/WordCount/smallWords.dat"));


$input -> group by $word = $
into { $word, num: count($) }

-> write({type: 'hdfs', location: 'Outputs/WordCount/
wordCountOutputx1_uniform_TEXT.jaql',
outoptions: {format:
'org.apache.hadoop.mapred.TextOutputFormat',
converter:
'com.foobar.store.ToJSONTxtConverter',
configurator:
'com.foobar.store.TextFileOutputConfigurator'}});
-------------

But I get:
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
cannot be cast to com.ibm.jaql.io.hadoop.JsonHolder

The smallWords.dat is a plain text file, simply with one word per
line.

I assume I am getting the converter/format/configurator wrong. Can you
tell me how to correct this?

Thanks,

Rob Stewart

Rainer Gemulla

unread,
Jan 19, 2010, 2:33:04 PM1/19/10
to jaql-...@googlegroups.com
Hi Rob,

Yes, this procedure has been changed; you can treat the code in
com.foobar.* as deprecated. Writing into delimited files now works as
follows:

data = [ { word: "hello", num: 10 }, { word: "world", num: 2 } ];
data -> write( del( "test", { fields: [ "word", "num" ] } ) );

This gives:
"hello",10
"world",2

del() is a function that produces the necessary file descriptor for both
reading and writing. It takes a filename and an option record as
arguments. The "fields" option used here is currently necessary for
specifying in which order the fields of a record should be written (or
read). Other useful options include "delimiter" (for custom delimiters)
and "convert" (for automatic type conversion when reading). More
examples can be found in
"$JAQL_HOME/src/test/com/ibm/jaql/storageTextQueries.txt".

Best,
Rainer.

Rob Stewart

unread,
Jan 19, 2010, 3:06:36 PM1/19/10
to Jaql Users
Rainer, Thanks very much for the pointer, this is far easier than the
(deprecated) com.foobar.* class methods.

Kudos to the person(s) who made the decision to introduce the del()
method !


Regards,

Rob

Reply all
Reply to author
Forward
0 new messages