Just posting the following from the pig user list for further context:
"Anyway, let me point you to the test results:
ttp://www.macs.hw.ac.uk/~rs46/WordCount_Scale_Up_Execution.pdf"
and the Jaql query:
"And For your information, the JAQL script looks like:
$input = read(lines("Inputs/WordCount/wordsx1_skewed.dat"));
$input -> group by $word = $
into { $word, num: count($) }
-> write(hdfs('Outputs/WordCount/wordCountOutputx1_skewed.jaql'));
(Note, this is from JAQL svn snapshot).
Rob"
In fact, that posting on the Pig user list is from me ! (Rob = Rob
Setwart).
Thanks for the guidance on the mapred-site.xml.
I have another question for you. I want to output my results from the
JAQL word count as a plain text file. I have tried:
------------
$input = read(lines("Inputs/WordCount/smallWords.dat"));
$input -> group by $word = $
into { $word, num: count($) }
-> write({type: 'hdfs', location: 'Outputs/WordCount/
wordCountOutputx1_uniform_TEXT.jaql',
outoptions: {format:
'org.apache.hadoop.mapred.TextOutputFormat',
converter:
'com.foobar.store.ToJSONTxtConverter',
configurator:
'com.foobar.store.TextFileOutputConfigurator'}});
-------------
But I get:
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
cannot be cast to com.ibm.jaql.io.hadoop.JsonHolder
The smallWords.dat is a plain text file, simply with one word per
line.
I assume I am getting the converter/format/configurator wrong. Can you
tell me how to correct this?
Thanks,
Rob Stewart
Yes, this procedure has been changed; you can treat the code in
com.foobar.* as deprecated. Writing into delimited files now works as
follows:
data = [ { word: "hello", num: 10 }, { word: "world", num: 2 } ];
data -> write( del( "test", { fields: [ "word", "num" ] } ) );
This gives:
"hello",10
"world",2
del() is a function that produces the necessary file descriptor for both
reading and writing. It takes a filename and an option record as
arguments. The "fields" option used here is currently necessary for
specifying in which order the fields of a record should be written (or
read). Other useful options include "delimiter" (for custom delimiters)
and "convert" (for automatic type conversion when reading). More
examples can be found in
"$JAQL_HOME/src/test/com/ibm/jaql/storageTextQueries.txt".
Best,
Rainer.
Kudos to the person(s) who made the decision to introduce the del()
method !
Regards,
Rob