Through experimentation with methods of the BSONWritable object, I have discovered it produces the following JSON output of a Java Date in my Hadoop MapReduce implementation. Here is a summary of the mapper and its use of BSONWritable:
...
public static class TokenizerMapper extends
Mapper<Object, BSONObject, Text, BSONWritable> {
public void map(Object key, BSONObject value, Context context)
throws IOException, InterruptedException {
...
Text mapOutKey = new Text();
BSONWritable mapOutValue = new BSONWritable();
// in this example - pubDate represents "2012-02-10T01.42.45Z"
Long ts = new Long(pubDate.getTime());
mapOutValue.put("ts-String", ts.toString()); // String object
mapOutValue.put("ts-long", ts.longValue()); // long primitive
mapOutValue.put("ts-int", ts.intValue()); // int primitive
context.write(mapOutKey, mapOutValue);
...
After manipulation of the mapOutValue in the reducer to update a count value the following is the result.
{ ..., "ts-String" : "1328873945000",
"ts-long" : NumberLong("1328838165000"),
"ts-int" : 1729050536, ... }Using int has undesirable side-effects. But, our current specification calls for a number output for timestamps. I would prefer to produce a number instead of NumberLong as I'm aware any potential float calculations on a JSON number will not have a significant impact. To clarify this perspective, if you validate the following JSON string at
http://jsonlint.com/, the interpreter does not modify it
{ "long" : 1328873945000 } To get the interpreter to use floating point calculations, you'd have to increase the number 100,000 orders of magnitude. In that case, it would round up as follows:
{ "really long" : 132887394500012345 } => { "really long": 132887394500012350 }So, this is not a significant loss in time considering several millenniums.
Is it possible to produce a JSON number from a long with BSONWritable? If not what other classes, utilities or MR output format would you recommend to achieve the desired result?