So their version of the code looked like:
logAnalysisPipe = new Each(logAnalysisPipe,
new Fields("country"),
new RegexReplace(new Fields("country"), "null", "UNKNOWN"),
Fields.REPLACE);
But it didn't work - no entries were matched by the regex.
The issue is that RegexFilter subclasses RegexMatcher, which calls tuple.toString() to create the string used by the regex Matcher. So this turns a null field in to "null".
But RegexReplace uses tuple.getString(0), and then remaps a null result to "". So the correct pattern in that case is "^$".
Whichever approach is chosen, it seems like it should be consistent here.
Thanks,
-- Ken
--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr