Inconsistency when handling null values in RegexReplace vs. RegexFilter

75 views
Skip to first unread message

Ken Krugler

unread,
Apr 24, 2012, 6:44:16 PM4/24/12
to cascadi...@googlegroups.com
Hi Chris,

During an Intro to Cascading class today, one of the students was trying to use RegexReplace to replace null field values with "UNKNOWN".

They copied some existing code I'd provided that used RegexFilter to just get rid of such entries, which was…

        logAnalysisPipe = new Each(logAnalysisPipe, new Fields("country"), new RegexFilter("null", true));

So their version of the code looked like:

        logAnalysisPipe = new Each(logAnalysisPipe, 
                                    new Fields("country"), 
                                    new RegexReplace(new Fields("country"), "null", "UNKNOWN"),
                                    Fields.REPLACE);

But it didn't work - no entries were matched by the regex.

The issue is that RegexFilter subclasses RegexMatcher, which calls tuple.toString() to create the string used by the regex Matcher. So this turns a null field in to "null".

But RegexReplace uses tuple.getString(0), and then remaps a null result to "". So the correct pattern in that case is "^$".

Whichever approach is chosen, it seems like it should be consistent here.

Thanks,

-- Ken

--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr





--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr




Chris K Wensel

unread,
Apr 25, 2012, 11:45:16 AM4/25/12
to cascadi...@googlegroups.com
good catch..

actually it would have worked if matchEachElement was set to true I believe.

nonetheless, the wrong #toString on tuple was being called (in a couple place it looks like)

is fixed in the next 2.0 wip.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.


Reply all
Reply to author
Forward
0 new messages