Null values not allowed in lookups?

128 views
Skip to first unread message

Robin

unread,
Sep 22, 2016, 5:47:30 AM9/22/16
to Druid User
Hi,

I'm trying to add some lookups based on a customJson file. The key is "userName", and I'm adding one lookup per the rest of the values.

We don't have all values for all users, so at first we tried to leave out some of the values, which caused the following exception. We then tried by adding all known column values but setting them to null instead. But that doesn't seem to work either.

From my perspective, it would be allowed to leave out columns, and Druid would leave out the key for that lookup and let the query ability to handle the unknown values.

java.lang.NullPointerException: Value column [country] missing data in line [{"userName":"username","country":null,"email":"us...@example.com","gender":"U","cellphone":"+123456789","age":"1983"}]
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253) ~[guava-16.0.1.jar:?]
        at io.druid.query.lookup.namespace.URIExtractionNamespace$DelegateParser.parse(URIExtractionNamespace.java:220) ~[?:?]
        at io.druid.data.input.MapPopulator$1.processLine(MapPopulator.java:67) ~[?:?]
        at com.google.common.io.CharStreams.readLines(CharStreams.java:317) ~[guava-16.0.1.jar:?]
        at com.google.common.io.CharSource.readLines(CharSource.java:239) ~[guava-16.0.1.jar:?]
        at io.druid.data.input.MapPopulator.populate(MapPopulator.java:59) ~[?:?]
        at io.druid.server.lookup.namespace.URIExtractionNamespaceCacheFactory$1$1.call(URIExtractionNamespaceCacheFactory.java:177) ~[?:?]
        at io.druid.server.lookup.namespace.URIExtractionNamespaceCacheFactory$1$1.call(URIExtractionNamespaceCacheFactory.java:130) ~[?:?]
        at com.metamx.common.RetryUtils.retry(RetryUtils.java:60) [java-util-0.27.9.jar:?]
        at com.metamx.common.RetryUtils.retry(RetryUtils.java:78) [java-util-0.27.9.jar:?]
        at io.druid.server.lookup.namespace.URIExtractionNamespaceCacheFactory$1.call(URIExtractionNamespaceCacheFactory.java:128) [druid-lookups-cached-global-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.lookup.namespace.URIExtractionNamespaceCacheFactory$1.call(URIExtractionNamespaceCacheFactory.java:73) [druid-lookups-cached-global-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.lookup.namespace.cache.NamespaceExtractionCacheManager$4.run(NamespaceExtractionCacheManager.java:361) [druid-lookups-cached-global-0.9.1.1.jar:0.9.1.1]
        at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:582) [guava-16.0.1.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_91]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_91]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_91]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]


regards,
Robin

Gian Merlino

unread,
Sep 26, 2016, 12:04:19 PM9/26/16
to druid...@googlegroups.com
Hey Robin,

Null values in lookups seem useful for your case (many logical lookups derived from a single JSON file). I raised https://github.com/druid-io/druid/pull/3512 to allow that to work.

With the current Druid version, some other workarounds are:

1) Split the single physical file into one file per logical lookup, and omit rows with null values.
2) Use a placeholder value like "Unknown country" or "NULL" instead of an actual null. You can pair this with "replaceMissingValueWith" : "your-placeholder-here" at query time to make sure that true nulls and your placeholders are folded together.

Gian

On Thu, Sep 22, 2016 at 2:47 AM, Robin <lob...@gmail.com> wrote:
Hi,

I'm trying to add some lookups based on a customJson file. The key is "userName", and I'm adding one lookup per the rest of the values.

We don't have all values for all users, so at first we tried to leave out some of the values, which caused the following exception. We then tried by adding all known column values but setting them to null instead. But that doesn't seem to work either.

From my perspective, it would be allowed to leave out columns, and Druid would leave out the key for that lookup and let the query ability to handle the unknown values.

java.lang.NullPointerException: Value column [country] missing data in line [{"userName":"username","country":null,"email":"user@example.com","gender":"U","cellphone":"+123456789","age":"1983"}]
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253) ~[guava-16.0.1.jar:?]
        at io.druid.query.lookup.namespace.URIExtractionNamespace$DelegateParser.parse(URIExtractionNamespace.java:220) ~[?:?]
        at io.druid.data.input.MapPopulator$1.processLine(MapPopulator.java:67) ~[?:?]
        at com.google.common.io.CharStreams.readLines(CharStreams.java:317) ~[guava-16.0.1.jar:?]
        at com.google.common.io.CharSource.readLines(CharSource.java:239) ~[guava-16.0.1.jar:?]
        at io.druid.data.input.MapPopulator.populate(MapPopulator.java:59) ~[?:?]
        at io.druid.server.lookup.namespace.URIExtractionNamespaceCacheFactory$1$1.call(URIExtractionNamespaceCacheFactory.java:177) ~[?:?]
        at io.druid.server.lookup.namespace.URIExtractionNamespaceCacheFactory$1$1.call(URIExtractionNamespaceCacheFactory.java:130) ~[?:?]
        at com.metamx.common.RetryUtils.retry(RetryUtils.java:60) [java-util-0.27.9.jar:?]
        at com.metamx.common.RetryUtils.retry(RetryUtils.java:78) [java-util-0.27.9.jar:?]
        at io.druid.server.lookup.namespace.URIExtractionNamespaceCacheFactory$1.call(URIExtractionNamespaceCacheFactory.java:128) [druid-lookups-cached-global-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.lookup.namespace.URIExtractionNamespaceCacheFactory$1.call(URIExtractionNamespaceCacheFactory.java:73) [druid-lookups-cached-global-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.lookup.namespace.cache.NamespaceExtractionCacheManager$4.run(NamespaceExtractionCacheManager.java:361) [druid-lookups-cached-global-0.9.1.1.jar:0.9.1.1]
        at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:582) [guava-16.0.1.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_91]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_91]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_91]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]


regards,
Robin

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/94edc8a5-ed74-4efa-a58e-b7fb531b7fda%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

charles.allen

unread,
Sep 26, 2016, 12:14:14 PM9/26/16
to Druid User
Have you tried mapping things you want reported as missing to the empty string? 
Reply all
Reply to author
Forward
0 new messages