count null values

16 views
Skip to first unread message

ben.t...@booking.com

unread,
Jan 14, 2019, 10:08:11 AM1/14/19
to Pinot Users
For data quality purposes it is useful to be able to count (NOT) NULL values in columns. 
What is the recommend way of doing that? I tried != NULL / = NULL / IS NULL, but it doesn't seem to support the NULL term.

Mayank Shrivastava

unread,
Jan 14, 2019, 10:37:01 AM1/14/19
to Pinot Users, ben.t...@booking.com
Pinot currently does not support NULL values natively. NULL values are expected to be stored as a special pre-determined value per data type (eg INT_MIN for int, 'null' for String). These default values can be overwritten using segment generation configs.

You can filter them out as you would filter out any other column, eg: 
where stringColumn <> "null"

Thanks,
Mayank


From: ben.teeuwen via Pinot Users <pinot...@googlegroups.com>
Sent: Monday, January 14, 2019 7:08 AM
To: Pinot Users
Subject: count null values
 
For data quality purposes it is useful to be able to count (NOT) NULL values in columns. 
What is the recommend way of doing that? I tried != NULL / = NULL / IS NULL, but it doesn't seem to support the NULL term.

--
You received this message because you are subscribed to the Google Groups "Pinot Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pinot_users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pinot_users/881a3a97-7686-4919-93f1-9191a2e656bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ben.t...@booking.com

unread,
Jan 14, 2019, 12:36:23 PM1/14/19
to Pinot Users
Thanks for the quick response.

Is there an example of how to overwnite these default settings using a 'segment generation config'? 
Thus far I've defined a table schema, table definition and a job.properties. Do I need a fourth configuration file for this?

Mayank Shrivastava

unread,
Jan 14, 2019, 1:20:31 PM1/14/19
to Pinot Users, ben.t...@booking.com
You can specify the default null value of your choice within the schema file as follows. Note, that you don't necessarily need to define these, and if you don' the system picks it for you.

    {
      "name": "myColumn",
      "dataType": "STRING",
      "defaultNullValue": "INVALID_VALUE"
    },

Default values picked when not specified are chosen as follows:
Apache Pinot (Incubating) - A realtime distributed OLAP datastore - apache/incubator-pinot



Thanks,
mayank

From: ben.teeuwen via Pinot Users <pinot...@googlegroups.com>
Sent: Monday, January 14, 2019 9:36 AM
To: Pinot Users
Subject: Re: count null values
 
--
You received this message because you are subscribed to the Google Groups "Pinot Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pinot_users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages