How to know count of ingested rows in druid

1,906 views
Skip to first unread message

Parveen Jain

unread,
Jan 13, 2017, 6:19:20 AM1/13/17
to Druid User

I am ingesting my data in druid using realtime node. Total no of records ingested is 40 million but when I query I see output of only 13 million record. I can see that there is no rejection of events in logs but still count is less than my given data ingest. When going through druid doc it says:


“To count the number of ingested rows of data, include a count aggregator at ingestion time, and a longSum aggregator at query time.”

What is the meaning of this line ?


 I am giving following line while ingesting data:

"metricsSpec" : [{

        "type" : "count",

        "name" : "COUNT"

      },

 

And querying using below lines:

{

    "queryType": "groupBy",

    "dataSource": "abcd",

    "granularity": "all",

    "dimensions": [],

    "aggregations": [

        {"type": "count", "name": "count"},

        {"type": "count", "name": "UNIQUE_CUSTOMERS", "fieldName": "CUSTOMER_ID"}

         ],

    "intervals": [""]

}

 

It gives me very13m counts(I am sure it is returning druid row counts.), but If I try change query from(as suggested in above lines) {"type": "count", "name": "count"}, to {"type": "longSum", "name": "count"}, it gives syntax error.


I even tried querying it using segmentQuery but this also gives me same 13m counts:

{

  "queryType":"segmentMetadata",

  "dataSource":"abcd",

  "intervals":[""]

}



Can anyone suggest any way to know how many original rows were ingested by druid ? 


Gian Merlino

unread,
Jan 16, 2017, 6:10:14 AM1/16/17
to druid...@googlegroups.com
At query time, instead of {"type": "count", "name": "count"} you want {"type": "longSum", "name": "count", "fieldName": "count"}. The idea is that at indexing time you're doing a count, but at query time you're summing an already-computed count.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/90bd5c7b-625b-4620-8ccd-8d1532a42973%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Akul Narang

unread,
Feb 21, 2018, 2:40:38 AM2/21/18
to Druid User
Hey everyone,

I'm also facing the same issue. The total rows ingested are  21731674 but the longSum count is 16570932. Can I recover the data loss/collapsed as this will effect the score I want to calculate ? Is there any workaround I can prevent rows getting collapsed ?


Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Jonathan Wei

unread,
Feb 21, 2018, 4:51:14 PM2/21/18
to druid...@googlegroups.com
Hi Parveen,

Separate from the total row count questions, I noticed your query has this aggregator:

{"type": "count", "name": "UNIQUE_CUSTOMERS", "fieldName": "CUSTOMER_ID"}

The "count" aggregator doesn't provide the cardinality of a dimension and doesn't take a "fieldName" parameter, it only counts the number of rows that are returned in a query.

To get a cardinality estimate for a column, you'd want to use a cardinality, hyperunique, or datasketch aggregator, e.g. http://druid.io/docs/latest/querying/aggregations.html#cardinality-aggregator


To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Gian Merlino

unread,
Mar 1, 2018, 1:12:31 AM3/1/18
to druid...@googlegroups.com
Hi Akul,

I am not sure what you mean by "recover the data loss/collapsed" but if you are asking how to disable Druid's rollup summarization feature, you can do that by setting rollup: false in your granularitySpec.

Gian

On Tue, Feb 20, 2018 at 11:40 PM, Akul Narang <akul.na...@gmail.com> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages