IngestSegment with filter on metric?

140 views
Skip to first unread message

Hagen Rother

unread,
Oct 18, 2016, 5:06:54 PM10/18/16
to druid-de...@googlegroups.com
Hi,

historically we merged all our kafka topics into a single data source. We have grown beyond the point where this is suitable, kafka ingestion is a good reason to fix this now.

I like to ingest the old segments and split them again. For that, I actually need to filter on a metric (> 0), it seems w/o the filter I still get rows for all source events I don't want in the new data source.

Any idea how to do this?

Thanks,
Hagen
--
Hagen Rother
Lead Architect | LiquidM

LiquidM Technology GmbH
Rosenthaler Str. 36 | 10178 Berlin | Germany
Phone:+49 176 15 00 38 77
Internet:www.liquidm.com | LinkedIn

Managing Directors | André Bräuer, Philipp Simon, Thomas Hille
Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B

Gian Merlino

unread,
Oct 18, 2016, 7:37:36 PM10/18/16
to druid-de...@googlegroups.com
You should be able to do this with a Javascript filter. IIRC those can filter on any column type.

And if it's a long type metric, this should work for all filters in 0.9.2, where we've started supporting all filters on long columns. Doubles to come in a future release.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAMh1h%3DLt%2BpmrCrrCZ%2Bui5yqawdGFxkJ0881qXA1_Ppmp%3DnsY%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hagen Rother

unread,
Oct 19, 2016, 4:24:08 AM10/19/16
to druid-de...@googlegroups.com
Thanks Gian,

i.e. either I deploy 0.9.2-RC2 on the indexer or try the javascript?

If I go 0.9.2, how do I express != 0? A not filter + a selector for string “0” ?

Cheers,
Hagen


For more options, visit https://groups.google.com/d/optout.

Hagen Rother

unread,
Oct 19, 2016, 8:07:34 AM10/19/16
to druid-de...@googlegroups.com
Ok, so I tried

                   “filter”: {
                     "type": "javascript",
                     "dimension": “metric-name",
                     "function": "function(x) { return x > 0; }"
                   }

and got:
SegmentDescriptorInfo is not found usually when indexing process did not produce any segments meaning either there was no input data to process or all the input events were discarded due to some error

However, I am quite sure there is data.

Any idea, what I am doing wrong?

Gian Merlino

unread,
Oct 19, 2016, 9:45:09 AM10/19/16
to druid-de...@googlegroups.com
It's possible that I lied and JS filter doesn't actually work... let me double check that.

In 0.9.2, yeah it'd be a not of a selector for "0". Or a boundfilter with lower bound "0", lower strict, no upper bound, and ordering "numeric". Depending on if you want != 0 or > 0.

Gian
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAMh1h%3D%2Bu8LrHQKX4Ctk%3Day1nhX2X-%2BYjJ-vgFitifwTjZJAUJA%40mail.gmail.com.

Hagen Rother

unread,
Oct 21, 2016, 4:35:24 AM10/21/16
to druid-de...@googlegroups.com
So I upgraded overlords and indexer to 0.9.2 branch (just removing -SNAPSHOT from poms), but now all hadoop tasks fail with:

2016-10-21T08:31:06,571 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1475753964134_35867_m_000499_0, Status : FAILED
Error: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType;
Container killed by the ApplicationMaster.
Any idea what’s going on?

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages