Groupby not performing very well

294 views
Skip to first unread message

Sandeep N L

unread,
Jan 15, 2014, 6:23:25 PM1/15/14
to druid-de...@googlegroups.com
Hi,


I have a batch ingested data of 610,000 rows and I am trying to do a POC. Our use case needs a groupby to be efficiently happen.

Here is the query I am doing on the ingested data

{
    "queryType": "groupBy",
    "dataSource": "aggs_generated",
    "granularity": "all",
    "dimensions": ["CId", "DateKey","HourNum","AtId","OrId","OId","AId","DisChanId","BTypeId","RId","MatchTypeId","DeviceTypeId",
            "TargetTypeId","PricingModelId","MediumId","PagePosId"],
    "aggregations": [{"type": "longSum", "name": "Impressions", "fieldName": "Impr"},
        {"type": "doubleSum", "name": "Clicks", "fieldName": "Clicks"},
   ],
    "intervals": ["2013-11-01/2013-11-30"]
}

The query keeps running for long hours and times out.

Please help me getting around this

Thanks,
Sandeep

Eric Tschetter

unread,
Jan 15, 2014, 7:04:18 PM1/15/14
to druid-de...@googlegroups.com
Sandeep,

To debug this, let's start with a simpler query and work our way up.  Can you reduce the time interval to just a day or so and reduce the number of dimensions to none and see if that improves things?  If that does, try stepping up the time interval and see if you run into the problem.  If you do not, then try stepping up the number of dimensions slowly as well and see if you run into any problems there.

Please take note of what point it starts breaking for you.  Once it does break, try issuing the same query not against the brokers, but against each historical node individually and see if that helps or not.

Lastly, you seem to be including a lot (maybe all?) of your dimensions in your query.  Would you be willing to explain your use case a bit?

--Eric


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/5762693a-63b1-444b-ba7e-c28d543031dd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sandeep N L

unread,
Jan 15, 2014, 7:36:10 PM1/15/14
to druid-de...@googlegroups.com
Thanks Eric.
Here is my use-case. I have hierarchical dimensions and say I have dim1 and dim2, dim3 etc under it.

So I have 2 questions for you here
1. Say if I group on dim2, how can I get to see the value of dim1 for each record returned.
2. Assuming we have different data-sources for all the major dimensions(which would eliminate the bigger group-by necessity), how would you do the smaller dimensions to be grouped on. Say there is a deviceTypeID and I would like to go to dim2 and then see all the aggregations for each particular device type.

I hope I am clear here.

Thanks,
Sandeep

Eric Tschetter

unread,
Jan 15, 2014, 7:41:13 PM1/15/14
to druid-de...@googlegroups.com
Here is my use-case. I have hierarchical dimensions and say I have dim1 and dim2, dim3 etc under it.

So I have 2 questions for you here
1. Say if I group on dim2, how can I get to see the value of dim1 for each record returned.
2. Assuming we have different data-sources for all the major dimensions(which would eliminate the bigger group-by necessity), how would you do the smaller dimensions to be grouped on. Say there is a deviceTypeID and I would like to go to dim2 and then see all the aggregations for each particular device type.

I hope I am clear here.

I'm not sure I understand.  Maybe let's start with, "if this were a SQL database and I had all this data loaded in a table, I would do this query" and write that SQL?

--Eric


 

Sandeep N L

unread,
Jan 15, 2014, 8:00:56 PM1/15/14
to druid-de...@googlegroups.com

For a SQL Query this would be
select dimension1, dimension2, dimension3, sum(measure1), avg(measure2) from

FactTable

group by dimension1, dimension2, dimension3

 

heirarchy is Dimension1 1:n dimension2 1:n dimension 3


-Sandeep

Eric Tschetter

unread,
Jan 16, 2014, 11:18:34 AM1/16/14
to druid-de...@googlegroups.com
For that query, you would do

dimensions: ["dimension1", "dimension2", "dimension3"],
aggregations: [
  {"type": "count", "name": "rows"},
  {"type": "longSum", "name": "sum(measure1)", "fieldName": "measure1"},
  {"type": "longSum", "name": "sum(measure2)", "fieldName": "measure2"}
],
postAggregations: [{"type": "arithmetic", "name": "avg(measure2)", "fn": "/", "fields": [{"type": "fieldAccess", "name": "1234", "fieldName": "sum(measure2)"}, {"type": "fieldAccess", "name": "1234", "fieldName": "rows"}]}],

I noticed you aren't using a WHERE clause, do you not care about filtering down to specific items?

--Eric


Sandeep N L

unread,
Jan 16, 2014, 2:35:10 PM1/16/14
to druid-de...@googlegroups.com
Yes, that is exactly what I am doing and I assume your are talking queryType: :groupBy here.
This one is not returning me the results for hours for 610,000 of data which doesnt seem to be natural. 

And yes, right now, the P0 use case for us is not filter, but it is needed.

On IRC, fj was suggesting me to use a time series query instead as groupBY is not performant and you dont use them in production. For me, I would need to group on multiple dimensions and hence I am not sure if I can use time series queries.

Please suggest

Thanks,
Sandeep

Eric Tschetter

unread,
Jan 16, 2014, 2:37:03 PM1/16/14
to druid-de...@googlegroups.com
The query that you pasted into this email thread is including a *lot* of dimensions, not just 3.  Would you be willing to paste in the actual query that is taking a while?

Also, did you do the other things I suggested about the number of dimensions and decreasing the time bounds, etc.?

--Eric


Sandeep N L

unread,
Jan 20, 2014, 5:39:22 PM1/20/14
to druid-de...@googlegroups.com
Yes, I did the incremental increase in range and actually found the threshold point. I saw a GC out of memory exception on broker. Let me know what is the parameter to update the memory to a higher value.

2014-01-20 22:45:12,613 INFO [qtp1372824277-28] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-01-20T22:45:12.613Z","service":"broker","host":"localhost:8080","metric":"query/wait","value":0,"user10":"failed","user2":"aggs_generated","user3":"1 dims","user4":"groupBy","user5":"2013-11-01T00:00:00.000Z/2013-11-29T00:00:00.000Z","user6":"false","user7":"9 aggs","user9":"PT40320M"}]
Jan 20, 2014 10:45:12 PM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:879)
    at java.util.concurrent.ConcurrentSkipListMap.put(ConcurrentSkipListMap.java:1645)
    at io.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:270)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$5.accumulate(GroupByQueryQueryToolChest.java:138)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$5.accumulate(GroupByQueryQueryToolChest.java:134)
    at com.metamx.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:39)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:46)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:42)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.ConcatSequence.accumulate(ConcatSequence.java:40)
    at com.metamx.common.guava.LazySequence.accumulate(LazySequence.java:37)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:46)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:42)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.ConcatSequence.accumulate(ConcatSequence.java:40)
    at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:82)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.mergeGroupByResults(GroupByQueryQueryToolChest.java:125)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.access$100(GroupByQueryQueryToolChest.java:57)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$2.run(GroupByQueryQueryToolChest.java:84)
    at io.druid.query.FinalizeResultsQueryRunner.run(FinalizeResultsQueryRunner.java:102)
    at io.druid.query.BaseQuery.run(BaseQuery.java:78)

2014-01-20 22:45:12,615 WARN [qtp1372824277-28] org.eclipse.jetty.servlet.ServletHandler -
javax.servlet.ServletException: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:420)
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
    at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:278)
    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:268)
    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180)
    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93)
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)
    at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:132)
    at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:129)
    at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:206)
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:129)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
    at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)
    at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:256)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
    at org.eclipse.jetty.server.Server.handle(Server.java:370)
    at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
    at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:960)
    at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1021)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
    at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:879)
    at java.util.concurrent.ConcurrentSkipListMap.put(ConcurrentSkipListMap.java:1645)
    at io.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:270)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$5.accumulate(GroupByQueryQueryToolChest.java:138)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$5.accumulate(GroupByQueryQueryToolChest.java:134)
    at com.metamx.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:39)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:46)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:42)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.ConcatSequence.accumulate(ConcatSequence.java:40)
    at com.metamx.common.guava.LazySequence.accumulate(LazySequence.java:37)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:46)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:42)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.ConcatSequence.accumulate(ConcatSequence.java:40)
    at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:82)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.mergeGroupByResults(GroupByQueryQueryToolChest.java:125)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.access$100(GroupByQueryQueryToolChest.java:57)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$2.run(GroupByQueryQueryToolChest.java:84)
    at io.druid.query.FinalizeResultsQueryRunner.run(FinalizeResultsQueryRunner.java:102)
    at io.druid.query.BaseQuery.run(BaseQuery.java:78)
2014-01-20 22:45:12,616 WARN [qtp1372824277-28] org.eclipse.jetty.servlet.ServletHandler - /druid/v2/
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:879)
    at java.util.concurrent.ConcurrentSkipListMap.put(ConcurrentSkipListMap.java:1645)
    at io.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:270)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$5.accumulate(GroupByQueryQueryToolChest.java:138)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$5.accumulate(GroupByQueryQueryToolChest.java:134)
    at com.metamx.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:39)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.MappedSequence.accumulate(MappedSequence.java:40)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:46)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:42)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.ConcatSequence.accumulate(ConcatSequence.java:40)
    at com.metamx.common.guava.LazySequence.accumulate(LazySequence.java:37)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:46)
    at com.metamx.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:42)
    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32)
    at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103)
    at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:80)
    at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:66)
    at com.metamx.common.guava.ConcatSequence.accumulate(ConcatSequence.java:40)
    at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:82)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.mergeGroupByResults(GroupByQueryQueryToolChest.java:125)
    at io.druid.query.groupby.GroupByQueryQueryToolChest.access$100(GroupByQueryQueryToolChest.java:57)
    at io.druid.query.groupby.GroupByQueryQueryToolChest$2.run(GroupByQueryQueryToolChest.java:84)
    at io.druid.query.FinalizeResultsQueryRunner.run(FinalizeResultsQueryRunner.java:102)
    at io.druid.query.BaseQuery.run(BaseQuery.java:78)

Fangjin Yang

unread,
Jan 21, 2014, 1:29:07 PM1/21/14
to druid-de...@googlegroups.com
Hi Sandeep,

You can allocate more memory for your JVM heap on the broker. What is it set to right now? Are you still a groupBy query across many dimensions or is the set smaller now?

-- FJ

Sandeep N L

unread,
Jan 23, 2014, 4:39:43 PM1/23/14
to druid-de...@googlegroups.com
Hi,

Thanks. Yeah I am using the groupBy but reduced the set to the major 6 dimensions we need.
I would like to share the performance numbers I got from querying. Please let me know if this is expected.

Single node instance, local overlord batch ingestion of 610,000 rows

  • Day grain, 1month range, 6 dimensions groupby and 9 metric aggregations
    • 9.7s
    • 290,000 returned row count
    • Peak CPU usage : 50-60%
  • Day grain, 1 month range, 5 dimensions groupby and 9 metric aggregations
    • 10s
    • 275,000 rows
  • Day grain, 1 month range, 6 dimensions groupby and 1 metric aggregation
    • 5.545s
    • 290,000 rows
    • CPU usage 40% peak
  • Day grain, 1 month range 1 dimensions groupby and 1 metric aggregation
    • 2.155s
    • 183,038 rows
    • CPU 20% peak
  • Day grain, 1 month range, 6 dimensions groupby and 9 metric aggregation and 1 dimension filter
    • 0.8s
Please comment on this and let me know if it can do something to better this

Thanks,
Sandeep

Fangjin Yang

unread,
Jan 26, 2014, 3:15:40 AM1/26/14
to druid-de...@googlegroups.com
Hi Sandeep,

A lot of Druid performance tuning is about understanding the hardware you are working with, your cluster setup (the number of nodes you have for each node type), and the correct configuration for such a setup. If you could share some of that information with us, we can share some pointers about best practices.

Thx,
FJ

Sandeep N L

unread,
Jan 27, 2014, 4:37:49 PM1/27/14
to druid-de...@googlegroups.com
Hi FJ,

Sorry I missed mentioning them. 
I am using a single node hardware, 8 cores, hyper threaded and having 5GB memory.
All the nodes(historical, co-ordinator and broker) are in single mode. The jvm options for them are all set at -mx2g

Let me know if I missed anything. Looking forward for your valuable feedback.

-Sandeep

Fangjin Yang

unread,
Jan 28, 2014, 5:35:45 PM1/28/14
to druid-de...@googlegroups.com
Hi Sandeep,

How many segments do you have and how large are the segments?

Druid's segment scan parallelization model requires 1 core per segment at a time. To get better parallelization and generally faster query rates, you can increase the number of processing threads on your historical node. Typically, we set this to be the number of hyperthreads - 1. 

Do you see frequent GC messages in your logs?

Multi-threaded group by queries can be somewhat memory intensive. If frequent full GCs are occurring frequently, it may impact your query times. 

Do you have enough off-heap memory (~2G in your setup) to memory map all your segments?

In general, the more memory and CPUs you can allocate for a historical node, the better. Druid memory maps segments by default, so having more memory on a node should reduce the number of times that segments have to be paged in and out of memory for a query. This paging time can introduce significant latency to your queries.

Those are some getting started points. Let me know if they help or if you have more concerns.

-- FJ

Mo

unread,
Oct 13, 2016, 8:44:51 AM10/13/16
to Druid Development
I'm getting something similar when I send a bunch of consecutive topN queries. The query has a hyper-unique aggregation and granularity set to all. I get the following in my historical log.

2016-10-12T02:46:25,674 ERROR [qtp1970856042-48] com.sun.jersey.spi.container.ContainerResponse - The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.PriorityQueue.<init>(PriorityQueue.java:168) ~[?:1.8.0_101]
at io.druid.query.topn.TopNNumericResultBuilder.<init>(TopNNumericResultBuilder.java:110) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at io.druid.query.topn.NumericTopNMetricSpec.getResultBuilder(NumericTopNMetricSpec.java:128) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at io.druid.query.topn.TopNBinaryFn.apply(TopNBinaryFn.java:126) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at io.druid.query.topn.TopNBinaryFn.apply(TopNBinaryFn.java:39) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at io.druid.common.guava.CombiningSequence$CombiningYieldingAccumulator.accumulate(CombiningSequence.java:212) ~[druid-common-0.9.1.1.jar:0.9.1.1]
at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104) ~[java-util-0.27.9.jar:?]
at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81) ~[java-util-0.27.9.jar:?]
at io.druid.common.guava.CombiningSequence.toYielder(CombiningSequence.java:78) ~[druid-common-0.9.1.1.jar:0.9.1.1]
at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) ~[java-util-0.27.9.jar:?]
at io.druid.query.CPUTimeMetricQueryRunner$1.toYielder(CPUTimeMetricQueryRunner.java:93) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at com.metamx.common.guava.Sequences$1.toYielder(Sequences.java:98) ~[java-util-0.27.9.jar:?]
at io.druid.server.QueryResource.doPost(QueryResource.java:224) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_101]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) ~[jersey-server-1.19.jar:1.19]
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) ~[jersey-servlet-1.19.jar:1.19]
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) ~[jersey-servlet-1.19.jar:1.19]
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) ~[jersey-servlet-1.19.jar:1.19]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0]
at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:278) ~[guice-servlet-4.0-beta.jar:?]

I have daily segments and am querying 90 days of data. Each segment is about 300 mb. I am sending queries directly to historical since I don't have realtime nodes and a single historical. config for the historical is the following:

-Xms2048m

-Xmx2048m

-XX:MaxDirectMemorySize=5120m



druid.processing.buffer.sizeBytes=314572800

druid.processing.numThreads=4


I have an 8gb machine with 8gb swap(ssd).

I am using imply and have the middle manager on the same machine.


Thanks,

Mohit

Gian Merlino

unread,
Oct 13, 2016, 12:47:45 PM10/13/16
to druid-de...@googlegroups.com
Hey Mohit,

"GC overhead limit exceeded" indicates running low on heap space, which will slow down queries due to spending a lot of time in the garbage collector. You could try increasing your heap size, or decreasing the number of concurrent queries to reduce memory use (by lowering druid.server.http.numThreads).

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/e206a1bd-692e-437e-96bc-b906f8f8b6a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages