Historical node not loading any segments

497 views
Skip to first unread message

mlub...@viralgains.com

unread,
May 14, 2021, 2:13:47 PM5/14/21
to Druid User
Hi everyone,

On druid 0.16, our production cluster is working fine, but on our test cluster I recently noticed an issue with batch index tasks changed data not showing up in queries, and the coordinator loadStatus was never 100% and the Druid UI showed 20 segments unavailable and 6 segments to drop.

I've tried restarting both the coordinator (only 1 coordinator in this cluster) and the historical node (only 1 historical node in cluster), which did not help.  Then I blew away the historical node and brought up a new one, and now it is not loading any segments.  

The UI is showing that the datasource has 1066 segments to load:
Screen Shot 2021-05-14 at 2.08.47 PM.png
But the loadQueue is empty and the historical server isn't loading any segments:
Screen Shot 2021-05-14 at 2.10.14 PM.png

These numbers are all staying constant.

Here are coordinator logs:
2021-05-14T18:10:03,418 INFO [org.apache.druid.metadata.SQLMetadataSegmentManager-Exec--0] org.apache.druid.metadata.SQLMetadataSegmentManager - Polled and found 1,079 segments in the database
2021-05-14T18:10:06,909 INFO [KafkaSupervisor-campaignsummary] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [campaignsummary] supervisor is running.
2021-05-14T18:10:21,936 INFO [KafkaSupervisor-campaignsummary-Reporting-0] org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=kafka-supervisor-gbcmcgcn] Resetting offset for partition campaignsummary-0 to offset 1839.
2021-05-14T18:10:36,910 INFO [KafkaSupervisor-campaignsummary] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [campaignsummary] supervisor is running.
2021-05-14T18:10:51,913 INFO [KafkaSupervisor-campaignsummary-Reporting-0] org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=kafka-supervisor-gbcmcgcn] Resetting offset for partition campaignsummary-0 to offset 1839.
2021-05-14T18:11:01,278 INFO [DatabaseRuleManager-Exec--0] org.apache.druid.metadata.SQLMetadataRuleManager - Polled and found 2 rule(s) for 2 datasource(s)
2021-05-14T18:11:01,351 INFO [TaskQueue-StorageSync] org.apache.druid.indexing.overlord.TaskQueue - Synced 1 tasks from storage (0 tasks added, 0 tasks removed).
2021-05-14T18:11:03,239 INFO [LookupCoordinatorManager--0] org.apache.druid.server.lookup.cache.LookupCoordinatorManager - Not updating lookups because no data exists
2021-05-14T18:11:03,480 INFO [org.apache.druid.metadata.SQLMetadataSegmentManager-Exec--0] org.apache.druid.metadata.SQLMetadataSegmentManager - Polled and found 1,079 segments in the database

And on the historical server there are no logs at all after startup.

I also verified that both the historical and the coordinator have network access to zookeeper, using telnet to each of the 3 zookeeper nodes.

Can anyone help me figure out what is going on in this cluster?  

Thank you,
Michael

Ben Krug

unread,
May 14, 2021, 3:52:46 PM5/14/21
to druid...@googlegroups.com
1066 segments to load - there must be some connection to the Norman Invasion and the Battle of Hastings.
The historical must be King Harold...

No logs on historical server is odd.  Does the druid user (that druid is running as) have permissions to write to the log directories,
and other necessary directories?

If so, then what is maxSegmentsToMove set to, and have you verified that the druid user can access the segments on deep storage?
But no logs makes me think there's a permission issue on the historical.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/c0087061-5697-4afe-994b-2b3dcd260e98n%40googlegroups.com.

mlub...@viralgains.com

unread,
May 14, 2021, 10:56:04 PM5/14/21
to Druid User
Sorry, I should clarify, the Historical server is writing logs, for example on startup, and after the first couple queries when it creates buffers, but then nothing after that, so there are no issues writing logs:

2021-05-14T17:00:34,090 INFO [main] org.eclipse.jetty.util.log - Logging initialized @5565ms to org.eclipse.jetty.util.log.Slf4jLog
2021-05-14T17:00:34,103 INFO [main] org.apache.druid.server.initialization.jetty.JettyServerModule - Creating http connector with port [9081]
2021-05-14T17:00:34,220 INFO [main] org.apache.druid.guice.JsonConfigurator - Loaded class[class org.apache.druid.query.lookup.LookupListeningAnnouncerConfig] from props[druid.lookup.] as [ListeningAnnouncerConfig{listenersPath='/druid_odyssey/listeners'}]
2021-05-14T17:00:34,225 INFO [main] org.apache.druid.guice.JsonConfigurator - Loaded class[class org.apache.druid.query.lookup.LookupConfig] from props[druid.lookup.] as [LookupConfig{snapshotWorkingDir='', enableLookupSyncOnStartup=true, numLookupLoadingThreads=1, coordinatorFetchRetries=3, lookupStartRetries=3, coordinatorRetryDelay=60000}]
2021-05-14T17:00:34,228 INFO [main] org.apache.druid.guice.JsonConfigurator - Loaded class[class org.apache.druid.guice.http.DruidHttpClientConfig] from props[druid.global.http.] as [org.apache.druid.guice.http.DruidHttpClientConfig@7015ebef]
2021-05-14T17:00:34,302 INFO [main] org.apache.druid.guice.JsonConfigurator - Loaded class[class org.apache.druid.client.coordinator.CoordinatorSelectorConfig] from props[druid.selectors.coordinator.] as [org.apache.druid.client.coordinator.CoordinatorSelectorConfig@73608eb0]
2021-05-14T17:00:34,322 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [INIT]
2021-05-14T17:00:34,322 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.start()] on object[org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@19ae2ee5].
2021-05-14T17:00:34,323 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [NORMAL]
2021-05-14T17:00:34,323 INFO [main] org.apache.druid.curator.CuratorModule - Starting Curator
2021-05-14T17:00:34,323 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:host.name=ip-10-0-5-119.ec2.internal
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.8.0_282
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Red Hat, Inc.
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.amzn2.0.1.x86_64/jre
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/usr/share/druid/conf/druid/cluster/_common:/usr/share/druid/conf/druid/cluster/data/historical:/usr/share/druid/lib/druid-services-0.16.0-incubating.jar:/usr/share/druid/lib/druid-core-0.16.0-incubating.jar:/usr/share/druid/lib/commons-io-2.6.jar:/usr/share/druid/lib/commons-lang-2.6.jar:/usr/share/druid/lib/commons-compress-1.18.jar:/usr/share/druid/lib/config-magic-0.9.jar:/usr/share/druid/lib/hibernate-validator-5.1.3.Final.jar:/usr/share/druid/lib/validation-api-1.1.0.Final.jar:/usr/share/druid/lib/jboss-logging-3.1.3.GA.jar:/usr/share/druid/lib/classmate-1.0.0.jar:/usr/share/druid/lib/javax.el-api-3.0.0.jar:/usr/share/druid/lib/javax.el-3.0.0.jar:/usr/share/druid/lib/jaxb-api-2.3.1.jar:/usr/share/druid/lib/javax.activation-api-1.2.0.jar:/usr/share/druid/lib/guava-16.0.1.jar:/usr/share/druid/lib/jackson-core-2.6.7.jar:/usr/share/druid/lib/jackson-annotations-2.6.7.jar:/usr/share/druid/lib/jackson-databind-2.6.7.jar:/usr/share/druid/lib/jackson-datatype-guava-2.6.7.jar:/usr/share/druid/lib/jackson-datatype-joda-2.6.7.jar:/usr/share/druid/lib/joda-time-2.10.2.jar:/usr/share/druid/lib/jackson-dataformat-smile-2.6.7.jar:/usr/share/druid/lib/guice-4.1.0.jar:/usr/share/druid/lib/javax.inject-1.jar:/usr/share/druid/lib/aopalliance-1.0.jar:/usr/share/druid/lib/guice-multibindings-4.1.0.jar:/usr/share/druid/lib/jdbi-2.63.1.jar:/usr/share/druid/lib/log4j-api-2.5.jar:/usr/share/druid/lib/log4j-core-2.5.jar:/usr/share/druid/lib/log4j-slf4j-impl-2.5.jar:/usr/share/druid/lib/slf4j-api-1.7.25.jar:/usr/share/druid/lib/log4j-jul-2.5.jar:/usr/share/druid/lib/log4j-1.2-api-2.5.jar:/usr/share/druid/lib/jcl-over-slf4j-1.7.12.jar:/usr/share/druid/lib/airline-0.7.jar:/usr/share/druid/lib/jackson-jq-0.0.10.jar:/usr/share/druid/lib/joni-2.1.27.jar:/usr/share/druid/lib/jcodings-1.0.43.jar:/usr/share/druid/lib/fastutil-8.2.3.jar:/usr/share/druid/lib/netty-buffer-4.1.30.Final.jar:/usr/share/druid/lib/netty-common-4.1.30.Final.jar:/usr/share/druid/lib/netty-codec-http-4.1.30.Final.jar:/usr/share/druid/lib/netty-codec-4.1.30.Final.jar:/usr/share/druid/lib/netty-transport-4.1.30.Final.jar:/usr/share/druid/lib/netty-resolver-4.1.30.Final.jar:/usr/share/druid/lib/opencsv-4.6.jar:/usr/share/druid/lib/commons-lang3-3.8.1.jar:/usr/share/druid/lib/commons-text-1.3.jar:/usr/share/druid/lib/commons-beanutils-1.9.3.jar:/usr/share/druid/lib/commons-logging-1.1.1.jar:/usr/share/druid/lib/commons-collections-3.2.2.jar:/usr/share/druid/lib/commons-collections4-4.2.jar:/usr/share/druid/lib/rhino-1.7.11.jar:/usr/share/druid/lib/xz-1.8.jar:/usr/share/druid/lib/zstd-jni-1.3.3-1.jar:/usr/share/druid/lib/json-path-2.3.0.jar:/usr/share/druid/lib/json-smart-2.3.jar:/usr/share/druid/lib/accessors-smart-1.2.jar:/usr/share/druid/lib/asm-7.1.jar:/usr/share/druid/lib/antlr4-runtime-4.5.1.jar:/usr/share/druid/lib/disruptor-3.3.6.jar:/usr/share/druid/lib/jsr305-2.0.1.jar:/usr/share/druid/lib/jna-4.5.1.jar:/usr/share/druid/lib/async-http-client-2.5.3.jar:/usr/share/druid/lib/async-http-client-netty-utils-2.5.3.jar:/usr/share/druid/lib/javax.activation-1.2.0.jar:/usr/share/druid/lib/netty-handler-4.1.30.Final.jar:/usr/share/druid/lib/netty-codec-socks-4.1.30.Final.jar:/usr/share/druid/lib/netty-handler-proxy-4.1.30.Final.jar:/usr/share/druid/lib/netty-transport-native-epoll-4.1.30.Final-linux-x86_64.jar:/usr/share/druid/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/usr/share/druid/lib/netty-resolver-dns-4.1.30.Final.jar:/usr/share/druid/lib/netty-codec-dns-4.1.30.Final.jar:/usr/share/druid/lib/reactive-streams-1.0.2.jar:/usr/share/druid/lib/netty-reactive-streams-2.0.0.jar:/usr/share/druid/lib/sigar-1.6.5.132.jar:/usr/share/druid/lib/jvm-attach-api-1.5.jar:/usr/share/druid/lib/netty-3.10.6.Final.jar:/usr/share/druid/lib/error_prone_annotations-2.3.2.jar:/usr/share/druid/lib/druid-processing-0.16.0-incubating.jar:/usr/share/druid/lib/druid-hll-0.16.0-incubating.jar:/usr/share/druid/lib/extendedset-0.16.0-incubating.jar:/usr/share/druid/lib/RoaringBitmap-0.8.6.jar:/usr/share/druid/lib/shims-0.8.6.jar:/usr/share/druid/lib/compress-lzf-1.0.4.jar:/usr/share/druid/lib/commons-net-3.6.jar:/usr/share/druid/lib/icu4j-54.1.1.jar:/usr/share/druid/lib/lz4-java-1.6.0.jar:/usr/share/druid/lib/asm-commons-7.1.jar:/usr/share/druid/lib/asm-tree-7.1.jar:/usr/share/druid/lib/asm-analysis-7.1.jar:/usr/share/druid/lib/checker-qual-2.5.7.jar:/usr/share/druid/lib/maven-artifact-3.6.0.jar:/usr/share/druid/lib/plexus-utils-3.0.15.jar:/usr/share/druid/lib/druid-server-0.16.0-incubating.jar:/usr/share/druid/lib/druid-aws-common-0.16.0-incubating.jar:/usr/share/druid/lib/aws-java-sdk-ec2-1.11.199.jar:/usr/share/druid/lib/aws-java-sdk-core-1.11.199.jar:/usr/share/druid/lib/httpclient-4.5.3.jar:/usr/share/druid/lib/httpcore-4.4.11.jar:/usr/share/druid/lib/commons-codec-1.12.jar:/usr/share/druid/lib/ion-java-1.0.2.jar:/usr/share/druid/lib/jackson-dataformat-cbor-2.6.7.jar:/usr/share/druid/lib/jmespath-java-1.11.199.jar:/usr/share/druid/lib/aws-java-sdk-s3-1.11.199.jar:/usr/share/druid/lib/aws-java-sdk-kms-1.11.199.jar:/usr/share/druid/lib/druid-gcp-common-0.16.0-incubating.jar:/usr/share/druid/lib/google-api-client-1.22.0.jar:/usr/share/druid/lib/google-oauth-client-1.22.0.jar:/usr/share/druid/lib/google-http-client-1.22.0.jar:/usr/share/druid/lib/google-http-client-jackson2-1.22.0.jar:/usr/share/druid/lib/jackson-module-guice-2.6.7.jar:/usr/share/druid/lib/druid-console-0.16.0-incubating.jar:/usr/share/druid/lib/zookeeper-3.4.14.jar:/usr/share/druid/lib/jline-0.9.94.jar:/usr/share/druid/lib/audience-annotations-0.5.0.jar:/usr/share/druid/lib/curator-framework-4.1.0.jar:/usr/share/druid/lib/curator-client-4.1.0.jar:/usr/share/druid/lib/curator-x-discovery-4.1.0.jar:/usr/share/druid/lib/curator-recipes-4.1.0.jar:/usr/share/druid/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/druid/lib/jackson-core-asl-1.9.13.jar:/usr/share/druid/lib/jackson-jaxrs-json-provider-2.6.7.jar:/usr/share/druid/lib/jackson-jaxrs-base-2.6.7.jar:/usr/share/druid/lib/jackson-module-jaxb-annotations-2.6.7.jar:/usr/share/druid/lib/jackson-jaxrs-smile-provider-2.6.7.jar:/usr/share/druid/lib/jersey-server-1.19.3.jar:/usr/share/druid/lib/jersey-core-1.19.3.jar:/usr/share/druid/lib/jsr311-api-1.1.1.jar:/usr/share/druid/lib/guice-servlet-4.1.0.jar:/usr/share/druid/lib/jersey-guice-1.19.3.jar:/usr/share/druid/lib/jersey-servlet-1.19.3.jar:/usr/share/druid/lib/jetty-server-9.4.10.v20180503.jar:/usr/share/druid/lib/javax.servlet-api-3.1.0.jar:/usr/share/druid/lib/jetty-http-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-util-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-io-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-proxy-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-client-9.4.10.v20180503.jar:/usr/share/druid/lib/tesla-aether-0.0.5.jar:/usr/share/druid/lib/aether-api-0.9.0.M2.jar:/usr/share/druid/lib/aether-spi-0.9.0.M2.jar:/usr/share/druid/lib/aether-util-0.9.0.M2.jar:/usr/share/druid/lib/aether-impl-0.9.0.M2.jar:/usr/share/druid/lib/aether-connector-file-0.9.0.M2.jar:/usr/share/druid/lib/aether-connector-okhttp-0.0.9.jar:/usr/share/druid/lib/okhttp-1.0.2.jar:/usr/share/druid/lib/wagon-provider-api-2.4.jar:/usr/share/druid/lib/maven-aether-provider-3.1.1.jar:/usr/share/druid/lib/maven-model-3.1.1.jar:/usr/share/druid/lib/maven-model-builder-3.1.1.jar:/usr/share/druid/lib/plexus-interpolation-1.19.jar:/usr/share/druid/lib/maven-repository-metadata-3.1.1.jar:/usr/share/druid/lib/maven-settings-builder-3.1.1.jar:/usr/share/druid/lib/maven-settings-3.1.1.jar:/usr/share/druid/lib/spymemcached-2.12.3.jar:/usr/share/druid/lib/jetty-servlet-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-security-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-servlets-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-continuation-9.4.10.v20180503.jar:/usr/share/druid/lib/derby-10.14.2.0.jar:/usr/share/druid/lib/derbynet-10.14.2.0.jar:/usr/share/druid/lib/derbyclient-10.14.2.0.jar:/usr/share/druid/lib/commons-math3-3.6.1.jar:/usr/share/druid/lib/caffeine-2.5.5.jar:/usr/share/druid/lib/commons-dbcp2-2.0.1.jar:/usr/share/druid/lib/commons-pool2-2.2.jar:/usr/share/druid/lib/druid-indexing-hadoop-0.16.0-incubating.jar:/usr/share/druid/lib/druid-indexing-service-0.16.0-incubating.jar:/usr/share/druid/lib/metrics-core-4.0.0.jar:/usr/share/druid/lib/druid-sql-0.16.0-incubating.jar:/usr/share/druid/lib/calcite-core-1.17.0.jar:/usr/share/druid/lib/avatica-core-1.12.0.jar:/usr/share/druid/lib/avatica-metrics-1.12.0.jar:/usr/share/druid/lib/protobuf-java-3.1.0.jar:/usr/share/druid/lib/calcite-linq4j-1.17.0.jar:/usr/share/druid/lib/esri-geometry-api-2.0.0.jar:/usr/share/druid/lib/aggdesigner-algorithm-6.0.jar:/usr/share/druid/lib/janino-2.7.6.jar:/usr/share/druid/lib/commons-compiler-2.7.6.jar:/usr/share/druid/lib/avatica-server-1.12.0.jar:/usr/share/druid/lib/jaxb-runtime-2.3.1.jar:/usr/share/druid/lib/txw2-2.3.1.jar:/usr/share/druid/lib/istack-commons-runtime-3.0.7.jar:/usr/share/druid/lib/stax-ex-1.8.jar:/usr/share/druid/lib/FastInfoset-1.2.15.jar
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.version=4.14.231-173.361.amzn2.x86_64
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.name=root
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/
2021-05-14T17:00:34,333 INFO [main] org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.0.1.85:2181,10.0.2.224:2181,10.0.5.75:2181 sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@d5e575c
2021-05-14T17:00:34,350 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema
2021-05-14T17:00:34,350 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.java.util.emitter.core.HttpPostEmitter.start()] on object[HttpPostEmitter{config=HttpEmitterConfig{flushMillis=60000, flushCount=1000, flushTimeOut=9223372036854775807, basicAuthentication='null', batchingStrategy=ARRAY, maxBatchSize=10383360, contentEncoding=null, batchQueueSizeLimit=100, httpTimeoutAllowanceFactor=8.0, minHttpTimeoutMillis=150, recipientBaseUrl='http://localhost:8000/'}}].
2021-05-14T17:00:34,352 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.java.util.emitter.service.ServiceEmitter.start()] on object[ServiceEmitter{serviceDimensions={service=druid/historical, host=ip-10-0-5-119.ec2.internal:9081, version=0.16.0-incubating}, emitter=HttpPostEmitter{config=HttpEmitterConfig{flushMillis=60000, flushCount=1000, flushTimeOut=9223372036854775807, basicAuthentication='null', batchingStrategy=ARRAY, maxBatchSize=10383360, contentEncoding=null, batchQueueSizeLimit=100, httpTimeoutAllowanceFactor=8.0, minHttpTimeoutMillis=150, recipientBaseUrl='http://localhost:8000/'}}}].
2021-05-14T17:00:34,353 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.server.coordination.SegmentLoadDropHandler.start() throws java.io.IOException] on object[org.apache.druid.server.coordination.SegmentLoadDropHandler@12fcb2c3].
2021-05-14T17:00:34,353 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Starting...
2021-05-14T17:00:34,361 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler$BackgroundSegmentAnnouncer - Completed background segment announcing
2021-05-14T17:00:34,362 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Cache load took 9 ms
2021-05-14T17:00:34,362 INFO [main] org.apache.druid.server.coordination.CuratorDataSegmentServerAnnouncer - Announcing self[DruidServerMetadata{name='ip-10-0-5-119.ec2.internal:9081', hostAndPort='ip-10-0-5-119.ec2.internal:9081', hostAndTlsPort='null', maxSize=60000000000, tier='_default_tier', type=historical, priority=0}] at [/druid_odyssey/announcements/ip-10-0-5-119.ec2.internal:9081]
2021-05-14T17:00:34,372 INFO [main-SendThread(ip-10-0-1-85.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-10-0-1-85.ec2.internal/10.0.1.85:2181. Will not attempt to authenticate using SASL (unknown error)
2021-05-14T17:00:34,379 INFO [main-SendThread(ip-10-0-1-85.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to ip-10-0-1-85.ec2.internal/10.0.1.85:2181, initiating session
2021-05-14T17:00:34,383 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Started.
2021-05-14T17:00:34,383 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.java.util.metrics.MonitorScheduler.start()] on object[org.apache.druid.java.util.metrics.MonitorScheduler@5f395ce1].
2021-05-14T17:00:34,385 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.server.coordination.ZkCoordinator.start() throws java.io.IOException] on object[org.apache.druid.server.coordination.ZkCoordinator@78c262ba].
2021-05-14T17:00:34,385 INFO [main] org.apache.druid.server.coordination.ZkCoordinator - Starting zkCoordinator for server[ip-10-0-5-119.ec2.internal:9081]
2021-05-14T17:00:34,386 INFO [main-SendThread(ip-10-0-1-85.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-10-0-1-85.ec2.internal/10.0.1.85:2181, sessionid = 0x1001943531b033e, negotiated timeout = 30000
2021-05-14T17:00:34,396 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
2021-05-14T17:00:34,422 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.java.util.http.client.NettyHttpClient.start()] on object[org.apache.druid.java.util.http.client.NettyHttpClient@8840c98].
2021-05-14T17:00:34,423 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider.start()] on object[org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider@273fa9e].
2021-05-14T17:00:34,423 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - starting
2021-05-14T17:00:34,423 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - started
2021-05-14T17:00:34,423 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.start() throws java.lang.Exception] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@7ab2a07e].
2021-05-14T17:00:34,506 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.discovery.DruidLeaderClient.start()] on object[org.apache.druid.discovery.DruidLeaderClient@30f28b5].
2021-05-14T17:00:34,507 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - Creating NodeTypeWatcher for nodeType [COORDINATOR].
2021-05-14T17:00:34,511 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - Created NodeTypeWatcher for nodeType [COORDINATOR].
2021-05-14T17:00:34,511 INFO [main] org.apache.druid.discovery.DruidLeaderClient - Started.
2021-05-14T17:00:34,511 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.query.lookup.LookupReferencesManager.start()] on object[org.apache.druid.query.lookup.LookupReferencesManager@2db86a7c].
2021-05-14T17:00:34,511 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - LookupExtractorFactoryContainerProvider is starting.
2021-05-14T17:00:34,579 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[ip-10-0-2-142.ec2.internal:9082:DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/coordinator', host='ip-10-0-2-142.ec2.internal', bindOnHost=false, port=-1, plaintextPort=9082, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='COORDINATOR', services={}}] appeared.
2021-05-14T17:00:34,579 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Received INITIALIZED in node watcher.
2021-05-14T17:00:34,611 WARN [main] org.apache.druid.query.lookup.LookupReferencesManager - No lookups found for tier [__default], response [org.apache.druid.java.util.http.client.response.StringFullResponseHolder@6075b369]
2021-05-14T17:00:34,612 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - Coordinator is unavailable. Loading saved snapshot instead
2021-05-14T17:00:34,612 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - No lookups to be loaded at this point
2021-05-14T17:00:34,612 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - LookupExtractorFactoryContainerProvider is started.
2021-05-14T17:00:34,612 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer.start()] on object[org.apache.druid.query.lookup.LookupResourceListenerAnnouncer@6b70d1fb].
2021-05-14T17:00:34,615 INFO [main] org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer - Announcing start time on [/druid_odyssey/listeners/lookups/__default/http:ip-10-0-5-119.ec2.internal:9081]
2021-05-14T17:00:34,615 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [SERVER]
2021-05-14T17:00:34,615 INFO [main] org.apache.druid.server.initialization.jetty.JettyServerModule - Starting Jetty Server...
2021-05-14T17:00:34,618 INFO [main] org.eclipse.jetty.server.Server - jetty-9.4.10.v20180503; built: 2018-05-03T15:56:21.710Z; git: daa59876e6f384329b122929e70a80934569428c; jvm 1.8.0_282-b08
2021-05-14T17:00:34,651 INFO [main] org.eclipse.jetty.server.session - DefaultSessionIdManager workerName=node0
2021-05-14T17:00:34,651 INFO [main] org.eclipse.jetty.server.session - No SessionScavenger set, using defaults
2021-05-14T17:00:34,654 INFO [main] org.eclipse.jetty.server.session - node0 Scavenging every 600000ms
2021-05-14T17:00:34,742 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider as a provider class
2021-05-14T17:00:34,743 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider as a provider class
2021-05-14T17:00:34,743 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.CustomExceptionMapper as a provider class
2021-05-14T17:00:34,743 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.ForbiddenExceptionMapper as a provider class
2021-05-14T17:00:34,743 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.BadRequestExceptionMapper as a provider class
2021-05-14T17:00:34,744 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.StatusResource as a root resource class
2021-05-14T17:00:34,787 INFO [main] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version 'Jersey: 1.19.3 10/24/2016 03:43 PM'
2021-05-14T17:00:34,868 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.CustomExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton"
2021-05-14T17:00:34,869 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.ForbiddenExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton"
2021-05-14T17:00:34,870 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.BadRequestExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton"
2021-05-14T17:00:34,870 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope "Singleton"
2021-05-14T17:00:34,877 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider to GuiceManagedComponentProvider with the scope "Singleton"
2021-05-14T17:00:35,238 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.QueryResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,244 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.security.StateResourceFilter to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,248 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.HistoricalResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,253 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.SegmentListerResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,255 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.security.ConfigResourceFilter to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,259 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.query.lookup.LookupListeningResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,261 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.query.lookup.LookupIntrospectionResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,263 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.StatusResource to GuiceManagedComponentProvider with the scope "Undefined"
2021-05-14T17:00:35,290 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@30d5e37c{/,null,AVAILABLE}
2021-05-14T17:00:35,300 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@26994c6a{HTTP/1.1,[http/1.1]}{0.0.0.0:9081}
2021-05-14T17:00:35,300 INFO [main] org.eclipse.jetty.server.Server - Started @6776ms
2021-05-14T17:00:35,300 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [ANNOUNCEMENTS]
2021-05-14T17:00:35,300 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.curator.announcement.Announcer.start()] on object[org.apache.druid.curator.announcement.Announcer@3cd46491].
2021-05-14T17:00:35,300 INFO [main] org.apache.druid.curator.announcement.Announcer - Starting announcer
2021-05-14T17:00:35,336 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Announcing [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='ip-10-0-5-119.ec2.internal', bindOnHost=false, port=-1, plaintextPort=9081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=60000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}].
2021-05-14T17:00:35,354 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Announced [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='ip-10-0-5-119.ec2.internal', bindOnHost=false, port=-1, plaintextPort=9081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=60000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}].
2021-05-14T17:00:35,355 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]
2021-05-14T17:56:55,862 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[ip-10-0-2-142.ec2.internal:9082:DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/coordinator', host='ip-10-0-2-142.ec2.internal', bindOnHost=false, port=-1, plaintextPort=9082, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='COORDINATOR', services={}}] disappeared.
2021-05-14T17:57:02,546 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[ip-10-0-2-142.ec2.internal:9082:DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/coordinator', host='ip-10-0-2-142.ec2.internal', bindOnHost=false, port=-1, plaintextPort=9082, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='COORDINATOR', services={}}] appeared.
[mlubavin@ip-10-0-5-119 ~]$

the coordinator appearing and disappearing at the end was from me restarting coordinator.  

So... back to my original question - is it coordinator that's not doing its job telling historical node to load segments, or historical not doing its job listening, or something else?

Ben Krug

unread,
May 15, 2021, 12:15:58 AM5/15/21
to druid...@googlegroups.com
I'm not sure - but you might check that it can access the deep storage (although I'd expect an error if not), and maxSegmentsToMove, and other dynamic coordinator properties.

Mohan Raj

unread,
May 15, 2021, 12:41:39 AM5/15/21
to druid...@googlegroups.com
Hey, no it's not the coordinator that loads or tells the historicals to load.

Please check your zookeeper nodes.

ZK is the mediator

Nishant Bangarwa

unread,
May 18, 2021, 6:01:13 PM5/18/21
to Druid User
Hi Michael, 
Can you share a longer window (30 mins) of logs for both coordinator and historical? 
The coordinator is responsible for assigning the segments to historical and in version 0.16 that you are using, zookeeper is used for mediation. 


Cheers, 
Nishant



mlub...@viralgains.com

unread,
May 19, 2021, 10:30:10 AM5/19/21
to Druid User
OK, I think I have some more info and am getting closer.  I found this error during coordinator startup:
2021-05-19T14:04:04,164 ERROR [LeaderSelector[/druid_odyssey/coordinator/_COORDINATOR]] org.apache.curator.framework.listen.ListenerContainer - Listener (org.apache.druid.curator.discovery.CuratorDruidLeaderSelector$1@c14e5a5) threw an exception
java.lang.ClassFormatError: Illegal field name "org.apache.druid.server.coordinator.DruidCoordinator$this" in class org/apache/druid/server/coordinator/DruidCoordinator$CoordinatorHistoricalManagerRunnable
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_282]
at java.lang.ClassLoader.defineClass(ClassLoader.java:756) ~[?:1.8.0_282]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_282]
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) ~[?:1.8.0_282]
at java.net.URLClassLoader.access$100(URLClassLoader.java:74) ~[?:1.8.0_282]
at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ~[?:1.8.0_282]
at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_282]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_282]
at java.net.URLClassLoader.findClass(URLClassLoader.java:362) ~[?:1.8.0_282]

And I do have java version 1.8.0_282
Is there no patch for this, only that we have to downgrade Java?

I checked and my working cluster is on java 1.8.0_272

mlub...@viralgains.com

unread,
May 19, 2021, 10:49:44 AM5/19/21
to Druid User
Confirming that downgrading java to 1.8.0_272 on the same coordinator machine that wasn't working, is now working

Nishant Bangarwa

unread,
May 21, 2021, 3:01:28 AM5/21/21
to druid...@googlegroups.com
Glad you found the issue,
 I have hit the exact same issue in past as well, Either way, downgrading java or Upgrading Druid to a more recent version would fix this. 
Another less preferred option would be to refactor the class and backport the fix. 


Michael Lubavin

unread,
May 21, 2021, 10:12:30 AM5/21/21
to druid...@googlegroups.com
Hi Nishant,

Thanks for your reply.  Are you saying this is avoided in a different version of Druid?

Thanks,
Michael

Regards,
Michael Lubavin

Director Engineering

W: www.viralgains.com





You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/1_t4c42BcFw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAOEd8t-9nTGbUwk-s1%3DXmprxLiyK9RX4q-1-u-43cafcZ0%3DxZg%40mail.gmail.com.

Nishant Bangarwa

unread,
May 21, 2021, 1:13:14 PM5/21/21
to druid...@googlegroups.com
Hi Michael, 

Yes, this has been fixed as part of refactorings done in https://github.com/apache/druid/pull/7306
If you are looking for a small patch to backport you can try backporting - https://github.com/apache/druid/pull/8813

--

Nishant Bangarwa

Head of Engineering

 



Reply all
Reply to author
Forward
0 new messages