What is index_realtime doing?

143 views
Skip to first unread message

Pet Nik

unread,
Jan 21, 2016, 9:21:07 AM1/21/16
to Druid User
I 'v runed coordinator, overlord, historical, broker.
couldn't run realtime node without druid.realtime.specFile

2016-01-21T13:53:41,033 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.RealtimeManagerConfig] from props[druid.realtime.] as [io.druid.guice.RealtimeManagerConfig@7c447c76]
Exception in thread "main" com.google.inject.CreationException: Guice creation errors:

1) Error injecting constructor, java.lang.NullPointerException
  at io.druid.guice.FireDepartmentsProvider.<init>(FireDepartmentsProvider.java:41)
  while locating io.druid.guice.FireDepartmentsProvider
  at io.druid.guice.RealtimeModule.configure(RealtimeModule.java:79)
  while locating java.util.List<io.druid.segment.realtime.FireDepartment>
    for parameter 0 at io.druid.segment.realtime.RealtimeMetricsMonitor.<init>(RealtimeMetricsMonitor.java:42)
  while locating io.druid.segment.realtime.RealtimeMetricsMonitor
  at io.druid.server.metrics.MetricsModule.getMonitorScheduler(MetricsModule.java:78)
  at io.druid.server.metrics.MetricsModule.getMonitorScheduler(MetricsModule.java:78)
  while locating com.metamx.metrics.MonitorScheduler
  at io.druid.server.metrics.MetricsModule.configure(MetricsModule.java:63)
  while locating com.metamx.metrics.MonitorScheduler annotated with @com.google.inject.name.Named(value=ForTheEagerness)

Why?

And i'v runned index_realtime

{
        "type": "index_realtime",
  "resource": {
    "availabilityGroup": "someGroup",
    "requiredCapacity": 1
  },
        "spec": {
            "dataSchema": {
              "dataSource": "test_source",
              "parser": {
                "type": "string",
                "parseSpec": {
                  "format": "tsv",
                  "timestampSpec": {
                    "column": "timestamp",
                    "format": "posix"
                  },
                  "columns": [...],
                  "dimensionsSpec": {...}
                }
              },
              "metricsSpec": [...],
              "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "DAY",
                "queryGranularity": "HOUR",
                "intervals": ["2016-01-21T00:00:00/2016-01-22T00:00:00"]
              }
            },
            "ioConfig": {
              "type": "realtime",
              "firehose": {
                "type": "local",
                "baseDir": "/dir",
                "filter": "2016-01-21.tsv"
              }
            },
            "tuningConfig": {
              "type": "realtime",
              "maxRowsInMemory": 500000,
              "intermediatePersistPeriod": "PT10m",
              "windowPeriod": "PT10m",
              "rejectionPolicy": {
              "type": "serverTime"
            }
        }
    }
}
Task compete "sucess" (without realtime node?), but i can't find results.
Part of logs:

2016-01-21T13:49:09,134 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Creating plumber using rejectionPolicy[serverTime-PT10M]
2016-01-21T13:49:09,138 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Expect to run at [2016-01-22T00:10:00.000Z]
2016-01-21T13:49:09,140 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.
2016-01-21T13:49:09,141 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [2016-01-21T00:00:00.000Z].
2016-01-21T13:49:09,141 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge
2016-01-21T13:49:09,170 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Searching for all [2016-01-21.tsv] in and beneath [/dir]
2016-01-21T13:49:09,186 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/dir/2016-01-21.tsv]
2016-01-21T13:49:09,837 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[test_source]
2016-01-21T13:49:09,838 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Shutting down...
2016-01-21T13:49:09,840 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Removing task directory: /tmp/persistent/task/index_realtime_test_source_0_2016-01-21T13:49:01.897Z_nldhhglf/work
2016-01-21T13:49:09,847 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_realtime_test_source_0_2016-01-21T13:49:01.897Z_nldhhglf",
  "status" : "SUCCESS",
  "duration" : 904
}

Thx

 

Fangjin Yang

unread,
Jan 21, 2016, 8:37:52 PM1/21/16
to Druid User
Pet Nik, do you have your logs of that task?

FWIW, I think if you are just getting started with Druid, you might have an easier time with this quickstart: http://imply.io/docs/latest/quickstart

We're trying to migrate that quickstart over to Druid right now
Message has been deleted

Pet Nik

unread,
Jan 24, 2016, 10:27:36 AM1/24/16
to Druid User
I ran the task and it went well. But the data from the first task to overwrite after a second job again. What is difference between the ordinary index task and index_realtime task? Attach the logs of tasks.

пятница, 22 января 2016 г., 4:37:52 UTC+3 пользователь Fangjin Yang написал:
log2.txt
log1.txt

Fangjin Yang

unread,
Jan 27, 2016, 11:36:25 PM1/27/16
to Druid User
Hi Pet,

You shouldn't ever use the index_realtime task on its own without the Tranquility library. It is going to be a hassle to manage. The index task is designed to read from files, and realtime indexing is designed to read from streams. Druid segments are versioned and immutable. Druid does a replace-by-interval strategy when new segments are created for an interval. So if you reindex data, it replaces existing data.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages