Hi,
when i use the indexing service to ingest data with index hadoop,and the job is successful,but i find the input records is not equal the out records.some records has been discarded.
see below:
2015-07-28T12:15:20,039 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 96%
2015-07-28T12:15:24,059 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 97%
2015-07-28T12:16:59,417 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 98%
2015-07-28T12:21:00,535 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 99%
2015-07-28T12:24:24,513 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 100%
2015-07-28T13:11:01,180 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1431571906396_22852 completed successfully
2015-07-28T13:11:01,383 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 53
File System Counters
FILE: Number of bytes read=27922640633
FILE: Number of bytes written=55943546674
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=25971081731
HDFS: Number of bytes written=2618327862
HDFS: Number of read operations=1191
HDFS: Number of large read operations=0
HDFS: Number of write operations=236
Job Counters
Failed map tasks=10
Killed reduce tasks=9
Launched map tasks=298
Launched reduce tasks=109
Other local map tasks=10
Data-local map tasks=278
Rack-local map tasks=10
Total time spent by all maps in occupied slots (ms)=64678934
Total time spent by all reduces in occupied slots (ms)=280668248
Total time spent by all map tasks (ms)=32339467
Total time spent by all reduce tasks (ms)=70167062
Total vcore-seconds taken by all map tasks=32339467
Total vcore-seconds taken by all reduce tasks=70167062
Total megabyte-seconds taken by all map tasks=132462456832
Total megabyte-seconds taken by all reduce tasks=574808571904
Map-Reduce Framework
Map input records=36490889
Map output records=36490840
Map output bytes=27776676541
Map output materialized bytes=27922812701
Input split bytes=38016
Combine input records=0
Combine output records=0
Reduce input groups=9
Reduce shuffle bytes=27922812701
Reduce input records=36490840
Reduce output records=0
Spilled Records=72981680
Shuffled Maps =28800
Failed Shuffles=0
Merged Map outputs=28800
GC time elapsed (ms)=6669226
CPU time spent (ms)=115836130
Physical memory (bytes) snapshot=492926922752
Virtual memory (bytes) snapshot=1436702986240
Total committed heap usage (bytes)=817816928256
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=25971043715
File Output Format Counters
Bytes Written=0
2015-07-28T13:11:01,525 INFO [task-runner-0] io.druid.indexer.IndexGeneratorJob - Adding segment logdata_format_2015-01-11T00:00:00.000+08:00_2015-01-12T00:00:00.000+08:00_2015-07-28T12:02:24.665+08:00 to the list of published segments
2015-07-28T13:11:01,531 INFO [task-runner-0] io.druid.indexer.IndexGeneratorJob - Adding segment logdata_format_2015-01-12T00:00:00.000+08:00_2015-01-13T00:00:00.000+08:00_2015-07-28T12:02:24.665+08:00 to the list of published segments
2015-07-28T13:11:01,535 INFO [task-runner-0] io.druid.indexer.IndexGeneratorJob - Adding segment logdata_format_2015-01-13T00:00:00.000+08:00_2015-01-14T00:00:00.000+08:00_2015-07-28T12:02:24.665+08:00 to the list of published segments
2015-07-28T13:11:01,540 INFO [task-runner-0] io.druid.indexer.IndexGeneratorJob - Adding segment logdata_format_2015-01-14T00:00:00.000+08:00_2015-01-15T00:00:00.000+08:00_2015-07-28T12:02:24.665+08:00 to the list of published segments.
As the info above,the map input records isn't equal the map output recodes,why?what's you advice here?Thanks.