I'm trying to ingest data from hdfs, but I keep getting errors like following. It seems like the task gets too much data to handle. But the thing is I have only dates as timestamp so I couldn't further break the data into smaller granularities. Any suggestions about how to solve this issue? Thanks!
Container [pid=70947,containerID=container_e22_1432969244945_6026_01_000075] is running beyond physical memory limits. Current usage: 4.0 GB of 4 GB physical memory used; 5.8 GB of 8.4 GB virtual memory used. Killing container.
Here is more log info.
2015-07-30T00:36:19,528 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 0%
2015-07-30T00:36:29,554 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 15%
2015-07-30T00:36:37,577 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 17%
2015-07-30T00:36:40,586 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 23%
2015-07-30T00:36:46,601 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 26%
2015-07-30T00:36:49,609 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 32%
2015-07-30T00:36:57,628 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 45%
2015-07-30T00:37:00,635 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 66%
2015-07-30T00:37:03,643 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 67%
2015-07-30T00:37:49,754 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 68%
2015-07-30T00:38:43,888 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 69%
2015-07-30T00:39:35,006 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 70%
2015-07-30T00:40:30,140 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 71%
2015-07-30T00:41:21,270 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 72%
2015-07-30T00:42:16,396 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 73%
2015-07-30T00:45:19,858 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1432969244945_6026_r_000000_2, Status : FAILED
Container [pid=77928,containerID=container_e22_1432969244945_6026_01_000077] is running beyond physical memory limits. Current usage: 4.0 GB of 4 GB physical memory used; 5.8 GB of 8.4 GB virtual memory used. Killing container.
Dump of the process-tree for container_e22_1432969244945_6026_01_000077 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 77932 77928 77928 77928 (java) 76785 9822 6220464128 1049133 /usr/lib/jvm/j2sdk1.8-oracle/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx3460300800 -Djava.io.tmpdir=/mnt/hdfs_12o/yarn/nm/usercache/qi_wang/appcache/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.123.204.75 60033 attempt_1432969244945_6026_r_000000_2 77
|- 77928 77926 77928 77928 (bash) 0 0 9822208 289 /bin/bash -c /usr/lib/jvm/j2sdk1.8-oracle/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx3460300800 -Djava.io.tmpdir=/mnt/hdfs_12o/yarn/nm/usercache/qi_wang/appcache/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.123.204.75 60033 attempt_1432969244945_6026_r_000000_2 77 1>/var/log/hadoop-yarn/container/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077/stdout 2>/var/log/hadoop-yarn/container/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
2015-07-30T00:45:20,862 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 0%
2015-07-30T00:45:41,910 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 11%
2015-07-30T00:45:44,917 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 13%
2015-07-30T00:45:53,938 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 18%
2015-07-30T00:45:56,945 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 22%
2015-07-30T00:46:03,965 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 25%
2015-07-30T00:46:06,972 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 32%
2015-07-30T00:46:14,992 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 52%
2015-07-30T00:46:17,999 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 67%
2015-07-30T00:47:12,128 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 68%
2015-07-30T00:48:06,250 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 69%
2015-07-30T00:49:01,361 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 70%
2015-07-30T00:49:52,465 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 71%
2015-07-30T00:50:43,574 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 72%
2015-07-30T00:51:35,685 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 73%
2015-07-30T00:54:30,044 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 100%
2015-07-30T00:54:31,052 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1432969244945_6026 failed with state FAILED due to: Task failed task_1432969244945_6026_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1
2015-07-30T00:54:31,139 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=810738421
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2326661134
HDFS: Number of bytes written=0
HDFS: Number of read operations=219
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed reduce tasks=4
Launched map tasks=73
Launched reduce tasks=4
Data-local map tasks=61
Rack-local map tasks=12
Total time spent by all maps in occupied slots (ms)=1428866
Total time spent by all reduces in occupied slots (ms)=2184837
Total time spent by all map tasks (ms)=1428866
Total time spent by all reduce tasks (ms)=2184837
Total vcore-seconds taken by all map tasks=1428866
Total vcore-seconds taken by all reduce tasks=2184837
Total megabyte-seconds taken by all map tasks=5852635136
Total megabyte-seconds taken by all reduce tasks=8949092352
Map-Reduce Framework
Map input records=30597989
Map output records=30597989
Map output bytes=3795354607
Map output materialized bytes=802113853
Input split bytes=10001
Combine input records=0
Spilled Records=30597989
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=277433
CPU time spent (ms)=5533170
Physical memory (bytes) snapshot=123575734272
Virtual memory (bytes) snapshot=395487993856
Total committed heap usage (bytes)=224425672704
File Input Format Counters
Bytes Read=2326651133