if the failure is in the mapper, its likely your custom functions accumulating state (and memory). something that tends to not scale well.
also you might find upgrading to Cascading 1.1.x a good idea.
cheers,
chris
> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>
--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com
but it looks to me as if your child jvms (map tasks) are using up so much memory they are preventing new child jvms from spawning.
you might reduce the amount of memory your child jvms can use so the gc will kick in, instead of just growing the heap preventing other tasks from spawning..
or it could be something entirely different. sorry.
ckw
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
When Hadoop has to execute a shell command (via bash), this triggers a
fork of the current process.
The current process is the Java JVM running the task. The fork
initially uses the same size heap as the parent process. If you're
running with a big Java heap size, this effectively means double that
amount of memory.
This typically works even when you don't have that much memory,
because Linux supports over-commit of memory, where if the swap space
is available, it pretends that the new process can be launched.
But if you've exhausted swap space, the process fork will fail.
-- Ken
>>> To post to this group, send email to cascading-
>>> us...@googlegroups.com.
>>> To unsubscribe from this group, send email to cascading-use...@googlegroups.com
>>> .
>>> For more options, visit this group athttp://groups.google.com/group/cascading-user?hl=en
>>> .
>>
>> --
>> Chris K Wensel
>> ch...@concurrentinc.comhttp://www.concurrentinc.com
>
> --
> You received this message because you are subscribed to the Google
> Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en
> .
>
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
You could instead reuse a tuple, to avoid lots of object allocation/
deallocation (followed by GCs).
-- Ken
On Jun 13, 2010, at 4:34pm, Aceeca wrote:
> Hello Chris,
>
> Thank you for your reply. You are correct, the issue is related to
> memory. Below is the exception I get:
>
> 2010-06-13 18:54:37,287 INFO org.apache.hadoop.mapred.TaskInProgress
> (IPC Server handler 0 on 9001): Error from
> attempt_201006131636_0001_m_000017_0: java.io.IOException: Cannot run
> program "bash": java.io.IOException: error=12, Cannot allocate memory
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
> at org.apache.hadoop.util.Shell.run(Shell.java:134)
> at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
> at org.apache.hadoop.fs.LocalDirAllocator
> $AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
> at
> org
> .apache
> .hadoop
> .fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:
> 124)
> at
> org
> .apache
> .hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:
>>> To post to this group, send email to cascading-
>>> us...@googlegroups.com.
>>> To unsubscribe from this group, send email to cascading-use...@googlegroups.com
>>> .
>>> For more options, visit this group athttp://groups.google.com/group/cascading-user?hl=en
>>> .
>>
>> --
>> Chris K Wensel
>> ch...@concurrentinc.comhttp://www.concurrentinc.com
>
> --
> You received this message because you are subscribed to the Google
> Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com
> .
Is that still true? I thought copy-on-write vfork was standard any more.
Normally the kernel will allow a certain amount of overcommit of memory (in the default, heuristic mode - mode 0 if you `cat /proc/sys/vm/overcommit_memory`). It takes into consideration the available memory in the system and also gives a little more leeway to root. If it thinks you're making a request it cannot possibly fulfill, however, the fork will fail. Adding more swap space makes the kernel think that your request to fork isn't so outlandish and will give it a green light
On Sun, Jun 13, 2010 at 7:29 PM, Ken Krugler <kkrugle...@transpac.com> wrote:
The current process is the Java JVM running the task. The fork initially uses the same size heap as the parent process. If you're running with a big Java heap size, this effectively means double that amount of memory.