Re-trying eager op execution on failure

已查看 14 次
跳至第一个未读帖子

Ryan Nett

未读,
2021年1月31日 00:32:562021/1/31
收件人 SIG JVM
I'm experimenting with automatic cleanup for eager tensors, like JavaCPP does for tensors.  I have it working on CPU using JavaCPP's detection (it allocates over the memory limit during execution, then pulls back).  However, on GPU I'm forced to catch OOM exceptions on execute, then run cleanup and re-try.  The catching works fine, but when I execute the same op again I get "expected 2 inputs, got 0".  This is despite TFE_OpGetInputLength(opHandle, "value", TF_Status.newStatus()) returning 1 for both arguments.


To see this, run something that would force an OOM on a GPU.

Any thoughts on how to get around this?  The easiest option I can see is to store set inputs/attributes in the builder and re-build the native op each call, which is less than ideal.

Samuel Audet

未读,
2021年1月31日 22:18:522021/1/31
收件人 Ryan Nett、SIG JVM
That's precisely the kind of thing I meant by "abysmal performance".
Java's GC was never meant to manage any resource at all but heap memory.
I like to think of it as simply an extension of "the stack", where other
(functional) programming languages such as Haskell actually use GC on
the actual stack instead of coming up with a special purpose heap (but
doing it both cleanly and efficiently like that is harder than it looks,
and that still doesn't help us at all with resource management :)

Samuel
回复全部
回复作者
转发
0 个新帖子