Container failure without relaunch

1 view
Skip to first unread message

Ganelin, Ilya

unread,
May 30, 2017, 1:13:11 PM5/30/17
to DataTorrent Users Group, us...@apex.apache.org

Hi all – several times now I’ve noticed odd behavior with our app. When running for several days or more, I’ll observe that following an operator failure, the container does not relaunch. I’m not sure what accounts for this, I don’t see any further errors in the log following the initial “stop” + “operator remove, it’s as if recovery is not working. Any thoughts on what could be causing this?

 

- Ilya Ganelin

id:image001.png@01D1F7A4.F3D42980



The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Pramod Immaneni

unread,
May 30, 2017, 1:18:21 PM5/30/17
to us...@apex.apache.org, DataTorrent Users Group
Hi Ilya,

What is the state of the physical containers in the physical tab. Are the containers dying and continuously restarting. 

Thanks

Ganelin, Ilya

unread,
May 30, 2017, 1:36:11 PM5/30/17
to us...@apex.apache.org, DataTorrent Users Group

I think I checked this and I don’t see any activity whatsoever. No re-launch, just empty tabs. I’ll try to provide a screenshot next time it happens.

 

- Ilya Ganelin

id:image001.png@01D1F7A4.F3D42980

 

From: Pramod Immaneni <pra...@datatorrent.com>
Reply-To: "us...@apex.apache.org" <us...@apex.apache.org>
Date: Tuesday, May 30, 2017 at 10:17 AM
To: "us...@apex.apache.org" <us...@apex.apache.org>
Cc: DataTorrent Users Group <dt-u...@googlegroups.com>
Subject: Re: Container failure without relaunch

 

Hi Ilya,

 

What is the state of the physical containers in the physical tab. Are the containers dying and continuously restarting. 

 

Thanks

On Tue, May 30, 2017 at 10:11 AM, Ganelin, Ilya <Ilya.G...@capitalone.com> wrote:

Hi all – several times now I’ve noticed odd behavior with our app. When running for several days or more, I’ll observe that following an operator failure, the container does not relaunch. I’m not sure what accounts for this, I don’t see any further errors in the log following the initial “stop” + “operator remove, it’s as if recovery is not working. Any thoughts on what could be causing this?

 

- Ilya Ganelin

 


The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Pramod Immaneni

unread,
May 30, 2017, 1:39:54 PM5/30/17
to DataTorrent Users Group
Sounds good, please click on retrieve killed as well in the physical tab to get the killed containers.

Sandesh Hegde

unread,
May 30, 2017, 1:41:40 PM5/30/17
to us...@apex.apache.org, DataTorrent Users Group
When that issue happens, please check the free resource(CPU and memory) available for Yarn.
image001.png
image002.png
image003.png
image002.png
image003.png
image001.png
Reply all
Reply to author
Forward
0 new messages