OutOfmemory user pods with placeholders on Kubernetes

Robert Schroll

unread,

Dec 3, 2020, 1:42:05 PM12/3/20

to Project Jupyter

Hi all,

We've run into a problem with a JupyterHub on Kubernetes set up (based on Z2JH), where user pods sometimes end up in an OutOfmemory state on startup. This seems to happen only with multiple hubs and user placeholders. Has anyone else run into this issue?

Our set up is a single Kuberenetes cluster. Nodes are sized to support a single user pod at a time, and we're relying on autoscaling to set the cluster size appropriately. We typically run several hubs on this cluster, and we've recently be experimenting with the user placeholders to speed start up time.

As best I can tell, the issue occurs as follows: Let's suppose we're running two hubs, alpha and beta. Alpha has two placeholders running, ph-1 and ph-2. Since each takes up a full node, they are running on node-a and node-b. Now, on the beta hub, user-beta-1 starts their server. There is no extra space, so ph-1 gets evicted, and user-beta-1 is assigned to node-a. This all works fine, even with the ph-1 and user-beta-1 being from different namespaces. The cluster notices the unschedulable pod (ph-1) and starts scaling up. Before the new node is ready, another user from the beta hub, user-beta-2, starts their server. Again, there is no extra space, so ph-2 is evicted. Now, however, both user-beta-2 and ph-1 (which was waiting for space to open up) get assigned to node-b (where ph-2 just left). ph-1 starts up more quickly and reserves the node's resources, so when user-beta-2 starts, it finds insufficient resources. (In our setup, memory is the critical limit.) user-beta-2 enters an OutOfmemory state, where it sulks until I come around and delete the pod. (Even if we remove other pods from the node, it never recovers.)

One worry is that there is some mismatch in priorities between the different namespaces. But I don't think that is the (entire) issue -- placeholders from one namespace are evicted to make room for pods from another. I think it has more to do with assignment of waiting pods to (newly empty) nodes. As I understand it, this is done by the userScheduler, and there's one per hub namespace. Perhaps the two schedulers are making inconsistent decisions?

One solution would be to give each hub its own node pool, so that pods could only evict placeholders from the same hub. I'd like to avoid that if possible. Our hubs have different usage patterns, so it's nice to have one large pool of placeholders that can server whichever hub is seeing the most use at any given time.

I wonder if it's possible to run a single userScheduler for all the hubs. Perhaps this would force it to consider user pods from all namespaces when making decisions. But I don't know how to go about doing this offhand.

Another solution would be to find a way to restart user pods that get into the OutOfmemory state. If I delete the pod by hand and then restart it from the web interface (which itself requires a restart to notice that the user pod has gone away), it will come up just fine, even kicking out the placeholder that beat it before. Running a cron job that could do this every minute would be a fine stop-gap solution. But again, I'm a bit out of my depth here.

Any ideas or suggestions? I'll readily admit that there's a 50/50 chance I've misdiagnosed at least a part of the problem, so I'm happy to run any additional diagnostics that might clear things up.

Thanks,

Robert

Robert Schroll

unread,

Dec 14, 2020, 5:12:00 AM12/14/20

to Project Jupyter

Hi all,

We've managed to find something of a work-around to this problem. By having only a single hub provide placeholders, and telling all hubs to use the userScheduler from that special hub, we've managed to avoid the OutOfmemory pods for the last few days. (For those of you playing along at home, set c.KubeSpawner.scheduler_name = '<special hub namespace>-user-scheduler' in the JupyterHub config file for all hubs other than the special one. Or all of them; it already defaults to that in the special hub.)

I don't particularly like having one hub that's special in some way, so I'm still open for other ideas on how to fix this. Perhaps it's possible to deploy a the userScheduler as part of its own namespace, independent of any hub?

I appreciate all hints and guesses you have to offer,

Robert

Anandraj Jaganathan

unread,

Dec 14, 2020, 11:27:51 PM12/14/20

to jup...@googlegroups.com

Hi All,

Need small help to get output on the below code

for char in 'Python is easy':
        if char !='':
                print(char, end='')
                continue
        else:
                break

I want print only ==> easy

On Mon, Dec 14, 2020 at 5:11 AM 'Robert Schroll' via Project Jupyter <jup...@googlegroups.com> wrote:

Hi all,

We've managed to find something of a work-around to this problem. By having only a single hub provide placeholders, and telling all hubs to use the userScheduler from that special hub, we've managed to avoid the OutOfmemory pods for the last few days. (For those of you playing along at home, set c.KubeSpawner.scheduler_name = '<special hub namespace>-user-scheduler' in the JupyterHub config file for all hubs other than the special one. Or all of them; it already defaults to that in the special hub.)

I don't particularly like having one hub that's special in some way, so I'm still open for other ideas on how to fix this. Perhaps it's possible to deploy a the userScheduler as part of its own namespace, independent of any hub?

I appreciate all hints and guesses you have to offer,
Robert

On Dec 3 2020, at 10:41 am, Robert Schroll <rob...@thedataincubator.com> wrote:
Hi all,

We've run into a problem with a JupyterHub on Kubernetes set up (based on Z2JH), where user pods sometimes end up in an OutOfmemory state on startup. This seems to happen only with multiple hubs and user placeholders. Has anyone else run into this issue?

Our set up is a single Kuberenetes cluster. Nodes are sized to support a single user pod at a time, and we're relying on autoscaling to set the cluster size appropriately. We typically run several hubs on this cluster, and we've recently be experimenting with the user placeholders to speed start up time.

As best I can tell, the issue occurs as follows: Let's suppose we're running two hubs, alpha and beta. Alpha has two placeholders running, ph-1 and ph-2. Since each takes up a full node, they are running on node-a and node-b. Now, on the beta hub, user-beta-1 starts their server. There is no extra space, so ph-1 gets evicted, and user-beta-1 is assigned to node-a. This all works fine, even with the ph-1 and user-beta-1 being from different namespaces. The cluster notices the unschedulable pod (ph-1) and starts scaling up. Before the new node is ready, another user from the beta hub, user-beta-2, starts their server. Again, there is no extra space, so ph-2 is evicted. Now, however, both user-beta-2 and ph-1 (which was waiting for space to open up) get assigned to node-b (where ph-2 just left). ph-1 starts up more quickly and reserves the node's resources, so when user-beta-2 starts, it finds insufficient resources. (In our setup, memory is the critical limit.) user-beta-2 enters an OutOfmemory state, where it sulks until I come around and delete the pod. (Even if we remove other pods from the node, it never recovers.)

One worry is that there is some mismatch in priorities between the different namespaces. But I don't think that is the (entire) issue -- placeholders from one namespace are evicted to make room for pods from another. I think it has more to do with assignment of waiting pods to (newly empty) nodes. As I understand it, this is done by the userScheduler, and there's one per hub namespace. Perhaps the two schedulers are making inconsistent decisions?

One solution would be to give each hub its own node pool, so that pods could only evict placeholders from the same hub. I'd like to avoid that if possible. Our hubs have different usage patterns, so it's nice to have one large pool of placeholders that can server whichever hub is seeing the most use at any given time.

I wonder if it's possible to run a single userScheduler for all the hubs. Perhaps this would force it to consider user pods from all namespaces when making decisions. But I don't know how to go about doing this offhand.

Another solution would be to find a way to restart user pods that get into the OutOfmemory state. If I delete the pod by hand and then restart it from the web interface (which itself requires a restart to notice that the user pod has gone away), it will come up just fine, even kicking out the placeholder that beat it before. Running a cron job that could do this every minute would be a fine stop-gap solution. But again, I'm a bit out of my depth here.

Any ideas or suggestions? I'll readily admit that there's a 50/50 chance I've misdiagnosed at least a part of the problem, so I'm happy to run any additional diagnostics that might clear things up.

Thanks,
Robert

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/5EB2885C-CE02-42EE-95E8-54DF68B2EF1D%40getmailspring.com.

Steve Spicklemire

unread,

Dec 15, 2020, 5:36:15 AM12/15/20

to jup...@googlegroups.com, Steve Spicklemire

Maybe:

`if char !=' ':`

Rather than

`if char !='':`

?

-steve

Anandraj Jaganathan

unread,

Dec 18, 2020, 1:35:20 PM12/18/20

to jup...@googlegroups.com

can you please clarify below question.

What will happen when you try to write data to hdfs using coalesce(1) and data size is more than block size? How many files will be created in hdfs?

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/6939D7DE-27B2-4B22-A34A-7EE8BC6B8AAE%40gmail.com.

Reply all

Reply to author

Forward