Autoscale volume and pods simultaneously

1,492 views
Skip to first unread message

Montassar Dridi

unread,
Jan 5, 2017, 6:07:42 PM1/5/17
to Kubernetes user discussion and Q&A
Hello!!

I'm using Kubernetes deployment with persistent volume to run my application, but when I try to add more replicas or autoscale, all the new pods try to connect to the same volume.
How can I simultaneously auto create new volumes for each new pod., like statefulsets(petsets) are able to do it.

Vishnu Kannan

unread,
Jan 5, 2017, 6:19:43 PM1/5/17
to Kubernetes user discussion and Q&A
Check out dynamic volumes provisioning here

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.

For more options, visit https://groups.google.com/d/optout.

Montassar Dridi

unread,
Jan 5, 2017, 7:12:50 PM1/5/17
to Kubernetes user discussion and Q&A
thanks for responding
I tried it, still doesn't automatically generate new volumes for the new pods. PetSets/StatefulSets using  "volumeClaimTemplates"  for that. Is there a tool like that for Deployment?


On Thursday, January 5, 2017 at 6:19:43 PM UTC-5, Vishnu Kannan wrote:
Check out dynamic volumes provisioning here
On Thu, Jan 5, 2017 at 3:07 PM, Montassar Dridi <montass...@gmail.com> wrote:
Hello!!

I'm using Kubernetes deployment with persistent volume to run my application, but when I try to add more replicas or autoscale, all the new pods try to connect to the same volume.
How can I simultaneously auto create new volumes for each new pod., like statefulsets(petsets) are able to do it.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

Tim Hockin

unread,
Jan 5, 2017, 7:41:22 PM1/5/17
to kubernet...@googlegroups.com
Can you explain what you're trying to achieve?

Fundamentally, persistent volumes and replication are at odds with
each other. Replication implies fungibility and "all replicas are
identical". Persistent volumes implies "the data matters and is
potentially different".

Now, I can think of a couple cases where this isn't quite so
black-and-white, and we've discussed if/how to implement for those
cases. But I am not going to tell you what they are until you explain
to me what you're trying to do, lest I muddy the water :)

Tim


On Thu, Jan 5, 2017 at 4:12 PM, Montassar Dridi

Montassar Dridi

unread,
Jan 5, 2017, 8:24:02 PM1/5/17
to Kubernetes user discussion and Q&A
Hi Tim,

I'm trying to do something like this example
I have a java web application and MYSQL database running within Kubernetes connected to each other, used "kubernetes Deployment" as the example above.
When I try to increase the number of the web replicas, they all try to connect to that persistent disk that was created from the beginning, and they get stuck not be able to create the new web pods.
So what I want is when I ask for new pods, a unique new persistent disks/volumes should be created and associated for each one of them, like how Statefulsets/PetSets do it.

I appreciate your help and thanks for responding.

Tim Hockin

unread,
Jan 5, 2017, 11:17:26 PM1/5/17
to kubernet...@googlegroups.com
On Thu, Jan 5, 2017 at 5:24 PM, Montassar Dridi
<montass...@gmail.com> wrote:
> Hi Tim,
>
> I'm trying to do something like this example
> https://github.com/kubernetes/kubernetes/tree/master/examples/mysql-wordpress-pd
> I have a java web application and MYSQL database running within Kubernetes
> connected to each other, used "kubernetes Deployment" as the example above.

"connected" via a Service or in the same Pod?

> When I try to increase the number of the web replicas, they all try to
> connect to that persistent disk that was created from the beginning, and
> they get stuck not be able to create the new web pods.

PDs are only able to be mounted read-write by one pod at a time.
That's just a limitation of the block device+filesystem interface.

> So what I want is when I ask for new pods, a unique new persistent
> disks/volumes should be created and associated for each one of them, like
> how Statefulsets/PetSets do it.

Why do you want a new PD for each replica? If it is new, then the
"persistent" nature of it is not valuable, and you can just use plain
inline volumes instead of a claim. But if it is going to be released
with the pod, why use a PD at all? why not just use emptyDir?

Montassar Dridi

unread,
Jan 6, 2017, 12:52:35 AM1/6/17
to Kubernetes user discussion and Q&A


On Thursday, January 5, 2017 at 11:17:26 PM UTC-5, Tim Hockin wrote:
On Thu, Jan 5, 2017 at 5:24 PM, Montassar Dridi
<montass...@gmail.com> wrote:
> Hi Tim,
>
> I'm trying to do something like this example
> https://github.com/kubernetes/kubernetes/tree/master/examples/mysql-wordpress-pd
> I have a java web application and MYSQL database running within Kubernetes
> connected to each other, used "kubernetes Deployment" as the example above.

"connected" via a Service or in the same Pod?

-> yes via service and each one have it's own "Deployment" so separate pods
 

> When I try to increase the number of the web replicas, they all try to
> connect to that persistent disk that was created from the beginning, and
> they get stuck not be able to create the new web pods.

PDs are only able to be mounted read-write by one pod at a time.
That's just a limitation of the block device+filesystem interface.

-> exactly, that's why I need to automatically generate a new PD unique for every newly created pod.
 

> So what I want is when I ask for new pods, a unique new persistent
> disks/volumes should be created and associated for each one of them, like
> how Statefulsets/PetSets do it.

Why do you want a new PD for each replica?  If it is new, then the
"persistent" nature of it is not valuable, and you can just use plain
inline volumes instead of a claim.  But if it is going to be released
with the pod, why use a PD at all?  why not just use emptyDir?

->The reason I'm creating new pods, is trying to anticipate what happens if I turn the Horizontal Pod auto-scaling on. Also I want all the pods to be consistent with each other, if I modify one, I want them all to get the same changes.

Tim Hockin

unread,
Jan 6, 2017, 1:09:56 AM1/6/17
to kubernet...@googlegroups.com
On Thu, Jan 5, 2017 at 9:52 PM, Montassar Dridi
But why do you need PERSISTENT volumes, if you say you want "unique new" disks?

Montassar Dridi

unread,
Jan 6, 2017, 2:33:30 AM1/6/17
to Kubernetes user discussion and Q&A
each new pod gets it's own persistent volume copy/clone of the first persistent volume 

Montassar Dridi

unread,
Jan 6, 2017, 9:42:07 AM1/6/17
to Kubernetes user discussion and Q&A
thank you for your help, appreciate it, I'm gonna use statefulsets

Tim Hockin

unread,
Jan 6, 2017, 11:35:24 AM1/6/17
to kubernet...@googlegroups.com
Ahh, you want to start with a clone of the data, not an empty volume.
Why not use something like git-sync to pull the data down from some
canonical source?

On Thu, Jan 5, 2017 at 11:33 PM, Montassar Dridi

Filip Grzadkowski

unread,
Jan 10, 2017, 4:55:52 PM1/10/17
to kubernet...@googlegroups.com, Marcin Wielgus


--
Filip

> email to kubernetes-users+unsubscribe@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.

> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Naseem Ullah

unread,
Sep 6, 2018, 4:03:28 PM9/6/18
to Kubernetes user discussion and Q&A
Hello,

I have a similar use case to Montassar.

Although I could use emptyDirs, each newly spun pod takes 2-3 minutes to download required data(pod does something similar to git-sync). If volumes could be prepopulated when a new pod is spun it will simply sync the diff, which will drastically reduce startup readiness time.

Any suggestions? Now I have a tradeoff between creating a static number of replicas and creating same number of PVCs , or using HPA but emptyDir volume which increases startup time for the pod.

Thanks,
Naseem

Tim Hockin

unread,
Sep 6, 2018, 4:07:10 PM9/6/18
to Kubernetes user discussion and Q&A, Jing Xu
Deployments and PersistentVolumes are generally not a good
combination. This is what StatefulSets are for.

There's work happening to allow creation of a volume from a snapshot,
but it's only Alpha in the next release.

Naseem Ullah

unread,
Sep 6, 2018, 4:08:12 PM9/6/18
to kubernet...@googlegroups.com, Jing Xu
I see I see.. what about autoscaling statefulsets with an HPA?

Tim Hockin

unread,
Sep 6, 2018, 4:33:18 PM9/6/18
to Kubernetes user discussion and Q&A, Jing Xu
You have to understand what you are asking for. You're saying "this
data is important and needs to be preserved beyond any one pod (a
persistent volume)" but you're also saying "the pods have no identity
because they can scale horizontally". These are mutually incompatible
statements.

You really want a shared storage API, not volumes...

Naseem Ullah

unread,
Sep 6, 2018, 4:52:11 PM9/6/18
to Kubernetes user discussion and Q&A
I do not think you have to understand what you are asking for, I've learned a lot by asking questions I only half understood :) With that said autoscaling sts was a question and not a feature request :)

I do not see how "data is important, and needs to be preserved" and "pods (compute) have no identity" are mutually incompatible statements but if you say so. :)

In any case the data is more or less important since it is fetchable, its just that if its already there when a new pod is spun up, or when a deployment is updated and a new pod created, it speeds up the startup time drastically (virtually immediate vs 2-3 minutes to sync)

Shared storage would be ideal. (deployement with hpa, with a mounted nfs vol) But I get OOM errors when using RWX persistent volume and +1 pods are syncing the same data to that volume at the same time, i do not know why these OOM errors occur.  But maybe that has something to do with the code running that syncs the data. RWX seems to be a recurring challenge.

Jing Xu

unread,
Sep 6, 2018, 6:10:45 PM9/6/18
to Kubernetes user discussion and Q&A
Naseem, for your volume data prepopulated request, like Tim mentioned, we now have volume snapshot which will be available in v1.12 as alpha feature.  This allows you to create volume snapshots from volume (PVC). With snapshot available, you can create a new volume (PVC) from snapshot as the data source. So the volume will have data prepopulated. We also plan to work on data clone and population features which allow you to clone data from one PVC to another one or prepopulate data from some data source. Please let us know if you have any questions about it or any requirements for the feature. Thanks!

Naseem Ullah

unread,
Sep 6, 2018, 6:20:53 PM9/6/18
to kubernet...@googlegroups.com
Thank you Jing and thank you Tim. 
If this feature will allow HPA enabled deployment managed pods to spawn with a prepopulated volume each, that would be nice. If not, using emptyDir with a 2 minute startup delay as the data is synced for each new pod is what it is.
PS would be nice if GKE had a RWX SC out of the box.
Cheers

David Rosenstrauch

unread,
Sep 6, 2018, 6:33:39 PM9/6/18
to kubernet...@googlegroups.com
FWIW, I recently ran into a similar issue, and the way I handled it was
to have each of the pods mount an NFS shared file system as a PV (AWS
EFS, in my case) and have each pod write its output into a directory on
the NFS share. The only issue then is just to make sure that each pod
writes it's output to a file that has a unique name (e.g., has the pod
name or ID in the file name) so that the pods don't overwrite each
other's data.

HTH,

DR

On 9/6/18 4:33 PM, 'Tim Hockin' via Kubernetes user discussion and Q&A
wrote:

Tim Hockin

unread,
Sep 6, 2018, 11:34:00 PM9/6/18
to Kubernetes user discussion and Q&A


On Thu, Sep 6, 2018, 1:52 PM Naseem Ullah <nas...@transit.app> wrote:
I do not think you have to understand what you are asking for, I've learned a lot by asking questions I only half understood :) With that said autoscaling sts was a question and not a feature request :)

LOL, fair enough


I do not see how "data is important, and needs to be preserved" and "pods (compute) have no identity" are mutually incompatible statements but if you say so. :)

Think of it this way - if the deployment scales up, clearly we should add more volumes.  What do I do if it scales down?  Delete the volumes?  Hold on to them for some later scale-up?  For how long?  How many volumes?   

Fundamentally, the persistent volume abstraction is wrong for what you want here.  We have talked about something like volume pools which would be the storage equivalent of deployments, but we have found very few use cases where that seems to be the best abstraction.

E.g. In this case, the data seems to be some sort of cache of recreatable data.  Maybe you really want a cache?

In any case the data is more or less important since it is fetchable, its just that if its already there when a new pod is spun up, or when a deployment is updated and a new pod created, it speeds up the startup time drastically (virtually immediate vs 2-3 minutes to sync)

Do you need all the data right away or can it be copied in on-demand?  

Shared storage would be ideal. (deployement with hpa, with a mounted nfs vol) But I get OOM errors when using RWX persistent volume and +1 pods are syncing the same data to that volume at the same time, i do not know why these OOM errors occur.  But maybe that has something to do with the code running that syncs the data. RWX seems to be a recurring challenge.

RWX is challenging because block devices generally don't support it at all, and the only mainstream FS that does is NFS, and well... NFS...

Tim Hockin

unread,
Sep 6, 2018, 11:37:43 PM9/6/18
to Kubernetes user discussion and Q&A


On Thu, Sep 6, 2018, 3:20 PM Naseem Ullah <nas...@transit.app> wrote:
Thank you Jing and thank you Tim. 
If this feature will allow HPA enabled deployment managed pods to spawn with a prepopulated volume each, that would be

This feature enables start-from-snapshot volumes but does not fundamentally alter the model.

nice. If not, using emptyDir with a 2 minute startup delay as the data is synced for each new pod is what it is.
PS would be nice if GKE had a RWX SC out of the box.

We have cloud file store (
https://cloud.google.com/filestore/) if that works for you, but this is not a Google productailing list, so I won't advertise any more. :)

Tim Hockin

unread,
Sep 6, 2018, 11:38:26 PM9/6/18
to Kubernetes user discussion and Q&A


On Thu, Sep 6, 2018, 3:33 PM David Rosenstrauch <dar...@darose.net> wrote:
FWIW, I recently ran into a similar issue, and the way I handled it was
to have each of the pods mount an NFS shared file system as a PV (AWS
EFS, in my case) and have each pod write its output into a directory on
the NFS share.  The only issue then is just to make sure that each pod
writes it's output to a file that has a unique name (e.g., has the pod
name or ID in the file name) so that the pods don't overwrite each
other's data.

As pods come and go - don't you eventually waste the disk space?

Naseem Ullah

unread,
Sep 7, 2018, 12:58:47 AM9/7/18
to Kubernetes user discussion and Q&A

On Thursday, September 6, 2018 at 11:37:43 PM UTC-4, Tim Hockin wrote:


On Thu, Sep 6, 2018, 3:20 PM Naseem Ullah <nas...@transit.app> wrote:
Thank you Jing and thank you Tim. 
If this feature will allow HPA enabled deployment managed pods to spawn with a prepopulated volume each, that would be

This feature enables start-from-snapshot volumes but does not fundamentally alter the model.

Ah I see  I see. 

nice. If not, using emptyDir with a 2 minute startup delay as the data is synced for each new pod is what it is.
PS would be nice if GKE had a RWX SC out of the box.

We have cloud file store (
https://cloud.google.com/filestore/) if that works for you, but this is not a Google productailing list, so I won't advertise any more. :)

Thanks!! Filestore seemed very promising until I saw the pricing hehe... living dangerously for now with the nfs-provisioner chart running in prod :) 

Naseem Ullah

unread,
Sep 7, 2018, 1:01:49 AM9/7/18
to Kubernetes user discussion and Q&A
Interesting approach indeed! Also curious about "As pods come and go - don't you eventually waste the disk space?"

Naseem Ullah

unread,
Sep 7, 2018, 1:05:32 AM9/7/18
to Kubernetes user discussion and Q&A


On Thursday, September 6, 2018 at 11:34:00 PM UTC-4, Tim Hockin wrote:


On Thu, Sep 6, 2018, 1:52 PM Naseem Ullah <nas...@transit.app> wrote:
I do not think you have to understand what you are asking for, I've learned a lot by asking questions I only half understood :) With that said autoscaling sts was a question and not a feature request :)

LOL, fair enough


I do not see how "data is important, and needs to be preserved" and "pods (compute) have no identity" are mutually incompatible statements but if you say so. :)

Think of it this way - if the deployment scales up, clearly we should add more volumes.  What do I do if it scales down?  Delete the volumes?  Hold on to them for some later scale-up?  For how long?  How many volumes?   

Fundamentally, the persistent volume abstraction is wrong for what you want here.  We have talked about something like volume pools which would be the storage equivalent of deployments, but we have found very few use cases where that seems to be the best abstraction.

E.g. In this case, the data seems to be some sort of cache of recreatable data.  Maybe you really want a cache?


Now that I understood!
Maybe I want do want a cache of sorts?
Basically a bunch of objects are downloaded based on a query. The pods continue to listen to a queue of sorts and if something changes a different object may be dowloaded, another that is no longer needed be deleted and so on, this is what i call the "diff". It's that initial download that takes 2-3 minutes.
 
In any case the data is more or less important since it is fetchable, its just that if its already there when a new pod is spun up, or when a deployment is updated and a new pod created, it speeds up the startup time drastically (virtually immediate vs 2-3 minutes to sync)

Do you need all the data right away or can it be copied in on-demand? 

Need all the data (objects) as they should be at all times. it has to match what was queried/continues to be listened to. 
 

Shared storage would be ideal. (deployement with hpa, with a mounted nfs vol) But I get OOM errors when using RWX persistent volume and +1 pods are syncing the same data to that volume at the same time, i do not know why these OOM errors occur.  But maybe that has something to do with the code running that syncs the data. RWX seems to be a recurring challenge.

RWX is challenging because block devices generally don't support it at all, and the only mainstream FS that does is NFS, and well... NFS...

I see... and yes amen to that! 

David Rosenstrauch

unread,
Sep 7, 2018, 9:15:03 AM9/7/18
to kubernet...@googlegroups.com


On 9/6/18 11:38 PM, 'Tim Hockin' via Kubernetes user discussion and Q&A
wrote:
> On Thu, Sep 6, 2018, 3:33 PM David Rosenstrauch <dar...@darose.net> wrote:
>
>> FWIW, I recently ran into a similar issue, and the way I handled it was
>> to have each of the pods mount an NFS shared file system as a PV (AWS
>> EFS, in my case) and have each pod write its output into a directory on
>> the NFS share. The only issue then is just to make sure that each pod
>> writes it's output to a file that has a unique name (e.g., has the pod
>> name or ID in the file name) so that the pods don't overwrite each
>> other's data.
>>
>
> As pods come and go - don't you eventually waste the disk space?

We have other processes that then slurp the logs up into Elastic Search,
and (eventually) purge old ones.

DR

Naseem Ullah

unread,
Sep 7, 2018, 9:31:54 AM9/7/18
to kubernet...@googlegroups.com
PS



Think of it this way - if the deployment scales up, clearly we should add more volumes.  What do I do if it scales down?  Delete the volumes?  Hold on to them for some later scale-up?  For how long?  How many volumes?   

Seeing that scaling happens 1 pod at a time, having n+1 volumes where n is the number of current pod replicas would make the most sense. Especially if time to provision a volume < time to determine another pod is needed and to scale up by 1 pod.
Reply all
Reply to author
Forward
0 new messages