Over the past 6 months, we have discussed (at length) on what is the best direction for Ganeti, given that the current code-base, while working well for us, has accumulated a lot of technical debt in terms of internal architecture deficiencies, programming language issues, testability, etc.
The main pain points we have identified are:
- the master daemon is a big, monolithic entity that contains a lot of components; changes in any component (OpCodes, Logical Units, objects, etc.) have cascading and sometimes unpredictable effects on the other components - the dynamic nature of Python makes any static analysis very hard (or impossible), which makes the previous point even more painful - the multi-threaded nature of the master daemon creates scalability problems in Python (which cannot take advantage of SMP/multi-core environments) - currently, the internal architecture relies on mutable objects, which creates additional problems regarding the consistency of internal data structures
As you know, Ganeti is written mainly in Python, with a sub-component in Haskell and small bits on other glue languages (autoconf, automake, make, shell, etc.). Over the (shorter) history of the htools component, we have seen significant advantages from the functional aspects, especially related to the above pain points.
So we have decided to make an experiment and see whether using Haskell in more parts of the code base would help some of the issues we have. We have in mind a staged use of Haskell for other parts of Ganeti (besides htools), which goes roughly like follows:
- 2.6 release (April/May): - alternative "confd" daemon implementation in Haskell; this is fully optional and people can select at build time whether to use the Python or Haskell version - 2.7 release (summer/autumn): - move query functionality from (Python) master daemon to the "confd" daemon (Haskell); again, this will be optional and selected at build time - provide one new, optional CLI tool (``gnt-*``) in Haskell; this will have some bits of new functionality that will not be exposed in a Python version
After the 2.7 release, we should be able to make a reasonable assessment on the feasibility of using Haskell more pervasively in the Ganeti code base, and on the downsides of doing so. Assuming it all goes well, it means that starting with the 2.8 release, Haskell will become required for building Ganeti. Of course, if we stumble and this plan doesn't work well, we can still revert the changes, since most of the new components are optional/extra.
Any interface that Ganeti provides right now (CLI: the ``gnt-*`` programs, LUXI: internal interface, cluster RPC: internal interface, RAPI: external interface, IAllocator: external interface) will remain the same, so any program that talks to Ganeti will still be able to do so in the new world. We will keep providing the current Python RAPI client (and we might look into providing a Haskell RAPI client).
From the point of view of building Ganeti from source and using it, it will mean that:
- the Haskell compiler/platform will be needed at ./configure and build time, together will all libraries - on the machines that Ganeti will actually run on, there will be no extra dependencies, on the contrary, we might have some reduction in libraries needed at run-time
From the point of view of using Ganeti from a distribution:
- we will continue to provide Ganeti packages in Debian, and if possible, in backports - any distribution that has the Haskell platform plus the list of libraries we use (not too long) should have no issues in building and providing packages - if you are a distribution user, installing Ganeti should bring it at worse the same dependency list, at best fewer dependencies
This is a long-term plan (2.7 will not come before autumn), so it would be a good idea to discuss this now. We're especially looking at feedback from users and contributors/developers on:
- as an end-user, would this change impact you at all? do you have any concerns regarding it? - would it impact you if Ganeti requires Haskell to build/install? do you see deployment issues or such? - would it impact you if the Ganeti code-base is written mostly in Haskell, with regards to contributing small patches or adding new features?
Of course, any other feedback on this topic is welcome.
I've thought about this a bit. It seems that your proposal make sense. As an end user, it seems that the only downside is the increased probability of bugs during the transition. This is something that I'm willing to live with in general when progress is at stake, and in particular Ganeti delivers such advantages that some extra inconvenience is more than offset. And if I have to learn a bit of Haskell to fix something at some point in the future - well, that's what they pay me for at $DAYJOB.
So basically, go for it!
Hopefully the Haskell build environment for RHEL/CentOS/SL is sane and Jun doesn't get a headache making us RPMS.
On Thursday, April 19, 2012 1:18:20 PM UTC-4, Iustin Pop wrote:
> Hi all,
> Over the past 6 months, we have discussed (at length) on what is the
> best direction for Ganeti, given that the current code-base, while
> working well for us, has accumulated a lot of technical debt in terms
> of internal architecture deficiencies, programming language issues,
> testability, etc.
> The main pain points we have identified are:
> - the master daemon is a big, monolithic entity that contains a lot of
> components; changes in any component (OpCodes, Logical Units,
> objects, etc.) have cascading and sometimes unpredictable effects on
> the other components
> - the dynamic nature of Python makes any static analysis very hard (or
> impossible), which makes the previous point even more painful
> - the multi-threaded nature of the master daemon creates scalability
> problems in Python (which cannot take advantage of SMP/multi-core
> environments)
> - currently, the internal architecture relies on mutable objects,
> which creates additional problems regarding the consistency of
> internal data structures
> As you know, Ganeti is written mainly in Python, with a sub-component
> in Haskell and small bits on other glue languages (autoconf, automake,
> make, shell, etc.). Over the (shorter) history of the htools
> component, we have seen significant advantages from the functional
> aspects, especially related to the above pain points.
> So we have decided to make an experiment and see whether using Haskell
> in more parts of the code base would help some of the issues we
> have. We have in mind a staged use of Haskell for other parts of
> Ganeti (besides htools), which goes roughly like follows:
> - 2.6 release (April/May):
> - alternative "confd" daemon implementation in Haskell; this is
> fully optional and people can select at build time whether to use
> the Python or Haskell version
> - 2.7 release (summer/autumn):
> - move query functionality from (Python) master daemon to the
> "confd" daemon (Haskell); again, this will be optional and
> selected at build time
> - provide one new, optional CLI tool (``gnt-*``) in Haskell; this
> will have some bits of new functionality that will not be exposed
> in a Python version
> After the 2.7 release, we should be able to make a reasonable
> assessment on the feasibility of using Haskell more pervasively in the
> Ganeti code base, and on the downsides of doing so. Assuming it all
> goes well, it means that starting with the 2.8 release, Haskell will
> become required for building Ganeti. Of course, if we stumble and this
> plan doesn't work well, we can still revert the changes, since most of
> the new components are optional/extra.
> Any interface that Ganeti provides right now (CLI: the ``gnt-*``
> programs, LUXI: internal interface, cluster RPC: internal interface,
> RAPI: external interface, IAllocator: external interface) will remain
> the same, so any program that talks to Ganeti will still be able to do
> so in the new world. We will keep providing the current Python RAPI
> client (and we might look into providing a Haskell RAPI client).
> From the point of view of building Ganeti from source and using it, it
> will mean that:
> - the Haskell compiler/platform will be needed at ./configure and
> build time, together will all libraries
> - on the machines that Ganeti will actually run on, there will be no
> extra dependencies, on the contrary, we might have some reduction in
> libraries needed at run-time
> From the point of view of using Ganeti from a distribution:
> - we will continue to provide Ganeti packages in Debian, and if
> possible, in backports
> - any distribution that has the Haskell platform plus the list of
> libraries we use (not too long) should have no issues in building
> and providing packages
> - if you are a distribution user, installing Ganeti should bring it at
> worse the same dependency list, at best fewer dependencies
> This is a long-term plan (2.7 will not come before autumn), so it
> would be a good idea to discuss this now. We're especially looking at
> feedback from users and contributors/developers on:
> - as an end-user, would this change impact you at all? do you have any
> concerns regarding it?
> - would it impact you if Ganeti requires Haskell to build/install? do
> you see deployment issues or such?
> - would it impact you if the Ganeti code-base is written mostly in
> Haskell, with regards to contributing small patches or adding new
> features?
> Of course, any other feedback on this topic is welcome.
First, I am somewhat of two minds as to whether to respond to this thread or not. I am not a ganeti developer, although I did push a small fix last year.
My real concern is that ganeti is too monolithic. Perhaps my understanding is wrong here, but I think there needs to be more of a separation between configuration and operations. By my way of thinking, it would really help maintainability if the process model for ganeti operations were more like qmail.
Again, I am probably speaking from lack of knowledge here, but I would find it much easier to navigate ganeti code if each operation were a separate command-line driven module written in any language. As a simple example, I think that booting an instance should be a simple script that is shelled on the target host with all parameters on the command line. This way, interactions between the operations and the management functions are well defined. This also allows you to develop/test these operational scripts without dragging in the world.
This type of separation would probably hurt performance but in the end, I think overall performance would improve because individual operations could be tuned without impacting the center. For example, migrating a virtual from server to server would be a black box and replacing dd over ssh with some other, faster, transport would become a safe optimization that did not involve the central code base. Similarly, if operations are shelled black boxed, I can envision a lot of new features that are all doable in the context of KVM/XEN but are difficult to plumb into the current structure.
None of these have much to do with the programming language involved. I suspect that many of the operation "scripts" would end up in bash.
One other aspect of designing stuff this way is that debugging a live server is a lot easier. 'ps' becomes a very useful tool that shows you what is actually happening.
Again, my two cents, which are worth even less.
My "feature" list that I would like to implement involve exploiting how KVM/XEN and more virtuals around and how this should allow neat stuff like "live convert from plain to DRBM" or migration of a plain instance to another instance without shutting it down by converting the host to DRBD and back on the fly. A smaller function would be writing a socket helper to replace dd over ssh for copying volumes.
I know how to do this in the current code, but having a separation of configuration and operations would make this a lot easier.
Finally, I don't want anyone taking any of this as a complaint. I will work with, and hopefully work to help any code base that I can find time for.
On Fri, Apr 20, 2012 at 11:27:38AM -0700, Mart n B. wrote: > Hi Iustin & Ganeti Devs,
> I've thought about this a bit. It seems that your proposal make sense. As > an end user, it seems that the only downside is the > increased probability of bugs during the transition. This is something that > I'm willing to live with in general when progress is at stake, and in > particular Ganeti delivers such advantages that some extra inconvenience is > more than offset.
Yes, during the transition period we might have more bugs, but one of the goals of the dynamic to static typing transition is to have less bugs, _long term_.
> And if I have to learn a bit of Haskell to fix something > at some point in the future - well, that's what they pay me for at $DAYJOB.
> So basically, go for it!
thanks!
> Hopefully the Haskell build environment for RHEL/CentOS/SL is sane and Jun > doesn't get a headache making us RPMS.
Heh. For Debian at least, it is sane, so related distros (e.g. Ubuntu) shouldn't have big problems.
On Fri, Apr 20, 2012 at 05:17:49PM -0700, Doug Dumitru wrote: > Hello all,
> First, I am somewhat of two minds as to whether to respond to this > thread or not. I am not a ganeti developer, although I did push a > small fix last year.
> My real concern is that ganeti is too monolithic. Perhaps my > understanding is wrong here, but I think there needs to be more of a > separation between configuration and operations. By my way of > thinking, it would really help maintainability if the process model > for ganeti operations were more like qmail.
This is also a concern of us. In hindsight, the design of Ganeti 2.0 with a monolithic, multi-threaded daemon was bad, but as they say, hindsight is 20/20 :)
> Again, I am probably speaking from lack of knowledge here, but I would > find it much easier to navigate ganeti code if each operation were a > separate command-line driven module written in any language. As a > simple example, I think that booting an instance should be a simple > script that is shelled on the target host with all parameters on the > command line. This way, interactions between the operations and the > management functions are well defined. This also allows you to > develop/test these operational scripts without dragging in the world.
> This type of separation would probably hurt performance but in the > end, I think overall performance would improve because individual > operations could be tuned without impacting the center.
Performance is indeed a concern. Benchmarks on the current codebase shows that starting up Python with all our imports is extremely expensive (around 100ms), so that you'd be limited at 10 jobs at max per second (without optimisations).
That means, shutting down a 1000-instances cluster would take around two minutes just in _starting_ the jobs (in practice, it is much much slower due to other issues). My goal is to have a job engine that can process null opcodes in the realm of 1000 jobs per second.
> For example, > migrating a virtual from server to server would be a black box and > replacing dd over ssh with some other, faster, transport would become > a safe optimization that did not involve the central code base. > Similarly, if operations are shelled black boxed, I can envision a lot > of new features that are all doable in the context of KVM/XEN but are > difficult to plumb into the current structure.
Ack.
> None of these have much to do with the programming language involved. > I suspect that many of the operation "scripts" would end up in bash.
> One other aspect of designing stuff this way is that debugging a live > server is a lot easier. 'ps' becomes a very useful tool that shows > you what is actually happening.
Totally agreed.
> Again, my two cents, which are worth even less.
> My "feature" list that I would like to implement involve exploiting > how KVM/XEN and more virtuals around and how this should allow neat > stuff like "live convert from plain to DRBM" or migration of a plain > instance to another instance without shutting it down by converting > the host to DRBD and back on the fly. A smaller function would be > writing a socket helper to replace dd over ssh for copying volumes.
I see. We never thought about modularity at this level, interesting.
> I know how to do this in the current code, but having a separation of > configuration and operations would make this a lot easier.
> Finally, I don't want anyone taking any of this as a complaint. I > will work with, and hopefully work to help any code base that I can > find time for.
Thanks for the feedback, very well written.
We discussed about splitting the masterd (the logical units) into separate processes, however there are a few downsides for doing that at the current moment.
The proposed change for 2.7 is to split the query paths outside masterd, so that what masterd does becomes more focused - a pure job execution engine.
Later, we have some rough plans of splitting the configuration and locking entirely out of masterd, so that all the "core" logic is separated out of the individual job execution part. At that stage, the practical difference between running multi-threaded or running multi-process is low, so we could change to whatever model is best.
But we can't make this transition today, or very quickly. Hence the staged approach, because in the meantime we still have to release and deliver features incrementally.
I'm somewhat against the idea of adding another language to it (or making it more prominent). Just on a gut feel level it seems like mixing the two languages will cause more work in the long run. A complete conversion I can see advantages to. Although it would mean learning a new language for me.
Making a bunch of architectural changes at the same time as language changes seems particularly fraught. I'd be inclined to see how it goes breaking down the master daemon, and the other architectural changes goes towards reducing the desire for adding languages.
I wonder if your seeing the grass as greener with the new language, this will solve all out problems etc etc.
Things I'd like to see moving forwards in terms of features are primarily better support for generic instances. I am using ganeti to run a clients office, they have a nice pair of Dell R210 II's and 8 VM's, all of different flavours, windows, linux, bsd etc so I have always used the instance+image but I've never really felt like its the way its meant to be used, integrating instance+image into the base distro would probably help with this.
A simple way to snapshot instances would be nice (even if it was disk only, and even offline only would be ok) so I could snapshot the mail server, do the upgrade and roll back if it all goes horribly wrong.
Oh the ability to change the drbd cache mode. Disk writes are pretty crappy and I'll take the performance gain over the risk of failure, particularly if it can be easily changed, IE if I'm installing windows or something it doesn't really matter if it gets hosed or copying in a bunch of files from the old file server, I can just re-do it on the .01% chance of a failure, but in the meantime the performance gain would be really nice lol.
That said. Its really working rather well providing HA without needing a dedicated storage back end (which is the reason I picked it), they had a power failure the other week that took out one server, everything was up and running on the remaining node inside 10 minutes which isn't bad for the first time I've needed to do it.
> Over the past 6 months, we have discussed (at length) on what is the > best direction for Ganeti, given that the current code-base, while > working well for us, has accumulated a lot of technical debt in terms > of internal architecture deficiencies, programming language issues, > testability, etc.
> The main pain points we have identified are:
> - the master daemon is a big, monolithic entity that contains a lot of > components; changes in any component (OpCodes, Logical Units, > objects, etc.) have cascading and sometimes unpredictable effects on > the other components > - the dynamic nature of Python makes any static analysis very hard (or > impossible), which makes the previous point even more painful > - the multi-threaded nature of the master daemon creates scalability > problems in Python (which cannot take advantage of SMP/multi-core > environments) > - currently, the internal architecture relies on mutable objects, > which creates additional problems regarding the consistency of > internal data structures
> As you know, Ganeti is written mainly in Python, with a sub-component > in Haskell and small bits on other glue languages (autoconf, automake, > make, shell, etc.). Over the (shorter) history of the htools > component, we have seen significant advantages from the functional > aspects, especially related to the above pain points.
> So we have decided to make an experiment and see whether using Haskell > in more parts of the code base would help some of the issues we > have. We have in mind a staged use of Haskell for other parts of > Ganeti (besides htools), which goes roughly like follows:
> - 2.6 release (April/May): > - alternative "confd" daemon implementation in Haskell; this is > fully optional and people can select at build time whether to use > the Python or Haskell version > - 2.7 release (summer/autumn): > - move query functionality from (Python) master daemon to the > "confd" daemon (Haskell); again, this will be optional and > selected at build time > - provide one new, optional CLI tool (``gnt-*``) in Haskell; this > will have some bits of new functionality that will not be exposed > in a Python version
> After the 2.7 release, we should be able to make a reasonable > assessment on the feasibility of using Haskell more pervasively in the > Ganeti code base, and on the downsides of doing so. Assuming it all > goes well, it means that starting with the 2.8 release, Haskell will > become required for building Ganeti. Of course, if we stumble and this > plan doesn't work well, we can still revert the changes, since most of > the new components are optional/extra.
> Any interface that Ganeti provides right now (CLI: the ``gnt-*`` > programs, LUXI: internal interface, cluster RPC: internal interface, > RAPI: external interface, IAllocator: external interface) will remain > the same, so any program that talks to Ganeti will still be able to do > so in the new world. We will keep providing the current Python RAPI > client (and we might look into providing a Haskell RAPI client).
> From the point of view of building Ganeti from source and using it, it > will mean that:
> - the Haskell compiler/platform will be needed at ./configure and > build time, together will all libraries > - on the machines that Ganeti will actually run on, there will be no > extra dependencies, on the contrary, we might have some reduction in > libraries needed at run-time
> From the point of view of using Ganeti from a distribution:
> - we will continue to provide Ganeti packages in Debian, and if > possible, in backports > - any distribution that has the Haskell platform plus the list of > libraries we use (not too long) should have no issues in building > and providing packages > - if you are a distribution user, installing Ganeti should bring it at > worse the same dependency list, at best fewer dependencies
> This is a long-term plan (2.7 will not come before autumn), so it > would be a good idea to discuss this now. We're especially looking at > feedback from users and contributors/developers on:
> - as an end-user, would this change impact you at all? do you have any > concerns regarding it? > - would it impact you if Ganeti requires Haskell to build/install? do > you see deployment issues or such? > - would it impact you if the Ganeti code-base is written mostly in > Haskell, with regards to contributing small patches or adding new > features?
> Of course, any other feedback on this topic is welcome.
On Sat, Apr 21, 2012 at 08:36:53PM +1000, Jake Anderson wrote: > I'm somewhat against the idea of adding another language to it (or > making it more prominent). > Just on a gut feel level it seems like mixing the two languages will > cause more work in the long run. > A complete conversion I can see advantages to. Although it would > mean learning a new language for me.
Just to be clear: the current plan is for evaluating more use of Haskell. What I posted is not the "end" of the road, just the next few steps.
If everything goes well, and we see the improvements we expect, then yes, we'll have more work to do (conversion). But we don't know yet, so we can't say "the end goal is a full conversion".
> Making a bunch of architectural changes at the same time as language > changes seems particularly fraught.
That's why we make the changes optional - you will be able to choose either Python or Haskell version of confd, and similar for the query infrastructure.
Rest assured that continued stability is in our own direct interest!
> I'd be inclined to see how it goes breaking down the master daemon, > and the other architectural changes goes towards reducing the desire > for adding languages.
> I wonder if your seeing the grass as greener with the new language, > this will solve all out problems etc etc.
No, it will definitely not solve all our problems, that's for sure.
But since we anyway have to refactor _heavily_, we want to see if refactoring in a language which we know will improve on _some_ aspects sounds like something worth trying.
> Things I'd like to see moving forwards in terms of features are > primarily better support for generic instances. > I am using ganeti to run a clients office, they have a nice pair of > Dell R210 II's and 8 VM's, all of different flavours, windows, > linux, bsd etc so I have always used the instance+image but I've > never really felt like its the way its meant to be used, integrating > instance+image into the base distro would probably help with this.
Hmm… interesting point. We should discuss more about this.
> A simple way to snapshot instances would be nice (even if it was > disk only, and even offline only would be ok) so I could snapshot > the mail server, do the upgrade and roll back if it all goes > horribly wrong.
Noted.
> Oh the ability to change the drbd cache mode. Disk writes are pretty > crappy and I'll take the performance gain over the risk of failure, > particularly if it can be easily changed, IE if I'm installing > windows or something it doesn't really matter if it gets hosed or > copying in a bunch of files from the old file server, I can just > re-do it on the .01% chance of a failure, but in the meantime the > performance gain would be really nice lol.
We already disable the barriers, and we'll have disk parameters already in 2.6. Note this is not A/B/C protocol tuning, but we're moving towards that.
> That said.
:)
> Its really working rather well providing HA without needing a > dedicated storage back end (which is the reason I picked it), they > had a power failure the other week that took out one server, > everything was up and running on the remaining node inside 10 > minutes which isn't bad for the first time I've needed to do it.
Glad to hear.
Again, thanks a lot for the feedback, especially the missing features!
On Thu, Apr 19, 2012 at 07:18:20pm +0200, Iustin Pop wrote:
> Over the past 6 months, we have discussed (at length) on what is the
> best direction for Ganeti, given that the current code-base, while
> working well for us, has accumulated a lot of technical debt in terms
> of internal architecture deficiencies, programming language issues,
> testability, etc.
> The main pain points we have identified are:
> - the master daemon is a big, monolithic entity that contains a lot of
> components; changes in any component (OpCodes, Logical Units,
> objects, etc.) have cascading and sometimes unpredictable effects on
> the other components
> - the dynamic nature of Python makes any static analysis very hard (or
> impossible), which makes the previous point even more painful
> - the multi-threaded nature of the master daemon creates scalability
> problems in Python (which cannot take advantage of SMP/multi-core
> environments)
With regard to this, have you identified problems with Python's not supporting
kernel-level multithreading? Our impression is multi-threading is used in the
master to provide multiple execution contexts, rather than to take advantage
of SMPs/multicores.
To put it differently, it's more of a matter of non-blocking operation of the
daemon: the daemon issues requests and blocks for responses. To make the
analogy, wouldn't the model of a select()-based event loop inside a *single*
process work for the master? Is there something that makes it computationally
intensive, so scalability with python-based threading suffers? It seems
the master's scalability is more impacted by locking, more on this below.
> - currently, the internal architecture relies on mutable objects,
> which creates additional problems regarding the consistency of
> internal data structures
> >From the point of view of building Ganeti from source and using it, it
> will mean that:
> - the Haskell compiler/platform will be needed at ./configure and
> build time, together will all libraries
> - on the machines that Ganeti will actually run on, there will be no
> extra dependencies, on the contrary, we might have some reduction in
> libraries needed at run-time
> >From the point of view of using Ganeti from a distribution:
> - we will continue to provide Ganeti packages in Debian, and if
> possible, in backports
> - any distribution that has the Haskell platform plus the list of
> libraries we use (not too long) should have no issues in building
> and providing packages
> - if you are a distribution user, installing Ganeti should bring it at
> worse the same dependency list, at best fewer dependencies
This is great! As long as you continue to provide Debian packages, and
your dependencies are part of Debian, or are backported, we'll be able
to run Ganeti and experiment with own additions/contributions.
> This is a long-term plan (2.7 will not come before autumn), so it
> would be a good idea to discuss this now. We're especially looking at
> feedback from users and contributors/developers on:
> - as an end-user, would this change impact you at all? do you have any
> concerns regarding it?
> - would it impact you if Ganeti requires Haskell to build/install? do
> you see deployment issues or such?
> - would it impact you if the Ganeti code-base is written mostly in
> Haskell, with regards to contributing small patches or adding new
> features?
Haskell does create a steeper learning curve and raises the entry
barrier significantly. Python is ubiquitous, every sysadmin knows a bit,
every sysadmin can dabble in the code and fix minor issues. On the
other hand, if you believe this is the way to go, and ensure a proper
fallback plan along the way, we trust your judgement completely. You
seem to have a solid plan to introduce Haskell gradually.
> Of course, any other feedback on this topic is welcome.
Regarding the focus of 2.7, I think you have also mentioned it, we've begun
looking into scalability of the locking mechanism in the master daemon. How do
you believe the transition to Haskell is going to impact this? Currently,
there are places where the master daemon locks the whole cluster, e.g.,
while the iallocator runs.
How do you see handling locking evolve in the future, with and without
Haskell's increased presence in the code? you mentioned separating the
locking mechanism from the rest of the master daemon. How would this impact
lock handling, and do you have plans to explore less strict locking in favor of
higher-throughput processing?
On Sat, Apr 21, 2012 at 04:40:43pm +0200, Iustin Pop wrote:
> > Things I'd like to see moving forwards in terms of features are
> > primarily better support for generic instances.
> > I am using ganeti to run a clients office, they have a nice pair of
> > Dell R210 II's and 8 VM's, all of different flavours, windows,
> > linux, bsd etc so I have always used the instance+image but I've
> > never really felt like its the way its meant to be used, integrating
> > instance+image into the base distro would probably help with this.
Hello Jake,
Regarding support for generic instances, please take a look at
snf-image:
It's a Ganeti OS definition we have developed, which tries to simplify
the deployment of Ganeti instances from predefined images. We've been
running it for some time now in production (Debian/Ubuntu/Fedora/Windows
instances), and it has proven quite helpful.
It has a number of features [untrusted image deployment in volatile VM,
setting hostname/passwords, file injection], it'd be great if you could
give it a try and give any feedback you may have.
> On Thu, Apr 19, 2012 at 07:18:20pm +0200, Iustin Pop wrote:
> > Over the past 6 months, we have discussed (at length) on what is the
> > best direction for Ganeti, given that the current code-base, while
> > working well for us, has accumulated a lot of technical debt in terms
> > of internal architecture deficiencies, programming language issues,
> > testability, etc.
> > The main pain points we have identified are:
> > - the master daemon is a big, monolithic entity that contains a lot of
> > components; changes in any component (OpCodes, Logical Units,
> > objects, etc.) have cascading and sometimes unpredictable effects on
> > the other components
> > - the dynamic nature of Python makes any static analysis very hard (or
> > impossible), which makes the previous point even more painful
> > - the multi-threaded nature of the master daemon creates scalability
> > problems in Python (which cannot take advantage of SMP/multi-core
> > environments)
> With regard to this, have you identified problems with Python's not supporting
> kernel-level multithreading? Our impression is multi-threading is used in the
> master to provide multiple execution contexts, rather than to take advantage
> of SMPs/multicores.
That is because CPython doesn't support SMP (not from pure Python code,
at least). Definitely _some_ of the operations are independed and could
use multiple cores.
> To put it differently, it's more of a matter of non-blocking operation of the
> daemon: the daemon issues requests and blocks for responses. To make the
> analogy, wouldn't the model of a select()-based event loop inside a *single*
> process work for the master? Is there something that makes it computationally
> intensive, so scalability with python-based threading suffers? It seems
> the master's scalability is more impacted by locking, more on this below.
In certain workloads, yes, we have seen cases where masterd is limited
CPU-wise. Indeed, most of the work is request/response, but after a
while, the single CPU model is too slow. Note that this is one example,
and not the most important one.
> > - currently, the internal architecture relies on mutable objects,
> > which creates additional problems regarding the consistency of
> > internal data structures
> > >From the point of view of building Ganeti from source and using it, it
> > will mean that:
> > - the Haskell compiler/platform will be needed at ./configure and
> > build time, together will all libraries
> > - on the machines that Ganeti will actually run on, there will be no
> > extra dependencies, on the contrary, we might have some reduction in
> > libraries needed at run-time
> > >From the point of view of using Ganeti from a distribution:
> > - we will continue to provide Ganeti packages in Debian, and if
> > possible, in backports
> > - any distribution that has the Haskell platform plus the list of
> > libraries we use (not too long) should have no issues in building
> > and providing packages
> > - if you are a distribution user, installing Ganeti should bring it at
> > worse the same dependency list, at best fewer dependencies
> This is great! As long as you continue to provide Debian packages, and
> your dependencies are part of Debian, or are backported, we'll be able
> to run Ganeti and experiment with own additions/contributions.
Indeed, this is very important to us, and we'll keep this level of
support.
> > This is a long-term plan (2.7 will not come before autumn), so it
> > would be a good idea to discuss this now. We're especially looking at
> > feedback from users and contributors/developers on:
> > - as an end-user, would this change impact you at all? do you have any
> > concerns regarding it?
> > - would it impact you if Ganeti requires Haskell to build/install? do
> > you see deployment issues or such?
> > - would it impact you if the Ganeti code-base is written mostly in
> > Haskell, with regards to contributing small patches or adding new
> > features?
> Haskell does create a steeper learning curve and raises the entry
> barrier significantly. Python is ubiquitous, every sysadmin knows a bit,
> every sysadmin can dabble in the code and fix minor issues. On the
> other hand, if you believe this is the way to go, and ensure a proper
> fallback plan along the way, we trust your judgement completely. You
> seem to have a solid plan to introduce Haskell gradually.
Thank you for the kind words.
Indeed, we will take a hit from the point of view of easy contributions.
We have had many discussions trying to understand how serious this
impact will be, but we are unclear.
What we hope to gain is however, on this topic, easier integration of
code patches (once they are written). For one thing, getting the code to
compile means already a lot of trust that it is not broken, and that it
doesn't break any assumptions in the existing code base (e.g. modifying
global state incorrectly, etc.).
> > Of course, any other feedback on this topic is welcome.
> Regarding the focus of 2.7, I think you have also mentioned it, we've begun
> looking into scalability of the locking mechanism in the master daemon. How do
> you believe the transition to Haskell is going to impact this? Currently,
> there are places where the master daemon locks the whole cluster, e.g.,
> while the iallocator runs.
Oh, thank you for bringing this up. Indeed, the iallocator runs are
known to be bad in terms of locking.
To solve this issue, we had the node resource design model, which was
supposed to be implemented in 2.6. This design was intended to remove
all long-duration locks related to the iallocator, and would have
allowed to do "many" iallocator runs in parallel, and during other
operations. The design was modeled on how htools does the 'predictive'
allocation, so the design itself is, we believe, sound.
Michael worked and tried to implement this design in the current Python
code base, and after fighting with it, came to the conclusion that due
to the current model (resource/locks) inside Python and due to the fact
that we can't "protect" data structures in Python, we cannot *reliably*
implement it.
So short story long, htools can predict how the cluster will look after
100 balancing steps, whereas the current Python code doesn't. If we end
up with the locking/config daemon in Haskell, I already know how the
design would look to remove long-lived locks during iallocator run.
> How do you see handling locking evolve in the future, with and without
> Haskell's increased presence in the code? you mentioned separating the
> locking mechanism from the rest of the master daemon. How would this impact
> lock handling, and do you have plans to explore less strict locking in favor of
> higher-throughput processing?
I hope the above explains the situation. We already wanted to make this
switch, but the current architecture prevents us. A new architecture
will definitely include, at design state, an optimised allocation model.
On Thu, Apr 19, 2012 at 07:18:15PM +0200, Iustin Pop wrote:
> Hi all,
> Over the past 6 months, we have discussed (at length) on what is the
> best direction for Ganeti, given that the current code-base, while
> working well for us, has accumulated a lot of technical debt in terms
> of internal architecture deficiencies, programming language issues,
> testability, etc.
> So we have decided to make an experiment and see whether using Haskell
> in more parts of the code base would help some of the issues we
> have. We have in mind a staged use of Haskell for other parts of
> Ganeti (besides htools), which goes roughly like follows:
> - 2.6 release (April/May):
> - alternative "confd" daemon implementation in Haskell; this is
> fully optional and people can select at build time whether to use
> the Python or Haskell version
> - 2.7 release (summer/autumn):
> - move query functionality from (Python) master daemon to the
> "confd" daemon (Haskell); again, this will be optional and
> selected at build time
> - provide one new, optional CLI tool (``gnt-*``) in Haskell; this
> will have some bits of new functionality that will not be exposed
> in a Python version
> After the 2.7 release, we should be able to make a reasonable
> assessment on the feasibility of using Haskell more pervasively in the
> Ganeti code base, and on the downsides of doing so. Assuming it all
> goes well, it means that starting with the 2.8 release, Haskell will
> become required for building Ganeti. Of course, if we stumble and this
> plan doesn't work well, we can still revert the changes, since most of
> the new components are optional/extra.
Hi all,
Some updates on this plan. 2.7 was delayed as 2.6 itself was a couple of
months late. As such, we've decided to slightly tweak this plan.
2.6 was released with the Haskell-based confd, and we run it in
production with no issues so far, including some new functionality not
provided in the Python version. Furthermore, we have some maintenance
overhead with the fact that htools is still optional, so we can't depend
on some feature that it provides.
So the new plan is as follows: the base Haskell environment (GHC
compiler and the base htools dependencies: the json, network, bytestring
and parallel library) will become required for Ganeti 2.7. The
additional packages used for extra functionality, curl, hslogger, crypto
and similar, will remain optional.
This means that in order to build Ganeti, you will need a Haskell
compiler plus some basic libraries that should be available in existing
stable distributions. For "full" enabling of the haskell components, you
will need to have the extra libraries (either from cabal or by running a
newer distribution).
This will have the following advantages:
- allow base Ganeti to depend on htools, and hence promote integration
- simplify the build configurations and requirements (hopefully they
will be simpler, due to the reduction in the possible combinations)
- allow exploration of more Haskell use in the project, by allowing
critical components to be implemented in this language; this is not
possible, nowadays, due to the "optionality" of htools
The disadvantage, of course, is the requirement for GHC and said
libraries. To compensate, we aim to ensure that our dependency list is
sane for existing stable/long term distributions.
We'll have more precise details on the new requirements as we near the
release (not soon, by the way).
On Wed, Sep 26, 2012 at 7:37 AM, Iustin Pop <ius...@google.com> wrote:
> Some updates on this plan. 2.7 was delayed as 2.6 itself was a couple of
> months late. As such, we've decided to slightly tweak this plan.
Any guess on a timeframe when 2.7 might be released?
> So the new plan is as follows: the base Haskell environment (GHC
> compiler and the base htools dependencies: the json, network, bytestring
> and parallel library) will become required for Ganeti 2.7. The
> additional packages used for extra functionality, curl, hslogger, crypto
> and similar, will remain optional.
> This means that in order to build Ganeti, you will need a Haskell
> compiler plus some basic libraries that should be available in existing
> stable distributions. For "full" enabling of the haskell components, you
> will need to have the extra libraries (either from cabal or by running a
> newer distribution).
<snip>
> The disadvantage, of course, is the requirement for GHC and said
> libraries. To compensate, we aim to ensure that our dependency list is
> sane for existing stable/long term distributions.
This will be an issue on RedHat related distributions as it lacks many of
the GHC dependencies in the core OS. For CentOS 6 it seems all but one
(ghc-curl) is available via EPEL so it will obviously make it more
difficult to include into the distro as an official package. Hopefully I
will help this situation in the long term but at least for the short term
using the repo I created should suffice for now.
Thanks for the update!
-- Lance Albertson
Associate Director of Operations
Oregon State University | Open Source Lab
> > So the new plan is as follows: the base Haskell environment (GHC
> > compiler and the base htools dependencies: the json, network, bytestring
> > and parallel library) will become required for Ganeti 2.7. The
> > additional packages used for extra functionality, curl, hslogger, crypto
> > and similar, will remain optional.
> > This means that in order to build Ganeti, you will need a Haskell
> > compiler plus some basic libraries that should be available in existing
> > stable distributions. For "full" enabling of the haskell components, you
> > will need to have the extra libraries (either from cabal or by running a
> > newer distribution).
> <snip>
> > The disadvantage, of course, is the requirement for GHC and said
> > libraries. To compensate, we aim to ensure that our dependency list is
> > sane for existing stable/long term distributions.
> This will be an issue on RedHat related distributions as it lacks many of
> the GHC dependencies in the core OS. For CentOS 6 it seems all but one
> (ghc-curl) is available via EPEL so it will obviously make it more
> difficult to include into the distro as an official package. Hopefully I
> will help this situation in the long term but at least for the short term
> using the repo I created should suffice for now.
Sorry, I'm not sure I parse this paragraph correctly - are all available
except curl, or is only curl available?
If the former, all is good - curl won't be a 'base' dependency, but only
an extended one. If the latter, hmm…
Just to recap: needed will be: ghc, json, network, bytestring (comes
with ghc), parallel.
On Wed, Sep 26, 2012 at 04:37:49pm +0200, Iustin Pop wrote:
> This will have the following advantages:
> - allow base Ganeti to depend on htools, and hence promote integration
> - simplify the build configurations and requirements (hopefully they
> will be simpler, due to the reduction in the possible combinations)
> - allow exploration of more Haskell use in the project, by allowing
> critical components to be implemented in this language; this is not
> possible, nowadays, due to the "optionality" of htools
> The disadvantage, of course, is the requirement for GHC and said
> libraries. To compensate, we aim to ensure that our dependency list is
> sane for existing stable/long term distributions.
> We'll have more precise details on the new requirements as we near the
> release (not soon, by the way).
> Feedback welcome, as always.
Dear Iustin,
The plan sounds well thought-out.
We've been trying to become more capable in Haskell, in an effort to
contribute to the codebase in the future.
We're fine with the Haskell dependencies, as long as they are available
for current Debian releases. It seems there are a few problems trying to
get everything right when running Squeeze. Can you comment on whether
things be better in Wheezy wrt the Haskell dependencies of Ganeti?
Finally, it would be very helpful if there was a single point detailing
all current dependencies, Haskell-related or otherwise. Do you plan on
keeping an updated list somewhere, as new features get added to master
and perhaps new dependencies are introduced?
On Thu, Sep 27, 2012 at 12:57:10PM +0300, Vangelis Koukis wrote:
> On Wed, Sep 26, 2012 at 04:37:49pm +0200, Iustin Pop wrote:
> > This will have the following advantages:
> > - allow base Ganeti to depend on htools, and hence promote integration
> > - simplify the build configurations and requirements (hopefully they
> > will be simpler, due to the reduction in the possible combinations)
> > - allow exploration of more Haskell use in the project, by allowing
> > critical components to be implemented in this language; this is not
> > possible, nowadays, due to the "optionality" of htools
> > The disadvantage, of course, is the requirement for GHC and said
> > libraries. To compensate, we aim to ensure that our dependency list is
> > sane for existing stable/long term distributions.
> > We'll have more precise details on the new requirements as we near the
> > release (not soon, by the way).
> > Feedback welcome, as always.
> Dear Iustin,
> The plan sounds well thought-out.
> We've been trying to become more capable in Haskell, in an effort to
> contribute to the codebase in the future.
Glad to hear! If you have any questions, feel free to ask, either
on list or directly to me.
> We're fine with the Haskell dependencies, as long as they are available
> for current Debian releases. It seems there are a few problems trying to
> get everything right when running Squeeze. Can you comment on whether
> things be better in Wheezy wrt the Haskell dependencies of Ganeti?
Yes, we will definitely not add any dependencies which will not be
present in Wheezy. And we will ensure that all base dependencies which
we will make required will be present in Squeeze.
> Finally, it would be very helpful if there was a single point detailing
> all current dependencies, Haskell-related or otherwise. Do you plan on
> keeping an updated list somewhere, as new features get added to master
> and perhaps new dependencies are introduced?
Hmm, INSTALL in the master branch should be more or less up-to-date; and
we will definitely keep it up to date once this move proceeds.