Over the past 6 months, we have discussed (at length) on what is the
best direction for Ganeti, given that the current code-base, while
working well for us, has accumulated a lot of technical debt in terms
of internal architecture deficiencies, programming language issues,
testability, etc.
The main pain points we have identified are:
- the master daemon is a big, monolithic entity that contains a lot of
components; changes in any component (OpCodes, Logical Units,
objects, etc.) have cascading and sometimes unpredictable effects on
the other components
- the dynamic nature of Python makes any static analysis very hard (or
impossible), which makes the previous point even more painful
- the multi-threaded nature of the master daemon creates scalability
problems in Python (which cannot take advantage of SMP/multi-core
environments)
- currently, the internal architecture relies on mutable objects,
which creates additional problems regarding the consistency of
internal data structures
As you know, Ganeti is written mainly in Python, with a sub-component
in Haskell and small bits on other glue languages (autoconf, automake,
make, shell, etc.). Over the (shorter) history of the htools
component, we have seen significant advantages from the functional
aspects, especially related to the above pain points.
So we have decided to make an experiment and see whether using Haskell
in more parts of the code base would help some of the issues we
have. We have in mind a staged use of Haskell for other parts of
Ganeti (besides htools), which goes roughly like follows:
- 2.6 release (April/May):
- alternative "confd" daemon implementation in Haskell; this is
fully optional and people can select at build time whether to use
the Python or Haskell version
- 2.7 release (summer/autumn):
- move query functionality from (Python) master daemon to the
"confd" daemon (Haskell); again, this will be optional and
selected at build time
- provide one new, optional CLI tool (``gnt-*``) in Haskell; this
will have some bits of new functionality that will not be exposed
in a Python version
After the 2.7 release, we should be able to make a reasonable
assessment on the feasibility of using Haskell more pervasively in the
Ganeti code base, and on the downsides of doing so. Assuming it all
goes well, it means that starting with the 2.8 release, Haskell will
become required for building Ganeti. Of course, if we stumble and this
plan doesn't work well, we can still revert the changes, since most of
the new components are optional/extra.
Any interface that Ganeti provides right now (CLI: the ``gnt-*``
programs, LUXI: internal interface, cluster RPC: internal interface,
RAPI: external interface, IAllocator: external interface) will remain
the same, so any program that talks to Ganeti will still be able to do
so in the new world. We will keep providing the current Python RAPI
client (and we might look into providing a Haskell RAPI client).
From the point of view of building Ganeti from source and using it, it
will mean that:
- the Haskell compiler/platform will be needed at ./configure and
build time, together will all libraries
- on the machines that Ganeti will actually run on, there will be no
extra dependencies, on the contrary, we might have some reduction in
libraries needed at run-time
From the point of view of using Ganeti from a distribution:
- we will continue to provide Ganeti packages in Debian, and if
possible, in backports
- any distribution that has the Haskell platform plus the list of
libraries we use (not too long) should have no issues in building
and providing packages
- if you are a distribution user, installing Ganeti should bring it at
worse the same dependency list, at best fewer dependencies
This is a long-term plan (2.7 will not come before autumn), so it
would be a good idea to discuss this now. We're especially looking at
feedback from users and contributors/developers on:
- as an end-user, would this change impact you at all? do you have any
concerns regarding it?
- would it impact you if Ganeti requires Haskell to build/install? do
you see deployment issues or such?
- would it impact you if the Ganeti code-base is written mostly in
Haskell, with regards to contributing small patches or adding new
features?
Of course, any other feedback on this topic is welcome.
thanks,
iustin
First, I am somewhat of two minds as to whether to respond to this
thread or not. I am not a ganeti developer, although I did push a
small fix last year.
My real concern is that ganeti is too monolithic. Perhaps my
understanding is wrong here, but I think there needs to be more of a
separation between configuration and operations. By my way of
thinking, it would really help maintainability if the process model
for ganeti operations were more like qmail.
Again, I am probably speaking from lack of knowledge here, but I would
find it much easier to navigate ganeti code if each operation were a
separate command-line driven module written in any language. As a
simple example, I think that booting an instance should be a simple
script that is shelled on the target host with all parameters on the
command line. This way, interactions between the operations and the
management functions are well defined. This also allows you to
develop/test these operational scripts without dragging in the world.
This type of separation would probably hurt performance but in the
end, I think overall performance would improve because individual
operations could be tuned without impacting the center. For example,
migrating a virtual from server to server would be a black box and
replacing dd over ssh with some other, faster, transport would become
a safe optimization that did not involve the central code base.
Similarly, if operations are shelled black boxed, I can envision a lot
of new features that are all doable in the context of KVM/XEN but are
difficult to plumb into the current structure.
None of these have much to do with the programming language involved.
I suspect that many of the operation "scripts" would end up in bash.
One other aspect of designing stuff this way is that debugging a live
server is a lot easier. 'ps' becomes a very useful tool that shows
you what is actually happening.
Again, my two cents, which are worth even less.
My "feature" list that I would like to implement involve exploiting
how KVM/XEN and more virtuals around and how this should allow neat
stuff like "live convert from plain to DRBM" or migration of a plain
instance to another instance without shutting it down by converting
the host to DRBD and back on the fly. A smaller function would be
writing a socket helper to replace dd over ssh for copying volumes.
I know how to do this in the current code, but having a separation of
configuration and operations would make this a lot easier.
Finally, I don't want anyone taking any of this as a complaint. I
will work with, and hopefully work to help any code base that I can
find time for.
--
Doug Dumitru
EasyCo LLC
Yes, during the transition period we might have more bugs, but one of
the goals of the dynamic to static typing transition is to have less
bugs, _long term_.
> And if I have to learn a bit of Haskell to fix something
> at some point in the future - well, that's what they pay me for at $DAYJOB.
>
> So basically, go for it!
thanks!
> Hopefully the Haskell build environment for RHEL/CentOS/SL is sane and Jun
> doesn't get a headache making us RPMS.
Heh. For Debian at least, it is sane, so related distros (e.g. Ubuntu)
shouldn't have big problems.
Thanks for the feedback!
iustin
This is also a concern of us. In hindsight, the design of Ganeti 2.0
with a monolithic, multi-threaded daemon was bad, but as they say,
hindsight is 20/20 :)
> Again, I am probably speaking from lack of knowledge here, but I would
> find it much easier to navigate ganeti code if each operation were a
> separate command-line driven module written in any language. As a
> simple example, I think that booting an instance should be a simple
> script that is shelled on the target host with all parameters on the
> command line. This way, interactions between the operations and the
> management functions are well defined. This also allows you to
> develop/test these operational scripts without dragging in the world.
>
> This type of separation would probably hurt performance but in the
> end, I think overall performance would improve because individual
> operations could be tuned without impacting the center.
Performance is indeed a concern. Benchmarks on the current codebase
shows that starting up Python with all our imports is extremely
expensive (around 100ms), so that you'd be limited at 10 jobs at max per
second (without optimisations).
That means, shutting down a 1000-instances cluster would take around two
minutes just in _starting_ the jobs (in practice, it is much much slower
due to other issues). My goal is to have a job engine that can process
null opcodes in the realm of 1000 jobs per second.
> For example,
> migrating a virtual from server to server would be a black box and
> replacing dd over ssh with some other, faster, transport would become
> a safe optimization that did not involve the central code base.
> Similarly, if operations are shelled black boxed, I can envision a lot
> of new features that are all doable in the context of KVM/XEN but are
> difficult to plumb into the current structure.
Ack.
> None of these have much to do with the programming language involved.
> I suspect that many of the operation "scripts" would end up in bash.
>
> One other aspect of designing stuff this way is that debugging a live
> server is a lot easier. 'ps' becomes a very useful tool that shows
> you what is actually happening.
Totally agreed.
> Again, my two cents, which are worth even less.
>
> My "feature" list that I would like to implement involve exploiting
> how KVM/XEN and more virtuals around and how this should allow neat
> stuff like "live convert from plain to DRBM" or migration of a plain
> instance to another instance without shutting it down by converting
> the host to DRBD and back on the fly. A smaller function would be
> writing a socket helper to replace dd over ssh for copying volumes.
I see. We never thought about modularity at this level, interesting.
> I know how to do this in the current code, but having a separation of
> configuration and operations would make this a lot easier.
>
> Finally, I don't want anyone taking any of this as a complaint. I
> will work with, and hopefully work to help any code base that I can
> find time for.
Thanks for the feedback, very well written.
We discussed about splitting the masterd (the logical units) into
separate processes, however there are a few downsides for doing that at
the current moment.
The proposed change for 2.7 is to split the query paths outside masterd,
so that what masterd does becomes more focused - a pure job execution
engine.
Later, we have some rough plans of splitting the configuration and
locking entirely out of masterd, so that all the "core" logic is
separated out of the individual job execution part. At that stage, the
practical difference between running multi-threaded or running
multi-process is low, so we could change to whatever model is best.
But we can't make this transition today, or very quickly. Hence the
staged approach, because in the meantime we still have to release and
deliver features incrementally.
Many thanks for the feedback!
iustin
Making a bunch of architectural changes at the same time as language
changes seems particularly fraught.
I'd be inclined to see how it goes breaking down the master daemon, and
the other architectural changes goes towards reducing the desire for
adding languages.
I wonder if your seeing the grass as greener with the new language, this
will solve all out problems etc etc.
Things I'd like to see moving forwards in terms of features are
primarily better support for generic instances.
I am using ganeti to run a clients office, they have a nice pair of Dell
R210 II's and 8 VM's, all of different flavours, windows, linux, bsd etc
so I have always used the instance+image but I've never really felt like
its the way its meant to be used, integrating instance+image into the
base distro would probably help with this.
A simple way to snapshot instances would be nice (even if it was disk
only, and even offline only would be ok) so I could snapshot the mail
server, do the upgrade and roll back if it all goes horribly wrong.
Oh the ability to change the drbd cache mode. Disk writes are pretty
crappy and I'll take the performance gain over the risk of failure,
particularly if it can be easily changed, IE if I'm installing windows
or something it doesn't really matter if it gets hosed or copying in a
bunch of files from the old file server, I can just re-do it on the .01%
chance of a failure, but in the meantime the performance gain would be
really nice lol.
That said.
Its really working rather well providing HA without needing a dedicated
storage back end (which is the reason I picked it), they had a power
failure the other week that took out one server, everything was up and
running on the remaining node inside 10 minutes which isn't bad for the
first time I've needed to do it.
Thanks for the feedback. Comments inline.
On Sat, Apr 21, 2012 at 08:36:53PM +1000, Jake Anderson wrote:
> I'm somewhat against the idea of adding another language to it (or
> making it more prominent).
> Just on a gut feel level it seems like mixing the two languages will
> cause more work in the long run.
> A complete conversion I can see advantages to. Although it would
> mean learning a new language for me.
Just to be clear: the current plan is for evaluating more use of
Haskell. What I posted is not the "end" of the road, just the next few
steps.
If everything goes well, and we see the improvements we expect, then
yes, we'll have more work to do (conversion). But we don't know yet, so
we can't say "the end goal is a full conversion".
> Making a bunch of architectural changes at the same time as language
> changes seems particularly fraught.
That's why we make the changes optional - you will be able to choose
either Python or Haskell version of confd, and similar for the query
infrastructure.
Rest assured that continued stability is in our own direct interest!
> I'd be inclined to see how it goes breaking down the master daemon,
> and the other architectural changes goes towards reducing the desire
> for adding languages.
>
> I wonder if your seeing the grass as greener with the new language,
> this will solve all out problems etc etc.
No, it will definitely not solve all our problems, that's for sure.
But since we anyway have to refactor _heavily_, we want to see if
refactoring in a language which we know will improve on _some_ aspects
sounds like something worth trying.
> Things I'd like to see moving forwards in terms of features are
> primarily better support for generic instances.
> I am using ganeti to run a clients office, they have a nice pair of
> Dell R210 II's and 8 VM's, all of different flavours, windows,
> linux, bsd etc so I have always used the instance+image but I've
> never really felt like its the way its meant to be used, integrating
> instance+image into the base distro would probably help with this.
Hmm… interesting point. We should discuss more about this.
> A simple way to snapshot instances would be nice (even if it was
> disk only, and even offline only would be ok) so I could snapshot
> the mail server, do the upgrade and roll back if it all goes
> horribly wrong.
Noted.
> Oh the ability to change the drbd cache mode. Disk writes are pretty
> crappy and I'll take the performance gain over the risk of failure,
> particularly if it can be easily changed, IE if I'm installing
> windows or something it doesn't really matter if it gets hosed or
> copying in a bunch of files from the old file server, I can just
> re-do it on the .01% chance of a failure, but in the meantime the
> performance gain would be really nice lol.
We already disable the barriers, and we'll have disk parameters already
in 2.6. Note this is not A/B/C protocol tuning, but we're moving towards
that.
> That said.
:)
> Its really working rather well providing HA without needing a
> dedicated storage back end (which is the reason I picked it), they
> had a power failure the other week that took out one server,
> everything was up and running on the remaining node inside 10
> minutes which isn't bad for the first time I've needed to do it.
Glad to hear.
Again, thanks a lot for the feedback, especially the missing features!
iustin
Some updates on this plan. 2.7 was delayed as 2.6 itself was a couple of
months late. As such, we've decided to slightly tweak this plan.
So the new plan is as follows: the base Haskell environment (GHC
compiler and the base htools dependencies: the json, network, bytestring
and parallel library) will become required for Ganeti 2.7. The
additional packages used for extra functionality, curl, hslogger, crypto
and similar, will remain optional.
This means that in order to build Ganeti, you will need a Haskell
compiler plus some basic libraries that should be available in existing
stable distributions. For "full" enabling of the haskell components, you
will need to have the extra libraries (either from cabal or by running a
newer distribution).
The disadvantage, of course, is the requirement for GHC and said
libraries. To compensate, we aim to ensure that our dependency list is
sane for existing stable/long term distributions.
Hi all,Over the past 6 months, we have discussed (at length) on what is the
best direction for Ganeti, given that the current code-base, while
working well for us, has accumulated a lot of technical debt in terms
of internal architecture deficiencies, programming language issues,
testability, etc.The main pain points we have identified are:
- the master daemon is a big, monolithic entity that contains a lot of
components; changes in any component (OpCodes, Logical Units,
objects, etc.) have cascading and sometimes unpredictable effects on
the other components
- the dynamic nature of Python makes any static analysis very hard (or
impossible), which makes the previous point even more painful
- the multi-threaded nature of the master daemon creates scalability
problems in Python (which cannot take advantage of SMP/multi-core
environments)
- currently, the internal architecture relies on mutable objects,
which creates additional problems regarding the consistency of
internal data structures
- would it impact you if the Ganeti code-base is written mostly in
Haskell, with regards to contributing small patches or adding new
features?
Hi Nate,
You're aware that you are replying to post that is one and a half years old?
Best,
Karsten
Hi Nate,
Sorry to hear you don't like our haskell choice. As mentioned this is
a very old message and at this point we're not going to change our
architecture over exactly one complaint.
Feedback is always welcome, but again there's not much that can be
done now, and yours seems to be more of a flame than an actual
feedback. :)
On Tue, Dec 3, 2013 at 3:31 AM, Nate <nate....@gmail.com> wrote:
> You said feedback is welcome:-)
>
> I believe this is and has been the wrong decision.
>
> Haskell has held me back from using ganeti.
>
> One of those technical debts/architectural deficiencies/programming language
> issues/testability issues you mentioned... The code base is littered, but
> this was one i couldn't stand for.
>
> I think the problem is not python, but some new language itches.
>
> Haskell's libraries are sparse; its academic users are even sparser.
> Programs produced in the language produce an unreadable mess.... Whatever
> useless programs there are written in the language.
>
> I don't see superior functional/haskell programmers coming out of school in
> great numbers.... And they probably won't.... Which leads to me to believe
> it has no place in companies or production environments, or a software put
> into either.
>
> I wish you would explain exactly what difficulty you were having (files/code
> blocks/line numbers), because the below "explanation" doesn't cut it for me.
>
Have you actually *tried* Ganeti?
Is there anything in the python or
haskell implementation that prevents you from using it and that you
can't fix because of our haskell choice, or you simply won't touch it
because there is some haskell code there?
Note that the python/haskell split is done in a way that allows most
contributors to continue their work in Python, and indeed we haven't
had problems with that.
> On Thursday, April 19, 2012 12:18:20 PM UTC-5, Iustin Pop wrote:
>>
>> Hi all,
>>
>> Over the past 6 months, we have discussed (at length) on what is the
>> best direction for Ganeti, given that the current code-base, while
>> working well for us, has accumulated a lot of technical debt in terms
>> of internal architecture deficiencies, programming language issues,
>> testability, etc.
>>
>> The main pain points we have identified are:
>>
>> - the master daemon is a big, monolithic entity that contains a lot of
>> components; changes in any component (OpCodes, Logical Units,
>> objects, etc.) have cascading and sometimes unpredictable effects on
>> the other components
>
> Obviously, this is software architecture, and not a language issue.
>
Sure, but we found that making changes to the software architecture
was harder in Python.
>>
>> - the dynamic nature of Python makes any static analysis very hard (or
>> impossible), which makes the previous point even more painful
>
> Please try PyCharm.
This is not something that we want to handle at the IDE level, but
something that we want to have assurances of at the unittest/check
level, and that haskell compilation will give us, while python won't.
>
>>
>> - the multi-threaded nature of the master daemon creates scalability
>> problems in Python (which cannot take advantage of SMP/multi-core
>> environments)
>
> This is false. Programs can be written in python that take advantage of all
> cores, and allow deferred i/o.
>
As far as I remember the "standard" version of CPython still has a GIL
that means code won't be able to scale to all cores.
>>
>> - currently, the internal architecture relies on mutable objects,
>> which creates additional problems regarding the consistency of
>> internal data structures
>
> This makes absolutely no sense... Python is all object... and they are
> mutable.. and they stay consistent... what are you trying to say?
>
He's saying that we should modify Ganeti to make copies of objects
rather than passing the original ones, as we don't want modifications
to be shared between different subsystems without explicit action.
This is being worked on in the master split design.
>>
>> - would it impact you if the Ganeti code-base is written mostly in
>> Haskell, with regards to contributing small patches or adding new
>> features?
>
> I only hope this doesn't go this way. I wouldn't patch, nor would the only
> other two haskell programmers on the planet. :o)
>
Do you have an installation of Ganeti?
Did you plan to contribute any
changes in particular? Are you facing an actual problem writing your
changes, or are we speaking just on hypotheticals?
Thanks,
Guido
I've been using Ganeti for over a year now. You don't need to know either Haskell or Python to use it. In fact, with the API you can use just about any language you choose.
I'd also like to point out that Haskell is considered a more secure language, because the functional style enforces good parsing and it's formally verifiable. This might not be the norm right now with most recent CS grads having learned Java or Python, but functional is making huge inroads thanks to JavaScript and JVM languages like Scala and Clojure. Remember, there was a time before they taught OO programming in college.