There are enough people who are willing to contribute to the development
of Django that it might not be a bad idea to consider moving to the
distributed model.
I'm starting this thread to encourage healthy discussion. I forbid holy
wars and flaming.
Has making a move to a distributed model already been discussed for
Django? The closest I've seen is Jacob announcing his git repo, and
that's about it. I haven't seen any serious discussion among core devs
about the idea.
Thanks!
Jeff Anderson
You don't even begin to approach why this might be a good idea for
Django. So, what does it gain?
Right now, you can already use your distributed VCS of choice with
Django and subversion. Some of us have been doing that for literally
years. The only time I ever use "svn" is on the very rare times I want
to alter subversion metadata properties. However, subversion is a very
good lowest common denominator for everybody to use as the central
repository and it makes a lot of sense to continue to have a central
repo.
Basically, I'm not really sure what you mean by "moving to a distributed
model", since that development model is already possible. Develop
whatever you like, publish it via repository. If you're working closely
with other people and they're using the same, or a compatible, system,
you can exchange updates with them. Eventually it goes via a committer
and into subversion.
So, welcome to the future. Your pony is already here.
Regards,
Malcolm
*snort*
Let me know how that goes.
> Has making a move to a distributed model already been discussed for
> Django? The closest I've seen is Jacob announcing his git repo, and
> that's about it. I haven't seen any serious discussion among core devs
> about the idea.
Three thoughts:
1. The beauty of distributed VCS is that you don't need anybody's
permission to confusinging use it. I, for example, have been happily
using git for the past handful of months; I know Gary uses Bazaar, and
I wouldn't be surprised to find that one of the other core devs uses
something really exotic like Darcs.
2. SVN, however, is a sort of lowest common denominator of open source
development; we can expect our users and developers to know and
understand it. We can't expect the same about newer and more
complicated tools -- I love the hell out of git, for example, but it's
about as simple to use as a 747. I'd rather spend my time helping
contributors figure out Django than helping them figure out DVCS.
3. Django's not switching any time in the foreseeable future. The core
devs are in violent agreement on this point. That said, we're going to
be encouraging those who want to start distributed branches to do so
-- I've been quite happy with the handful of developers whose git
branches I'm tracking.
Jacob
Uhh, don't forget the mercurial users out there! I think there's even
somebody with SVK around.
> 2. SVN, however, is a sort of lowest common denominator of open source
> development; we can expect our users and developers to know and
> understand it. [...] but it's [git] about as simple to use as a 747. I'd rather
> spend my time helping contributors figure out Django than helping them
> figure out DVCS.
Agreed, the most you have to explain right now is how to run "svn
diff", i.e saying: "Run git diff from the branch you created to the
branch you have the upstream repo in" raises the barrier to
contribute. But I wouldn't compare it to a 747; Maybe crossing the
Ocean without compass neither GPS ;)
> -- I've been quite happy with the handful of developers whose git
> branches I'm tracking.
"quite" means you don't like commits like "Ups forgot that", "Argh,
missed that one" in remote branches? They're meaningfull! :)
--
http://www.marcfargas.com - will be finished someday.
If I want to start a branch in my own repo, I can do that. The problem
happens when merge conflicts start happening. I'm forced to do things
"the subversion way" when I'm stuck with a subversion backend. I
**must** rebase my work rather than merge. This isn't really a good
thing in the distributed environment. It also breaks my ability to
directly check in things from my branches and repos-- when people are
constantly rebasing their work, I lose any ability to really track their
branches, and almost all advantages of using a distributed RCS.
Continuing to have a single, central repo isn't exactly moving to the
distributed model of development. I didn't realize that I needed to
explain the differences between the central repository model and the
distributed model, but I'll try. They are very different philosophies.
I'm suggesting that Django considers this philosophy of developing
things in a distributed fashion. I'm not suggesting that Django continue
using a centralized repo model, and simply switch from svn to another
tool. I'm sorry if that's what it sounded like.
A distributed model would mean abandoning this notion of committers and
non-committers, and thus also the concept of a central repository. There
are plenty of blog posts and documents about this approach to software
development, their benefits, and weaknesses. I highly suggest doing
research on this approach if you aren't terribly familiar with it.
One way that it *might* work for the Django is each component would have
someone that "watches over" it. Someone would be over the translations,
someone would be over forms, brosner would probably be over the admin
app, etc. Translations I believe is a good example. A translator for a
particular language or locale would update their working copy and
commit. Their changes would get merged into the translation manager's
repo. Generally, a release maintainer would be the one that merges in
stable/completed features into their git repo, so they'd merge in
anything when the translation maintainer says he has more stuff ready.
This is very different from the way that things currently work. There
wouldn't really need to be any formal decisions about "who is in" and
"who is out" for commit access. If the release maintainer feels person x
is capable of taking care of a certain task, whoever person x might be
has just become a committer. They can't commit to a central repo, but
they can commit all they want to their own.
I'm not simply suggesting "let's use tool x instead of tool y". I'm
suggesting a philosophy change in the way that Django development is
handled. I don't really have anything against subversion, it does what
it is designed to do very well. I am in favor of considering new ways of
doing things, and re-examining philosophies that are followed. That is
how I became a linux user, and a Python programmer. That is why I
started using Django. I was investigating a new way of doing things, and
found that Django had something good to offer. Here I'm just presenting
an alternate philosophy, and hoping for discussion to take place. I
believe that embracing the distributed RCS philosophy will help nurture
the development of Django in the long run.
Basically, what I'm really trying to say is that there is no pony yet.
There is only a really big pig with a fake pony tail tied onto it,
wearing a saddle. Very ugly. Very Scary. Close your eyes. Little girls
are screaming.
Thanks!
Jeff Anderson
When you consider attracting new members to the development community,
ease of use of the development infrastructure is important.
I've heard good things about Mercurial, Bazaar and git, but I get the
impression that git is sometimes difficult to drive. Overall the idea of
a distributed version control system is great. But will I be able to
drive it without getting myself in a mess?
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
*shrug* That doesn't mean it's less features than we need.
> In this case, it
> means I'm stuck with subversion's linear development. non-linear
> development is a requirement for the distributed model of software
> development.
At some level. It still linearises eventually, since changelogs are an
ordered file of changes and only one thing at a time lands in the final
release block of code.
> If I want to start a branch in my own repo, I can do that. The problem
> happens when merge conflicts start happening. I'm forced to do things
> "the subversion way" when I'm stuck with a subversion backend. I
> **must** rebase my work rather than merge.
You can work out the merge conflicts and fix them up that way.
> This isn't really a good
> thing in the distributed environment. It also breaks my ability to
> directly check in things from my branches and repos-- when people are
> constantly rebasing their work, I lose any ability to really track their
> branches, and almost all advantages of using a distributed RCS.
Keep in mind that the work tracking from the central repository is only
one component of development work. As will come up again below, far more
work actually goes into preparing a final feature than the code change
that eventually lands. Having to linearise on your side for one branch,
rather than having it automatically done by the tool is a concession.
But it's a useful concession since it enables a much larger audience to
also participate. Using distributed tools and understanding how to use
them well is hard. You've apparently done a bit of research and use
here. I know I have, too. And, yet, we have some different opinions
about workflow and capabilities. And we're two people. Now multiply that
by 10,000. Factor in those who haven't used any version control system
before. Subversion itself is tricky enough. Low barrier to entry and
contribution is a requirement. Those of us wanting to use a more
distributed model can do so (and are doing so), but some accommodations
of the others is necessary.
Short version: there are some trade-offs. They're all possible to work
around if you choose to. The advantages usually outweigh the trade-offs
for those of us wanting to use that model and for those that are more
comfortable of doing things other ways, we aren't forcing them away.
> Continuing to have a single, central repo isn't exactly moving to the
> distributed model of development. I didn't realize that I needed to
> explain the differences between the central repository model and the
> distributed model, but I'll try.
Yeah, thanks. I was wondering where I'd left those instructions about
how to suck eggs. :-)
Yes, I'm joking. Maybe you thought I was clueless, so you tried to fill
in the blanks. Fair enough. That's being helpful.
Seriously, I've been using distributed version control systems for a few
years now. I track a number of projects that use them. I use them
personally for both open source and client work a lot. Some are truly
decentralised, others are merely distributed with a more obvious central
node a la Django. All are distributed.
You're still talking about how this affects your workflow, not about why
it's better for Django (you listed a bunch of possibilities, but not how
they're advantageous to anything beyond the fact that it will mostly
eliminate the periodic merge conflict; and they won't be that common).
You *can* still work on branches and exchange with other people using
distributed systems. You'll have to have a branch that tracks Django and
periodically merges from that to your particular published development
branches. That's fine. Commit ids are stable in, for example, git-svn,
so merges will be the same for everybody who merges from a
subversion-tracking branch to their development branch (in the sense
that everybody pulling from subversion will get the same commit id for
the same upstream commit; it just won't necessarily match the one they
were using on their development branch if it wasn't pulled from
subversion. It's the standard rebase issue). I would like to think that
other DVCS do things similarly stably. Yes, there are a few little
oddities with merging things you already had that were then passed
upstream and come back as a merge with a different commit, but that's
relatively minor in the grand scheme of things. Most development doesn't
actually result in a commit to djangoproject.com upstream, when you sit
down and think about it (there's more back-and-forth in the development
phase than in the final patch). Distributed systems allow creating
branches very easily, so after a big block of work that is accepted
upstream, it's not particularly hard to, for example, stop using the
branch you were developing on for that and use a different one for the
next feature you're working on. That's not abnormal practice even in
highly distributed projects like the linux kernel, since it keeps new
features isolated from each other as much as possible.
At that point, you can publish your branches and happily work back and
forth with anybody else using the same workflow.
What will still happen, though, is that the central version of Django,
the thing that is called "Django" and is released, is based off a
particular branch, which is the one synced from our central subversion
repository. This actually happens even in distributed development. When
something is released it is released from *somewhere*. There is a
particular commit on a particular branch in the entire universe of
versions of the code that is called the release. We choose for the
location of that to be in the subversion repository. This isn't contrary
to distributed development at all. It's saying ahead of time that there
is a "master" version that things feed up to (there's nothing about
distributed development that says a hierarchy of checked-out versions
isn't possible; it's just not a requirement or a non-requirement).
Built on that basis, the rest still comes down to workflow. At some
point, necessary changes have to filter back to the main place from
which the release will be done.
> They are very different philosophies.
> I'm suggesting that Django considers this philosophy of developing
> things in a distributed fashion. I'm not suggesting that Django continue
> using a centralized repo model, and simply switch from svn to another
> tool. I'm sorry if that's what it sounded like.
>
> A distributed model would mean abandoning this notion of committers and
> non-committers, and thus also the concept of a central repository.
That's not necessarily something that follows from the definition. It's
one way it can work, but it's only the *only* way if you choose a
restricted definition of a phrase that is new enough not to have a
canonically obvious meaning yet.
> There
> are plenty of blog posts and documents about this approach to software
> development, their benefits, and weaknesses. I highly suggest doing
> research on this approach if you aren't terribly familiar with it.
>
> One way that it *might* work for the Django is each component would have
> someone that "watches over" it.
That won't really work for us, since we rely heavily on many eyes making
things work. Commits to the "final resting place" for things that will
ultimately make it into the release give us one checkpoint through which
everything passes. Anybody and everybody can watch that and review the
code. Many bugs are caught that way. Given the size of our developer and
contributor base, the abilities of both and the relatively small size of
our code, this is a pretty good model.
> Someone would be over the translations,
> someone would be over forms, brosner would probably be over the admin
> app, etc. Translations I believe is a good example. A translator for a
> particular language or locale would update their working copy and
> commit. Their changes would get merged into the translation manager's
> repo. Generally, a release maintainer would be the one that merges in
> stable/completed features into their git repo, so they'd merge in
> anything when the translation maintainer says he has more stuff ready.
You've just described a hierarchical system of merges that is the same
as what we have now. Everything filters up to the subversion repository.
You can still use whatever system you like down below and trade back and
forth between people using similar systems. The only concession to
having something that needs to rebased (the svn -> git conversion, say)
is that you don't do your development on the branch that gets updated
from upstream, but, rather, merge that into your branches.
Remember we're a relatively small project. There doesn't need to be more
than a single layer of "formal" hierarchy for merges going into the
thing targeted for release. And as each new layer is added, it really
does get harder and harder to track what's going on in places you're
interested in.
> This is very different from the way that things currently work. There
> wouldn't really need to be any formal decisions about "who is in" and
> "who is out" for commit access.
There is nothing at all stopping you from publishing your own repository
of Django changes. And pulling changes from whoever you want. So
everybody's already a committer on some level. Again, it's a difference
in workflow, not capabilities. Right now, the "commit bit" for the
central subversion repo just controls who can do the final push to what
we use as the basis for a release. It doesn't have any influence over
who can develop work, how they do so and who can ultimately propose them
for inclusion.
I'm personally far from convinced that the features you've outlined add
significant extra advantages or remove any of the larger problems we
have in our workflow to justify the retraining, community upset and
*much* higher barrier to entry that it would require. Don't think of
this as "either/or": you can still use DVCS for development of new stuff
and the only interaction with subversion is at the interface to the
final Django version.
Regards,
Malcolm
On Wed, Sep 10, Malcolm Tredinnick wrote:
> Commit ids are stable in, for example, git-svn,
> so merges will be the same for everybody who merges from a
> subversion-tracking branch to their development branch (in the sense
> that everybody pulling from subversion will get the same commit id for
> the same upstream commit; it just won't necessarily match the one they
> were using on their development branch if it wasn't pulled from
> subversion.
I've also used git for some time now to track Django. I happen to use
git-svnimport, since I don't commit. It's not very hard to sync with
subversion, but one thing is annonying: Your commits depend on whether you
use git-svn or (deprecated) git-svnimport, and on the options you choose.
This creates a few superficial annoyances when you want to share a git tree.
As a proposal, can we "officially" recommend JKM's git tree is kind of official and link it
prominently?
Michael
--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100
http://www.noris.de - The IT-Outsourcing Company
Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Hansjochen Klenk -
Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689
It does not matter which import we officially recommend, because the SHA-1 ids
are the same for every git-svn import as long as you use the same SVN base URL.
You should not use git-svnimport anymore, git-svn is more feature complete and
is more actively developed.
Several git and hg repositories are listed here:
http://code.djangoproject.com/wiki/DjangoBranches
Mine is updated hourly and since I rely on the service myself it will
surely stay for
some time.
Matthias
I just checked, and the two repositories have diverged. Jacob's
repository contains
a merge commit from 12 days ago[1] which is not recorded as a merge in the
subversion history and does not appear as a merge in all other git-svn imports.
I'm not sure at all if it matters though, since Django development happens with
patches that are posted to trac.
[1]: <http://code.djangoproject.com/git/?p=django;a=commit;h=c894f853a2c5b59d86ba854a2beae34129047ef5>
, 'Merge branch 'url-tag-asvar'
I wasn't aware that git-svn always creates the same trees. Goooood!
On Thu, Sep 11, Matthias Kestenholz wrote:
> I just checked, and the two repositories have diverged. Jacob's
> repository contains
> a merge commit from 12 days ago[1] which is not recorded as a merge in the
> subversion history and does not appear as a merge in all other git-svn imports.
The branch url-tag-asvar is not in svn. This branch has been created
directly in the git repository. So it's not an exact mirror of svn and
rather shouldn't be used as "the official mirror" :-)
Well, I retract my proposal. If git-svn creates identical repositories for
everybody, just use this and we're fine.
> I'm not sure at all if it matters though, since Django development happens with
> patches that are posted to trac.
No, it does not matter for Django development. But it matters as soon as
different developers try to work together, and both of them use git. Having
trees with exactly the same commit ids would make things a bit simpler.
It's been pointed out to me that the above comes across as dismissive
and condescending, and some took offense at it. I agree -- it's a
stupid way to respond to an honest question. I was out of line, and I
apologize. Jeff, I hope you weren't offended!
By way of explanation, I'm just a bit tired of holy wars in web
development; SCM discussions are starting to rival emacs/vi in their
heat and vitrol, and I sorta expected to see the same here. Apparently
we're a slightly more mature community, though -- other than from me,
the tone has been gratifying to see!
So, again, sorry about being a dick!
Jacob
You damn kids and your fancy "DVCS" tools. Emacs backups are the only
revision control I've ever needed!
--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
>
> On Fri, Sep 12, 2008 at 10:00 AM, Jeff Anderson
> <jeff...@programmerq.net> wrote:
>> I actually was quite amused by your reaction.
>>
>> You are right though, this community does seem to come through with a
>> tone more mature than your average mailing list. :)
>
> You damn kids and your fancy "DVCS" tools. Emacs backups are the only
> revision control I've ever needed!
You mean *VIM* backups, don't you?!!
---
David Zhou
da...@nodnod.net
Thanks,
Eugene
Honestly, not that my opinion matters in any way, but:
Don't fix it if it ain't broken.
Good day to you
Ludvig "lericson" Ericson
ludvig....@gmail.com