hg subrepositories always on tip?

Tobias Werth

unread,

Mar 2, 2012, 4:34:07 AM3/2/12

to merc...@selenic.com

Hi,

with Mercurial a subrepository sub can be added to a repository super
like this:

echo sub = sub > .hgsub
hg add .hgsub
hg clone https://example.com/hg/sub
hg commit -m "added subrepo"

Mercurial then creates a .hgsubstate to offer a consistent snapshot of
the repository. There are some (rare) cases where this is not useful or
convenient, e.g. collobaration on a bunch of text files with different
access rights.

If Alice only has access to the subrepository sub but not to the
containing repository super, then Bob (who has access to super and sub)
always has to update the state of the subrepository/ies when Alice
changes sub.

Is there a more convenient way for Bob to tell HG that he always wants
to use the the tip of the sub repository as state?

Tobi

_______________________________________________
Mercurial mailing list
Merc...@selenic.com
http://selenic.com/mailman/listinfo/mercurial

Todd Greer

unread,

Mar 2, 2012, 11:58:10 AM3/2/12

to Tobias Werth, merc...@selenic.com

Tobias Werth said:

> Is there a more convenient way for Bob to tell HG that he always wants to use the the tip of the sub repository as state?

I'm in the middle of setting up our continuous integration systems, and I'm taking the approach of having the build scripts pull and update select subrepos, and then committing and pushing the superrepo. I'm not done, so I can't say whether there are any major issues yet. Alternatively, you might install an update hook on the super-repo that pulls and updates select subrepos.

It would be convenient for there to be a built-in way to do this; something like "hg sync subrepo1 subrepo2 subrepo3", which would pull and update listed subrepos, then commit and push the superrepo.

--
Todd

Scott Palmer

unread,

Mar 2, 2012, 1:23:43 PM3/2/12

to Todd Greer, merc...@selenic.com

On 2012-03-02, at 11:58 AM, Todd Greer wrote:

> Tobias Werth said:
>
>> Is there a more convenient way for Bob to tell HG that he always wants to use the the tip of the sub repository as state?
>
> I'm in the middle of setting up our continuous integration systems, and I'm taking the approach of having the build scripts pull and update select subrepos, and then committing and pushing the superrepo. I'm not done, so I can't say whether there are any major issues yet. Alternatively, you might install an update hook on the super-repo that pulls and updates select subrepos.
>
> It would be convenient for there to be a built-in way to do this; something like "hg sync subrepo1 subrepo2 subrepo3", which would pull and update listed subrepos, then commit and push the superrepo.
>
> --
> Todd

I was thinking there should be a way to force subrepos to the tip revision in some cases as well. Particularly when you are working on the tip of your main development branch and you want to make sure you are compiling against the latest version of some shared code. (I.e. you are using an API that is also currently in development by a colleague.)

For example if the parent repo is at the tip, then perhaps something like "hg update -S" in the parent could have an option to bring the subrepos to their respective tip revisions. Perhaps an extra option would be needed. Maybe even an extra section in the .hgsub file could be used to indicate which subrepos you would like that rule to apply to, since there could be some subrepos that you do wan tho keep pinned unless they were explicitly updated from within the subrepo.
E.g in .hgsub:

subrepo= <url_to_repo>
othersubrepo = <url_to_other_subrepo>

[subPath]
http://(.*)/ParentProject/subrepo = http://\1/SharedLibraries/subrepo

[subTip]
subrepo = pinned
othersubrepo = latest

Where the [subTip] section tells Mercurial which subrepos should be updated to their latest revision when the parent is updated to "tip"

Scott

paul_...@selinc.com

unread,

Mar 2, 2012, 3:49:00 PM3/2/12

to Scott Palmer, merc...@selenic.com

This concept is used by IBM ClearCase, and is called a Config Specification aka config spec. I highly encourage anyone interested in enterprise-class subrepo implementation to go study config specs. While ClearCase has some special problems, config specs solve source control configuration pretty nicely (and confuse the uninitiated badly... ).

The need to point components onto branches/versions/tags/tip/etc, and then to maintain hg in a state where those pointers are 'valid' of that is very evident when dealing with large-scale codebases (multiple components, multiple projects, multiple products).

We've had to implement a hacky version of it ourselves; we use a top level repo and have a script move every subrepo to the head of the current named branch that the top-level repo points to.

I anticipate that eventually subrepos will have this functionality (either native or scripted by most largescale implementors of hg).

- - -
Regards,
Paul Nathan

mercuria...@selenic.com wrote on 03/02/2012 10:23:43 AM: > From: Scott Palmer <swpa...@gmail.com>

Ben Fritz

unread,

Mar 5, 2012, 1:03:39 PM3/5/12

to merc...@selenic.com

On Mar 2, 2:49 pm, paul_nat...@selinc.com wrote:
> This concept is used by IBM ClearCase, and is called a Config
> Specification aka config spec. I highly encourage anyone interested in
> enterprise-class subrepo implementation to go study config specs. While
> ClearCase has some special problems, config specs solve source control
> configuration pretty nicely (and confuse the uninitiated badly... ).
>

I have to say, I never expected to see ClearCase recommended on ANY
forum, much less this one.

I especially never expected to see the "config spec" listed as a
positive feature of ClearCase.

Config spec problems have probably been the #1 cause of bad builds
where I work. While perhaps this would not be as much of a problem
with a dedicated "build view" (a ClearCase view is comparable to a
working copy in most centralized VCSes), it is far too easy to have
the "wrong" config spec and have almost no indication of it. Compound
this with the fact that ClearCase has no concept of atomic changesets
(at least not that I know of), and there is no way to indicate which
version of each file goes with which version of any of the others, and
it is extremely difficult to piece together what went wrong when
somebody does screw up a build.

My workplace is slowly migrating from ClearCase (which is around $4000
per user) to SVN, and I don't think I'll miss anything when I finally
leave it behind.

In my opinion, one of the BEST features of Mercurial subrepositories,
or even SVN externals, is that when you grab a version of the code,
you know you'll get a version of the subrepo or externals which will
actually work with your code. You don't need to wade through a listing
of labels, branches, and the like and find the right place to put it
in your config file to pull in the correct version every time you want
to build something which is not at /main/LATEST for every file
involved in the build.

paul_...@selinc.com

unread,

Mar 5, 2012, 1:41:13 PM3/5/12

to Ben Fritz, merc...@selenic.com

mercuria...@selenic.com wrote on 03/05/2012 10:03:39 AM: > From: Ben Fritz <fritzo...@gmail.com>

> To: merc...@selenic.com
> Date: 03/05/2012 10:03 AM
> Subject: Re: hg subrepositories always on tip?
> Sent by: mercuria...@selenic.com

> > >

Ben,

Indubitably, config specs are very confusing and cause significant issues. Unfortunately, we've gotten to a point where we are like, "well, #%#$, this config spec idea has some merit after all". Allowing development to continually roll forward on the head of other people's branches turns out to be very useful for continuous development. We have over 100 subrepos; asking our people to manually update all of those subrepos is not approximating feasible. I am not recommending ClearCase, but it does bear study and understanding of what it tries to provide and why.

I've personally written the build code to allow reproducible builds with the continual-update workflow. It's a bit trick, but it does work.

Believe me when I say we aren't thrilled about the idea of "config spec workflow" either.

Benjamin Fritz

unread,

Mar 5, 2012, 2:22:57 PM3/5/12

to paul_...@selinc.com, merc...@selenic.com

On Mon, Mar 5, 2012 at 12:41 PM, <paul_...@selinc.com> wrote:
>
> Indubitably, config specs are very confusing and cause significant issues.
> Unfortunately, we've gotten to a point where we are like, "well, #%#$, this
> config spec idea has some merit after all". Allowing development to
> continually roll forward on the head of other people's branches turns out to
> be very useful for continuous development. We have over 100 subrepos; asking
> our people to manually update all of those subrepos is not approximating
> feasible. I am not recommending ClearCase, but it does bear study and
> understanding of what it tries to provide and why.
>

Yes, I can see why 100 subrepos would be a pain to keep updated, and
why you might want something like "give me the latest on this
branch/bookmark" in a subrepository definition.

I haven't put much thought into how one would update subrepos in a
continuous integration sort of way, but I wonder, shouldn't it be the
responsibility of the person making the change to a subrepo, to also
push that change out to the repositories including it? Presumably,
either the work is self-contained to the subrepo transparent to those
containing it, or it fixes a problem found in the repository
containing it, so it should get populated upward when fixed, or the
fix is not truly "done". Granted, my current team (working in SVN) has
only a few libraries we ourselves maintain and link into the main app
via svn:externals, but our process is that the person making the
change to the library also updates references to that library when
needed.

Scott Palmer

unread,

Mar 5, 2012, 2:30:22 PM3/5/12

to Benjamin Fritz, merc...@selenic.com

On 2012-03-05, at 2:22 PM, Benjamin Fritz wrote:

> On Mon, Mar 5, 2012 at 12:41 PM, <paul_...@selinc.com> wrote:
>>
>> Indubitably, config specs are very confusing and cause significant issues.
>> Unfortunately, we've gotten to a point where we are like, "well, #%#$, this
>> config spec idea has some merit after all". Allowing development to
>> continually roll forward on the head of other people's branches turns out to
>> be very useful for continuous development. We have over 100 subrepos; asking
>> our people to manually update all of those subrepos is not approximating
>> feasible. I am not recommending ClearCase, but it does bear study and
>> understanding of what it tries to provide and why.
>>
>
> Yes, I can see why 100 subrepos would be a pain to keep updated, and
> why you might want something like "give me the latest on this
> branch/bookmark" in a subrepository definition.
>
> I haven't put much thought into how one would update subrepos in a
> continuous integration sort of way, but I wonder, shouldn't it be the
> responsibility of the person making the change to a subrepo, to also
> push that change out to the repositories including it?

No, that is unmanageable when the opposite situation is true. That of many parent repos sharing something via a subrepo. It would be up to the developers of those parent projects to decide when to update the shared component. If the parent project has elected to live on the bleeding edge, it should be able to keep the subrepo updated to the tip easily, but the developer of the shared code may not even know that any particular user of the shared code exists.

> Presumably,
> either the work is self-contained to the subrepo transparent to those
> containing it, or it fixes a problem found in the repository
> containing it, so it should get populated upward when fixed, or the
> fix is not truly "done". Granted, my current team (working in SVN) has
> only a few libraries we ourselves maintain and link into the main app
> via svn:externals, but our process is that the person making the
> change to the library also updates references to that library when
> needed.

Regards

Scott

Todd Greer

unread,

Mar 5, 2012, 11:17:45 PM3/5/12

to Ben Fritz, merc...@selenic.com

Ben Fritz said:

> In my opinion, one of the BEST features of Mercurial subrepositories, or even SVN externals, is that when you grab a version of the code,
> you know you'll get a version of the subrepo or externals which will actually work with your code. You don't need to wade through a listing
> of labels, branches, and the like and find the right place to put it in your config file to pull in the correct version every time you want to build
> something which is not at /main/LATEST for every file involved in the build.

I don't think anyone is disagreeing with the value of pinning subrepos. It should undoubtedly be the default behavior. However, while a project is
under active development, it often makes the most sense for certain libraries to be automatically kept up-to-date. I think the main questions
are as follows:

1. How should one specify which subrepos should be kept up-to-date, or "unpinned"?
2. How is it determined when a project is "in active development" and should thus keep its selected subrepos up-to-date?
3. Is anyone sufficiently motivated to add this feature?

(This list is a combination of what I've read here and my own thoughts--I don't presume that this is a summary of the thread.)

One wrinkle is that the set of unpinned subrepos may change over time. Given that, one possible answer would be something like:
hg sync subrepo1 subrep2 ...

This might be doable as an alias, possibly on top of the 'onsub' extension. I may go learn more about aliases tomorrow.

Another option is marking the repos in .hgsub or a new file; that does a better job when the set of unpinned subrepos doesn't change.

Humbly contributed,
Todd

Martin Geisler

unread,

Mar 6, 2012, 3:58:03 AM3/6/12

to Todd Greer, Ben Fritz, merc...@selenic.com

Todd Greer <TGr...@affinegy.com> writes:

> Ben Fritz said:
>
>> In my opinion, one of the BEST features of Mercurial subrepositories,
>> or even SVN externals, is that when you grab a version of the code,
>> you know you'll get a version of the subrepo or externals which will
>> actually work with your code. You don't need to wade through a
>> listing of labels, branches, and the like and find the right place to
>> put it in your config file to pull in the correct version every time
>> you want to build something which is not at /main/LATEST for every
>> file involved in the build.
>
> I don't think anyone is disagreeing with the value of pinning
> subrepos. It should undoubtedly be the default behavior. However,
> while a project is under active development, it often makes the most
> sense for certain libraries to be automatically kept up-to-date. I
> think the main questions are as follows:
>
> 1. How should one specify which subrepos should be kept up-to-date, or
> "unpinned"?

I've played with the idea of simply saying that

foo = default

in .hgsubstate would mean "unpinned". More generally, I've wanted to let
.hgsubstate contain a branch name or maybe even any revset. With a
revset you could write

foo = max(branch(stable) and tagged())

to mean that you want subrepo foo to always be at the latest tagged
release on the stable branch. Sounds quite nice, I think!

For that to work, it would be necessary to pull new changesets into the
subrepos regularly -- probably with 'hg pull --subrepos'. That flag was
never implemented, even though I think Matt said he was okay with it.

> 2. How is it determined when a project is "in active development" and
> should thus keep its selected subrepos up-to-date?

I think that would be a matter of writing the .hgsubstate file
correctly.

> 3. Is anyone sufficiently motivated to add this feature?

That has historically been a problem with subrepos -- it's a feature for
large organizations and so we don't use it in core Mercurial.

--
Martin Geisler

aragost Trifork
Professional Mercurial support
http://www.aragost.com/mercurial/

Didly

unread,

Mar 6, 2012, 4:26:11 AM3/6/12

to Martin Geisler, Ben Fritz, merc...@selenic.com

Martin, I think this something that could be quite useful in some
scenarios (I've had people ask me about it in the past).

That's why I've also thought about this, and I came up with a similar
idea to yours, except that I kept the path to the repo that is being
pulled in the .hgsub file. The idea would be to do the following:

foo = foo @ max(branch(stable) and tagged())

That is, the "extended" .hgsub syntax would be:

pathtosubrepo = syncpath @ targetrevset

Using "@" could perhaps be problematic, given that "@" can be used in
the "syncpath" to specify an http username. Adding spaces could fix
that or perhaps a different separator character could be used (e.g.
"<" or something else).

If no "targetrevset" was set the behavior would be the current "fixed
revision" behavior.

That being said, I wonder if Matt will ever accept something like
this. He has said a few times in the past that he thinks that updating
to a given revision should always replicate the same state, and this
would break that assumption.

Since we are talking about subrepo improvements, there is another
question that I've gotten plenty of times since I started pushing for
Mercurial to replace ClearCase at my current job is: "How do I make a
subrepo read only?"

The current answer to that question (AFAIK) is basically that you can,
but that to do so you must play with your mercurial server push
permissions, which is not a perfect solution because if someone
modifies a "read-only" subrepo he only gets warned about it at push
time. The alternative is to create some sort of custom commit hook or
perhaps an extension, but I'd rather have a built-in, generic solution
to that problem.

Cheers,

Angel

John Gee

unread,

Mar 6, 2012, 7:57:36 AM3/6/12

to Didly, Ben Fritz, merc...@selenic.com

On 6/03/2012, at 22:26, Didly wrote:

> That's why I've also thought about this, and I came up with a similar
> idea to yours, except that I kept the path to the repo that is being
> pulled in the .hgsub file. The idea would be to do the following:
>

> [...]

>
> That being said, I wonder if Matt will ever accept something like
> this. He has said a few times in the past that he thinks that updating
> to a given revision should always replicate the same state, and this
> would break that assumption.

I would like the new functionality, but suggest a variation to preserve the current update behaviour and opt-in to the new behaviour.

I think it is important to have a normal update to a given revision replicate the same state. However, I would use a command which could update subrepos in a controlled way using optional information say in .hgsub.

Perhaps:

hg update --subrepos
working directory gets revision of master repo as usual, and subrepos are then updated as per revset in .hgsub
default behaviour with no revset in .hgsub is same as calling "hg update" on that subrepo

So with no extra information in .hgsub the command does an update in each subrepo, which seems an acceptable outcome from the look of the command.

--
John Gee
Programmers live in interesting times...

paul_...@selinc.com

unread,

Mar 6, 2012, 11:53:47 AM3/6/12

to Martin Geisler, Ben Fritz, merc...@selenic.com

mercuria...@selenic.com wrote on 03/06/2012 12:58:03 AM: > From: Martin Geisler <m...@aragost.com>

> To: Todd Greer <TGr...@affinegy.com>
> Cc: Ben Fritz <fritzo...@gmail.com>, "merc...@selenic.com" > <merc...@selenic.com>
> Date: 03/06/2012 12:58 AM
> Subject: Re: hg subrepositories always on tip?
> Sent by: mercuria...@selenic.com

>

Martin,

What's your gut feeling estimate on the difficulty of putting together a patch to enable subrepos to handle arbitrary revsets?

Some sort of hg update --subrepos & hg pull --subrepos would of course be required... do you think it would be a hard & buggy feature?

Martin Geisler

unread,

Mar 6, 2012, 6:00:47 PM3/6/12

to paul_...@selinc.com, Ben Fritz, merc...@selenic.com

paul_...@selinc.com writes:

>> I've played with the idea of simply saying that
>>
>> foo = default
>>
>> in .hgsubstate would mean "unpinned". More generally, I've wanted to
>> let .hgsubstate contain a branch name or maybe even any revset. With
>> a revset you could write
>>
>> foo = max(branch(stable) and tagged())
>>
>> to mean that you want subrepo foo to always be at the latest tagged
>> release on the stable branch. Sounds quite nice, I think!
>>
>> For that to work, it would be necessary to pull new changesets into
>> the subrepos regularly -- probably with 'hg pull --subrepos'. That
>> flag was never implemented, even though I think Matt said he was okay
>> with it.
>

> What's your gut feeling estimate on the difficulty of putting together
> a patch to enable subrepos to handle arbitrary revsets?
>
> Some sort of hg update --subrepos & hg pull --subrepos would of course
> be required... do you think it would be a hard & buggy feature?

I have already spent an afternoon looking at it. That was a while ago
and I mostly remember that I couldn't make it work smoothly in that
amount of time :)

The subrepo code is pretty well layered with regard to the rest of the
code, so I don't think it should be too hard to hook in the right place
for update. The idea is basically to read a branch/revset from the
.hgsubstate file and then use that to lookup the revision. We then keep
the revset around so that we can write it into the .hgsubstate file
again on commit -- otherwise the revset will be translated back to a
node ID immediatedly.

As far as I recall that could be made to work. But there are probably
some more tricky issues: if .hgsubstate can contain a revset, then we
can have a subrepo that is dirty even though there is nothing to commit
in the repo with .hgsubstate. A .hgsubstate with

foo = stable

would mean that foo becomes dirty as soon as new changesets on the
stable branch appear in foo. But a top-level commit wont have anything
to commit -- the same revset will be correct after foo is updated. That
might upset some parts of the code, but I never got that far in my
little experiment.

So my gut feeling is that we're talking at least "some days". Maybe one
day for a crude version where 'hg update' understands and preserves a
revset in .hgsubstate. Another day for adding 'hg pull --subrepos' and
'hg update --subrepos', a day for writing tests, a day for unforseen
problems, and a day or two for discussing the whole thing here...

--
Martin Geisler

Mercurial links: http://mercurial.ch/

Eric Siegerman

unread,

Mar 6, 2012, 9:41:13 PM3/6/12

to Martin Geisler, Ben Fritz, merc...@selenic.com

This is all sounding very exciting; it's a problem I've been
dealing with as well.

On Wed, 2012-03-07 at 00:00 +0100, Martin Geisler wrote:
> The idea is basically to read a branch/revset from the
> .hgsubstate file and then use that to lookup the revision. We then
keep
> the revset around so that we can write it into the .hgsubstate file
> again on commit -- otherwise the revset will be translated back to a
> node ID immediatedly.

ISTM that the branch/revset should be specified in .hgsub rather
than in .hgsubstate. There are a number of reasons:

- It would eliminate the situation you describe:

> if .hgsubstate can contain a revset, then we
> can have a subrepo that is dirty even though there is nothing to
commit
> in the repo with .hgsubstate.

That's because the test for "is the parent repo dirty?" would
be exactly as it is now: it's dirty if changes have been made
within the repo's own scope OR if .hgsubstate is out of date.

- In general, files that are both user-edited *and*
automatically maintained tend to cause grief (the above being
merely one flavour of grief). IMO it's generally better to
put human- and machine-generated stuff in separate files.

- It would keep .hgsubstate available to do what it does now --
snapshotting the specific revision of each subrepo. That
in turn would give us the best of both worlds:
- "hg update --subrepos" would do what people have been
describing -- update each subrepo based on the revset
specified in .hgsub; but at the same time,

- "hg commit" would still snapshot the exact state of the
tree, recursively down through the subrepos, as is now
the case

This combination would in turn let one do the following,
when attempting to put together a known-good version of the
entire project, e.g. in one's continuous-integration system:
hg update --subrepos
hg commit '-mTrial build'
# Build and test.
# If happy:
hg tag -f latest-clean-build

Cutting a release would look pretty similar, but presumably
with more testing :-), and with the final line being:
hg tag release-1.2.3

- Eric

Martin Geisler

unread,

Mar 7, 2012, 2:55:39 AM3/7/12

to Eric Siegerman, Ben Fritz, merc...@selenic.com

Eric Siegerman <pub0...@davor.org> writes:

> This is all sounding very exciting; it's a problem I've been dealing
> with as well.
>
> On Wed, 2012-03-07 at 00:00 +0100, Martin Geisler wrote:
>> The idea is basically to read a branch/revset from the .hgsubstate
>> file and then use that to lookup the revision. We then keep the
>> revset around so that we can write it into the .hgsubstate file again
>> on commit -- otherwise the revset will be translated back to a node
>> ID immediatedly.
>
> ISTM that the branch/revset should be specified in .hgsub rather than
> in .hgsubstate. There are a number of reasons:
>
> - It would eliminate the situation you describe:
>
> > if .hgsubstate can contain a revset, then we can have a subrepo
> > that is dirty even though there is nothing to commit in the repo
> > with .hgsubstate.
>
> That's because the test for "is the parent repo dirty?" would be
> exactly as it is now: it's dirty if changes have been made within
> the repo's own scope OR if .hgsubstate is out of date.
>
> - In general, files that are both user-edited *and*
> automatically maintained tend to cause grief (the above being
> merely one flavour of grief). IMO it's generally better to
> put human- and machine-generated stuff in separate files.

I agree with that -- putting the revset in the .hgsubstate file would be
bad. I guess I talked about it as being there since that makes it clear
that it's a generalization of what we have now: the changeset hash
that's listed in .hgsubstate *is* a very simple revset.

> - It would keep .hgsubstate available to do what it does now --
> snapshotting the specific revision of each subrepo. That
> in turn would give us the best of both worlds:
>
> - "hg update --subrepos" would do what people have been
> describing -- update each subrepo based on the revset
> specified in .hgsub; but at the same time,
>
> - "hg commit" would still snapshot the exact state of the tree,
> recursively down through the subrepos, as is now the case

I'm not sure that's so nice: I think one of the points of being able to
pin a subrepo to a branch tip must be that you don't want a commit in
the top-level repo every time the subrepo changs?

If 'hg update' will use the revset to update the subrepo and thus make
the .hgsubstate out of date, then a brand new clone could end up with a
dirty subrepo (if new changesets have been added to the subrepo since
the last top-level commit of .hgsubstate). That feels a bit weird.

--
Martin Geisler

aragost Trifork
Professional Mercurial support
http://www.aragost.com/mercurial/

Scott Palmer

unread,

Mar 7, 2012, 9:00:46 AM3/7/12

to Martin Geisler, merc...@selenic.com, Ben Fritz

That makes sense, but I will also agree that this should be handled entirely in .hgsub. I suggest a new section similar to how [subpaths] were added there. Keep things simple and with a single purpose. The main part will do the mapping of folder to URL only as it does today. If some subrepos have special needs - like wanting to track the tip or some bookmark then that behaviour should be specified in a new section that can work without hacking at the URLs of the main part.

>> - It would keep .hgsubstate available to do what it does now --
>> snapshotting the specific revision of each subrepo. That
>> in turn would give us the best of both worlds:
>>
>> - "hg update --subrepos" would do what people have been
>> describing -- update each subrepo based on the revset
>> specified in .hgsub; but at the same time,
>>
>> - "hg commit" would still snapshot the exact state of the tree,
>> recursively down through the subrepos, as is now the case
>
> I'm not sure that's so nice: I think one of the points of being able to
> pin a subrepo to a branch tip must be that you don't want a commit in
> the top-level repo every time the subrepo changs?

I think that's okay. You are doing the commit in the parent because you want a snapshot of the state as it is at that moment. I don't see another way that preserves that behaviour. This still should need the option as described above, "hg update --subrepos" in order for the pinned revisions to change. Reproducing the state as it was at the time of the commit is still important even when we have enabled this feature. Therefore I feel even with the added options to .hgsub to enable this behaviour (on a per-subrepo basis) it should still only be triggered via a new option to the update command.

>
> If 'hg update' will use the revset to update the subrepo and thus make
> the .hgsubstate out of date, then a brand new clone could end up with a
> dirty subrepo (if new changesets have been added to the subrepo since
> the last top-level commit of .hgsubstate). That feels a bit weird.

Yes that would be weird. I think clone should work as it does today with a regular "update" that uses the revisions as pinned in the .hgsubstate file. Only with the option to update recursively should it deviate from that. E.g hg clone --subrepos

Regards,

Scott

Martin Geisler

unread,

Mar 7, 2012, 10:01:02 AM3/7/12

to Scott Palmer, Henrik Stuart, Sune Foldager, merc...@selenic.com, Ben Fritz

Scott Palmer <swpa...@gmail.com> writes:

> On 2012-03-07, at 2:55 AM, Martin Geisler wrote:
>
>> Eric Siegerman <pub0...@davor.org> writes:
>>
>>> This is all sounding very exciting; it's a problem I've been dealing
>>> with as well.
>>>
>

>>> - It would keep .hgsubstate available to do what it does now --
>>> snapshotting the specific revision of each subrepo. That
>>> in turn would give us the best of both worlds:
>>>
>>> - "hg update --subrepos" would do what people have been
>>> describing -- update each subrepo based on the revset
>>> specified in .hgsub; but at the same time,
>>>
>>> - "hg commit" would still snapshot the exact state of the tree,
>>> recursively down through the subrepos, as is now the case
>>
>> I'm not sure that's so nice: I think one of the points of being able
>> to pin a subrepo to a branch tip must be that you don't want a commit
>> in the top-level repo every time the subrepo changs?
>
> I think that's okay. You are doing the commit in the parent because
> you want a snapshot of the state as it is at that moment. I don't see
> another way that preserves that behaviour. This still should need the
> option as described above, "hg update --subrepos" in order for the
> pinned revisions to change.

If that's all you want, then the onsub extension almost handles it.
Running

hg onsub 'hg pull; hg update "max(default and tagged())"'

will do what you want, except that the target revset is not read from
the .hgsub file. I think that would be an easy thing to add, though.

> Reproducing the state as it was at the time of the commit is still
> important even when we have enabled this feature. Therefore I feel
> even with the added options to .hgsub to enable this behaviour (on a
> per-subrepo basis) it should still only be triggered via a new option
> to the update command.

My feeling is that the attractive aspect of this feature is that you get
a loose coupling between the repos -- you don't record the exact
revision that was used for everything, you only record that "tip of
default" was used.

For a release you'll want to freeze everything, but until then it might
be enough to just use a snapshot of the default branch. This is similar
to what I've seen in Maven: you can declare that your package version to
be 0.1.SNAPSHOT -- that's like "0.1 - epsilon". Such a package can then
depend on other components in their SNAPSHOT versions. When you make a
release you remove the SNAPSHOT parts of the version numbers and write
the exact version numbers.

It also sounds similar to the in-house repoman system that Sune and
Henrik has built. There each component tracks a given branch and you get
the tip of that branch in a new checkout -- Sune/Henrik, please correct
me if I'm wrong :)

>> If 'hg update' will use the revset to update the subrepo and thus
>> make the .hgsubstate out of date, then a brand new clone could end up
>> with a dirty subrepo (if new changesets have been added to the
>> subrepo since the last top-level commit of .hgsubstate). That feels a
>> bit weird.
>
> Yes that would be weird. I think clone should work as it does today
> with a regular "update" that uses the revisions as pinned in the
> .hgsubstate file. Only with the option to update recursively should it
> deviate from that. E.g hg clone --subrepos

Today, clone already clones the full subrepo: it does not add a -r flag
on the clone command to only get what the .hgsubstate file references.
That turns out to be important for local clones where you can a
hard-linked subrepo clone.

--
Martin Geisler

aragost Trifork
Professional Mercurial support
http://www.aragost.com/mercurial/

Scott Palmer

unread,

Mar 7, 2012, 10:11:14 AM3/7/12

to Martin Geisler, Henrik Stuart, Sune Foldager, merc...@selenic.com, Ben Fritz

Yes, almost is the key. It treats all of the subrepos the same. Some of the subrepos I may want to have pinned, while others should track the latest and greatest. Some should perhaps track a particular branch or bookmark, while others should be on the tip of the default branch.

>> Reproducing the state as it was at the time of the commit is still
>> important even when we have enabled this feature. Therefore I feel
>> even with the added options to .hgsub to enable this behaviour (on a
>> per-subrepo basis) it should still only be triggered via a new option
>> to the update command.
>
> My feeling is that the attractive aspect of this feature is that you get
> a loose coupling between the repos -- you don't record the exact
> revision that was used for everything, you only record that "tip of
> default" was used.

I think as long as you are on the tip of the parent that is fine… but I want to be able to reproduce the history accurately when I update to any revision of my parent repo that is not on the tip of a branch. That is after all what version control is all about. I think it is critical. Once I'm "back in time" I can always bring the subrepos forward in time if I need to by issuing some other command (e.g. hg update --subrepos, or in some trivial cases using "onsub")

> For a release you'll want to freeze everything, but until then it might
> be enough to just use a snapshot of the default branch. This is similar
> to what I've seen in Maven: you can declare that your package version to
> be 0.1.SNAPSHOT -- that's like "0.1 - epsilon". Such a package can then
> depend on other components in their SNAPSHOT versions. When you make a
> release you remove the SNAPSHOT parts of the version numbers and write
> the exact version numbers.

Yes. This is very similar. Just as in Maven though, we have to be able to choose which things are using the "snapshots" (i.e. latest in the 0.1.x branch) and which need to be something more specific.

> It also sounds similar to the in-house repoman system that Sune and
> Henrik has built. There each component tracks a given branch and you get
> the tip of that branch in a new checkout -- Sune/Henrik, please correct
> me if I'm wrong :)
>
>>> If 'hg update' will use the revset to update the subrepo and thus
>>> make the .hgsubstate out of date, then a brand new clone could end up
>>> with a dirty subrepo (if new changesets have been added to the
>>> subrepo since the last top-level commit of .hgsubstate). That feels a
>>> bit weird.
>>
>> Yes that would be weird. I think clone should work as it does today
>> with a regular "update" that uses the revisions as pinned in the
>> .hgsubstate file. Only with the option to update recursively should it
>> deviate from that. E.g hg clone --subrepos
>
> Today, clone already clones the full subrepo: it does not add a -r flag
> on the clone command to only get what the .hgsubstate file references.
> That turns out to be important for local clones where you can a
> hard-linked subrepo clone.

Yes, I know the clone is full, but the updated that usually happens with a clone would be based on the .hgsubstate file.

Regards,

Scott

Greg Ward

unread,

Mar 7, 2012, 11:36:40 AM3/7/12

to Martin Geisler, Henrik Stuart, Sune Foldager, merc...@selenic.com, Ben Fritz

On 07 March 2012, Martin Geisler said:
> > Reproducing the state as it was at the time of the commit is still
> > important even when we have enabled this feature. Therefore I feel
> > even with the added options to .hgsub to enable this behaviour (on a
> > per-subrepo basis) it should still only be triggered via a new option
> > to the update command.
>
> My feeling is that the attractive aspect of this feature is that you get
> a loose coupling between the repos -- you don't record the exact
> revision that was used for everything, you only record that "tip of
> default" was used.
>
> For a release you'll want to freeze everything, but until then it might
> be enough to just use a snapshot of the default branch. This is similar
> to what I've seen in Maven: you can declare that your package version to
> be 0.1.SNAPSHOT -- that's like "0.1 - epsilon". Such a package can then
> depend on other components in their SNAPSHOT versions. When you make a
> release you remove the SNAPSHOT parts of the version numbers and write
> the exact version numbers.

Disclaimer: I know almost nothing about Maven and have minimal
experience with subrepos. However, I did watch a training video about
"agile best practices" the other day that strongly recommended against
using this Maven feature. I believe the presenter (Neal Ford) called
it a misfeature.

I've certainly noticed a strong tendency in the Java community to
specify *precisely* which versions of which third-party libraries
(jars) a particular project uses. This seems to be the case whether
they use Maven to specify the dependencies in a config file, or
whether they just commit all their dependencies to source control. I
can see why they do it: it minimizes the number of moving parts you
have to worry about. I think that is exactly what subrepos try to
achieve, albeit at a source level rather than a binary level.

I've also noticed that C, Python, and Perl programmers seem to be much
more relaxed about dependencies. Yeah, certain versions are required,
and maybe even some versions are too new to work -- but apart from
that, whatever. Hmmm. Wonder if there's any deeper significance to
that. Probably not. ;-)

Greg
--
Greg Ward http://www.gerg.ca/
A day for firm decisions!!!!! Or is it?

Todd Greer

unread,

Mar 7, 2012, 4:28:59 PM3/7/12

to Scott Palmer, Martin Geisler, Henrik Stuart, Sune Foldager, merc...@selenic.com, Ben Fritz

[I'm top-posting intentionally, because I can't manage to trim the dialog sufficiently while still maintaining adequate context. Sorry.]

I really like the proposal to support letting the selected subrepos be unpinned, but still recording their states on commit so you can reproduce a build.
I'd like to propose a slightly more general syntax for a new .hgsub section by way of example:

subrepo1 = subrepo1
subrepo2 = subrepo2
subrepo3 = subrepo3

[unpins]
#I am a novice with revspecs, so please forgive any errors.
active.subrepo1 = tip
active.subrepo2 = tip

beta.subrepo1 = .hgsubstate
beta.subrepo2 = max(branch(stable) and tagged())

----- end of .hgsub -----

The idea is that a default clone or update would always update according to .hgsubstate. If, OTOH you say "hg update --with-subrepo-config active", then the "active" rules are consulted, which use the tip and the head of "main-dev-branch". You can have as many of these subrepo "liveness" configurations as you want.

If you clone a subrepo with " --with-subreo-config active" (or beta, or whatever), you will immediately have dirty subrepos, and a commit will commit changes to .hgsubstate. This is a good thing, as it lets you run a sequence of update, build, commit, tag, and now have a reproducible build. Remember that, at the time I produce a build, I don't know whether it is a release or not, since it needs final testing. Thus, the steps to make a given repo state releasable need to be minimal.

That said, I think the critical ideas are the ones that have already been proposed around specifying this stuff in .hgsub, and still keeping .hgsubstate up-to-date. If having multiple configurations significantly increases the work, it may not be worth it. Also, I'm assuming that current Hg clients would ignore the entire section, and always get the subrepos according to .hgsubstate. Assuming this, I can put the new hg version (or extension) on the build machine, and not worry about getting all my developers to upgrade at the same time.

I don't have the ability to contribute right now towards implementing whatever approach is chosen, but I'll be happy to help test.

Thanks,
Todd

-----Original Message-----
From: mercuria...@selenic.com [mailto:mercuria...@selenic.com] On Behalf Of Scott Palmer
Sent: Wednesday, March 07, 2012 9:11 AM
To: Martin Geisler
Cc: Henrik Stuart; Sune Foldager; merc...@selenic.com; Ben Fritz
Subject: Re: hg subrepositories always on tip?

I think as long as you are on the tip of the parent that is fine... but I want to be able to reproduce the history accurately when I update to any revision of my parent repo that is not on the tip of a branch. That is after all what version control is all about. I think it is critical. Once I'm "back in time" I can always bring the subrepos forward in time if I need to by issuing some other command (e.g. hg update --subrepos, or in some trivial cases using "onsub")

Reply all

Reply to author

Forward