[pkg-discuss] How to handle @current

Brock Pytlik

unread,

Mar 5, 2012, 3:00:37 PM3/5/12

to pkg discuss

As part of Danek's design for the publication changes, we need a way to
defer dependency resolution until it's known whether a new version of a
package is going to be published or not. I've been struggling with how
to do it for a little while, but I think I've got things nailed down
now. What follows is a rough idea of how I envision the publication
process happening. I've written this as if the processing was being done
on the client side of things, but from step 3 on, it could be happening
on the server side.

1) Do pkgdep generate on all manifests to be published. This step is
unchanged from the present.

2) Do pkgdep resolve -P. -P is a new option which means "postpone
dependency versions." It also eliminates the dependency collapsing and
coalescing which currently happens during the resolve phase.
Dependencies inferred on system packages during this phase will have
versions attached to them, but those inferred on other packages also
being resolved will have versions of "@current."

3) Use pkgsend (in some possibly new form) to fill in file hashes and
possibly pkgmog to apply other transformations so that the all the
actions except for dependency actions, signature actions, and attribute
actions with the name pkg.fmri are what they would be when the package
is finally published. This step may or may not move file data across to
the repo.

4) Grab the repo lock. I'm being deliberately vague here. Since we're
comparing the manifest we're considering publishing with one in the
repository, if another package got published after we compared, but
before we published, we might make incorrect decisions about whether to
publish a package. This could be as simple as a convention of not
allowing multiple people to publish to the same repository at the same
time. It could be accomplished by comparing the catalog at comparison
time with the catalog when the packages are being published to ensure
the state of the world is what's expected. It could be an actual lock on
publishing to the repository.

5) For each manifest to be published compare all actions except
attributes whose name is pkg.fmri, signature actions (though if the
signing cert is different, might want to mark it as being different
anyway), and dependency actions with the version of the package
immediately less than the version to be published. Put all manifests
with at least one different action in the "different manifests" set. Put
the rest in the "identical manifests" set.

6) For all depend actions in manifests to be published which use
<pkg-name>@current in the target or predicate, if <pkg-name> is in the
set of "different manifests", replace @current with the version of the
package in the package to be published, otherwise replace @current with
the version of the previously published manifest

7) Now that all dependencies have fixed versions, collapse and coalesce
the dependencies for all the packages to be published.

8) For each manifest to be published, compare all actions except
attributes whose name is pkg.fmri and signature actions with the version
of the package immediately less than the version to be published. Put
all manifests which are different in the "actually to be published" set.

9) Publish each manifest in the "actually to be published" set.

10) Release the repo lock.

I believe this approach will work. Please let me know if you see any
issues with the proposal.

Thanks,
Brock
_______________________________________________
pkg-discuss mailing list
pkg-d...@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Shawn Walker

unread,

Mar 8, 2012, 12:25:39 PM3/8/12

to Brock Pytlik, pkg discuss

Any physical repository lock implies that this isn't suitable for http
publication. If http publication needs to be supported, there needs to
be a 'pkgrepo unlock' or the like.

> 5) For each manifest to be published compare all actions except
> attributes whose name is pkg.fmri, signature actions (though if the
> signing cert is different, might want to mark it as being different
> anyway), and dependency actions with the version of the package
> immediately less than the version to be published. Put all manifests
> with at least one different action in the "different manifests" set. Put
> the rest in the "identical manifests" set.
>
> 6) For all depend actions in manifests to be published which use
> <pkg-name>@current in the target or predicate, if <pkg-name> is in the
> set of "different manifests", replace @current with the version of the
> package in the package to be published, otherwise replace @current with
> the version of the previously published manifest

How will you deal with timestamps which are server side, or is your
assumption that @current doesn't include the timestamp?

> 7) Now that all dependencies have fixed versions, collapse and coalesce
> the dependencies for all the packages to be published.
>
> 8) For each manifest to be published, compare all actions except
> attributes whose name is pkg.fmri and signature actions with the version
> of the package immediately less than the version to be published. Put
> all manifests which are different in the "actually to be published" set.
>
> 9) Publish each manifest in the "actually to be published" set.
>
> 10) Release the repo lock.
>
> I believe this approach will work. Please let me know if you see any
> issues with the proposal.

For me, the prototype will likely be easier to follow. This isn't a
failing of yours, it's just hard to visualize the entire process in my
head from text.

-Shawn

Brock Pytlik

unread,

Mar 8, 2012, 5:10:21 PM3/8/12

to Shawn Walker, pkg discuss

On 03/08/12 09:25, Shawn Walker wrote:
> On 03/05/12 12:00, Brock Pytlik wrote:

>> [snip]

>>
>> 4) Grab the repo lock. I'm being deliberately vague here. Since we're
>> comparing the manifest we're considering publishing with one in the
>> repository, if another package got published after we compared, but
>> before we published, we might make incorrect decisions about whether to
>> publish a package. This could be as simple as a convention of not
>> allowing multiple people to publish to the same repository at the same
>> time. It could be accomplished by comparing the catalog at comparison
>> time with the catalog when the packages are being published to ensure
>> the state of the world is what's expected. It could be an actual lock on
>> publishing to the repository.
>
> Any physical repository lock implies that this isn't suitable for http
> publication. If http publication needs to be supported, there needs
> to be a 'pkgrepo unlock' or the like.
>

Yep, see the last step. As a side note, this isn't really part of
@current. As far as I can tell, is something that falls out of the new
publication model that Danek sent out. That's why I was intentionally vague.

>> [snip]

>>
>> 6) For all depend actions in manifests to be published which use
>> <pkg-name>@current in the target or predicate, if <pkg-name> is in the
>> set of "different manifests", replace @current with the version of the
>> package in the package to be published, otherwise replace @current with
>> the version of the previously published manifest
>
> How will you deal with timestamps which are server side, or is your
> assumption that @current doesn't include the timestamp?
>

@current is literally the magical string "@current", there is no timestamp.

>> 7) Now that all dependencies have fixed versions, collapse and coalesce
>> the dependencies for all the packages to be published.
>>
>> 8) For each manifest to be published, compare all actions except
>> attributes whose name is pkg.fmri and signature actions with the version
>> of the package immediately less than the version to be published. Put
>> all manifests which are different in the "actually to be published" set.
>>
>> 9) Publish each manifest in the "actually to be published" set.
>>
>> 10) Release the repo lock.
>>
>> I believe this approach will work. Please let me know if you see any
>> issues with the proposal.
>
> For me, the prototype will likely be easier to follow. This isn't a
> failing of yours, it's just hard to visualize the entire process in my
> head from text.
>

Ok, that's why I sent out a draft implementation that handles step 2,
and I'll probably send out another one later which handles steps 5
through 7 or 8.

Brock

Bart Smaalders

unread,

Mar 13, 2012, 12:48:05 PM3/13/12

to pkg-d...@opensolaris.org

On 03/08/12 14:10, Brock Pytlik wrote:
> On 03/08/12 09:25, Shawn Walker wrote:
>> On 03/05/12 12:00, Brock Pytlik wrote:
>>> [snip]

Right now we rely on being able to publish multiple packages
at once in ON to get satisfactory performance. Can we do
the checking single threaded, but actually publish in parallel?

- Bart

--
Bart Smaalders Solaris Kernel Performance
bart.sm...@oracle.com http://blogs.oracle.com/barts
"You will contribute more with Mercurial than with Thunderbird."
"Civilization advances by extending the number of important
operations which we can perform without thinking about them."

Shawn Walker

unread,

Mar 13, 2012, 1:49:35 PM3/13/12

to Brock Pytlik, pkg discuss

On 03/08/12 14:10, Brock Pytlik wrote:

You misunderstand me. I was referring to "replace @current with the
version of the package in the package to be published", etc. When the
replacement is done, does that replacement include the timestamp?

Brock Pytlik

unread,

Mar 13, 2012, 4:03:36 PM3/13/12

to pkg-d...@opensolaris.org

On 03/13/12 09:48, Bart Smaalders wrote:
> On 03/08/12 14:10, Brock Pytlik wrote:
>> On 03/08/12 09:25, Shawn Walker wrote:
>>> On 03/05/12 12:00, Brock Pytlik wrote:
>>>> [snip]
>
> Right now we rely on being able to publish multiple packages
> at once in ON to get satisfactory performance. Can we do
> the checking single threaded, but actually publish in parallel?
>

Here's the real constraint, once a package 'pkg:/foo' has been compared
to the package 'pkg:/foo' to be published, no other process should start
comparing or processing a package with that name (in this case, foo)
until the new version has been published.

So, in the RE/ON case, it should be fine as long as two different people
don't attempt to publish to a repo at the same time. Looking back at the
steps,
1) pkgdep generate can be done in parallel
2) pkgdep resolve -P has to be done single threaded
3) pkgsend fill can be done in parallel
5) comparing the manifests can be done in parallel (whether it's done in
parallel or not, the repo's catalog must be unchanging at this point)
6) if the output of comparing the manifests is stored appropriately,
then replacing @current with the appropriate versions could be done in
parallel once everything in step 5 had finished. I'm not sure parallel
would be a huge win here because there would be a substantial bit of
duplicated work, but it's worth looking into at least.
7) collapsing and coalescing the dependencies can be done in parallel
8) the second round of comparisons can also be done in parallel (Again
though, the repo's catalog must still be unchanging)
9) actually publishing the manifests can be done in parallel though
(though only for this set of packages, not for some other set of
packages someone else was working on, and the catalog at this point
needs to be the same as what it was at step 5)

For simplicity and my own sanity, I've assumed a single, repo level,
"lock." In actuality, as long as the set of packages that person A and
person B want to publish are completely disjoint, they could both go
through the above steps at the same time. Personally, I'd see that as
more trouble than it's worth, but implementing that could be done.

Hth,
Brock

> - Bart

Brock Pytlik

unread,

Mar 13, 2012, 4:15:47 PM3/13/12

to Shawn Walker, pkg discuss

On 03/13/12 10:49, Shawn Walker wrote:
> [snip]

>>
>>>> [snip]
>>>>
>>>> 6) For all depend actions in manifests to be published which use
>>>> <pkg-name>@current in the target or predicate, if <pkg-name> is in the
>>>> set of "different manifests", replace @current with the version of the
>>>> package in the package to be published, otherwise replace @current
>>>> with
>>>> the version of the previously published manifest
>>>
>>> How will you deal with timestamps which are server side, or is your
>>> assumption that @current doesn't include the timestamp?
>>>
>> @current is literally the magical string "@current", there is no
>> timestamp.
>
>
> You misunderstand me. I was referring to "replace @current with the
> version of the package in the package to be published", etc. When the
> replacement is done, does that replacement include the timestamp?
>

Since automatically generated dependencies don't include timestamps now,
I can't see why we'd start adding them to the versions that replace
@current. That does mean that there's a shortcut we can take where if
the currently published package and the package to be published have the
same version, then we know what version to replace @current with before
we even do the manifest diffs.

Brock

Bart Smaalders

unread,

Mar 13, 2012, 7:06:00 PM3/13/12

to pkg-d...@opensolaris.org

There's only ever one user publishing to a repo at a time - but he
is using parallel make :).

- Bart

--
Bart Smaalders Solaris Kernel Performance
bart.sm...@oracle.com http://blogs.oracle.com/barts
"You will contribute more with Mercurial than with Thunderbird."
"Civilization advances by extending the number of important
operations which we can perform without thinking about them."

Danek Duvall

unread,

Mar 15, 2012, 11:14:53 AM3/15/12

to Brock Pytlik, pkg discuss

Your step 5 compares the intended package version with the one "immediately
less" already in the repo. I think this is likely to lead to confusion.
I'd originally envisioned -- and would much rather see -- using a
client-defined set of package versions as the basis for this comparison.
Perhaps we pass a version of entire or some other incorporation on the
commandline and have the client figure out what the right set is from there
(some heuristics can probably make that reasonably fast in the most common
cases), but fundamentally, the client end of the operation needs to define
this comparison.

That should get rid of the need to lock the repo. Between knowing in
advance what versions to compare against and having a transaction group
within which to resolve @current and yea or nay the entire op, I think the
repo itself can continue on its merry way during publication (possibly with
the exception of removing packages).

Danek

Brock Pytlik

unread,

Mar 19, 2012, 5:19:09 PM3/19/12

to Danek Duvall, pkg discuss

On 03/15/12 08:14, Danek Duvall wrote:
> Your step 5 compares the intended package version with the one "immediately
> less" already in the repo. I think this is likely to lead to confusion.
> I'd originally envisioned -- and would much rather see -- using a
> client-defined set of package versions as the basis for this comparison.
> Perhaps we pass a version of entire or some other incorporation on the
> commandline and have the client figure out what the right set is from there
> (some heuristics can probably make that reasonably fast in the most common
> cases), but fundamentally, the client end of the operation needs to define
> this comparison.

Assuming the client is specifying the exact packages to compare against,
down to timestamps, then I could see this working. If the packages
aren't specified to timestamps, then we haven't really made any
meaningful guarantees about correctness.

I'm not a fan of this approach because I think it imposes more work on
the user, which means it's more prone to have something unexpected
happen. Perhaps I'm missing something, but if a@1 and a@2 are in the
repo currently, when would a user ever want to compare against a@1
instead of a@2 when publishing a@3?

Brock

Danek Duvall

unread,

Mar 20, 2012, 1:27:26 PM3/20/12

to Brock Pytlik, pkg discuss

Brock Pytlik wrote:

> Assuming the client is specifying the exact packages to compare against,
> down to timestamps, then I could see this working. If the packages aren't
> specified to timestamps, then we haven't really made any meaningful
> guarantees about correctness.

True.

> I'm not a fan of this approach because I think it imposes more work on the
> user, which means it's more prone to have something unexpected happen.
> Perhaps I'm missing something, but if a@1 and a@2 are in the repo currently,
> when would a user ever want to compare against a@1 instead of a@2 when
> publishing a@3?

It's probably not strictly necessary, but to be fair, allowing it was part
of your proposal, too. The scenarios I see this being useful for are
backpublishing and the branching of the versioning space (maintaining both
the SRU and the development packages in a single repo, for instance).

You could also get rid of the lock by grabbing a snapshot at the beginning
of the operation -- simply retrieving the catalog should be sufficient --
in order to get a surface that won't change during the course of the
publication. It'd be nice to be able to freeze that surface over the
course of multiple publications, though. Say I'm repeatedly building and
publishing a component as I'm tweaking its build and packaging. I'm going
to want a fixed reference that doesn't change as I pile my test
publications into the reference repo (or if I'm using the official repo,
find that a new build parachuted in while I was doing my work).

I'm a bit uncertain as to what "immediately less" means in a branching
version space. If you can define that rigorously, great; otherwise, we can
probably simplify to "compare against tip" and simply require that the user
start off with a repo whose package surface defined by the newest revisions
of each package is the surface they want.

I still think that using an incorporation as a reference to a surface makes
sense (at least when considering entire) -- just take the newest packages
that satisfy that incorporation, and stash that list of package versions
until you decide to change them.

Brock Pytlik

unread,

Mar 20, 2012, 6:17:29 PM3/20/12

to Danek Duvall, pkg discuss

On 03/20/12 10:27, Danek Duvall wrote:
> Brock Pytlik wrote:
[snip]

>> I'm not a fan of this approach because I think it imposes more work on the
>> user, which means it's more prone to have something unexpected happen.
>> Perhaps I'm missing something, but if a@1 and a@2 are in the repo currently,
>> when would a user ever want to compare against a@1 instead of a@2 when
>> publishing a@3?
> It's probably not strictly necessary, but to be fair, allowing it was part
> of your proposal, too. The scenarios I see this being useful for are
> backpublishing and the branching of the versioning space (maintaining both
> the SRU and the development packages in a single repo, for instance).

I'm not sure how it was part of my proposal. I think my proposal
would've compared a@1.5 with a@1, but I'm not sure how a@3 could ever be
compared with a@1 when a@2 exists (barring race conditions).

>
> You could also get rid of the lock by grabbing a snapshot at the beginning
> of the operation -- simply retrieving the catalog should be sufficient --
> in order to get a surface that won't change during the course of the
> publication. It'd be nice to be able to freeze that surface over the
> course of multiple publications, though. Say I'm repeatedly building and
> publishing a component as I'm tweaking its build and packaging. I'm going
> to want a fixed reference that doesn't change as I pile my test
> publications into the reference repo (or if I'm using the official repo,
> find that a new build parachuted in while I was doing my work).

I don't see how grabbing a snapshot would remove the need for a lock.
Sure, I can keep comparing a@3 against a@1 like it was when I took my
snapshot, but if a@2 is in the repo when I go to publish, then I'm still
comparing against the wrong thing. Most importantly, I could decide not
to publish a@3 since it was identical to a@1 and end up with a
bad/corrupt/wrong repo because a@2 was actually present.

I also don't understand the example above. If you're repeatedly building
and publishing a component you're working on, presumably you're not
doing this into the repo that RE's going to be publishing to. I'd assume
that you're doing it into your own private repo. The reference is always
fixed, it's the repo you're publishing to. There's no way to use "the
official repo", whatever that means, at least in my scheme. If you're
publishing to repo X, you're only comparing against packages in repo X.

Now, if what you're suggesting is that you're publishing into your
private repo, but you only want to publish packages that have changed
compared to a reference repo (say the ipkg repo), that's doable, but's
orthogonal to what I've been discussing so far. In that situation, I'm
still not certain what the reference surface buys you but I'll have to
think some more since the reference repo concept wasn't in my design at all.

>
> I'm a bit uncertain as to what "immediately less" means in a branching
> version space. If you can define that rigorously, great; otherwise, we can
> probably simplify to "compare against tip" and simply require that the user
> start off with a repo whose package surface defined by the newest revisions
> of each package is the surface they want.

Here was how I implemented "immediately less" in my prototype:
1) Use the name of the package as a pattern for catalog.gen_packages
2) catalog.gen_packages always returns results in descending version order
3) iterate through the results from catalog.gen_packages, if the result
is the first result that the package to be published is a successor to,
then that result is the version that's "immediately less"

It's unclear to me why branching versions matter here. A concrete
example of the situation you're concerned about would probably help me
see the problem better.

>
> I still think that using an incorporation as a reference to a surface makes
> sense (at least when considering entire) -- just take the newest packages
> that satisfy that incorporation, and stash that list of package versions
> until you decide to change them.

Sure we can do that, but I'm not really seeing a reason to do it, at
least it doesn't remove the need for a lock.

As I said when I started this, I'm fine with saying that we don't need a
lock because we're just telling people:
"Hey, don't be idiots. Don't have two or more people publishing
overlapping sets of package names, especially in an adversarial fashion."
I think that's a fine convention to establish.

However, if we don't want to depend upon that external convention, and
we care that the repo is consistent/correct immediately after your
publication finishes, then we need a lock (or a check/abort step).

As far as I can see, stashing the list of package versions aside only
increases the likelihood of ending up with an inconsistent repo, and
doesn't actually produce any benefit. (I should say, it doesn't produce
any correctness benefit I can see. It could potentially give speed
benefits.) What problem is that bit trying to solve?

Brock

Reply all

Reply to author

Forward