=== Transition ===
The compiler will support two compilation modes: maximal sharding and simulated sharding. Maximal sharding is used when all linkers support it and the Precompile/CompilePerms/Link entry points are used. Simulated sharding is used when either some linker can't shard or when the Compiler entry point is used.
Linkers individually indicate whether they implement the sharding or non-sharding API. This allows linkers to be updated one by one and to leave the non-sharding API behind once they do. It does not cause trouble with other linkers, because in practice linkers are highly independent. I've looked at as many linkers as I could find to verify this. Occasionally one linker depends on another; in such a case they'll have to be updated in tandem, but the need for that should be rare.
By default, a linker is assumed to want the legacy non-sharding API. For such linkers, it isn't safe to assume it generators or its associated artifacts can be safely serialized and then deserialized on a different computer.
The non-sharding API will be deprecated. After the sharding API has been out for one GWT release cycle, support for non-shardable linkers will be dropped.
=== Maximal sharding ===
Currently, Precompile parses Java into ASTs and runs generators. CompilePerms then runs one copy for each permutation, in parallel. Each instance optimizes the AST for one permutation and then converts it into JavaScript plus some additional artifacts. Finally, Link takes the JavaScript and all the produced artifacts, runs the individual linkers, and produces the final output. In summary, the three stages are:
current Precompile:
current CompilePerms:
current Link:
With maximal sharding, Precompile does no work except to count the number of permutations. Each CompilePerms instance parses Java ASTs, run generators, and optimizes for a specific permutation. Additionally, each CompilePerms instance also runs the shardable part of linkers on the results for that permutation. It then "thins" the artifacts (see below) and emits them. Finally, Link takes these results from the CompilePerms instances, runs the final, non-shardable part of each linker, and emits all the artifacts designated as emitted artifacts. In summary, the maximal-sharding staging looks like this:
new Precompile:
new CompilePerms:
new Link:
=== Simulated Sharding ===
Simulated sharding uses the in-trunk compiler staging, but runs the linkers as much as possible as if they were using the maximal sharding staging. The sequence is the same whether the Compiler entry point is used or the Precompile/CompilePerms/Link trio of entry points is used. Under simulated sharding, the Precompile and CompilePerms steps run exactly as in trunk. The Link stage, however, runs the linkers in a careful order so as to use the sharded API for those linkers that have been updated:
=== Development mode ===
Development mode does not generate any compiled permutations. Thus, it does not run the per-permutation part of linkers. It does, however, need to run the final-link part of linkers. It should do this just after the places it calls link() or relink().
=== Detailed API changes ===
=== Thinning of an artifact set ===
After the sharded part of a linker runs, the resulting artifact set is thinned down, so as to minimize the amount sent back to the Link node and to minimize the amount of deserialization that Link has to do. Thinning an artifact set does two things:
=== Order of linkers ===
Whenever the compiler runs a number of linkers, it runs them in the order implied by the PRE, PRIMARY, and POST annotations. This is true on the shards and not, as well as with both the shardable and non-shardable link() methods.
I don't maintain any linkers but I have experimented with multi-
machine builds. The current Precompile, CompilePerms, and Link
implementation has the nice feature that the CompilePerms step does
not require access to the source code being compiled. This makes it
very, very much easier to deploy additional CompilePerms workers as
they don't need to check out source code etc. I like the plan for
being able to perform some linking in parallel but I wouldn't like to
lose the ability to deploy a useful CompilePerms worker that does not
need source code access. If performing Java parsing, creating AST and
generating artifacts is something that may need to be parallelized for
some builds then I'd like it if that was done in an additional step so
that people could choose whether or not to run that on multiple
machines while still being able to run the CompilePerms steps on
multiple machines.
> - parse Java and run generators
> - output: number of permutations, AST, generated artifacts
>
> current CompilePerms:
>
> - input: permutation id, AST
> - compile one permutation to JavaScript
> - output: JavaScript, generated artifacts
>
> current Link:
>
> - input: JavaScript from all permutations, generated artifacts
> - run linkers on all artifacts
> - emit EmittedArtifacts into the final output
>
> With maximal sharding, Precompile does no work except to count the number of
> permutations. Each CompilePerms instance parses Java ASTs, run generators,
> and optimizes for a specific permutation. Additionally,
> each CompilePerms instance also runs the shardable part of linkers on the
> results for that permutation. It then "thins" the artifacts (see below) and
> emits them. Finally, Link takes these results from the CompilePerms
> instances, runs the final, non-shardable part of each linker, and emits all
> the artifacts designated as emitted artifacts. In summary, the
> maximal-sharding staging looks like this:
>
> new Precompile:
>
> - output: number of permutations
>
> new CompilePerms:
>
> - input: permutation id
> - compile one permutation to JavaScript, including running generators
> - run the on-shard part of linkers
> - thin down the resulting artifacts, as defined below
> - output: JavaScript and the thinned down set of artifacts
>
> new Link:
>
> - input: JavaScript and transferable artifacts from each permutation
> - run the final part of linkers, which can add more files to the final
> output
> - output: resulting emitted artifacts
>
> === Simulated Sharding ===
>
> Simulated sharding uses the in-trunk compiler staging, but runs the linkers
> as much as possible as if they were using the maximal sharding staging. The
> sequence is the same whether the Compiler entry point is used or the
> Precompile/CompilePerms/Link trio of entry points is used. Under
> simulated sharding, the Precompile and CompilePerms steps run exactly as in
> trunk. The Link stage, however, runs the linkers in a careful order so as to
> use the sharded API for those linkers that have been updated:
>
> - For each compiled permutation, run the on-shard part of
> all shardable linkers. For each permutation, start with a fresh set of
> artifacts so that the linkers don't see each other's output.
> - Combine all of the resulting artifacts.
> - Run the non-shardable linkers on those artifacts.
> - Thin the artifacts, as defined below
> - Run the final part of all shardable linkers.
> - Emit the "output" and "extra" files.
>
> === Development mode ===
>
> Development mode does not generate any compiled permutations. Thus, it does
> not run the per-permutation part of linkers. It does, however, need to run
> the final-link part of linkers. It should do this just after the places it
> calls link() or relink().
>
> === Detailed API changes ===
>
> - Linkers that are updated to be shardable are annotated with a new
> annotation @Shardable
> - The Linker.link() method has a new boolean parameter, indicating
> whether it is running on a shard or on the final node.
> - BinaryEmittedArtifact is added as a final subclass of EmittedArtifact,
> indicating an artifact with no internal structure. The compiler can bulk
> copy such artifacts rather than using Java serialization.
> - There is a new annotation @Transferable that can be added to artifacts.
> Artifacts without this annotation are subject to thinning, described below.
>
> === Thinning of an artifact set ===
>
> After the sharded part of a linker runs, the resulting artifact set is
> thinned down, so as to minimize the amount sent back to the Link node and to
> minimize the amount of deserialization that Link has to do. Thinning an
> artifact set does two things:
>
> - All EmittedArtifacts are replaced by a BinaryEmittedArtifact, thus
> discarding any fields that the EmittedArtifact might have had.
> - All other artifacts are discarded, except ones annotated with
One very nice feature of the current system is that the CompilePerms
step does not need access to the source code being compiled. This is a
significant benefit as it makes it very easy to setup a new machine to
perform CompilePerms work. Without this each CompilePerms machine
would have to checkout the source to compile, a significant amount of
work and potentially difficult to configure. My experiments showed
most of the time being spent in the current Precompile step, but that
is because I was not generating a large number of permutations. I
imagine the use case for multi machine builds is that you're doing a
build for QA or release that needs to include all languages etc,
certainly 10s of permutations. One big machine with access to the
source to run Precompile in parallel on multiple pages and then being
able to simply make available lots of dumb CompilePerms workers that
need just GWT installed would be a big advantage here. Or, Precompile
different pages on different machines (using some out of band
distribution system) and then use a farm of dumb workers to
CompilePerms. Individual developers probably use dev mode or just
build a single language.
Making it possible to run portions of the linkers as part of
CompilePerms would certainly be a benefit and I'm all for the "reduced
serialization" plan.
On Feb 9, 4:31 pm, Lex Spoon <sp...@google.com> wrote:
> - parse Java and run generators
> - output: number of permutations, AST, generated artifacts
>
> current CompilePerms:
>
> - input: permutation id, AST
> - compile one permutation to JavaScript
> - output: JavaScript, generated artifacts
>
> current Link:
>
> - input: JavaScript from all permutations, generated artifacts
> - run linkers on all artifacts
> - emit EmittedArtifacts into the final output
>
> With maximal sharding, Precompile does no work except to count the number of
> permutations. Each CompilePerms instance parses Java ASTs, run generators,
> and optimizes for a specific permutation. Additionally,
> each CompilePerms instance also runs the shardable part of linkers on the
> results for that permutation. It then "thins" the artifacts (see below) and
> emits them. Finally, Link takes these results from the CompilePerms
> instances, runs the final, non-shardable part of each linker, and emits all
> the artifacts designated as emitted artifacts. In summary, the
> maximal-sharding staging looks like this:
>
> new Precompile:
>
> - output: number of permutations
>
> new CompilePerms:
>
> - input: permutation id
> - compile one permutation to JavaScript, including running generators
> - run the on-shard part of linkers
> - thin down the resulting artifacts, as defined below
> - output: JavaScript and the thinned down set of artifacts
>
> new Link:
>
> - input: JavaScript and transferable artifacts from each permutation
> - run the final part of linkers, which can add more files to the final
> output
> - output: resulting emitted artifacts
>
> === Simulated Sharding ===
>
> Simulated sharding uses the in-trunk compiler staging, but runs the linkers
> as much as possible as if they were using the maximal sharding staging. The
> sequence is the same whether the Compiler entry point is used or the
> Precompile/CompilePerms/Link trio of entry points is used. Under
> simulated sharding, the Precompile and CompilePerms steps run exactly as in
> trunk. The Link stage, however, runs the linkers in a careful order so as to
> use the sharded API for those linkers that have been updated:
>
> - For each compiled permutation, run the on-shard part of
> all shardable linkers. For each permutation, start with a fresh set of
> artifacts so that the linkers don't see each other's output.
> - Combine all of the resulting artifacts.
> - Run the non-shardable linkers on those artifacts.
> - Thin the artifacts, as defined below
> - Run the final part of all shardable linkers.
> - Emit the "output" and "extra" files.
>
> === Development mode ===
>
> Development mode does not generate any compiled permutations. Thus, it does
> not run the per-permutation part of linkers. It does, however, need to run
> the final-link part of linkers. It should do this just after the places it
> calls link() or relink().
>
> === Detailed API changes ===
>
> - Linkers that are updated to be shardable are annotated with a new
> annotation @Shardable
> - The Linker.link() method has a new boolean parameter, indicating
> whether it is running on a shard or on the final node.
> - BinaryEmittedArtifact is added as a final subclass of EmittedArtifact,
> indicating an artifact with no internal structure. The compiler can bulk
> copy such artifacts rather than using Java serialization.
> - There is a new annotation @Transferable that can be added to artifacts.
> Artifacts without this annotation are subject to thinning, described below.
>
> === Thinning of an artifact set ===
>
> After the sharded part of a linker runs, the resulting artifact set is
> thinned down, so as to minimize the amount sent back to the Link node and to
> minimize the amount of deserialization that Link has to do. Thinning an
> artifact set does two things:
>
> - All EmittedArtifacts are replaced by a BinaryEmittedArtifact, thus
> discarding any fields that the EmittedArtifact might have had.
> - All other artifacts are discarded, except ones annotated with
there's a fairly large repository based elephant in the room named maven.
On Wed, Feb 10, 2010 at 10:45 AM, Lex Spoon <sp...@google.com> wrote:Is copying source code so inconvenient that it would be worth having a slower build? I would have thought any of the following would work to move source code from one machine to another:
1. rsync2. jar + scp3. "svn up" on the slave machinesDo any of those seem practical for your situation, Alex?Overall, it's easy to provide an extra build staging as an option, but we support a number of build stagings already....What does make it difficult is that you can't have a pool of worker machines that can build any project that are asked of them without copying the sources to the worker for each request. For a large project, this can get problematic especially when you have to send the transitive dependencies.
Besides, what is gained by having the user have to arrange this copying themselves rather than the current method of sending it as part of the compile process? For example, distributed C/C++ compilers send the preprocessed source to the worker nodes, so they don't have to have the source or the same include files, we currently send the AST which is a representation of the source, etc.
Besides, what is gained by having the user have to arrange this copying themselves rather than the current method of sending it as part of the compile process? For example, distributed C/C++ compilers send the preprocessed source to the worker nodes, so they don't have to have the source or the same include files, we currently send the AST which is a representation of the source, etc.Compared to the status quo, we gain much faster builds.Compared to automatically copying, we have a fully specced out proposal. :) If we try to automatically copy dependencies, how would we we know exactly what to copy?
the usecases being described as a point of deliberation, defining dependancies, repository access, and bundling automation, are well solved items in the maven stable. how hard can it be to define a multiproject descriptor, assign "channels" of build-stage progression, and have a top-level project build coordinated by one maven instance publish artifacts to sucessive build-channels served elsewhere by daemons which trigger maven sub-builds?
If this is indeed the direction to go in (and I'm a big fan of the
goals as well), it's probably also worth making a more formal
definition for "won't step on each other's toes". As a use case, I'm
working on a PRE linker that (currently) removes CompilationResults,
alters them based on information collected from across all
permutations, and then emits new ones. Obviously this isn't ideal--its
expensive and CompilationResults were written to be (mostly)
immutable--but it's also perfectly acceptable within the current
design of the artifactSet/linker chain. The primary linker only cares
about the set of compilation results it receives, and if an earlier
linker altered them, it need never know.
It seems (and I could definitely be misinterpreting here) that in both
the simulated sharding procedure and Scott's alternate proposal, there
will be sections of primary and post linkers running before a non-
shardable pre linker. If that's true, then neither will be able to
fully honor the ordering of linkers when shardable and non-shardable
linkers are mixed. But, then again, when I started on this one I think
I could find only one other PRE linker in existence, so now would be
the time to change.
Continuing to think out loud, it seems that the way to alter my linker
is probably either to statically derive what all permutations will
need in every shard (as opposed to just having each triggered
generator emit an artifact and collecting them at the end), or keeping
that the same and creating a custom primary linker, which I was hoping
not to do as it would tend to limit adoption. If that's the largest
price to pay, though, the trade off would seem worth it.
current Precompile:
- parse Java and run generators
- output: number of permutations, AST, generated artifacts
current CompilePerms:
- input: permutation id, AST
- compile one permutation to JavaScript
- output: JavaScript, generated artifacts
current Link:
- input: JavaScript from all permutations, generated artifacts
- run linkers on all artifacts
- emit EmittedArtifacts into the final output
If this isn't what the the current flow is then what is the current
flow and how does sharded linking fit into that?
On Feb 11, 6:43 pm, Scott Blum <sco...@google.com> wrote:
In general, I'd agree, but the number of linkers in the wild appears
to be small, this may be a case of trying to preserve an API that only
5 or 10 people in the world are using.
> Maybe I'm missing some use cases, but I don't see what problems result from
> having some linkers run early and others run late. As Lex noted, all the
> linkers are largely independent of each other and mostly won't step on each
> other's toes.
In theory, you could have a non-sharded pre-linker whose job it is to
pre-filter the results before all other linkers are supposed to see
them. This could be, for example, substituting text into compiled
artifacts that a later linker might depend on, although admittedly,
this would only cause you a problem if you had written a
sharded-linker that cooperates with something a non-shared pre-linker
is supposed to do. I can't really think of any practical cases.
> - It seems unnecessary to have to annotate Artifacts to say which ones are
> transferable, because I thought we already mandated that all Artifacts have
> to be transferable.
Should all artifacts have to be transferable? The linker could be
generating temporary artifacts that run within a shard that don't need
to be sent back for the final link right?
> 2) Instead of trying to do automatic thinning, we just let the linkers
> themselves do the thinning. For example, one of the most
> serialization-expensive things we do is serialize/deserialze symbolMaps. To
> avoid this, we update SymbolMapsLinker to do most of its work during
> sharding, and update IFrameLinker (et al) to remove the CompilationResult
> during the sharded link so it never gets sent across to the final link.
It sounds to me like almost every linker will want to do thinning,
so if thinning is going to be used 100% of the time, won't requiring
everyone to reimplement thinning themselves result in potential bugs?
I thought Lex's design was essentially to make things network
efficient by doing the right thing in the common case (automatic
thinning, white-list things you want transferred). I'm not saying the
manual/opt-out approach wouldn't result in similar savings, but it
seems like Lex's design would make it harder for people to write
linkers that blow up on sharded compiles, especially when most third
parties/external contributors aren't using the shard feature yet, so
don't have much a way to detect they've done something bad.
-Ray
> On Thu, Feb 11, 2010 at 4:43 PM, Scott Blum <sco...@google.com> wrote:
>> - I dislike the whole transition period followed by having to forcibly
>> update all linkers, unless there's a really compelling reason to do so.
>
> In general, I'd agree, but the number of linkers in the wild appears
> to be small, this may be a case of trying to preserve an API that only
> 5 or 10 people in the world are using.
+1. I've written a handful of custom linkers (including one in the public gwt-firefox-extension project), but I'm used to updating them between GWT releases to work around subtle changes in the linker contract (ie: the evolution of hosted mode, various global variable changes, etc).
I'd rather have a clean linker system that changes from version to version than an awkward one with a lot of legacy interfaces.
Matt.
Where can I read a description of what -XshardPrecompile, or see the
code for it, it sounds very useful to me personally?
It's not in 2.0.0
as far as I can see. My concerns about the sharded linking proposal
came from what I understood the original flow to be from my looking at
it and from the original sharded linkin proposal.
If this is indeed the direction to go in (and I'm a big fan of the
goals as well), it's probably also worth making a more formal
definition for "won't step on each other's toes". As a use case, I'm
working on a PRE linker that (currently) removes CompilationResults,
alters them based on information collected from across all
permutations, and then emits new ones. Obviously this isn't ideal--its
expensive and CompilationResults were written to be (mostly)
immutable--but it's also perfectly acceptable within the current
design of the artifactSet/linker chain. The primary linker only cares
about the set of compilation results it receives, and if an earlier
linker altered them, it need never know.
It seems (and I could definitely be misinterpreting here) that in both
the simulated sharding procedure and Scott's alternate proposal, there
will be sections of primary and post linkers running before a non-
shardable pre linker. If that's true, then neither will be able to
fully honor the ordering of linkers when shardable and non-shardable
linkers are mixed.
Continuing to think out loud, it seems that the way to alter my linker
is probably either to statically derive what all permutations will
need in every shard (as opposed to just having each triggered
generator emit an artifact and collecting them at the end), or keeping
that the same and creating a custom primary linker, which I was hoping
not to do as it would tend to limit adoption.
I have a few comments, but first I wanted to raise the point that I'm not sure why we're having this argument about maximally sharded Precompiles at all. For one thing, it's already implemented, and optional, via "-XshardPrecompile". I can't think of any reason to muck with this, or why it would have any relevance to sharded linking. Can we just table that part for now, or is there something I'm missing?
- I'm not sure why development mode wouldn't run a sharded link first. Wouldn't it make sense if development mode works just like production compile, it just runs a single "development mode" permutation shard link before running the final link?
2) Instead of trying to do automatic thinning, we just let the linkers themselves do the thinning. For example, one of the most serialization-expensive things we do is serialize/deserialze symbolMaps. To avoid this, we update SymbolMapsLinker to do most of its work during sharding, and update IFrameLinker (et al) to remove the CompilationResult during the sharded link so it never gets sent across to the final link.
That is exactly my point -- the C++ example sends the preprocessed source to the worker nodes, so they don't have to have the dependencies or the right include path or whatever. The analogy here would be for GWT to send all of the collected source, either in its native form or as is currently done in a parsed AST form, to the worker nodes.
On Fri, Feb 12, 2010 at 9:50 AM, Alex Moffat <alex....@gmail.com> wrote:Where can I read a description of what -XshardPrecompile, or see the
code for it, it sounds very useful to me personally?-XshardPrecompile is an experiment that everyone wants to change, so it seems unlikely to be released in its current form. We can talk about it if it helps, but I would propose that we focus more on what we want to do for real.
- I'm not sure why development mode wouldn't run a sharded link first. Wouldn't it make sense if development mode works just like production compile, it just runs a single "development mode" permutation shard link before running the final link?
Sure, we can do that. Note, though, that they will be running against an empty ArtifactSet, because there aren't any compiles for them to look at. Thus, they won't typically do anything.
2) Instead of trying to do automatic thinning, we just let the linkers themselves do the thinning. For example, one of the most serialization-expensive things we do is serialize/deserialze symbolMaps. To avoid this, we update SymbolMapsLinker to do most of its work during sharding, and update IFrameLinker (et al) to remove the CompilationResult during the sharded link so it never gets sent across to the final link.
In addition to the other issues pointed out, note that this adds ordering constraints among the linkers. Any linker that deletes something must run after every linker that wants to look at it. Your example wouldn't work as is, because it would mean no POST linker can look at CompilationResults. It also wouldn't work to put the deletion in a POST linker, for the same reason. We'd have to work out a way for the deletions to happen last, after all the normal linkage activity.Suppose, continuing that idea, we add a POSTPOST order that is used only for deletion. If it's really only for deletion, then the usual link() API is overly general, because it lets linkers both add and remove artifacts during POSTPOST, which is not desired. So, we want a POSTPOST API that is only for deletion. Linkers somehow or another mark artifacts for deletion, but not anything else. At this point, though, isn't it pretty much the same as the automated thinning in the initial proposal?
On Fri, Feb 12, 2010 at 7:00 PM, Lex Spoon <sp...@google.com> wrote:On Fri, Feb 12, 2010 at 9:50 AM, Alex Moffat <alex....@gmail.com> wrote:Where can I read a description of what -XshardPrecompile, or see the
code for it, it sounds very useful to me personally?-XshardPrecompile is an experiment that everyone wants to change, so it seems unlikely to be released in its current form. We can talk about it if it helps, but I would propose that we focus more on what we want to do for real.It seemed relevant because it sounded like you propose to essentially make -XshardPrecompile the default (only?) behavior for Precompile? Or did I misread?
The reason that makes me cautious has to do with a desire for a future change to the Generator API to support things like minimal rebuild. I imagine a world where the work each Generator does could be sharded out in a way that's independent of the number of permutations.
- I'm not sure why development mode wouldn't run a sharded link first. Wouldn't it make sense if development mode works just like production compile, it just runs a single "development mode" permutation shard link before running the final link?Sure, we can do that. Note, though, that they will be running against an empty ArtifactSet, because there aren't any compiles for them to look at. Thus, they won't typically do anything.Do public resources and generated resources show up during the sharded phase?
On Tue, Feb 16, 2010 at 3:32 PM, Scott Blum <sco...@google.com> wrote:On Fri, Feb 12, 2010 at 7:00 PM, Lex Spoon <sp...@google.com> wrote:On Fri, Feb 12, 2010 at 9:50 AM, Alex Moffat <alex....@gmail.com> wrote:Where can I read a description of what -XshardPrecompile, or see the
code for it, it sounds very useful to me personally?-XshardPrecompile is an experiment that everyone wants to change, so it seems unlikely to be released in its current form. We can talk about it if it helps, but I would propose that we focus more on what we want to do for real.It seemed relevant because it sounded like you propose to essentially make -XshardPrecompile the default (only?) behavior for Precompile? Or did I misread?No, that's the idea.
The reason that makes me cautious has to do with a desire for a future change to the Generator API to support things like minimal rebuild. I imagine a world where the work each Generator does could be sharded out in a way that's independent of the number of permutations.Are you saying that you want to not have to shard, with future developments? I don't think that should be a problem with this patch. As a case in point, the Compiler entry point *could* shard out generating and linking, but it chooses not to. We have the flexibility to play around with these choices over time.
Everyone is happy, I think, with having dev mode run a single on-shard linking step. So, these are just details. FWIW, here is how it is in the patch:1. Resources are available via ResourceOracle.2. Public artifacts are be there. They are identical on all permutations, so they aren't added to the artifact set until the final link step.3. Generated artifacts are there for compilation, but not for development mode. With development mode, all linking is done before the generators run, and generators run on demand.
----------- you write (gmail just messed up my reply quotes): ----Now that I am thinking along those lines, it almost begs the question. If we are willing to break the world, is this the best possible way to model new link process? In other words, it seems worth re-examining the design without regard to the existing API and asking ourselves if it's the thing we'd have designed from scratch. Maybe you guys all already did that and I'm the only one late to the party.For example, if we're going from scratch, then we could avoid the transition entirely and just mandate what the new rules are. We wouldn't need a @Shardable annotation since all linkers would need to be sharding aware. We might rather have two separate methods for sharded vs. non-sharded link than a boolean parameter. We might revisit the whole PRE, PRIMARY, POST thing with regards to sharding and decide the right answer is SHARD, PRE, PRIMARY, POST. Or something. I don't know what the right answers are. All I'm saying is, breaking things is awesome when you're doing something revolutionary and the end result is awesome. I just want to be sure, if we're going to break things, that we believe we'll end up somewhere revolutionary and awesome as opposed to evolutionary and incremental, but less than awesome.--------------------------------------------------------------------------------I initially proposed simply breaking the world. However, at your encouragement, this patch has developed to be backwards compatible. As things stand, this patch both gets a large improvement and is evolutionary.
On those specific changes:1. @Shardable can certainly be dropped after a deprecation period. Is there any urgency to drop it immediately?2. Two separate methods versus one with a boolean looks fine to me. It's changed back and forth as the patch developed.
3.PRE/PRIMARY/POST still appear to be useful. All linkers care whether they are primary or not, because there is one primary linker and it must deal with generating a selection script. Additionally, a few linkers care whether they go before or after the primary linker.4. SHARD as a separate linker order is very tempting but turns out to have some problems. First, many linkers have both an on-shard and on-final part, and if SHARD was a separate order then those linkers would have to be subdivided into two linkers. Instead of IframeLinker, we'd have to have IframeShardLinker and IframeFinalLinker. Second, the SHARD part also has PRE/PRIMARY/POST, so you really have six linker orders, not four. It's tidier to represent the six as two times three.