| Mercurial 4.6 Sprint Report | Gregory Szorc | 14/03/18 15:03 | The semiannual Mercurial developer meetup (or "sprint") was March 2-4 in
Boston. Mozilla was represented by me and Connor Sheehan. As usual, the sprint was 3 very long days of discussion, planning, and hacking. One of the larger topics of the sprint was support for partial clones. Partial clones refers to a client-side clone that has a subset of the files and/or a subset of history. Fully distributed version control systems like Mercurial and Git transfer all the data all the time, which obviously doesn't scale. Partial clone is the solution to that scaling problem. We tend to use the terms "narrow clone" for a subset of files and "shallow clone" for a subset of history. Google has upstreamed their "narrow" extension, which allows Mercurial to support narrow clones. Facebook has a "remotefilelog" extension that implements support for shallow clone. Non-experimental support for shallow clones will require significant work to refactor client-side storage. So that is a few releases out. The immediate focus to support partial clones is to get the server and wire protocol pieces in place and to provide experimental-level support for partial clone so it can be used by limited-use clients (like automated systems). The thinking here is that if the server pieces get deployed and are reasonably backwards compatible, then this enables multiple versions of clients in the near future to support partial clones. For example (and this is the plan of record), Mozilla could roll out partial clone support on hg.mozilla.org and enable the experimental client bits in Firefox CI, which have a tightly controlled client environment. This would allow us to start using partial clone for critical CI efficiency wins a few releases before the client-side bits are stabilized and non-experimental. Supporting partial clones will be an overhaul of the wire protocol, which is long overdue for the project. The new wire protocol is being designed with modern practices in mind. For example, CPU-bound activities will be able to scale out to multiple CPU cores. The new HTTP protocol will also be designed such that repository hosting can easily leverage a CDN for content distribution (it can be difficult to scale version control servers and the use of redirects to CDN or scalable blob store services will make running services at scale much easier). After a long discussion, it was concluded that the minimum viable product for partial clones will be narrow clones. The bulk of the complexity for partial clone is related to shallow clones and this problem will be deferred to later releases. Google has authored an `hg fix` command which runs code formaters (such as clang-format). Basically, you can check in a file that defines which code formaters run for which files and `hg fix` automatically runs formaters. I believe they will be upstreaming it into core Mercurial. More details at https://www.mercurial-scm.org/wiki/AutomaticFormattingPlan. Speaking of Google, their project to adopt Mercurial for their massive, internal monorepo is picking up steam. A percentage of people at Google are now using the `hg` client for interacting with their monorepo. (I can't share more detailed numbers, sorry.) Facebook continues to do crazy and interesting things. They rewrote "dirstate" in Rust and this yielded significant performance improvements. This isn't yet upstreamed though. They are still working on Mononoke - a Mercurial server implemented in Rust. Still under heavy development. They are pretty obsessed with performance everywhere. They want to make heavy use of progress bars on all commands and operations that could take unbounded time so they have a better grasp on what operations need perf attention. During the sprint, a Facebook engineer imported Git's "xdiff" diffing library into Mercurial. Since the sprint, he has cleaned up the code dramatically and made changes to increase performance by up to 10x. He is attempting to upstream some of this work back to Git. There was a long discussion about obsolescence, hiddenness, and a path forward in core. This is a technically complicated topic. The short version is there appears to be a plan for enabling hiding changesets in core by default. This will make history rewriting operations significantly faster. The sentiment is Mercurial should focus on shipping that, then we can worry about adding more changeset evolution / evolve features in core. The oxidation (Rust in Mercurial) effort is underway. There is a Rust version of `hg` in the core repository and it passes all but ~5 tests. This is basically a Rust program that starts a Python interpreter. That still needs to get shipped, which will require significant packaging work. The project wants to start implementing performance critical and low-level components in Rust (as opposed to C). We generally know what we want here. We're still waiting for the first domino to fall. We agreed that we want proper support for sub commands. e.g. `hg command subcommand <args>`. First consumer may be `hg show`, which we agreed should move from an extension to a core command. There was a discussion on "named commits" - allowing people to give human readable names to individual commits. This seems to be something that MQ users really love. There is already a mechanism in core to associate extra names with changesets. So this was mostly a discussion about the UI for defining names and where to store the name. I think we agreed that `hg commit --name X` would store a name in the changeset extras field and we'd cache names to make lookup faster. The Python 3 port is proceeding at a healthy pace. Just recently, the effort reached a milestone with over 50% of tests now passing on Python 3! We're optimistic that we'll be able to ship a beta quality release of Mercurial that supports Python 3 by the end of 2018. There was talk of random bigger projects/features that are "good ideas" and need someone to work on them: * `hg shellprompt` - a command that spits out shell script that you can eval in your shell init so you can get better shell integration * --dry flags for every command that modifies things * curses interface for resolving merge conflicts * curses interface for running revsets * side-by-side diff support * curses interface for annotate When you get a bunch of VCS people together in a room, we tend to end up talking about things related to VCS, like code review. There was an interesting discussion on commit authoring and review workflows. It was interesting to see people from Google, Facebook, and other companies talk about many of the same "review workflow" issues that have come up at Mozilla. e.g. should commits be squashed before review, should you use fix-up commits or amend commits, etc. People from large companies tended to agree that fix-up commits and having reviewers see the intermediate, throw-away commits did not scale. There was talk of https://www.mercurial-scm.org/wiki/GenericTemplatingPlan and steps needed to complete that work. <https://www.mercurial-scm.org/wiki/GenericTemplatingPlan> There's interest in adding a "bug report" extension/command that can be used to report Mercurial bugs in a more turnkey manner. This devolved into a data sensitivity and privacy discussion. There was talk of establishing a formal "stack" primitive, both as an internal and user-facing concept for expressing "the current commits I'm working on." Various parts of the internal code already expose a "stack" and "hg show stack" exposes a stack to the user. The end goal would be for various commands to take the current stack into account when deciding how to behave. There was a discussion on `hg push --force` and how the UX is bad because it removes all of the safety stops. Conclusion: add `hg push --allow <thing>` to provide granular overrides for each thing you want to override. e.g. --allow new-head, --allow create-branch, etc. There were several more discussions and side-conversations. The full notes are available at https://public.etherpad-mozilla.org/p/sprint-hg4.6 |