Searchfox Update: Changes to annotate / "blame" behavior around revision skipping

59 views
Skip to first unread message

Andrew Sutherland

unread,
Mar 16, 2022, 3:58:41 PM3/16/22
to dev-pl...@mozilla.org

tl;dr: The Searchfox blame UI will no longer try to skip over revisions in `.git-blame-ignore-revs`.  The downside is that this may result in you needing to click through reformatting patches and do additional manual ctrl-f-ing to find where the line you were interested in went before repeating your manual blame algorithm steps.  The upside is that there is no longer any risk of the heuristics getting tricked and referencing an inaccurate revision and thereby tricking you.  This change is equivalent to always showing you what was in the "ignored changesets" collapsed details section of the blame popup, but faster!  Want to see a cool thing to tempt you to read more?  Look at https://clicky.visophyte.org/files/microannotate/nsWebBrowserPersist.cpp.html and then read more!

Details

Revision history and the "annotate" / "blame" UIs for revision control are tricky because they're built on a sequential, line-centric data-model where moving a function above another function in a file results in a destructive representational decision to treat one function as continuing through history and the other function as removed and then re-added as new code.  Reformatting that maintains the overall sequence of tokens but changes how they are distributed across multiple lines also looks like removal of all of the old code and the addition of new code.  Tools frequently perform heuristic-based post-passes to help identify intra-line changes which are reflected in diff UIs, as well as (entire) lines of code that are copied/moved in a revision (ex: Phabricator does this).

In Dec, 2018 searchfox gained "hyperblame" functionality[1] that attempted to skip over the revisions listed in https://searchfox.org/mozilla-central/source/.git-blame-ignore-revs that lists a number of code re-formattings / refactorings that don't intentionally have semantic content.  As noted above, this is a tricky problem space and, especially when built on a line-centric model, additional heuristics and/or an alternate approach to the data-model are necessary.  https://bugzilla.mozilla.org/show_bug.cgi?id=1517978 has tracked potential enhancements to this problem space, but the reality is that searchfox is not directly staffed[2] and although significant progress has been made on improving the blame infrastruture and UX[3], no one has had the time to attempt to further iterate on this significant undertaking.

Unfortunately, we have seen that the functionality as exposed by default has resulted in a number of bugs being filed that are evidence of people being (understandably!) misled by the current behavior; these are tracked as blocked by the aforementioned https://bugzilla.mozilla.org/show_bug.cgi?id=1517978 enhancement bug.  It's always been possible to bypass the result of the heuristic by expanding the "N ignored changeset(s)" `<details>` in the blame strip popup, but this has been far from obvious or intuitive.  Also, having an option that might be magically correct but also might be wrong and to figure it out you have to perform the extra steps you would have performed if the feature didn't exist may be counter-productive.  The heuristics also were currently applied at runtime and so had an ongoing dynamic cost that could impact the latency of the dynamic serving of files.  (All HEAD revisions are statically pre-generated, but any other revision results in dynamic HTML generation for the heuristics, although searchfox otherwise precomputes blame[5])  There are no known active plans to further enhance blame-related functionality at this time, but there are possibilities[5] which are cool[5].

As such, blame-skipping functionality has been removed in https://github.com/mozsearch/mozsearch/pull/491 and this change has already taken effect for mozilla-central and will roll out to other trees over the next day.  If you are interested in helping move searchfox's blame behavior forward or are an avid user of searchfox's blame behavior and would like to see enhancements, please talk to your manager[2] about allocating some time to contribute to searchfox or to help surface the demand.

Andrew (:asuth)


1: https://bugzilla.mozilla.org/show_bug.cgi?id=1507923 via https://github.com/mozsearch/mozsearch/pull/170

2: Quoting my other searchfox message from earlier this week: "Searchfox is a contributor driven project.  If there are features you would like to see or rough edges that slow down your work, please talk to your manager and discuss the possibility of finding time to potentially contribute to the project yourself if you are interested, and/or help your manager surface the potential efficiency improvements for you/the organization so that these can be quantified and potentially folded into OKRs for teams with the relevant expertise and interest."  We've had a short discussion on this in #searchfox today on chat.mozilla.org, but again, I'd emphasize the importance of making sure your manager knows what things that could make your development experience more pleasant and more productive as the most important discussion to have!

3: Noting that (nearly all if not) all blame improvements were made by :kats[4], we have had: https://bugzilla.mozilla.org/show_bug.cgi?id=1627532, https://bugzilla.mozilla.org/show_bug.cgi?id=1634770, https://bugzilla.mozilla.org/show_bug.cgi?id=1653245, https://bugzilla.mozilla.org/show_bug.cgi?id=1674601, https://bugzilla.mozilla.org/show_bug.cgi?id=1654946, https://bugzilla.mozilla.org/show_bug.cgi?id=1683563, https://bugzilla.mozilla.org/show_bug.cgi?id=1716914, and more!

4: Thank you, :kats!  (And While :kats is no longer at MoCo, he has continued to make significant enhancements to searchfox and blame![4])

5: https://github.com/mozsearch/mozsearch/blob/master/docs/blame.md are the high level searchfox docs on the blame pre-computation/caching mechanism.  Note that this mechanism is extensible and could be extended to support applying additional levels of heuristic inference on line movement, as proposed at https://bugzilla.mozilla.org/show_bug.cgi?id=1517978#c1 (although the proposal does call for storing the more expensive heuristic computations in a more long-lived cache that would be reused even when otherwise regenerating the m-c blame cache, such as when we tuned copy/move detection heuristic values).  Note, however, that the most interesting option at this time might be :marco's https://github.com/mozilla/microannotate/ approach which uses a transform to make the diff algorithm token-centric instead of line-centric.  An example run of this (on old data from the transformed https://github.com/marco-c/gecko-dev-wordified/tree/master) is https://clicky.visophyte.org/files/microannotate/nsWebBrowserPersist.cpp.html where you can see it used on Gecko code.  A slightly more fleshed out version of this mechanism (hover popups!) is https://github.com/cregit/cregit being applied to the linux kernel at the example of https://cregit.linuxsources.org/code/4.19/net/ethernet/eth.c.html.

Reply all
Reply to author
Forward
0 new messages