Editing vendored crates

Henri Sivonen

unread,

Feb 26, 2017, 12:51:58 PM2/26/17

to dev-platform

I tried to add some panics to a vendored to create (rust-encoding) to
see if the code in question runs. However, I didn't get to the running
part, because the edited code failed to build.

It turns out that each vendored crate has a .cargo-checksum.json file
that contains hashes of all the files in the crate, and Cargo refuses
to build the crate if the hashes don't match or the
.cargo-checksum.json file is removed.

This seems hostile not only to experimental local edits as in my case
but also to use cases such as uplifting fixes to branches and shipping
modified crates if the upstream is slow to accept patches or disagrees
with the patches.

As far as I can tell, there doesn't seem to be a way to ask cargo to
regenerate the .cargo-checksum.json after edits. Also, adding a
[replace] section to Cargo.toml of libgkrust pointing to the edited
crate under third-party/rust or adding paths = [ "third-party/rust" ]
to .cargo/config.in don't make Cargo happy.

What's the right way to build with edited crates under
third-party/rust? (I think we should have an easy way to do so.)

--
Henri Sivonen
hsiv...@hsivonen.fi
https://hsivonen.fi/

Manish Goregaokar

unread,

Feb 26, 2017, 5:03:06 PM2/26/17

to Henri Sivonen, dev-platform

> [replace] section to Cargo.toml of libgkrust pointing to the edited
> crate under third-party/rust or adding paths = [ "third-party/rust" ]
> to .cargo/config.in don't make Cargo happy.

[replace] is the way to do it, but it has to point to a crate outside of
the vendoring directory.

The vendor directory is like ~/.cargo; it's not supposed to be edited by
anything other than the tool.

-Manish Goregaokar

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Bobby Holley

unread,

Feb 26, 2017, 5:06:42 PM2/26/17

to Manish Goregaokar, Henri Sivonen, dev-platform

On Sun, Feb 26, 2017 at 2:02 PM, Manish Goregaokar <manis...@gmail.com>
wrote:

> > [replace] section to Cargo.toml of libgkrust pointing to the edited
> > crate under third-party/rust or adding paths = [ "third-party/rust" ]
> > to .cargo/config.in don't make Cargo happy.
>
> [replace] is the way to do it, but it has to point to a crate outside of
> the vendoring directory.
>

That is not true - you can use [replace] to edit the vendored code:
https://bugzilla.mozilla.org/show_bug.cgi?id=1323557#c3

This is basically the right way to do it, rather than editing the
checksums. [replace] tells the Cargo machinery that the vendored code is
its own snowflake, rather than just a cache of what's on crates.io. Doing
otherwise breaks cargo invariants.

bholley

Bobby Holley

unread,

Feb 26, 2017, 5:11:05 PM2/26/17

to Henri Sivonen, dev-platform

On Sun, Feb 26, 2017 at 9:51 AM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:

> I tried to add some panics to a vendored to create (rust-encoding) to
> see if the code in question runs. However, I didn't get to the running
> part, because the edited code failed to build.
>
> It turns out that each vendored crate has a .cargo-checksum.json file
> that contains hashes of all the files in the crate, and Cargo refuses
> to build the crate if the hashes don't match or the
> .cargo-checksum.json file is removed.
>
> This seems hostile not only to experimental local edits as in my case
> but also to use cases such as uplifting fixes to branches and shipping
> modified crates if the upstream is slow to accept patches or disagrees
> with the patches.
>
> As far as I can tell, there doesn't seem to be a way to ask cargo to
> regenerate the .cargo-checksum.json after edits. Also, adding a
> [replace] section to Cargo.toml of libgkrust pointing to the edited
> crate under third-party/rust or adding paths = [ "third-party/rust" ]
> to .cargo/config.in don't make Cargo happy.
>

Can you elaborate on what goes wrong here? This worked for ted's experiment
mentioned upthread, and for me on at least one occasion in
https://hg.mozilla.org/try/rev/18dc070e0308 (permalink:
https://pastebin.mozilla.org/8980438 )

You'll need to |cargo update| after adding the replace in Cargo.toml to
update Cargo.lock.

Xidorn Quan

unread,

Feb 26, 2017, 6:24:23 PM2/26/17

to dev-pl...@lists.mozilla.org

This workflow should really be automated via a mach command. Filed bug
1342815 [1] for that.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1342815

- Xidorn

Henri Sivonen

unread,

Feb 27, 2017, 7:03:36 AM2/27/17

to Bobby Holley, dev-platform

On Mon, Feb 27, 2017 at 12:10 AM, Bobby Holley <bobby...@gmail.com> wrote:
> Can you elaborate on what goes wrong here? This worked for ted's experiment
> mentioned upthread, and for me on at least one occasion in
> https://hg.mozilla.org/try/rev/18dc070e0308 (permalink:
> https://pastebin.mozilla.org/8980438 )
>
> You'll need to |cargo update| after adding the replace in Cargo.toml to
> update Cargo.lock.

Thanks. I failed to do that yesterday.

When I do that, Cargo complains about not finding the
encoding-index-foo crates in subdirectories of encoding. Replacement
in gkrust's Cargo.toml doesn't work. So then I go and edit encoding's
Cargo.toml to point it to the right place. Then Cargo complains about
those crates not finding encoding_index_tests. So then I edit their
Cargo.tomls to point to the test crate.

Then cargo update passes.

But then I do ./mach build and Cargo complains about the checksums not
matching because I edited the Cargo.tomls under the crates that I
thought I was changing from "locally-stored crates.io crate" status to
"local replacement" status.

The I remove the checksum file.

The cargo complains about not finding the checksum file.

I find this level of difficulty (self-inflicted quasi-Tivoization
practically) an unreasonable impediment to practicing trivial Software
Freedom with respect to the vendored crates.

> This is basically the right way to do it, rather than editing the checksums. [replace] tells the Cargo machinery that the vendored code is its own snowflake, rather than just a cache of what's on crates.io. Doing otherwise breaks cargo invariants.

What are the invariants? Why do we need the invariants, when can do
without impediments like these for e.g. the vendored copy of ICU?

What badness would arise from patching Cargo to ignore the
.cargo-checksum.json files?

On Mon, Feb 27, 2017 at 1:23 AM, Xidorn Quan <m...@upsuper.org> wrote:
> This workflow should really be automated via a mach command. Filed bug
> 1342815 [1] for that.
>
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1342815

Thank you. I think it should be possible to stick a diagnostic
println!() or panic!() in the vendored code with zero or minimal
ceremony.

Ralph Giles

unread,

Feb 27, 2017, 12:04:35 PM2/27/17

to Henri Sivonen, dev-platform, Bobby Holley

On Mon, Feb 27, 2017 at 4:03 AM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:

> I find this level of difficulty (self-inflicted quasi-Tivoization
> practically) an unreasonable impediment to practicing trivial Software
> Freedom with respect to the vendored crates.

I agree we need to fix the ergonomics here, but I don't think you
should be so hard on cargo. The hash checking is designed to make
builds more reproducible, so that unless we make an explicit diversion
we know we're building with the same source as every other use of that
same package version. This has benefits for security, debugging, and
change control.

-r

Henri Sivonen

unread,

Feb 27, 2017, 12:32:34 PM2/27/17

to Ralph Giles, dev-platform, Bobby Holley

On Mon, Feb 27, 2017 at 7:04 PM, Ralph Giles <gi...@mozilla.com> wrote:
> On Mon, Feb 27, 2017 at 4:03 AM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:
>
>> I find this level of difficulty (self-inflicted quasi-Tivoization
>> practically) an unreasonable impediment to practicing trivial Software
>> Freedom with respect to the vendored crates.
>
> I agree we need to fix the ergonomics here, but I don't think you
> should be so hard on cargo.

Sorry about the tone. I'm rather frustrated at how hard it is to do
something that should be absolutely trivial (adding a local diagnostic
panic!()/println!()).

> The hash checking is designed to make
> builds more reproducible, so that unless we make an explicit diversion
> we know we're building with the same source as every other use of that
> same package version. This has benefits for security, debugging, and
> change control.

We don't seem to need such change control beyond hg logs for e.g. the
in-tree ICU or Skia, though.

Ted Mielczarek

unread,

Feb 27, 2017, 12:47:23 PM2/27/17

to dev-pl...@lists.mozilla.org

As someone who has maintained a vendored upstream C++ project (Breakpad)
for a decade, I can say that this causes us headaches *all the time*. We
are constantly landing local changes to vendored projects and not
keeping track of them and then either losing patches or dealing with
conflicts or breakage when we update from upstream.

I'm sorry this is causing you pain, and we should figure out a way to
make it less painful, but note that the intention is that things in
`third_party/rust` should be actual third-party code, not things under
active development by Mozilla. We don't currently have a great middle
ground between "mozilla-central is the repository of record for this
crate" and "this crate is vendored from crates.io". We're finding our
way there with Servo, so we might have a better story for things like
encoding-rs when we get that working well. I understand that there are
lots of benefits to developing a crate in a standalone GitHub
repository, and you're certainly not the only one who wants to do that,
but it does make the integration story harder. It's very hard to support
code that has its repository of record somewhere other than
mozilla-central, but also have a simple workflow for making changes to
that code in mozilla-central along with other code.

-Ted

Henri Sivonen

unread,

Feb 27, 2017, 12:56:59 PM2/27/17

to Ted Mielczarek, dev-platform

On Mon, Feb 27, 2017 at 7:47 PM, Ted Mielczarek <t...@mielczarek.org> wrote:
> On Mon, Feb 27, 2017, at 12:32 PM, Henri Sivonen wrote:
>> We don't seem to need such change control beyond hg logs for e.g. the
>> in-tree ICU or Skia, though.
>
> As someone who has maintained a vendored upstream C++ project (Breakpad)
> for a decade, I can say that this causes us headaches *all the time*.

OK.

> I'm sorry this is causing you pain, and we should figure out a way to
> make it less painful, but note that the intention is that things in
> `third_party/rust` should be actual third-party code, not things under
> active development by Mozilla. We don't currently have a great middle
> ground between "mozilla-central is the repository of record for this
> crate" and "this crate is vendored from crates.io". We're finding our
> way there with Servo, so we might have a better story for things like
> encoding-rs when we get that working well.

Note that my problem at hand isn't with encoding_rs but with the
currently-vendored rust-encoding. That is, I indeed would like to add
a diagnostic panic!()/println!() to genuinely third-party code--not to
code I've written.

That is, I'd like to experimentally understand what, if anything,
rust-encoding is currently used for. (My best current hypothesis from
reading things on SearchFox is that it's used in order to be able to
compile one Option<EncodingRef> that's always None in Gecko.)

Bobby Holley

unread,

Feb 27, 2017, 1:00:48 PM2/27/17

to Henri Sivonen, dev-platform, Ted Mielczarek

On Mon, Feb 27, 2017 at 9:56 AM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:

> On Mon, Feb 27, 2017 at 7:47 PM, Ted Mielczarek <t...@mielczarek.org>
> wrote:
> > On Mon, Feb 27, 2017, at 12:32 PM, Henri Sivonen wrote:
> >> We don't seem to need such change control beyond hg logs for e.g. the
> >> in-tree ICU or Skia, though.
> >
> > As someone who has maintained a vendored upstream C++ project (Breakpad)
> > for a decade, I can say that this causes us headaches *all the time*.
>
> OK.
>
> > I'm sorry this is causing you pain, and we should figure out a way to
> > make it less painful, but note that the intention is that things in
> > `third_party/rust` should be actual third-party code, not things under
> > active development by Mozilla. We don't currently have a great middle
> > ground between "mozilla-central is the repository of record for this
> > crate" and "this crate is vendored from crates.io". We're finding our
> > way there with Servo, so we might have a better story for things like
> > encoding-rs when we get that working well.
>
> Note that my problem at hand isn't with encoding_rs but with the
> currently-vendored rust-encoding. That is, I indeed would like to add
> a diagnostic panic!()/println!() to genuinely third-party code--not to
> code I've written.
>

To be clear, I totally agree that local debugging-oriented edits to
vendored crates should be painless. A 1-line mach command to add a cargo
[replace] and update Cargo.lock (with a helpful message if somebody forgets
to do that) seems like the right way to do that.

>
> That is, I'd like to experimentally understand what, if anything,
> rust-encoding is currently used for. (My best current hypothesis from
> reading things on SearchFox is that it's used in order to be able to
> compile one Option<EncodingRef> that's always None in Gecko.)
>

FWIW, |cargo tree| is a very helpful tool to figure out who's pulling in a
crate.

>
> --
> Henri Sivonen
> hsiv...@hsivonen.fi
> https://hsivonen.fi/

Henri Sivonen

unread,

Feb 27, 2017, 1:31:04 PM2/27/17

to Bobby Holley, dev-platform

On Mon, Feb 27, 2017 at 8:00 PM, Bobby Holley <bobby...@gmail.com> wrote:
> FWIW, |cargo tree| is a very helpful tool to figure out who's pulling in a
> crate.

Thanks, but what I'm trying to figure out isn't whose pulling it in
(the style crate is) but whether it is actually used beyond an
always-None Option<EncodingRef> in a way that would result in the data
tables actually getting included in the build as oppose to (hopefully)
getting discarded by LTO.

(Motivation: If we are taking on the data tables, I want to argue that
we should include encoding_rs instead even before the "replace uconv
with encoding_rs" step is done.)

Simon Sapin

unread,

Feb 27, 2017, 1:48:12 PM2/27/17

to dev-pl...@lists.mozilla.org

As an aside, I have a plan to remove rust-encoding entirely from Servo
(including Stylo) and use encoding_rs instead. But doing this the way I
want to has a number of prerequisites, and I’d prefer to switch
everything at once to avoid having both in the build. At the moment I’m
prioritizing other Stylo-related work, but I’m confident it’ll happen
before we start shipping Stylo.

--
Simon Sapin

Henri Sivonen

unread,

Feb 27, 2017, 1:54:53 PM2/27/17

to Simon Sapin, dev-platform

On Mon, Feb 27, 2017 at 8:47 PM, Simon Sapin <simon...@exyr.org> wrote:
> As an aside, I have a plan to remove rust-encoding entirely from Servo
> (including Stylo) and use encoding_rs instead. But doing this the way I want
> to has a number of prerequisites, and I’d prefer to switch everything at
> once to avoid having both in the build. At the moment I’m prioritizing other
> Stylo-related work, but I’m confident it’ll happen before we start shipping
> Stylo.

Nice! Works for me. :-)

Simon Sapin

unread,

Mar 16, 2017, 3:35:23 PM3/16/17

to dev-pl...@lists.mozilla.org

On 27/02/17 19:47, Simon Sapin wrote:
> On 27/02/17 19:30, Henri Sivonen wrote:
>> On Mon, Feb 27, 2017 at 8:00 PM, Bobby Holley <bobby...@gmail.com> wrote:
>>> FWIW, |cargo tree| is a very helpful tool to figure out who's pulling in a
>>> crate.
>>
>> Thanks, but what I'm trying to figure out isn't whose pulling it in
>> (the style crate is) but whether it is actually used beyond an
>> always-None Option<EncodingRef> in a way that would result in the data
>> tables actually getting included in the build as oppose to (hopefully)
>> getting discarded by LTO.
>>
>> (Motivation: If we are taking on the data tables, I want to argue that
>> we should include encoding_rs instead even before the "replace uconv
>> with encoding_rs" step is done.)
>

> As an aside, I have a plan to remove rust-encoding entirely from Servo
> (including Stylo) and use encoding_rs instead. But doing this the way I
> want to has a number of prerequisites, and I’d prefer to switch
> everything at once to avoid having both in the build. At the moment I’m
> prioritizing other Stylo-related work, but I’m confident it’ll happen
> before we start shipping Stylo.

Henri, you’re correct that the parts of the style crate using
rust-encoding are never used in Stylo. Gecko always passes UTF-8 input
(after decoding on the C++ side I assume) to Stylo for parsing
stylesheets, which we unsafely view as &str with
std::str::from_utf8_unchecked.

I’ve submitted https://github.com/servo/servo/pull/15993 to remove the
rust-encoding dependency from stylo. This doesn’t need to wait on
switching the rest of Servo.

--
Simon Sapin

Bobby Holley

unread,

Apr 29, 2017, 11:37:50 PM4/29/17

to Henri Sivonen, dev-platform

On Sun, Feb 26, 2017 at 2:10 PM, Bobby Holley <bobby...@gmail.com> wrote:

> On Sun, Feb 26, 2017 at 9:51 AM, Henri Sivonen <hsiv...@hsivonen.fi>
> wrote:
>
>> I tried to add some panics to a vendored to create (rust-encoding) to
>> see if the code in question runs. However, I didn't get to the running
>> part, because the edited code failed to build.
>>
>> It turns out that each vendored crate has a .cargo-checksum.json file
>> that contains hashes of all the files in the crate, and Cargo refuses
>> to build the crate if the hashes don't match or the
>> .cargo-checksum.json file is removed.
>>
>> This seems hostile not only to experimental local edits as in my case
>> but also to use cases such as uplifting fixes to branches and shipping
>> modified crates if the upstream is slow to accept patches or disagrees
>> with the patches.
>>
>> As far as I can tell, there doesn't seem to be a way to ask cargo to
>> regenerate the .cargo-checksum.json after edits. Also, adding a
>> [replace] section to Cargo.toml of libgkrust pointing to the edited
>> crate under third-party/rust or adding paths = [ "third-party/rust" ]
>> to .cargo/config.in don't make Cargo happy.
>>
>

> Can you elaborate on what goes wrong here? This worked for ted's
> experiment mentioned upthread, and for me on at least one occasion in
> https://hg.mozilla.org/try/rev/18dc070e0308 (permalink:
> https://pastebin.mozilla.org/8980438 )
>
>

And because pastebin seems to lie about stuff being permanent:
https://gist.github.com/bholley/85dc2277aaa732bce04cc9e672915595

> You'll need to |cargo update| after adding the replace in Cargo.toml to
> update Cargo.lock.
>
>
>>

>> What's the right way to build with edited crates under
>> third-party/rust? (I think we should have an easy way to do so.)
>>

>> --
>> Henri Sivonen
>> hsiv...@hsivonen.fi
>> https://hsivonen.fi/