[Proposal] Add http mirror to accelerate repos downloading

67 views
Skip to first unread message

yancl

unread,
Oct 9, 2020, 11:10:31 AM10/9/20
to bazel-dev
Hi,
I've drafted a design proposal for accelerating downloads of http archive using mirror server.

Please see the design doc, and feel free to comment :).
Thanks!

Tony Aiuto

unread,
Oct 9, 2020, 3:36:50 PM10/9/20
to yancl, bazel-externaldeps, bazel-dev

--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/34be8fb9-210e-44f9-a7c1-3f5bf44f8faan%40googlegroups.com.

James Sharpe

unread,
Oct 9, 2020, 3:49:36 PM10/9/20
to Tony Aiuto, yancl, bazel-externaldeps, bazel-dev
A few thoughts:

* The url mapping should explicitly drop any authentication i.e. username / password in the url as this could be sensitive information that shouldn't be known by the mirror.
* How do you propose handling query parameters in the url? e.g. https://example.com/download?fileid=1234 This doesn't map onto your current proposed url mapping for a simple http based cache server.
* I don't know enough about the topic but in the case the hostname is an internationalized domain name (IDN) then I suspect that you need to convert it to a punycode representation in the path part of the url.

Austin Schuh

unread,
Oct 9, 2020, 3:55:28 PM10/9/20
to James Sharpe, Tony Aiuto, yancl, bazel-externaldeps, bazel-dev
Here's a related discussion. The goal of that one being to rehost
without fallback for reproducibility concerns.

https://github.com/bazelbuild/bazel/issues/6342

Austin
> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CAAF04kODTBzmb%2BMuUbqRu-a4BS_VUS9BpaLp9ojKN%2B8kj%2BiKjg%40mail.gmail.com.

yancl

unread,
Oct 11, 2020, 12:42:18 AM10/11/20
to bazel-dev
Thanks, Austin :)

yancl

unread,
Oct 11, 2020, 12:42:38 AM10/11/20
to bazel-dev
Thanks for pointing that out :)

yancl

unread,
Oct 11, 2020, 12:42:38 AM10/11/20
to bazel-dev
在2020年10月10日星期六 UTC+8 上午3:49:36<James Sharpe> 写道:
A few thoughts:

* The url mapping should explicitly drop any authentication i.e. username / password in the url as this could be sensitive information that shouldn't be known by the mirror.
 Yes, the mapping should drop sensitive information to the mirror server.
 
* How do you propose handling query parameters in the url? e.g. https://example.com/download?fileid=1234 This doesn't map onto your current proposed url mapping for a simple http based cache server.
 So maybe it will be more complete and clear by changing the mirror rule from "http(s)://host[:port]/path --> {http_mirror}/host[:port]/path" to 
 "http(s)://[userinfo@]host[:port]/path[?query][#fragment] --> {http_mirror}/host[:port]/path[?query][#fragment]"
 
* I don't know enough about the topic but in the case the hostname is an internationalized domain name (IDN) then I suspect that you need to convert it to a punycode representation in the path part of the url.

 I am not sure whether punycode is needed, maybe just url encode the original url is enough?

Simon Mavi Stewart

unread,
Oct 12, 2020, 10:32:11 AM10/12/20
to bazel-dev
Hi,

This looks interesting. I think it may need to be a little more sophisticated. For example, how would one block specific sites at the bazel level, rather than through some corporate network policy (useful for the case where a developer's workstation has Net access, but CI machines don't) I suspect that a simple additional mirror is a partial solution to the problem, but as more use cases come up, more work will be needed.

For reference, I recently put forward a PR that performs a similar function --- we want to be able to consume third party dependencies, but rewrite URLs as needed.

Cheers,

Simon

Tony Aiuto

unread,
Oct 12, 2020, 10:35:26 AM10/12/20
to Simon Mavi Stewart, external-deps, bazel-dev

--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.

yancl

unread,
Oct 13, 2020, 8:55:40 AM10/13/20
to bazel-dev
Cool, your solution is much more powerful and extendible by adding more keywords. It meets my needs if your PR could be accepted.

Thanks, Simon :)

Reply all
Reply to author
Forward
0 new messages