A look at central repos

175 views
Skip to first unread message

Matt Farina

unread,
Jul 29, 2016, 12:22:41 PM7/29/16
to Go Package Management
Central repositories, like those for other programming language communities, have come up repeatedly. Without giving an opinion on what the Go community should do, I wrote up a doc that highlights what they do along with how other communities find them to be useful.

Because these docs are being viewed on Hacker News right now I only have suggested changes on. When things calm down I'm happy to open it up to easier changes. I'll give out edit permissions to folks in the interim if needed.

Note, I'm not suggesting what we do. I just want folks to understand these since they are used by many other languages right now. Consider this more of the R in R&D.

Eric Johnson

unread,
Jul 29, 2016, 12:58:17 PM7/29/16
to Matt Farina, Go Package Management
Not exactly sure where to incorporate this thought into your document,
so contributing here, rather than suggesting there.

You've missed additional reasons for "central" repositories.
Specifically, the scenarios inside an organization.

* Legal review - companies might wish to limit the use of "third
party" packages to only those in their internal "central"
repository. A central repository enforces a workflow wherein
packages are reviewed by Legal before they can be incorporated into
an "official" release.
* Mirroring - Saying that code is available on the internet may not
help if you work someplace where connectivity to the outside world
is unpredictable, or over a slow connection.
* Development sandbox for less-stable forks, or not-yet-fully-tested
for production libraries. I'm thinking of OS-level package managers
that let one identify other repositories outside of the official
ones for the OS (Gentoo overlays, for example).

"Vendoring" in Go sort-of addresses those use-cases for applications,
but doesn't help at all for libraries, where I think we (almost?) all
agree that vendoring doesn't make sense. Also, vendoring distributes the
problem of legal review, rather than centralizing, which increases the
chances of errors.

Eric


On 7/29/16 9:22 AM, Matt Farina wrote:
> Central repositories, like those for other programming language
> communities, have come up repeatedly. Without giving an opinion on
> what the Go community should do, I wrote up a doc that highlights what
> they do along with how other communities find them to be useful
> <https://docs.google.com/document/d/12FscANpkcznMTMRXqtgbK9JJo0dkT0xbe9IL-nlRbHI/edit#>.
>
>
> Because these docs are being viewed on Hacker News right now I only
> have suggested changes on. When things calm down I'm happy to open it
> up to easier changes. I'll give out edit permissions to folks in the
> interim if needed.
>
> Note, I'm not suggesting what we do. I just want folks to understand
> these since they are used by many other languages right now. Consider
> this more of the R in R&D.
> --
> You received this message because you are subscribed to the Google
> Groups "Go Package Management" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to go-package-manag...@googlegroups.com
> <mailto:go-package-manag...@googlegroups.com>.
> To post to this group, send email to
> go-package...@googlegroups.com
> <mailto:go-package...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Sam Boyer

unread,
Jul 29, 2016, 1:02:53 PM7/29/16
to Go Package Management, matt....@gmail.com
There isn't agreement on the "vendoring in libraries is bad" point (though it also depends on whether we're talking about the world we have now, versus the world we might like to have, you're talking about).

Matt Farina

unread,
Jul 29, 2016, 1:27:29 PM7/29/16
to Go Package Management, matt....@gmail.com
There are reasons I didn't include your points but they have crossed my mind. I tried to share them below.

These are all problems that need to be solved. Just not sure they are a good case for a central package repo.


On Friday, July 29, 2016 at 12:58:17 PM UTC-4, Eric Johnson wrote:
Not exactly sure where to incorporate this thought into your document,
so contributing here, rather than suggesting there.

You've missed additional reasons for "central" repositories.
Specifically, the scenarios inside an organization.

  * Legal review - companies might wish to limit the use of "third
    party" packages to only those in their internal "central"
    repository. A central repository enforces a workflow wherein
    packages are reviewed by Legal before they can be incorporated into
    an "official" release.

You can do a legal review from just the source without a central repo. There are tools for that. But, a central repo does help you see the license when searching compared to something like godoc. That's useful and I think I noted that.
 
  * Mirroring - Saying that code is available on the internet may not
    help if you work someplace where connectivity to the outside world
    is unpredictable, or over a slow connection.

You can mirror today. There are some projects to help you with it. But, the situation could be better. A central repo doesn't necessarily make this better or worse.
 
  * Development sandbox for less-stable forks, or not-yet-fully-tested
    for production libraries. I'm thinking of OS-level package managers
    that let one identify other repositories outside of the official
    ones for the OS (Gentoo overlays, for example).

You don't need a central package service for this. If someone has on prem git with GitHub Enterprise or GitLab today they can do it. No special middle man needed to solve this.
 

"Vendoring" in Go sort-of addresses those use-cases for applications,
but doesn't help at all for libraries, where I think we (almost?) all
agree that vendoring doesn't make sense. Also, vendoring distributes the
problem of legal review, rather than centralizing, which increases the
chances of errors.

Eric


On 7/29/16 9:22 AM, Matt Farina wrote:
> Central repositories, like those for other programming language
> communities, have come up repeatedly. Without giving an opinion on
> what the Go community should do, I wrote up a doc that highlights what
> they do along with how other communities find them to be useful
> <https://docs.google.com/document/d/12FscANpkcznMTMRXqtgbK9JJo0dkT0xbe9IL-nlRbHI/edit#>.
>
>
> Because these docs are being viewed on Hacker News right now I only
> have suggested changes on. When things calm down I'm happy to open it
> up to easier changes. I'll give out edit permissions to folks in the
> interim if needed.
>
> Note, I'm not suggesting what we do. I just want folks to understand
> these since they are used by many other languages right now. Consider
> this more of the R in R&D.
> --
> You received this message because you are subscribed to the Google
> Groups "Go Package Management" group.
> To unsubscribe from this group and stop receiving emails from it, send

Eric Johnson

unread,
Jul 29, 2016, 1:54:52 PM7/29/16
to Matt Farina, Go Package Management
Hi Matt,

Thanks for the quick responses.

On 7/29/16 10:27 AM, Matt Farina wrote:
> There are reasons I didn't include your points but they have crossed
> my mind. I tried to share them below.
>
> These are all problems that need to be solved. Just not sure they are
> a good case for a central package repo.
I didn't say they were *good* cases. Just cases. I leave the value
judgments to others.
>
> On Friday, July 29, 2016 at 12:58:17 PM UTC-4, Eric Johnson wrote:
>
> Not exactly sure where to incorporate this thought into your
> document,
> so contributing here, rather than suggesting there.
>
> You've missed additional reasons for "central" repositories.
> Specifically, the scenarios inside an organization.
>
> * Legal review - companies might wish to limit the use of "third
> party" packages to only those in their internal "central"
> repository. A central repository enforces a workflow wherein
> packages are reviewed by Legal before they can be incorporated
> into
> an "official" release.
>
>
> You can do a legal review from just the source without a central repo.
> There are tools for that. But, a central repo does help you see the
> license when searching compared to something like godoc. That's useful
> and I think I noted that.

Legal departments like control. Developers don't. I'm pointing out that
a central repository for *control* purposes seems to be important to some.
>
> * Mirroring - Saying that code is available on the internet may not
> help if you work someplace where connectivity to the outside
> world
> is unpredictable, or over a slow connection.
>
>
> You can mirror today. There are some projects to help you with it.
> But, the situation could be better. A central repo doesn't necessarily
> make this better or worse.
A central repository centralizes the problem. In some cases, that can
make it much easier to address.
>
> * Development sandbox for less-stable forks, or
> not-yet-fully-tested
> for production libraries. I'm thinking of OS-level package
> managers
> that let one identify other repositories outside of the official
> ones for the OS (Gentoo overlays, for example).
>
>
> You don't need a central package service for this. If someone has on
> prem git with GitHub Enterprise or GitLab today they can do it. No
> special middle man needed to solve this.

Perhaps not, but the central package manager brings in some benefits.
(Are you assuming just Git? What about Mercurial, Subversion, Perforce,
etc.?)

I read your document as being about all the use cases where the
centralized repository is/was might help solve a problem, rather than
just those cases where it was a requirement.

Perhaps the document deserves a section about use-cases that benefit
from a central repository, even if there are other possible ways to
solve the problem? Perhaps that's where you could stick these cases?

Eric
> > an email to go-package-manag...@googlegroups.com
> <javascript:>
> > <mailto:go-package-manag...@googlegroups.com
> <javascript:>>.
> > To post to this group, send email to
> > go-package...@googlegroups.com <javascript:>
> > <mailto:go-package...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Go Package Management" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to go-package-manag...@googlegroups.com
> <mailto:go-package-manag...@googlegroups.com>.

mhhcbon

unread,
Jul 29, 2016, 3:53:28 PM7/29/16
to Go Package Management
hi,

two quick comments,

* I d love to see a distributed storage engine over bt.... there s enough go users to make it possible, no ? :) Central repository would be kept only to reference the metadatas, provide query, search api and so on, rather than a huge centralized database containing the alpha and omega.... Now i ll agree that because go is mostly used in enterprise, there are poor chances that it happens.
Unless you have some ideas?



  * Mirroring - Saying that code is available on the internet may not
    help if you work someplace where connectivity to the outside world
    is unpredictable, or over a slow connection.

You can mirror today. There are some projects to help you with it. But, the situation could be better. A central repo doesn't necessarily make this better or worse.

Good thing with central repository is that it will force to publish a package under the form of a tarball.
By then, it will be super easy to apply a sort of cache level at the end user storage system.
And if we could have an of http api between end user <> rest of the world, then it will be even more easier to cache at organization level by using different sources rather than the central repository.

The go tools are using repositories today, which makes this feature way more difficult to implement.

I also like this tarball packaging thing because my packages are holding a lots of side-jobs stuff which are non necessary to someone depending on them. Thus this tarball package would contain only the strict necessary for peer users.

I up vote that idea, i only regret there will be this huge point of failure.

Matt Farina

unread,
Jul 29, 2016, 5:16:29 PM7/29/16
to Go Package Management


On Friday, July 29, 2016 at 3:53:28 PM UTC-4, mhhcbon wrote:
hi,

two quick comments,

* I d love to see a distributed storage engine over bt.... there s enough go users to make it possible, no ? :) Central repository would be kept only to reference the metadatas, provide query, search api and so on, rather than a huge centralized database containing the alpha and omega.... Now i ll agree that because go is mostly used in enterprise, there are poor chances that it happens.
Unless you have some ideas?


  * Mirroring - Saying that code is available on the internet may not
    help if you work someplace where connectivity to the outside world
    is unpredictable, or over a slow connection.

You can mirror today. There are some projects to help you with it. But, the situation could be better. A central repo doesn't necessarily make this better or worse.

Good thing with central repository is that it will force to publish a package under the form of a tarball.
By then, it will be super easy to apply a sort of cache level at the end user storage system.
And if we could have an of http api between end user <> rest of the world, then it will be even more easier to cache at organization level by using different sources rather than the central repository.

The go tools are using repositories today, which makes this feature way more difficult to implement.

I also like this tarball packaging thing because my packages are holding a lots of side-jobs stuff which are non necessary to someone depending on them. Thus this tarball package would contain only the strict necessary for peer users.

I up vote that idea, i only regret there will be this huge point of failure.

I expect, given the other language ecosystems that have them, that this will be a point of discussion.

I'll keep your ideas in mind. I know of many others who share them.


John Souvestre

unread,
Aug 1, 2016, 2:42:35 PM8/1/16
to Go Package Management
Hi Eric.

I believe that with Go it's not safe to have two copies (even if they are the same version) of a package in the same exe. So such situations need to be resolved to a single copy.

John

John Souvestre - New Orleans LA


-----Original Message-----
From: Eric Johnson [mailto:er...@tibco.com]
Sent: 2016 August 01, Mon 13:23
To: Matt Farina
Cc: John Souvestre
Subject: Re: [go-pm] A look at central repos

Hi Matt,

On 8/1/16 11:13 AM, Matt Farina wrote:
> Side note, the A & B relying on different versions of C problem is
> called a diamond dependency. You can search of it and it's listed on
> the Dependency Hell wikipeidia page.
>
Yes, but that's another use-case. Right now, if I'm just building
libraries, and whatever the applications I use them in don't bring them
in together, then I can't use the standard tools, and put these two
libraries on the same GOPATH, unless I vendor the library's
dependencies. So that's a pain. Presumably, I can use something like "gb"?

Eric.

Eric Johnson

unread,
Aug 1, 2016, 2:48:50 PM8/1/16
to John Souvestre, Go Package Management
Hi John,


On 8/1/16 11:42 AM, John Souvestre wrote:
> Hi Eric.
>
> I believe that with Go it's not safe to have two copies (even if they are the same version) of a package in the same exe. So such situations need to be resolved to a single copy.

Not the scenario I'm mentioning.

Just the libraries, A & B. Consider them open-source, and developed by
two independent people.

Now I find myself needing to work with both of them. If A & B vendor
their dependency on C, then any application that tries to use them
together will have problems, as you mention.

But still, what if I have two different applications I'm working on. One
uses A, and the other B.

Without separate GOPATHs, this gets complicated, because I can't put a
single version of C in my GOPATH.

Eric.

mhhcbon

unread,
Aug 7, 2016, 7:05:45 AM8/7/16
to Go Package Management
Another quick note about some implementations details.

If you ever had work with apt/rpm packages sources, you may know that
in the end, they only require a simple webserver to provide a package source.
Every required files are hosted pre generated on the webserver, and the client
figures out everything from them.

And i believe this s really neat because with the right tools and process anyone
could host its own package source very easily.

Also, i suspect, but this has to be proven, that maintenance and creation should be easier than
using a database.

At the end, as a real world example, it let me create/provide/host my own source package
via github (https://github.com/mh-cbon/gh-api-cli/tree/gh-pages/apt)

In practice (well apt is not a good example, tbh), its just about having the package files in a directory
and run a command like aptly or createrepo to make things happen.
easy.

As a counter example, chocolatey requires to run a specific webserver to make things happen, and thus as publisher,
i m locked into their system (unless i deploy necessary hardware resources).
Same for npm.

In the regards of those facts, it worth to note that other features such as search, auth, web interface are,
once again, not core modules to the need of the client to resolve/locate/query/download/install functionality.

BTW, that may be a good thing to let the client have multiple sources like linux distros does.

Matt Farina

unread,
Aug 8, 2016, 12:34:35 PM8/8/16
to Go Package Management


On Monday, August 1, 2016 at 2:42:35 PM UTC-4, John Souvestre wrote:
Hi Eric.

I believe that with Go it's not safe to have two copies (even if they are the same version) of a package in the same exe.  So such situations need to be resolved to a single copy.


That's not completely true. You can have multiple copies of a package in different locations. As long as instances are isolated (no outside interaction or instances passed between locations) it will work. You will have binary bloat as each version is in there.

It's not wise to do this in most cases but technically possible.

nos...@iandavis.com

unread,
Aug 8, 2016, 1:25:35 PM8/8/16
to go-package...@googlegroups.com

On Mon, Aug 8, 2016, at 05:34 PM, Matt Farina wrote:


On Monday, August 1, 2016 at 2:42:35 PM UTC-4, John Souvestre wrote:
Hi Eric.

I believe that with Go it's not safe to have two copies (even if they are the same version) of a package in the same exe.  So such situations need to be resolved to a single copy.


That's not completely true. You can have multiple copies of a package in different locations. As long as instances are isolated (no outside interaction or instances passed between locations) it will work. You will have binary bloat as each version is in there.

It's not wise to do this in most cases but technically possible.
 

It's not safe because the package may be acquiring resources in package init functions which could conflict with multiple imported copies.

-- Ian

John Souvestre

unread,
Aug 8, 2016, 3:53:25 PM8/8/16
to Go Package Management

Hello Matt.

 

Understood.  It’s not always unsafe, just sometimes unsafe.

 

Ø  It's not wise to do this in most cases but technically possible.

 

Yes.  Considering the risk, I would only consider doing it under extraordinary circumstances.  Thus I would expect to have to manually force such a solution.

 

John

    John Souvestre - New Orleans LA

 

From: go-package...@googlegroups.com [mailto:go-package...@googlegroups.com] On Behalf Of Matt Farina
Sent: 2016 August 08, Mon 11:35
To: Go Package Management
Subject: Re: [go-pm] A look at central repos

 


On Monday, August 1, 2016 at 2:42:35 PM UTC-4, John Souvestre wrote:

John Souvestre

unread,
Aug 8, 2016, 10:39:26 PM8/8/16
to go-package...@googlegroups.com

Hi Ian.

 

Ø  It's not safe because the package may be acquiring resources in package init functions which could conflict with multiple imported copies.

 

Yes, but this could also be done somewhere else in the package besides the init function.  In other words, you aren’t “safe” just because the package in question has no init function.

 

John

    John Souvestre - New Orleans LA

 

From: go-package...@googlegroups.com [mailto:go-package...@googlegroups.com] On Behalf Of nos...@iandavis.com
Sent: 2016 August 08, Mon 12:26
To: go-package...@googlegroups.com
Subject: Re: [go-pm] A look at central repos

 

 

On Mon, Aug 8, 2016, at 05:34 PM, Matt Farina wrote:

--

You received this message because you are subscribed to the Google Groups "Go Package Management" group.

To unsubscribe from this group and stop receiving emails from it, send an email to go-package-manag...@googlegroups.com.
To post to this group, send email to go-package...@googlegroups.com.

Ian Davis

unread,
Aug 9, 2016, 6:49:22 AM8/9/16
to go-package...@googlegroups.com
On Tue, Aug 9, 2016, at 03:39 AM, John Souvestre wrote:

Hi Ian.

 

Ø It's not safe because the package may be acquiring resources in package init functions which could conflict with multiple imported copies.

 

Yes, but this could also be done somewhere else in the package besides the init function.  In other words, you aren’t “safe” just because the package in question has no init function.

 



True, but that's not what I said.

-- Ian

roger peppe

unread,
Aug 9, 2016, 1:04:03 PM8/9/16
to nos...@iandavis.com, go-package...@googlegroups.com
Could you provide a real-world example of this?

I've seen problems arising from multiple imports of the "same" package
at different paths, but nothing actually unsafe.

FWIW I have found it very useful in the past to be able to simultaneously import
different API versions of the same package. This happens
when the package in question is a utility package used only internally,
and has no type-fragile reflection code.

However I don't consider this an argument against always merging imports
when possible BTW - it's always possible to make a new fork and import
that.

cheers,
rog.
>
> -- Ian
>
> --
> You received this message because you are subscribed to the Google Groups
> "Go Package Management" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to go-package-manag...@googlegroups.com.
> To post to this group, send email to go-package...@googlegroups.com.

nos...@iandavis.com

unread,
Aug 10, 2016, 7:12:05 AM8/10/16
to go-package...@googlegroups.com
On Tue, Aug 9, 2016, at 06:04 PM, roger peppe wrote:
>
> Could you provide a real-world example of this?
>
> I've seen problems arising from multiple imports of the "same" package
> at different paths, but nothing actually unsafe.
>

Here's a contrived example using a real package golang.org/x/image/bmp

That package registers a bmp image decoder in its init function. When
two copies of the package are imported then both will register their
decoder, which 99% of the time is fine since they have identical
behaviour. However that package also exports an error

var ErrUnsupported = errors.New("bmp: unsupported BMP image")

That means that code like this will start failing unexpectedly,
depending on which package's init function gets run first:

data := bytes.NewBuffer([]byte("BM????\x00\x00\x00\x00bad
dataxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"))

_, _, err := image.Decode(data)

if err == bmp.ErrUnsupported {
log.Fatalf("image was unsupported")
}



Matt Farina

unread,
Aug 10, 2016, 9:20:56 AM8/10/16
to Go Package Management
This init function here is a good example of a place where you can only have one version of a package.

For anyone following, the init function registers something at a global level within the image package. The first one to register will win, I believe.

When code calls the second one it will fail in unexpected ways. Emphasis on the unexpected.

Dave Cheney

unread,
Aug 10, 2016, 5:53:51 PM8/10/16
to Matt Farina, Go Package Management

s/version/copy of the package


Reply all
Reply to author
Forward
0 new messages