Re: Erlang package repository

39 views
Skip to first unread message

Eric Merritt

unread,
Feb 23, 2012, 10:20:46 AM2/23/12
to Torben Hoffmann, erlware-...@googlegroups.com
On Thu, Feb 23, 2012 at 4:43 AM, Torben Hoffmann
<torben...@gmail.com> wrote:
> Hi Eric,
>
> Just had a chat with Jesper Louis about package repository - only the meta
> data, not storage of packages.
>
> After discussing a bit he noted that what we were talking about is quite
> similar to hackage.haskell.org
>
> Hackage has some problems that one should strive to avoid.
>
> Single point of failure - the DB has to be distributed in some way.

Lets take a page from agner and put the metadata in git. No need to
reinvent the wheel and then having multiple sources is really trivial.

> Control of package versions - major mayhem with dependencies down your
> dependency tree.

We should adopt sinans spec format or something like it.

{<name>, lt, <vsn>}
{<name>, lte, <vsn>}
{<name>, gt, <vsn>}
{<name>, gte, <vsn>}
{<name>, between, <vsn1>, <vsn2>}

> Lack of a tinderbox auto builder (Jenkins comes to mind) - it should rebuild

We can do this pretty easily. We need to prune the metadata in any
case. If we are not storing the tarballs they could easily go away. If
they go away or become unbuildable they need to be pruned from the
metadata.

> the packages on a regular basis to see what still works.
> They use black listing - white listing should be preferred. This links with
> the tinderbox auto builder, but it should also be possible for the users to
> mark different combinations as working.

More detail?

> Something like the Cabal format is required - I am thinking app/rel-file
> with some extensions, but this is where you have done some more homework
> than me... ;-)


I am thinking the same thing. Sinan already supports a
versioned_dependencies tuple in the app file. Maybe something like
that?

> Basically we need a distributed hash map that ties application versions
> together with instructions on where to get the source (and later on
> binaries)

Not only that, but also how to build. Something like debian/rules

> For starters a github repo with a hash table should be a good way to store
> the hash table. I think that homebrew started like this, so there might be
> something to learn from that.

I have been thinking about this. Why a hash table, why not a simple
directory layout? Much easier to quiery and git should handle all the
merging and the like without conflicts since there should only every
be additions or removals (never changes).

maybe something like:

.
├── apps
│   ├── my_app
│   │   ├── 0.0.1
│   │   ├── 0.0.2
│   │   ├── 0.0.3
│   │   ├── 0.1.0
│   │   └── 0.1.1
│   │   ├── my_app.app
│   │   └── rules
│   └── your_app
│   ├── 1.0.0
│   ├── 1.0.1
│   ├── 1.2.1
│   └── 1.2.3
└── releases

my_app.app is the dotapp+extensions
rules is something like the debian/rules make file. Just something we
can pass standard commands have have it do the right thing. build,
test, etc.

We might also need an 'install' file that tells us where to look for
the built result since each a build system has its own layout.

> This calls for a small tool that can make it easy for people to publish
> their application to the system.
>
> After writing all this this feels like something that Agner and possibly
> cean already sort of covers.

Agner covers parts of this, but only about 80% of the easy stuff. No
version dependencies or anything like that. We should use it to take
ideas.

> Though Agner does not make it easy to publish
> your stuff. cean uses a .pub file which has to be gatekeeper approved before
> it can go on cean.

I think we should probably take ideas that work from each one. From
agner we take the idea of using git and expand on it, from cean the
parts of the pub file that work. I think agner is not complete enough
and cean has too much infrastructure requirements. that is its not
easy enough. Some combination + extension of the two might work.


> Take it as some input on what to think about.
>
> Cheers,
> Torben
>
> --
> http://www.linkedin.com/in/torbenhoffmann

Tim Watson

unread,
Feb 23, 2012, 6:31:30 PM2/23/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 23 February 2012 15:20, Eric Merritt <ericbm...@gmail.com> wrote:
> On Thu, Feb 23, 2012 at 4:43 AM, Torben Hoffmann
> <torben...@gmail.com> wrote:
>> Hi Eric,
>>
>> Just had a chat with Jesper Louis about package repository - only the meta
>> data, not storage of packages.
>>
>> After discussing a bit he noted that what we were talking about is quite
>> similar to hackage.haskell.org
>>
>> Hackage has some problems that one should strive to avoid.
>>

This is true, but Hackage is one of the better solutions around.

>> Single point of failure - the DB has to be distributed in some way.
>
> Lets take a page from agner and put the metadata in git. No need to
> reinvent the wheel and then having multiple sources is really trivial.
>

I think this is a very reasonable approach.

>> Control of package versions - major mayhem with dependencies down your
>> dependency tree.
>
> We should adopt sinans spec format or something like it.
>
> {<name>, lt, <vsn>}
> {<name>, lte, <vsn>}
> {<name>, gt, <vsn>}
> {<name>, gte, <vsn>}
> {<name>, between, <vsn1>, <vsn2>}
>

Ok but you also should consider 'publisher' (i.e., the corollary to
maven's group id) as this allows for the fact that you might have
people publishing multiple forks of the same application so that
appname + version isn't enough to differentiate between them.

>> Lack of a tinderbox auto builder (Jenkins comes to mind) - it should rebuild
>
> We can do this pretty easily. We need to prune the metadata in any
> case. If we are not storing the tarballs they could easily go away. If
> they go away or become unbuildable they need to be pruned from the
> metadata.
>

Where are the tarballs going to be stored? Also why not have published
artefacts as .ez archives where possible (i.e., when they're pure
Erlang code with no headers in ./include or port/driver code) - just a
thought.

>> the packages on a regular basis to see what still works.
>> They use black listing - white listing should be preferred. This links with
>> the tinderbox auto builder, but it should also be possible for the users to
>> mark different combinations as working.
>
> More detail?
>

What is the interface between the builder and the source packages
going to look like? What about folks like me that would prefer to
publish binary packages (possibly in additional to source packages) as
well?

>> Something like the Cabal format is required - I am thinking app/rel-file
>> with some extensions, but this is where you have done some more homework
>> than me... ;-)
>

Eric discussed this in his *nix tool chain post.

Problem with this is that using that folder structure doesn't leave for

- publisher
- multiple installed erts versions
- builds that contain C code (i.e., ports, drivers, NIFs) when I need
both the 32 and 64 bit versions available on the same machine

I think there's a lot to be said for using a git repo to make the
metadata available. I do think that the index (whether it is a file or
a directory structure based thing) should be easy to fetch, and with
github/bitbucket/gitorious (and pretty much any scm out there) you
don't even need git installed to fetch a download.

Using a single index file might, on the other hand, have the advantage
that fetching each repository index simply requires an HTTP GET (given
that every scm has some kind of web api for getting the raw files in
the source tree).

Just some more food for thought.

>
> We might also need an 'install' file that tells us where to look for
> the built result since each a build system has its own layout.
>
>> This calls for a small tool that can make it easy for people to publish
>> their application to the system.
>>
>> After writing all this this feels like something that Agner and possibly
>> cean already sort of covers.
>
> Agner covers parts of this, but only about 80% of the easy stuff. No
> version dependencies or anything like that. We should use it to take
> ideas.
>
>> Though Agner does not make it easy to publish
>> your stuff. cean uses a .pub file which has to be gatekeeper approved before
>> it can go on cean.
>
> I think we should probably take ideas that work from each one. From
> agner we take the idea of using git and expand on it, from cean the
> parts of the pub file that work. I think agner is not complete enough
> and cean has too much infrastructure requirements. that is its not
> easy enough. Some combination + extension of the two might work.
>
>

Personally I want to produce an archive (ending in .ez or .zip) that
contains beam, resource files (e.g., config, etc) and maybe some
headers, and publish that. I build my own sources in CI and want to be
able to publish them to the repository each time a snapshot build
succeeds.

I do recognise that the ability to build from source is needed though,
in the case that pre-built artefacts are not available.

Eric Merritt

unread,
Feb 23, 2012, 6:47:24 PM2/23/12
to erlware-...@googlegroups.com, Torben Hoffmann
>
>>> Control of package versions - major mayhem with dependencies down your
>>> dependency tree.
>>
>> We should adopt sinans spec format or something like it.
>>
>> {<name>, lt, <vsn>}
>> {<name>, lte, <vsn>}
>> {<name>, gt, <vsn>}
>> {<name>, gte, <vsn>}
>> {<name>, between, <vsn1>, <vsn2>}
>>
>
> Ok but you also should consider 'publisher' (i.e., the corollary to
> maven's group id) as this allows for the fact that you might have
> people publishing multiple forks of the same application so that
> appname + version isn't enough to differentiate between them.

I agreed. god I hate that but you are right. It needs to be supported.

>>> Lack of a tinderbox auto builder (Jenkins comes to mind) - it should rebuild
>>
>> We can do this pretty easily. We need to prune the metadata in any
>> case. If we are not storing the tarballs they could easily go away. If
>> they go away or become unbuildable they need to be pruned from the
>> metadata.
>>
>
> Where are the tarballs going to be stored? Also why not have published
> artefacts as .ez archives where possible (i.e., when they're pure
> Erlang code with no headers in ./include or port/driver code) - just a
> thought.

Again. I think that is pretty trivial to support. I actually dont know
where to put the tarballs. Someplace we dont have to manage as
infrastructure. Maybe amazon s3? or perhaps something like rackspaces
file spaces? We could probably find someone to sponsor that, maybe
even Afiniate.

Since this wouldn't hold the actual code, just the metadata all of
that might go in the metadata. We may need to add publisher to the
heirarchy though.

> I think there's a lot to be said for using a git repo to make the
> metadata available. I do think that the index (whether it is a file or
> a directory structure based thing) should be easy to fetch, and with
> github/bitbucket/gitorious (and pretty much any scm out there) you
> don't even need git installed to fetch a download.

I suspect this can be done with erlanggit or something (assuming erld
is built in erlang). We could probably other options. However, for
version 0.1 I wouldnt worry about it.

>
> Using a single index file might, on the other hand, have the advantage
> that fetching each repository index simply requires an HTTP GET (given
> that every scm has some kind of web api for getting the raw files in
> the source tree).

true, and a valid point. The metadata is small enough that getting
fresh each time shouldnt be a big deal. The downside is that the whole
file needs to be updated every time it changes and there needs to be
some way to merge multple repositories into a single 'index'.

>
> Just some more food for thought.

Very good thoughts.

>>
>> We might also need an 'install' file that tells us where to look for
>> the built result since each a build system has its own layout.
>>
>>> This calls for a small tool that can make it easy for people to publish
>>> their application to the system.
>>>
>>> After writing all this this feels like something that Agner and possibly
>>> cean already sort of covers.
>>
>> Agner covers parts of this, but only about 80% of the easy stuff. No
>> version dependencies or anything like that. We should use it to take
>> ideas.
>>
>>> Though Agner does not make it easy to publish
>>> your stuff. cean uses a .pub file which has to be gatekeeper approved before
>>> it can go on cean.
>>
>> I think we should probably take ideas that work from each one. From
>> agner we take the idea of using git and expand on it, from cean the
>> parts of the pub file that work. I think agner is not complete enough
>> and cean has too much infrastructure requirements. that is its not
>> easy enough. Some combination + extension of the two might work.
>>
>>
>
> Personally I want to produce an archive (ending in .ez or .zip) that
> contains beam, resource files (e.g., config, etc) and maybe some
> headers, and publish that. I build my own sources in CI and want to be
> able to publish them to the repository each time a snapshot build
> succeeds.
>
> I do recognise that the ability to build from source is needed though,
> in the case that pre-built artefacts are not available.

I agree we should support both. Though binaries add a lot more complexity.

Tim Watson

unread,
Feb 23, 2012, 7:30:53 PM2/23/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 23 February 2012 23:47, Eric Merritt <ericbm...@gmail.com> wrote:
>>
>>>> Control of package versions - major mayhem with dependencies down your
>>>> dependency tree.
>>>
>>> We should adopt sinans spec format or something like it.
>>>
>>> {<name>, lt, <vsn>}
>>> {<name>, lte, <vsn>}
>>> {<name>, gt, <vsn>}
>>> {<name>, gte, <vsn>}
>>> {<name>, between, <vsn1>, <vsn2>}
>>>
>>
>> Ok but you also should consider 'publisher' (i.e., the corollary to
>> maven's group id) as this allows for the fact that you might have
>> people publishing multiple forks of the same application so that
>> appname + version isn't enough to differentiate between them.
>
> I agreed. god I hate that but you are right. It needs to be supported.
>

I'm actually annoyed that I'm right. This has as much to do with the
flat namespace as anything. There have been other discussions on
erlang-questions about packaging and namespaces that bring up
interesting ideas (isolation of beam/names to an application boundary,
copying Eiffel's LACE framework) but they're all too much of a
distraction.

Reality is you could have two applications with the same name and
version and that isn't necessarily invalid if they're published by
different organisations/people and represent different forks of the
same work. It's only invalid if they're included in the same
dependency graph.

We *could* make people solve this themselves of course. When scala
artefacts have to get published to a maven/nexus repository, you need
to know which version of scala they're valid to use with, and if the
publisher is supporting parallel scala language versions there's no
way to do that unless you modify that artefact name, so you get the
delightful:

<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.9.1</artifactId>
<version>1.7.1</version>
<scope>test</scope>
</dependency>

where _2.9.1 is the runtime version that this particular build
supports. It's hideous but it provides a work-around.

In my opinion, if we could provide a cleaner - or at least better
structured - approach to dealing with this problem (we have in
Erlang), that would be a good thing.

Yes exactly - pushing those details into the metadata is better, so I
guess in that case the choice of directory structure isn't such a big
deal.

>> I think there's a lot to be said for using a git repo to make the
>> metadata available. I do think that the index (whether it is a file or
>> a directory structure based thing) should be easy to fetch, and with
>> github/bitbucket/gitorious (and pretty much any scm out there) you
>> don't even need git installed to fetch a download.
>
> I suspect this can be done with erlanggit or something (assuming erld
> is built in erlang). We could probably other options. However, for
> version 0.1 I wouldnt worry about it.
>

What is erlanggit? Is it a binding? If so, you still need libgit
installed don't you, and if you're talking about
https://github.com/AlainODea/erlang_git then it's incomplete and
appears dead. Fair enough though, let's work out the details when
we're further along.

>>
>> Using a single index file might, on the other hand, have the advantage
>> that fetching each repository index simply requires an HTTP GET (given
>> that every scm has some kind of web api for getting the raw files in
>> the source tree).
>
> true, and a valid point. The metadata is small enough that getting
> fresh each time shouldnt be a big deal. The downside is that the whole
> file needs to be updated every time it changes and there needs to be
> some way to merge multple repositories into a single 'index'.
>

Hmn, that's an interesting one - pros and cons both ways.

Yeah you're right. I reckon that support could come after 1.0.0 is out
though. After all, the only other tools in that space are CEAN (which
hardly anyone seems to use for some reason) and maven-erlang-plugin
(which is highly unlikely to win any popularity contests in the Erlang
community).

I am really going to sleep now though....

:)

Eric Merritt

unread,
Feb 23, 2012, 7:53:32 PM2/23/12
to erlware-...@googlegroups.com, Torben Hoffmann
>>> Ok but you also should consider 'publisher' (i.e., the corollary to
>>> maven's group id) as this allows for the fact that you might have
>>> people publishing multiple forks of the same application so that
>>> appname + version isn't enough to differentiate between them.
>>
>> I agreed. god I hate that but you are right. It needs to be supported.
>>
>
> I'm actually annoyed that I'm right. This has as much to do with the
> flat namespace as anything. There have been other discussions on
> erlang-questions about packaging and namespaces that bring up
> interesting ideas (isolation of beam/names to an application boundary,
> copying Eiffel's LACE framework) but they're all too much of a
> distraction.

Lets stay away from this. If we try to fix the VM we will fail. Lets
concentrate on the resolution of apps and just have the idea that on
one <name>-<vsn> can exist in an erlang code path (basically a local
repo) at any one time. If we take that as reality it will be much
better for us.

> Reality is you could have two applications with the same name and
> version and that isn't necessarily invalid if they're published by
> different organisations/people and represent different forks of the
> same work. It's only invalid if they're included in the same
> dependency graph.

right exactly.


>
> We *could* make people solve this themselves of course. When scala
> artefacts have to get published to a maven/nexus repository, you need
> to know which version of scala they're valid to use with, and if the
> publisher is supporting parallel scala language versions there's no
> way to do that unless you modify that artefact name, so you get the
> delightful:
>
> <dependency>
>    <groupId>org.scalatest</groupId>
>    <artifactId>scalatest_2.9.1</artifactId>
>    <version>1.7.1</version>
>    <scope>test</scope>
> </dependency>
>
> where _2.9.1 is the runtime version that this particular build
> supports. It's hideous but it provides a work-around.

This might actually work. Though I would not want to restict that
artifact id to a single version of the erlang runtime

> In my opinion, if we could provide a cleaner - or at least better
> structured - approach to dealing with this problem (we have in
> Erlang), that would be a good thing.

agreed.


>>
>> Since this wouldn't hold the actual code, just the metadata all of
>> that might go in the metadata. We may need to add publisher to the
>> heirarchy though.
>>
>
> Yes exactly - pushing those details into the metadata is better, so I
> guess in that case the choice of directory structure isn't such a big
> deal.
>
>>> I think there's a lot to be said for using a git repo to make the
>>> metadata available. I do think that the index (whether it is a file or
>>> a directory structure based thing) should be easy to fetch, and with
>>> github/bitbucket/gitorious (and pretty much any scm out there) you
>>> don't even need git installed to fetch a download.
>>
>> I suspect this can be done with erlanggit or something (assuming erld
>> is built in erlang). We could probably other options. However, for
>> version 0.1 I wouldnt worry about it.
>>
>
> What is erlanggit? Is it a binding? If so, you still need libgit

nope. pure erlang implementation of git.

The one reason I like using git or something like it is we get remotes
repos, merges and a lot of other functionality for free. I like
getting things for free rather a lot.

> installed don't you, and if you're talking about
> https://github.com/AlainODea/erlang_git then it's incomplete and
> appears dead. Fair enough though, let's work out the details when
> we're further along.

no thats dead and very incomplete.

[This](https://github.com/schacon/erlangit) seems more complete and
usable though I have never used it in anger.

>>
>> I agree we should support both. Though binaries add a lot more complexity.
>>
>
> Yeah you're right. I reckon that support could come after 1.0.0 is out
> though. After all, the only other tools in that space are CEAN (which
> hardly anyone seems to use for some reason) and maven-erlang-plugin
> (which is highly unlikely to win any popularity contests in the Erlang
> community).

agreed.


>
> I am really going to sleep now though....
>
> :)
>

> --
> You received this message because you are subscribed to the Google Groups "erlware-questions" group.
> To post to this group, send email to erlware-...@googlegroups.com.
> To unsubscribe from this group, send email to erlware-questi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/erlware-questions?hl=en.
>

Eric Merritt

unread,
Feb 23, 2012, 8:17:36 PM2/23/12
to erlware-...@googlegroups.com, Torben Hoffmann
for the canonical repo integrating with erldocs might be interesting

On Thu, Feb 23, 2012 at 6:53 PM, Eric Merritt <ericbm...@gmail.com> wrote:
>>>> Ok but you also should consider 'publisher' (i.e., the corollary to
>>>> maven's group id) as this allows for the fact that you might have
>>>> people publishing multiple forks of the same application so that
>>>> appname + version isn't enough to differentiate between them.
>>>
>>> I agreed. god I hate that but you are right. It needs to be supported.
>>>
>>
>> I'm actually annoyed that I'm right. This has as much to do with the

>> flat namespace asfor anything. There have been other discussions on

Torben Hoffmann

unread,
Feb 23, 2012, 10:54:04 AM2/23/12
to Eric Merritt, erlware-...@googlegroups.com

On 23/2/12 16:20 , Eric Merritt wrote:
> On Thu, Feb 23, 2012 at 4:43 AM, Torben Hoffmann
> <torben...@gmail.com> wrote:
>> Hi Eric,
>>
>> Just had a chat with Jesper Louis about package repository - only the meta
>> data, not storage of packages.
>>
>> After discussing a bit he noted that what we were talking about is quite
>> similar to hackage.haskell.org
>>
>> Hackage has some problems that one should strive to avoid.
>>
>> Single point of failure - the DB has to be distributed in some way.
> Lets take a page from agner and put the metadata in git. No need to
> reinvent the wheel and then having multiple sources is really trivial.

Yes - first get started and then get fancy... was my intention but
forgot to write it ;-)


>
>> Control of package versions - major mayhem with dependencies down your
>> dependency tree.
> We should adopt sinans spec format or something like it.
>
> {<name>, lt,<vsn>}
> {<name>, lte,<vsn>}
> {<name>, gt,<vsn>}
> {<name>, gte,<vsn>}
> {<name>, between,<vsn1>,<vsn2>}

That is exactly what is needed!!


>
>> Lack of a tinderbox auto builder (Jenkins comes to mind) - it should rebuild
> We can do this pretty easily. We need to prune the metadata in any
> case. If we are not storing the tarballs they could easily go away. If
> they go away or become unbuildable they need to be pruned from the
> metadata.

I would not remove them, but mark them as failing so that people have a
chance to clean up and see what has stopped working.


>
>> the packages on a regular basis to see what still works.
>> They use black listing - white listing should be preferred. This links with
>> the tinderbox auto builder, but it should also be possible for the users to
>> mark different combinations as working.
> More detail?

Things should be considered not to work unless verified to work. So you
may publish something, but until there is some sort of evidence that it
works it should have a status of unknown.
The white list would be all the things know to work and the only ones
that should be offered for download.
Some main page should then show the status of the entry.

The key point is that it should be easy to find the things known to work.
Hackage has many stale packages that do not work any more - an automated
environment would solve that issue.


>
>> Something like the Cabal format is required - I am thinking app/rel-file
>> with some extensions, but this is where you have done some more homework
>> than me... ;-)
>
> I am thinking the same thing. Sinan already supports a
> versioned_dependencies tuple in the app file. Maybe something like
> that?

Yes, that sounds like the way to go.

>
>> Basically we need a distributed hash map that ties application versions
>> together with instructions on where to get the source (and later on
>> binaries)
> Not only that, but also how to build. Something like debian/rules

You're right.
Something like {build, "some command"} and {test, "..."} and then add on
that as it matures.


>
>> For starters a github repo with a hash table should be a good way to store
>> the hash table. I think that homebrew started like this, so there might be
>> something to learn from that.
> I have been thinking about this. Why a hash table, why not a simple
> directory layout? Much easier to quiery and git should handle all the
> merging and the like without conflicts since there should only every
> be additions or removals (never changes).

That will work too - and it has some merits from being a tree that is
simple and understandable. Let's go for that.
One can always change later on.


>
> maybe something like:
>
> .
> ├── apps
> │ ├── my_app
> │ │ ├── 0.0.1
> │ │ ├── 0.0.2
> │ │ ├── 0.0.3
> │ │ ├── 0.1.0
> │ │ └── 0.1.1
> │ │ ├── my_app.app
> │ │ └── rules
> │ └── your_app
> │ ├── 1.0.0
> │ ├── 1.0.1
> │ ├── 1.2.1
> │ └── 1.2.3
> └── releases
>
> my_app.app is the dotapp+extensions
> rules is something like the debian/rules make file. Just something we
> can pass standard commands have have it do the right thing. build,
> test, etc.
>
> We might also need an 'install' file that tells us where to look for
> the built result since each a build system has its own layout.

Makes sense.


>
>> This calls for a small tool that can make it easy for people to publish
>> their application to the system.
>>
>> After writing all this this feels like something that Agner and possibly
>> cean already sort of covers.
> Agner covers parts of this, but only about 80% of the easy stuff. No
> version dependencies or anything like that. We should use it to take
> ideas.
>
>> Though Agner does not make it easy to publish
>> your stuff. cean uses a .pub file which has to be gatekeeper approved before
>> it can go on cean.
> I think we should probably take ideas that work from each one. From
> agner we take the idea of using git and expand on it, from cean the
> parts of the pub file that work. I think agner is not complete enough
> and cean has too much infrastructure requirements. that is its not
> easy enough. Some combination + extension of the two might work.

That was my thinking - if we add more details to the desired work flow
we talked about yesterday I think we should be able to identify what is
needed.

Eric Merritt

unread,
Feb 24, 2012, 11:49:03 AM2/24/12
to Torben Hoffmann, erlware-...@googlegroups.com
>> More detail?
>
> Things should be considered not to work unless verified to work. So you may
> publish something, but until there is some sort of evidence that it works it
> should have a status of unknown.
> The white list would be all the things know to work and the only ones that
> should be offered for download.
> Some main page should then show the status of the entry.
>
> The key point is that it should be easy to find the things known to work.
> Hackage has many stale packages that do not work any more - an automated
> environment would solve that issue.

We should mark that in the metadata somehow. We should probably think
about signing in the metadata too. I are going to have a distributed
repo it would be good if people knew what information they could trust
and the like. I will think about that too.

>>
>>> Basically we need a distributed hash map that ties application versions
>>> together with instructions on where to get the source (and later on
>>> binaries)
>>
>> Not only that, but also how to build. Something like debian/rules
>
> You're right.
> Something like {build, "some command"} and {test, "..."} and then add on
> that as it matures.


Lets not invent something if at all possible. Better to simply use
make and have a few known commands that will be called, build, test,
etc. then try to reinvent it. If not make then something similar at
least, shell scripts with the arguments etc.

Tim Watson

unread,
Feb 24, 2012, 3:28:53 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 24 February 2012 16:49, Eric Merritt <ericbm...@gmail.com> wrote:
>>> More detail?
>>
>> Things should be considered not to work unless verified to work. So you may
>> publish something, but until there is some sort of evidence that it works it
>> should have a status of unknown.
>> The white list would be all the things know to work and the only ones that
>> should be offered for download.
>> Some main page should then show the status of the entry.
>>
>> The key point is that it should be easy to find the things known to work.
>> Hackage has many stale packages that do not work any more - an automated
>> environment would solve that issue.
>
> We should mark that in the metadata somehow. We should probably think
> about signing in the metadata too. I are going to have a distributed
> repo it would be good if people knew what information they could trust
> and the like. I will think about that too.
>

Agreed. Please do consider signing as well. It's really important to
businesses that have heavy regulatory compliance undertakings to
consider and verifiable sources (of anything that is used in business
critical software) are part of the due diligence.

>>>
>>>> Basically we need a distributed hash map that ties application versions
>>>> together with instructions on where to get the source (and later on
>>>> binaries)
>>>
>>> Not only that, but also how to build. Something like debian/rules
>>
>> You're right.
>> Something like {build, "some command"} and {test, "..."} and then add on
>> that as it matures.
>
>
> Lets not invent something if at all possible. Better to simply use
> make and have a few known commands that will be called, build, test,
> etc. then try to reinvent it. If not make then something similar at
> least, shell scripts with the arguments etc.
>

There are some common things you need to be able to do. Take a look at
https://github.com/hyperthunk/rebar_alien_plugin which does something
similar for running commands, and also at
https://github.com/hyperthunk/epm which does the same.

I tell you what though, we have another problem if we do this. It
becomes far more complicated if someone is using a build system that
we do not have installed. That is one of the 'big' advantages of going
for binary artefacts instead of building from source. The publisher is
responsible for getting the thing to build and package properly, then
they sign the package and deploy it to the repository using their
private key.

So in a sense, I guess I'm asking why we would take on this massive
amount of work (building the publisher's sources, regardless of the
build system they're using - assuming it will be installed, with the
right version and any dependencies of its own, etc) and there's more
to consider here too. What about packages that have driver code *and*
additional native dependencies of their own, for example a driver
based application that requires a specific version of iconv to be
available/installed. Now we're either forcing the package maintainer
to distribute a binary copy of iconv or we're committing to being able
to find and install it, etc. This is starting to look like madness.

Also there is another, serious issue with those situations, like the
(depending on iconv or something else) one above. If we're building
from source, and the driver has native dependencies of its own, these
need to be available for whatever architecture we're building on. This
might not be terribly onerous on the build server, but it needs to
work on every client that builds from source too. Now we're forcing
the maintainer to either embed the source code for the native
dependencies (for the driver), or to provide (as part of his build) a
means of fetching them, or to bundle a binary dependency for each
supported os/architecture.

This is just too much maintenance overhead, and it's not an uncommon
situation: https://github.com/hyperthunk/erlxsl requires
https://github.com/hyperthunk/erlxsl-sablotron which depends on
https://github.com/hyperthunk/sablotron. One is built with rebar, one
with waf and one with autotools and make.

Also we should support windows. It is wrong to push out developers
who're stuck on that platform. The rebar support for windows gets
better as it relies less and less on cygwin/msys to provide an
'pretend' unix environment, and sticks to the native tool chain.

I think it's clear why the hackage guys decided to steer clear of this
mess. It is the package maintainers' responsibility to make sure their
code builds and works and to set up a CI server somewhere to prove
this. If someone is on travis-ci and they're regularly committing
their code then I'm going to consider their signed packages viable.
Otherwise, I'll investigate. It's my responsibility as a developer to
do that.

So IMVHO we should stay clear of this, otherwise we'll likely never be
code complete and even if we did get that far, every man and his dog
would want it to work some other way.

Painful though the challenges of supporting binary packages are, I
personally think they're much easier to solve than this.

Eric Merritt

unread,
Feb 24, 2012, 3:44:16 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
>>
>> We should mark that in the metadata somehow. We should probably think
>> about signing in the metadata too. I are going to have a distributed
>> repo it would be good if people knew what information they could trust
>> and the like. I will think about that too.
>>
>
> Agreed. Please do consider signing as well.

I think we need to support it. It shouldn't get in the way if its not
there, but be available to folks if it is.

>>
>
> There are some common things you need to be able to do. Take a look at
> https://github.com/hyperthunk/rebar_alien_plugin which does something
> similar for running commands, and also at
> https://github.com/hyperthunk/epm which does the same.
>
> I tell you what though, we have another problem if we do this. It
> becomes far more complicated if someone is using a build system that
> we do not have installed.

Agreed. build time dependencies solve this somewhat. The non-erlang,
os dependencies is what is really going to screw us at least in the
binary case.

> That is one of the 'big' advantages of going
> for binary artefacts instead of building from source. The publisher is
> responsible for getting the thing to build and package properly, then
> they sign the package and deploy it to the repository using their
> private key.

The only thing I worry about here is that they simply wont do it. That
is what we found with faxien. The other problem is that folks will
only build for the platforms they have access to. Which will be i686
or amd64 in the vast majority of cases and probably one or the other
and both. and you (the consumer) wont be able to solve that problem
without going out and downloading yourself.

So I agree with you, there is just complexity on both sides. Do you
want the complexity in handling the build problems (which would be
mostly on us) or do you want complexity on the user in getting
packages? I keep going back and forth on this one.

>
> So in a sense, I guess I'm asking why we would take on this massive
> amount of work (building the publisher's sources, regardless of the
> build system they're using - assuming it will be installed, with the
> right version and any dependencies of its own, etc) and there's more
> to consider here too. What about packages that have driver code *and*
> additional native dependencies of their own, for example a driver
> based application that requires a specific version of iconv to be
> available/installed. Now we're either forcing the package maintainer
> to distribute a binary copy of iconv or we're committing to being able
> to find and install it, etc. This is starting to look like madness.

I agree. its a nasty problem. Though, its easy enough to only support
the simple case, which will cover 98% of the apps out there and expand
to the non-trivial cases later. We should do that in any case.

> Also there is another, serious issue with those situations, like the
> (depending on iconv or something else) one above. If we're building
> from source, and the driver has native dependencies of its own, these
> need to be available for whatever architecture we're building on. This
> might not be terribly onerous on the build server, but it needs to
> work on every client that builds from source too. Now we're forcing
> the maintainer to either embed the source code for the native
> dependencies (for the driver), or to provide (as part of his build) a
> means of fetching them, or to bundle a binary dependency for each
> supported os/architecture.
>
> This is just too much maintenance overhead, and it's not an uncommon
> situation: https://github.com/hyperthunk/erlxsl requires
> https://github.com/hyperthunk/erlxsl-sablotron which depends on
> https://github.com/hyperthunk/sablotron. One is built with rebar, one
> with waf and one with autotools and make.

If thats the case though, what do we do about holes in the packages.
The whole not built for my platform problem. I am ok with saying, just
dont worry about it. but it needs to be an explicit decision.

> Also we should support windows. It is wrong to push out developers
> who're stuck on that platform. The rebar support for windows gets
> better as it relies less and less on cygwin/msys to provide an
> 'pretend' unix environment, and sticks to the native tool chain.

This is more problematic. I don't use windows at all. I haven't even
had access to a windows box for ten years or so. Its just a platform I
couldn't care less about. Now, if someone wants to step up and be the
windows guy, I am very happy to write cross platform code, make
choices that don't explicitly disallow windows etc. I guess I just
don't see much value in this but am willing to be convinced otherwise.

> I think it's clear why the hackage guys decided to steer clear of this
> mess. It is the package maintainers' responsibility to make sure their
> code builds and works and to set up a CI server somewhere to prove
> this. If someone is on travis-ci and they're regularly committing
> their code then I'm going to consider their signed packages viable.
> Otherwise, I'll investigate. It's my responsibility as a developer to
> do that.
>
> So IMVHO we should stay clear of this, otherwise we'll likely never be
> code complete and even if we did get that far, every man and his dog
> would want it to work some other way.
>
> Painful though the challenges of supporting binary packages are, I
> personally think they're much easier to solve than this.

I am actually pretty ok with this. The less infrastructure we have to
host and manage the better off we are going to be. If we push 'holes'
and complaints onto the package owners thats probably the way to go
over all.

Tim Watson

unread,
Feb 24, 2012, 5:31:22 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
Ok let's go a few more rounds on this until we feel comfortable. I
think you and I have very similar concerns and are both just as
conflicted about what 'The Right Thing To Do (TM)' looks like.

On 24 February 2012 20:44, Eric Merritt <ericbm...@gmail.com> wrote:
>>>
>>> We should mark that in the metadata somehow. We should probably think
>>> about signing in the metadata too. I are going to have a distributed
>>> repo it would be good if people knew what information they could trust
>>> and the like. I will think about that too.
>>>
>>
>> Agreed. Please do consider signing as well.
>
> I think we need to support it. It shouldn't get in the way if its  not
> there, but be available to folks if it is.
>

I'm a bit conflicted with this one. If I'm uploading packages to a
repository, I want to know (for sure) that nobody else can interfere
with them. I don't want someone downloading my code and getting their
harddrive wiped because the package got corrupted 'somehow' or other.
So authenticating the maintainers is important. CEAN deals with this
by having a manual process, which is obviously not going to work for
us. We need alternatives.

In terms of code/package signing, I agree that should be optional. At
a minimum, I'd expect the behaviour that sinan/faxien had around MD5
checking to remain, but that's enough as long as the user is
comfortable installing code from an unverifiable source.

>>>
>>
>> There are some common things you need to be able to do. Take a look at
>> https://github.com/hyperthunk/rebar_alien_plugin which does something
>> similar for running commands, and also at
>> https://github.com/hyperthunk/epm which does the same.
>>
>> I tell you what though, we have another problem if we do this. It
>> becomes far more complicated if someone is using a build system that
>> we do not have installed.
>
> Agreed. build time dependencies solve this somewhat. The non-erlang,
> os dependencies is what is really going to screw us at least in the
> binary case.
>
>> That is one of the 'big' advantages of going
>> for binary artefacts instead of building from source. The publisher is
>> responsible for getting the thing to build and package properly, then
>> they sign the package and deploy it to the repository using their
>> private key.
>
> The only thing I worry about here is that they simply wont do it. That
> is what we found with faxien. The other problem is that folks will
> only build for the platforms they have access to. Which will be i686
> or amd64 in the vast majority of cases and probably one or the other
> and both.  and you (the consumer) wont be able to solve that problem
> without going out and downloading yourself.
>

I think people didn't use faxien because they weren't using sinan. I
think people weren't using sinan because they either (a) didn't get
it, (b) misunderstood it - I fell into this category at the time - or
(c) were happy with Make and didn't see a reason to change, which is a
subset of (a) IMO. I suspect people also didn't understand how to
'push' their artefacts to the repository - it certainly wasn't clear
to me and I spent quite some time looking at the documentation, though
I didn't trawl through the code to figure it out back then.

I am ok with pushing this issue to the consumer. That's pretty much
how cabal works - if there is a non-haskell dependency that's required
by the package but cannot be located by GHC (or whatever), then it
pukes and tells you to go sort it out yourself.

I think the way to get buy-in for this is to make it too simple to
ignore. Once all the moving parts are in place, all the user should
need do to get going is a few CLI instructions and the tool(s) should
do the rest. Whatever tools we make available, I'll write the relevant
rebar plugins to make sure that people don't *have* to change their
tool chain unless they want to. It should look something like this

$ rebar register publisher=nebularis mode=oauth/openid
location=https://github.com/hyperthunk
Please enter your password: ********
<<welcome text......>>
$
$ cd ~/work/nebularis/myproject
$ git tag -a 0.0.6 -m "Release 0.0.6"
$ git describe --abbrev=0
0.0.6
$
$ rebar clean compile generate dist
$ ls -la dist/
total 140376
drwxr-xr-x 4 t4 nlmp 136 19 Nov 02:09 .
drwxr-xr-x 23 t4 nlmp 782 19 Nov 02:09 ..
-rw-r--r-- 1 t4 nlmp 71863486 19 Nov 02:08 myproject-0.0.6.zip
$
$ # the central repo URL is defined in some config somewhere
$ # you might have set yourself up as multiple publishers
$
$ rebar publish namespace=nebularis repository=central
artefact=dist/myproject-0.0.6.zip
[INFO] Publishing artefact [myproject] to namespace [nebularis] in
repository [central]
[DEBUG] Pub MetaData:
[DEBUG] * Minimum erts version = 5.7.5 (R13B04)
[DEBUG] * Maximum erts version (inferred) = 5.9.* (R15)
[DEBUG] * No Driver Code Detected: os/arch flags set to [generic]
[DEBUG] * Not built using HIPE: erts os/arch flags set to [generic]
[WARN] You are publishing an unsigned package.
[WARN] For details on how to sign your code, go to http://wtf.com/crazy-stuff
[INFO] Uploading to http://......
[INFO] Progress [========================>] 100%
[INFO] Publishing is complete.
$

It really should *not* be more complicated than that. In summary,
these ideas consist of

- let people register *really easily* on the command line
- use oauth/openid to verify the person if possible, maybe support other methods
- make sure that publication is *really easy* once an artefact is built
- detect the environment/build constraints for the package wherever possible
- allow the package to override the detected constraints if the
maintainer wants to do that
- make code signing possible, but optional (as you've already said)

Personally, I think that if we go onto Erlang questions and put that
demo session in front of people, they'll bite. Some won't care, but
they're not the target audience because they don't care about doing
it. Some will complain about exactly the stuff we're covering below -
I've now got to build it for n different environments and I don't have
access to do that, etc - but that's a *different* problem space to
solve I think.

Now in terms of the consumers, if they are running a 64bit R15 VM on
FreeBSD and they want ErlXSL as a dependency and there isn't a binary
artefact available, then these are the options that I think make the
most sense.

1. We do what maven does (which is very similar to cabal's approach in
some ways) - I'll describe this in a minute
2. We do what rebar/agner does - I'll describe this in a minute
3. We do all the bonkers things we've just agreed not to do, because
they're too complicated
4. We come up with some magical alternative that I've yet to hear about

I'm going to go through these in reverse order. Choice (4) doesn't
seem to exist. Choice (3) seems to be out, because it's too hard.
Choice (2) doesn't seem likely to help us - if we felt those tools
were doing what we wanted in terms of dependency management, we
wouldn't be having these conversations in the first place. Which leads
us to....

#1 - The maven-esque approach.

When maven can't resolve a dependency, it simply stops the build and
says 'go find and build this yourself, and when you're done, run this
command <.....> to install the thing into the right place in your
local repository'. What *most* maven users actually do is this:

1. download and build the missing artefact from source
2. install it into their local repository with `maven deploy-file ...'
3. deploy it into their corporate maven/nexus repository using `maven
deploy ....'

After step 3, anybody else in the organisation that wants that
particular artefact will just get it pre-built, so long as they've
configured the corporate repository in their local machine's maven
settings. Personally I think this is 'the right thing' for most cases,
as it makes the user/team consuming the artefact responsible for how
they handle the situation.

What I'd like to suggest we consider, as a corollary to what we offer
publishers, is the concept of a 'companion namespace' which is
basically a way to support the following process.

- publisher publishes package X with for 32/64bit Windows and generic
Linux platforms on i686 or amd architectures
- package X depends on some native crap and the publisher 'deals with
that' in their build and (re)distributes a binary dependency in their
package
- the publisher *should* provide some info (even if it's just
library/package names) about the native dependencies that they're
explicitly handling, but I guess this would have to be optional
- a consumer running OpenBSD wants to install package X
- the dependency manager says 'I cannot do this as there is no
pre-built package for your platform/architecture/etc'
- the dependency manager provides detailed information about 'what' it
cannot resolve (os, arch, erts min/max, etc)
- the dependency manager says 'go build this yourself (and good luck)'
- the consumer goes and does the download+configure+make dance
- the consumer installs X into their local repository using the tool
(which provides a command to do this)
- the consumer wants to make sure their colleagues and friends, plus
anyone who downloads their project, can get the artefact without
having to go through all this malarky
- the consumer publishes the X they've built into the repository

Now how can that last step work, if the consumer doesn't have the
oauth login to push a package into the nebularis namespace on the
central repository? A companion namespace basically provides a way for
the consumer to say "Hey guys, I've built nebularis/X-0.0.6 for
<insert complicated environment dependencies here> and I've put it in
*my* namespace".

Will the dependency resolver now recognise <the-consumer>/X-0.0.6? No.
That's almost what's happened, but we want a way to connect the
artefact back to the original publisher, but say that 'this is
effectively like a fork of a project's source code'. It's almost like
saying

Package Info: artefact=X, version=0.0.6, namespace=(nebularis, via
<the consumer's namespace>)

Or perhaps, it is more like saying

namespace=consumer/companions/nebularis

Now if someone else asks for nebularis/X-0.0.6 on OpenBSD they won't
simply get this 'companion package' by magic. That would be sick and
wrong - deceitful even. But when the dependency manager pukes and
tells them it cannot support their platform, it will offer up the fact
that this 'companion package' has been built by someone else and let
them decide if they 'trust' that source or not.

It would also make sense that the original publisher, nebularis, get
notified that someone has published X in a 'companion namespace' and
then they can either

(a) ignore this
(b) build their own version for OpenBSD and then the companion will
likely never be used again
(c) contact the consumer (person) and figure out some kind of
arrangement with them, which might lead to
(d) give the consumer "Publication Rights" to publish X (and only X
and/or only on this very specific architecture) directly into the
nebularis namespace, OR
(e) choose to "promote" the artefact buit by the consumer, and
effectively migrate (or copy) it into their own (canonical) namespace

This provides a useful bit of collaboration tooling. If lots of people
start becoming 'companions' as it were, the maintainer can either
filter them for the ones he wants to collaborate with, or blacklist
them as people he is going to ignore. Or maybe it should be a
whitelist, as a nod to the points about white listing being preferable
for packages/repositories, which may well be a generally good idea
anyway, regardless of context.

Either way, the package manager will only recognise the canonical
namespace. To support companion namespaces, you've got to explicitly
say "I trust this 'companion package' and want to use it" or even "I
trust <publisher-X> to provide 'companion packages' for any of the
following namespaces: [nebularis, basho, rabbitmq]" or something like
that. Whether you, as a consumer, state this per project or per
development environment, I'm not sure. I suspect that latter, as
basically you want to make sure that this kind of explicit
white-listing of 'companion' stuff works for transitive dependencies
as well as your own projects.

A lot of people are effectively 'forced' to use Windows as a
development environment at work. That's the situation we're in. It is
effectively illegal (and a sack-able offense) to use an unauthorised
machine on the corporate network, and if it's not a server, then it's
running windows. We do lots of open source stuff to get around this,
as the source is on github and the build/test environments can be
accessed over VPN. We also use virtual machines so that running things
on windows can be circumvented to some extent.

Anyway, I will be the 'Windows guy' although I spend 80% of my time on
a mac and the rest on Linux or FreeBSD. I care about supporting the
platform, because it widens our audience and silences the "it doesn't
work on Windows" arguments about Erlang development tools.

>> I think it's clear why the hackage guys decided to steer clear of this
>> mess. It is the package maintainers' responsibility to make sure their
>> code builds and works and to set up a CI server somewhere to prove
>> this. If someone is on travis-ci and they're regularly committing
>> their code then I'm going to consider their signed packages viable.
>> Otherwise, I'll investigate. It's my responsibility as a developer to
>> do that.
>>
>> So IMVHO we should stay clear of this, otherwise we'll likely never be
>> code complete and even if we did get that far, every man and his dog
>> would want it to work some other way.
>>
>> Painful though the challenges of supporting binary packages are, I
>> personally think they're much easier to solve than this.
>
> I am actually pretty ok with this. The less infrastructure we have to
> host and manage the better off we are going to be. If we push 'holes'
> and complaints onto the package owners thats probably the way to go
> over all.
>

I think it's less work, pain and expense (i.e., seeking of sponsorship
for environments, etc) for us. I suspect that what would be *really*
nice for package maintainers is if we are logging the requests for
specific os/arch combinations, that we provide them with some stats
and maybe notifications about '1000 people tried to download this for
<< 64bit OpenSUSE 10.1 running on R15B 64bit HIPE>> in the last week'
then they'd at least they'd know they're loosing potential target
audience.

The only way I can see getting around the host+build approach would be this:

(1) agree on a common specification of environment configuration
requirements for packages
(2) have the ability to fire up virtualised environments (using
virtualbox or equivalent)
(3) have the ability to resolve packaging requirements onto a virtual
machine and run the build

This is very much how travis-ci works. It uses the ruby libvirt
bindings and basically has virtualbox instances running for a couple
of linuxes and windows and has build queues for the jobs that need to
go to one environment or another. I believe it uses Chef to deploy the
required stuff onto the virtual.

You *could* make (1) either Puppet or Chef and reuse common package
definitions etc. This would work very nicely, but it's quite an ask of
maintainers and would probably put people off. You could also generate
Chef/Puppet configuration from the package requirement specifications,
but this is quite complex, as we've done some of it at work and it
takes a bit of understanding what's what.

Essentially Puppet (and to some extent, Chef) will have a recipe that
says either

1. Use the local system package manager to fetch X (and it'll go to
Yum, Apt or whatever) *or*
2. Download the sources from here and ./configure && make && make install, *or*
3. Download the binary for this platform/arch/etc from <here>

That's for library dependencies of course. Application and service
dependencies add more complexity still, but we're not really talking
about that.

So again, this is really much more like CI land than what we're trying
to do (package repositories). None of the other solutions do this, at
least not that I'm aware of. Now admittedly it would be 'bloody
useful' to have a hosted service that'll take your stuff and go build
it on 10 different platform in parallel whenever you create a new
release/tag, but whilst it's a useful adjunct to what we're trying to
do, I think it's waaaay out of scope.

Tim Watson

unread,
Feb 24, 2012, 6:05:47 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
I'm top posting again. The thread (below) is for context.

What I'm thinking is that github appears to offer *all* the right
infrastructure to do this. Consider these features of the github API,
for example:

- I can authenticate the user (who is calling the API)
- I can get information about the user
- I can get information about the user's repositories
- I can get information about individual repositories
- I can add/edit downloads for a repository using the API
- I can read binary (blob) data from repositories via the API (no need
for git to be installed locally)
- I can write binary (blob) data as a commit, back to a specific path
in the remote repository via the API
- I can get/list notifications and events

So you *could* publish the artefacts to github. Now this might sounds
bonkers, as you're thinking sheesh what about the disk space
constraints (interestingly bitbucket don't seem to have any
'documented' limits, but their API sucks and fewer people are on
there). I think there are a number of options to potentially work
around that though.

(1) each publisher hosts their own repository (or repositories) for
each namespace
(2) artefacts are uploaded (to add to the project downloads page) and
then we update a global index stored in github

Both options make the publisher responsible for their own disk space
usage. With my organisation membership included, I have around 80 - 90
github repositories, some small and some larger. This includes
nebularis/public-releases in which I was playing around with uploading
binary packages for applications into a github repository to see what
the space usage would be like.

I am using around 25% of my free account's 300mb soft space limit. The
public-releases repository contains 7 artefacts for 6 small
applications, one of them with two versions uploaded. It is around
377kb. Once you get into uploading files and native binaries (i.e.,
drivers, etc) then a lot more space will be used, but some intensive
compression might actually keep this manageable. If each publisher is
only pushing up around 30 applications/libraries of which only 4 or 5
contain native code, and of the non-native artefacts there are on
average 10 different versions, and for each version there is one
compiled with R13 and one with R15, we might get something like

- 10 versions * 25 repos * 2 erts versions for the pure erlang repos
(with everything compressed to the max in the zip files for each
version)
- 5 native apps * 10 versions * 4 operating systems * 2 architectures
* 2 erts versions

I'm not sure what kind of space usage that will work out to, but it's
worth considering. Most of the workflow I discussed below could be
supported like this and that includes 'companion namespaces' where you
can not only register your 'companion package' in the main metadata
index (which is in *our* repository and is updated whenever the tool
interacts with the repositories) but also if the tool was using the
github API to push a blob+commit when uploading and artefact, then all
you'd need to do to 'trust' someone is add them as a collaborator to
your repository or organisation. That's not quite as fine grained as
trusting them for only one project's artefacts (as your organisational
artefact repository contains *all* your built packages), which is why
I mentioned the option to store the artefacts as project downloads
instead.

Eric Merritt

unread,
Feb 24, 2012, 6:16:04 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
On Fri, Feb 24, 2012 at 4:31 PM, Tim Watson <watson....@gmail.com> wrote:
> Ok let's go a few more rounds on this until we feel comfortable. I
> think you and I have very similar concerns and are both just as
> conflicted about what 'The Right Thing To Do (TM)' looks like.

I think so too.

>>> Agreed. Please do consider signing as well.
>>
>> I think we need to support it. It shouldn't get in the way if its  not
>> there, but be available to folks if it is.
>>
>
> I'm a bit conflicted with this one. If I'm uploading packages to a
> repository, I want to know (for sure) that nobody else can interfere
> with them. I don't want someone downloading my code and getting their
> harddrive wiped because the package got corrupted 'somehow' or other.
> So authenticating the maintainers is important. CEAN deals with this
> by having a manual process, which is obviously not going to work for
> us. We need alternatives.

I think package signing+a white list should work. You specify the
people you trust or say just trust everyone (which most folks will
actually do). That should cover all the basics.

>
> In terms of code/package signing, I agree that should be optional. At
> a minimum, I'd expect the behaviour that sinan/faxien had around MD5
> checking to remain,

That should be there for everything. but thats not really code signing.

> but that's enough as long as the user is
> comfortable installing code from an unverifiable source.
>
>

> I think the way to get buy-in for this is to make it too simple to
> ignore. Once all the moving parts are in place, all the user should
> need do to get going is a few CLI instructions and the tool(s) should
> do the rest. Whatever tools we make available, I'll write the relevant
> rebar plugins to make sure that people don't *have* to change their
> tool chain unless they want to. It should look something like this

I will probably write a non-rebar driver for all of this myself.
Something like the git command that organizes it all together. It
doesnt really make much sense to me to drive it from rebar (that is
probably my huge negative bias against rebar talking). That said I
don't see any reason why you shouldnt do it if you want to.

>
> - let people register *really easily* on the command line

everything should be doable from the command line.

> - use oauth/openid to verify the person if possible, maybe support other methods

I really, really hope we dont end up hosting anything. If we can
figure out how to push all this to existing services that would be a
huge win.

>
> #1 - The maven-esque approach.
>
> When maven can't resolve a dependency, it simply stops the build and
> says 'go find and build this yourself, and when you're done, run this
> command <.....> to install the thing into the right place in your
> local repository'. What *most* maven users actually do is this:
>
> 1. download and build the missing artefact from source
> 2. install it into their local repository with `maven deploy-file ...'
> 3. deploy it into their corporate maven/nexus repository using `maven
> deploy ....'

This seems reasonable to me.

Its not really a namespace. its more 'unapproved' packages for an
existing namespace. I think we should be careful about muddying the
waters there.


> Will the dependency resolver now recognise <the-consumer>/X-0.0.6? No.
> That's almost what's happened, but we want a way to connect the
> artefact back to the original publisher, but say that 'this is
> effectively like a fork of a project's source code'. It's almost like
> saying

I like the idea, but I dont think we should treat it as a
namespace/fork. Its really not at all. Its just that someone besides
the maintainer has built and signed the packages. The package signing
should take that into account I think without much need for a separate
namespace. In this model the package owner can upload packages, those
packages supersede any of the unapproved packages. However, anyone
should be able to upload a binary for any package as long as they sign
the package. Its up to the person pulling it down to make the trust
decisions.

I like this much better then having a different namespace because the
code is actually from a single namespace. ITs not a fork or anything
like that.

supporting third party built binaries regardless of how we end up
doing it is a great idea.

>
> Package Info: artefact=X, version=0.0.6, namespace=(nebularis, via
> <the consumer's namespace>)
>
> Or perhaps, it is more like saying
>
> namespace=consumer/companions/nebularis
>
> Now if someone else asks for nebularis/X-0.0.6 on OpenBSD they won't
> simply get this 'companion package' by magic. That would be sick and
> wrong - deceitful even. But when the dependency manager pukes and
> tells them it cannot support their platform, it will offer up the fact
> that this 'companion package' has been built by someone else and let
> them decide if they 'trust' that source or not.

If we do this via signing/signatures that should cover things without
requiring much additional complication. THen it would be us saying,
this package is available from X but you dont trust X.

>
> It would also make sense that the original publisher, nebularis, get
> notified that someone has published X in a 'companion namespace' and

> then they can either.


>
>
> (a) ignore this
> (b) build their own version for OpenBSD and then the companion will
> likely never be used again
> (c) contact the consumer (person) and figure out some kind of
> arrangement with them, which might lead to
> (d) give the consumer "Publication Rights" to publish X (and only X
> and/or only on this very specific architecture) directly into the
> nebularis namespace, OR
> (e) choose to "promote" the artefact buit by the consumer, and
> effectively migrate (or copy) it into their own (canonical) namespace

notification is a good idea.


> This provides a useful bit of collaboration tooling.

Yup, and a fair amount of extra complexity in the app. I would rather
not have to do that if possible.

> If lots of people
> start becoming 'companions' as it were, the maintainer can either
> filter them for the ones he wants to collaborate with, or blacklist
> them as people he is going to ignore. Or maybe it should be a
> whitelist, as a nod to the points about white listing being preferable
> for packages/repositories, which may well be a generally good idea
> anyway, regardless of context.
>
> Either way, the package manager will only recognise the canonical
> namespace. To support companion namespaces, you've got to explicitly
> say "I trust this 'companion package' and want to use it"

which is very similar to 'trust this signer', for that reason I think
we can simply do it on the signing side.

> or even "I
> trust <publisher-X> to provide 'companion packages' for any of the
> following namespaces: [nebularis, basho, rabbitmq]" or something like
> that. Whether you, as a consumer, state this per project or per
> development environment, I'm not sure. I suspect that latter, as
> basically you want to make sure that this kind of explicit
> white-listing of 'companion' stuff works for transitive dependencies
> as well as your own projects.
>
>

>> This is more problematic. I don't use windows at all. I haven't even
>> had access to a windows box for ten years or so. Its just a platform I
>> couldn't care less about. Now, if someone wants to step up and be the
>> windows guy, I am very happy to write cross platform code, make
>> choices that don't explicitly disallow windows etc. I guess I just
>> don't see much value in this but am willing to be convinced otherwise.
>>
>
> A lot of people are effectively 'forced' to use Windows as a
> development environment at work. That's the situation we're in. It is
> effectively illegal (and a sack-able offense) to use an unauthorised
> machine on the corporate network, and if it's not a server, then it's
> running windows. We do lots of open source stuff to get around this,
> as the source is on github and the build/test environments can be
> accessed over VPN. We also use virtual machines so that running things
> on windows can be circumvented to some extent.
>
> Anyway, I will be the 'Windows guy' although I spend 80% of my time on
> a mac and the rest on Linux or FreeBSD. I care about supporting the
> platform, because it widens our audience and silences the "it doesn't
> work on Windows" arguments about Erlang development tools.

I am more then happy to support you on this. It shouldn't really
effect us much it really doesn't effect us much as long as we stay
with pure erlang.

>>
>> I am actually pretty ok with this. The less infrastructure we have to
>> host and manage the better off we are going to be. If we push 'holes'
>> and complaints onto the package owners thats probably the way to go
>> over all.
>>
>
> I think it's less work, pain and expense (i.e., seeking of sponsorship
> for environments, etc) for us. I suspect that what would be *really*
> nice for package maintainers is if we are logging the requests for
> specific os/arch combinations, that we provide them with some stats
> and maybe notifications about '1000 people tried to download this for
> << 64bit OpenSUSE 10.1 running on R15B 64bit HIPE>> in the last week'
> then they'd at least they'd know they're loosing potential target
> audience.

I agree, hopefully we can do this via some hosted service.


>
> So again, this is really much more like CI land than what we're trying
> to do (package repositories). None of the other solutions do this, at
> least not that I'm aware of. Now admittedly it would be 'bloody
> useful' to have a hosted service that'll take your stuff and go build
> it on 10 different platform in parallel whenever you create a new
> release/tag, but whilst it's a useful adjunct to what we're trying to
> do, I think it's waaaay out of scope.

I am not sure its way out of scope, but in general I agree with you.
the model you have proposed above should work. We need to come to
agreement on the non-maintainer packages and then we can do some
speccing out.

Eric Merritt

unread,
Feb 24, 2012, 6:20:24 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
On Fri, Feb 24, 2012 at 5:05 PM, Tim Watson <watson....@gmail.com> wrote:
> I'm top posting again. The thread (below) is for context.

no worries.

>
> What I'm thinking is that github appears to offer *all* the right
> infrastructure to do this. Consider these features of the github API,
> for example:

I have been thinking the same thing.

> - I can authenticate the user (who is calling the API)
> - I can get information about the user
> - I can get information about the user's repositories
> - I can get information about individual repositories
> - I can add/edit downloads for a repository using the API
> - I can read binary (blob) data from repositories via the API (no need
> for git to be installed locally)
> - I can write binary (blob) data as a commit, back to a specific path
> in the remote repository via the API
> - I can get/list notifications and events
>
> So you *could* publish the artefacts to github. Now this might sounds
> bonkers, as you're thinking sheesh what about the disk space
> constraints (interestingly bitbucket don't seem to have any
> 'documented' limits, but their API sucks and fewer people are on
> there). I think there are a number of options to potentially work
> around that though.
>
> (1) each publisher hosts their own repository (or repositories) for
> each namespace
> (2) artefacts are uploaded (to add to the project downloads page) and
> then we update a global index stored in github

I like model 2

> Both options make the publisher responsible for their own disk space
> usage. With my organisation membership included, I have around 80 - 90
> github repositories, some small and some larger. This includes
> nebularis/public-releases in which I was playing around with uploading
> binary packages for applications into a github repository to see what
> the space usage would be like.
>
> I am using around 25% of my free account's 300mb soft space limit. The
> public-releases repository contains 7 artefacts for 6 small
> applications, one of them with two versions uploaded. It is around
> 377kb. Once you get into uploading files and native binaries (i.e.,
> drivers, etc) then a lot more space will be used, but some intensive
> compression might actually keep this manageable. If each publisher is
> only pushing up around 30 applications/libraries of which only 4 or 5
> contain native code, and of the non-native artefacts there are on
> average 10 different versions, and for each version there is one
> compiled with R13 and one with R15, we might get something like

I think this will work in the short term. In the longer term we really
dont want versions of packages to go away and with space constraints
that will be exactly what happens. However, thats something we can
consider at a later date and it might be as easy as caching the
packages somewhere and teaching the tools how to fall back to that
cache.

> - 10 versions * 25 repos * 2 erts versions for the pure erlang repos
> (with everything compressed to the max in the zip files for each
> version)
> - 5 native apps * 10 versions * 4 operating systems * 2 architectures
> * 2 erts versions
>
> I'm not sure what kind of space usage that will work out to, but it's
> worth considering. Most of the workflow I discussed below could be
> supported like this and that includes 'companion namespaces' where you
> can not only register your 'companion package' in the main metadata
> index (which is in *our* repository and is updated whenever the tool
> interacts with the repositories) but also if the tool was using the
> github API to push a blob+commit when uploading and artefact, then all
> you'd need to do to 'trust' someone is add them as a collaborator to
> your repository or organisation. That's not quite as fine grained as
> trusting them for only one project's artefacts (as your organisational
> artefact repository contains *all* your built packages), which is why
> I mentioned the option to store the artefacts as project downloads
> instead.

I think its a great idea, especially as a way to get started with things.

Tim Watson

unread,
Feb 24, 2012, 6:27:42 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 24 February 2012 23:16, Eric Merritt <ericbm...@gmail.com> wrote:
>
> I think package signing+a white list should work. You specify the
> people you trust or say just trust everyone (which most folks will
> actually do). That should cover all the basics.
>

Yes you're right.

>>
>> In terms of code/package signing, I agree that should be optional. At
>> a minimum, I'd expect the behaviour that sinan/faxien had around MD5
>> checking to remain,
>
> That should be there for everything. but thats not really code signing.
>

Yes I know. Preaching to the choir - bad habbit of mine. :)

>> but that's enough as long as the user is
>> comfortable installing code from an unverifiable source.
>>
>>
>> I think the way to get buy-in for this is to make it too simple to
>> ignore. Once all the moving parts are in place, all the user should
>> need do to get going is a few CLI instructions and the tool(s) should
>> do the rest. Whatever tools we make available, I'll write the relevant
>> rebar plugins to make sure that people don't *have* to change their
>> tool chain unless they want to. It should look something like this
>
> I will probably write a non-rebar driver for all of this myself.
> Something like the git command that organizes it all together. It
> doesnt really make much sense to me to drive it from rebar (that is
> probably my huge negative bias against rebar talking). That said I
> don't see any reason why you shouldnt do it if you want to.
>

Oh I completely agree that what we build has 'nothing' to do with
rebar. I will do this simply so that people who're already driver 99%
of their development process from rebar can simply include the right
plugin as a dependency and then use the extra commands it provides,
without having to install or configure anything themselves. It allows
them not to think too much, which is actually not a good thing, but
seems popular nowadays.

>>
>> - let people register *really easily* on the command line
>
> everything should be doable from the command line.
>

+1.

>> - use oauth/openid to verify the person if possible, maybe support other methods
>
> I really, really hope we dont end up hosting anything. If we can
> figure out how to push all this to existing services that would be a
> huge win.
>

Yes for sure.

>>
>> #1 - The maven-esque approach.
>>
>> When maven can't resolve a dependency, it simply stops the build and
>> says 'go find and build this yourself, and when you're done, run this
>> command <.....> to install the thing into the right place in your
>> local repository'. What *most* maven users actually do is this:
>>
>> 1. download and build the missing artefact from source
>> 2. install it into their local repository with `maven deploy-file ...'
>> 3. deploy it into their corporate maven/nexus repository using `maven
>> deploy ....'
>
> This seems reasonable to me.
>

Good. I'm happy to settle on that broad approach unless anyone else
wants to chime in and disagree.

Ok.

>
>> Will the dependency resolver now recognise <the-consumer>/X-0.0.6? No.
>> That's almost what's happened, but we want a way to connect the
>> artefact back to the original publisher, but say that 'this is
>> effectively like a fork of a project's source code'. It's almost like
>> saying
>
> I like the idea, but I dont think we should treat it as a
> namespace/fork. Its really not at all. Its just that someone besides
> the maintainer has built and signed the packages. The package signing
> should take that into account I think without much need for a separate
> namespace. In this model the package owner can upload packages, those
> packages supersede any of the unapproved packages. However, anyone
> should be able to upload a binary for any package as long as they sign
> the package. Its up to the person pulling it down to make the trust
> decisions.
>
> I like this much better then having a different namespace because the
> code is actually from a single namespace. ITs not a fork or anything
> like that.
>
> supporting third party built binaries regardless of how we end up
> doing it is a great idea.
>

Ok you've totally convinced me that this is all about signing (and
whether the consumer trusts the signing party) and nothing to do with
namespaces. That will throw some of my suggestions about using github
over the wall though, but you're absolutely right and we should stick
with the approach you've just outlined.

Ok fair enough.

>> If lots of people
>> start becoming 'companions' as it were, the maintainer can either
>> filter them for the ones he wants to collaborate with, or blacklist
>> them as people he is going to ignore. Or maybe it should be a
>> whitelist, as a nod to the points about white listing being preferable
>> for packages/repositories, which may well be a generally good idea
>> anyway, regardless of context.
>>
>> Either way, the package manager will only recognise the canonical
>> namespace. To support companion namespaces, you've got to explicitly
>> say "I trust this 'companion package' and want to use it"
>
> which is very similar to 'trust this signer', for that reason I think
> we can simply do it on the signing side.
>

Agreed.

Yes exactly. As long as the local tool chains don't do too much
calling out to the shell or rely on symlinks and stuff like that, we
should be fine.

>>>
>>> I am actually pretty ok with this. The less infrastructure we have to
>>> host and manage the better off we are going to be. If we push 'holes'
>>> and complaints onto the package owners thats probably the way to go
>>> over all.
>>>
>>
>> I think it's less work, pain and expense (i.e., seeking of sponsorship
>> for environments, etc) for us. I suspect that what would be *really*
>> nice for package maintainers is if we are logging the requests for
>> specific os/arch combinations, that we provide them with some stats
>> and maybe notifications about '1000 people tried to download this for
>> << 64bit OpenSUSE 10.1 running on R15B 64bit HIPE>> in the last week'
>> then they'd at least they'd know they're loosing potential target
>> audience.
>
> I agree, hopefully we can do this via some hosted service.
>
>
>>
>> So again, this is really much more like CI land than what we're trying
>> to do (package repositories). None of the other solutions do this, at
>> least not that I'm aware of. Now admittedly it would be 'bloody
>> useful' to have a hosted service that'll take your stuff and go build
>> it on 10 different platform in parallel whenever you create a new
>> release/tag, but whilst it's a useful adjunct to what we're trying to
>> do, I think it's waaaay out of scope.
>
> I am not sure its way out of scope, but in general I agree with you.
> the model you have proposed above should work. We need to come to
> agreement on the non-maintainer packages and then we can do some
> speccing out.
>

Cool.

Tim Watson

unread,
Feb 24, 2012, 6:40:43 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 24 February 2012 23:20, Eric Merritt <ericbm...@gmail.com> wrote:
>> (1) each publisher hosts their own repository (or repositories) for
>> each namespace
>> (2) artefacts are uploaded (to add to the project downloads page) and
>> then we update a global index stored in github
>
> I like model 2
>

Ok. This makes supporting 3rd party built binaries impossible though.
You can't upload to someone else's repo unless they've explicitly
given you permissions. You can upload 'downloads' to your own
cache/repo or fork and that can go into the central metadata index I
suppose.

>> Both options make the publisher responsible for their own disk space
>> usage. With my organisation membership included, I have around 80 - 90
>> github repositories, some small and some larger. This includes
>> nebularis/public-releases in which I was playing around with uploading
>> binary packages for applications into a github repository to see what
>> the space usage would be like.
>>
>> I am using around 25% of my free account's 300mb soft space limit. The
>> public-releases repository contains 7 artefacts for 6 small
>> applications, one of them with two versions uploaded. It is around
>> 377kb. Once you get into uploading files and native binaries (i.e.,
>> drivers, etc) then a lot more space will be used, but some intensive
>> compression might actually keep this manageable. If each publisher is
>> only pushing up around 30 applications/libraries of which only 4 or 5
>> contain native code, and of the non-native artefacts there are on
>> average 10 different versions, and for each version there is one
>> compiled with R13 and one with R15, we might get something like
>
> I think this will work in the short term. In the longer term we really
> dont want versions of packages to go away and with space constraints
> that will be exactly what happens. However, thats something we can
> consider at a later date and it might be as easy as caching the
> packages somewhere and teaching the tools how to fall back to that
> cache.
>

In the longer term, I think if we pull it off it might be worth simply
talking to github about what we're doing and seeing what they suggest.
They might think it's such as cool thing that they'd like to sponsor
it with unlimited disk space, in which case we could just create one
canonical repo and work off that, allowing users to create their own
repos on github for private artefacts that they don't want to share.

I also think that we should do some investigation and try to work out
what the population would be like for a decent sized organisation that
uses plenty of native (driver) solutions as well as pure erlang and
probably wants to support at least 2 or 3 major platforms. Probably
picking basho and/or rabbitmq would make for a good start. Build
everything they've got and then do the maths for the cartesian product
of two (min/max constrained) erts versions, three major operating
systems, two major architectures, both common word sizes.

If that's grossly above 0.5Gb then we might need to seriously
reconsider this. Perhaps we should undertake that task before going
much further with any planning.

Eric Merritt

unread,
Feb 24, 2012, 6:44:34 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
On Fri, Feb 24, 2012 at 5:40 PM, Tim Watson <watson....@gmail.com> wrote:
> On 24 February 2012 23:20, Eric Merritt <ericbm...@gmail.com> wrote:
>>> (1) each publisher hosts their own repository (or repositories) for
>>> each namespace
>>> (2) artefacts are uploaded (to add to the project downloads page) and
>>> then we update a global index stored in github
>>
>> I like model 2
>>
>
> Ok. This makes supporting 3rd party built binaries impossible though.
> You can't upload to someone else's repo unless they've explicitly
> given you permissions. You can upload 'downloads' to your own
> cache/repo or fork and that can go into the central metadata index I
> suppose.

Ok then a mix of 1 and 2. Repos are hosted on github. I think thats
what I was thinking in any case.

Fair enough.

>
> I also think that we should do some investigation and try to work out
> what the population would be like for a decent sized organisation that
> uses plenty of native (driver) solutions as well as pure erlang and
> probably wants to support at least 2 or 3 major platforms. Probably
> picking basho and/or rabbitmq would make for a good start. Build
> everything they've got and then do the maths for the cartesian product
> of two (min/max constrained) erts versions, three major operating
> systems, two major architectures, both common word sizes.

I suspect its still going to be fairly small. However, this approach
makes sense.

>
> If that's grossly above 0.5Gb then we might need to seriously
> reconsider this. Perhaps we should undertake that task before going
> much further with any planning.

agreed.

Tim Watson

unread,
Feb 24, 2012, 6:56:20 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 24 February 2012 23:44, Eric Merritt <ericbm...@gmail.com> wrote:
>>
>> If that's grossly above 0.5Gb then we might need to seriously
>> reconsider this. Perhaps we should undertake that task before going
>> much further with any planning.
>
> agreed.
>

Cool. I'll try and find some time to do this over the weekend and post
back my findings.

Cheers,

Tim

Eric Merritt

unread,
Feb 24, 2012, 9:36:53 PM2/24/12
to erlware-...@googlegroups.com, Torben Hoffmann
I was a bit worried, but it looks like we can do the signing in pure
erlang. With the erlang public key like and the keygen stuff from
ecrypt it should be a complete solution.

Tim Watson

unread,
Feb 25, 2012, 9:47:20 AM2/25/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 25 February 2012 02:36, Eric Merritt <ericbm...@gmail.com> wrote:
> I was a bit worried, but it looks like we can do the signing in pure
> erlang. With the erlang public key like and the keygen stuff from
> ecrypt it should be a complete solution.
>

That's a relief. Last thing we need to worry about is having to
distribute a NIF based solution to make it easier to distribute
packages with native code. :)

Tim Watson

unread,
Feb 25, 2012, 12:40:25 PM2/25/12
to erlware-...@googlegroups.com, Torben Hoffmann

So I haven't dealt with the native dependencies yet, as I've only had
30 mins free this afternoon, but I built all of the rabbitmq
repositories apart from hstcp and toke (the two that use drivers) and
copied all the generated .ez files into /tmp/repo. The total size is
2mb.

Now between version 1.4.0 and the current version of the server (which
probably goes through the most churn in terms of version numbers),
there were 31 intermediate releases (based on `hg tags | wc`) and
there are 24 .ez archives. So worst case if we're going to have circa
30 versions of each, published for two min/max erts releases (one
combined for R13 and the other for R15), then the pure erlang
repositories will come to 2mb * 24 artefacts * 30 releases * 2 erts
versions per release.

That's coming up to almost 3Gb of storage, so I think either we need
to rethink github as binary artefact repository, think about how to
get around the storage limits (such as moving to another git hosting
solution that doesn't impose them) or reconsider whether we're doing
the right thing.

Whilst you can pay for that kind of storage on github, it's quite
expensive and you'd be using most of it for your binaries. Plus I
haven't even added in the size cost of storing native stuff. I also
don't think that this example (rabbitmq) is particularly heavy on
repositories - at work we have a *lot* more code than this that needs
goes into our nexus repository.

Perhaps we ought to do some canvasing and see if there are any notable
organisations or companies in the community that might consider
hosting something for us, just in terms of them sucking up the disk
space issue. Perhaps we even ought to try and find out if CEAN 2.0
actually has some kind of API, although the way they deal with
publication probably means that's a non starter.

I'll have a chatter with the open source division at work as well, but
I think that's unlikely to go anywhere if I'm honest, and even if it
did the timescales would be geological.

Eric Merritt

unread,
Feb 25, 2012, 1:04:01 PM2/25/12
to erlware-...@googlegroups.com, Torben Hoffmann
I think these will be much more the exception then the rule really.
Maybe we can say this.

Git exists to store metadata, not the actual tarball. The metadata
contains a pointer to a url that is http accessable and has the
tarball.

In the default case we use git and we upload automatically to the git
download area when people publish. There is no problem with that. That
will serve 95% of the population.

For folks who have non-normal space needs, riak, rabbit, etc. We can
do a few things. We can support automatically pushing to a couple of
other services, like s3 and dropbox whatever or we can plop out a
tarball give, let them upload it somewhere (even their own servers)
then publish the metadata via the normal tools. This should allow a
lot more flexibility in the long term while allowing us to do the
simple thing in the short term.

I dont think there is really any reason why metadata and tarballs need
to be stored together.

Eric

Tim Watson

unread,
Feb 25, 2012, 5:08:35 PM2/25/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 25 Feb 2012, at 18:04, Eric Merritt wrote:

> I think these will be much more the exception then the rule really.
> Maybe we can say this.

I never said we have to store the metadata and the tarballs together, so I'm fine with this. I do think that the issue I'm worried about is so much with the volume of repositories - of the 70 - 80 I have, only 20 or so contain applications that are of real value, but it's also the number of different versions etc. I do suspect that you're right at probably most of them will be looking at much smaller disk space requirements. The problem is that they need to be made aware of exactly what kind of cost in disk space they'll pay for publishing.

I must admit that I'm quite worried that this will actually put people off using it, and then we're back to where you were with faxien two years ago. Great concept, not enough uptake from the community, goes nowhere. I've asserted that the rabbit team do not have a massive amount of code and that despite this, if they have a lot of versions of everything then they'll be stuck on space. You've asserted that they're not the norm and that most people don't have to worry about it.

I'm going to go look at a sample group of github users who regularly commit to Erlang source repositories. I'd like to see how they compare to the rabbit team. I'm also going to go away and work out exactly how many different artefacts rabbitmq really would have to deal with.

If we're not careful this will go nowhere. We're already pushing back the responsibility for building valid artefacts for all supported environments to the package maintainers - don't get me wrong, I actually think this really is the right thing to do on reflection. Now we're pushing back the responsibility for providing disk space as well. That's understandable, given that none of us wants to suck up having to deal with it, but I'm not sure if it'll stick. And if we don't make the whole process/experience completely seamless, then it definitely won't.

>
> Git exists to store metadata, not the actual tarball. The metadata
> contains a pointer to a url that is http accessable and has the
> tarball.

I do agree that the index need not be stored along with the tarball(s), and if this is the way we go then a URL is absolutely the right way to address the artefact's location.

>
> In the default case we use git and we upload automatically to the git
> download area when people publish. There is no problem with that. That
> will serve 95% of the population.
>

You also have to merge the central index. It will only serve 95% of the population who're using github, and who are willing to have their github disk usage taken up with this. I must admit I was hoping the sizes would be much smaller and that we'd get away with this better. Not that many people are uploading binary releases to github you know, and I suspect it's primarily because they don't want to overrun their quota on the free account. This is far less the case with sourceforge, where 99% do have downloads available.

> For folks who have non-normal space needs, riak, rabbit, etc. We can
> do a few things. We can support automatically pushing to a couple of
> other services, like s3 and dropbox whatever or we can plop out a
> tarball give, let them upload it somewhere (even their own servers)
> then publish the metadata via the normal tools. This should allow a
> lot more flexibility in the long term while allowing us to do the
> simple thing in the short term.

I agree that we will need multiple repository implementations. At work we have a very large volume of data that will need to be stored, for which we currently rely on nexus and may continue to do so in the future, as it provides <org>/<artefact>/<version> which is sufficient for our needs as long as we munge the artefact name (in nexus) to include all the os/arch/erts/etc crap so that we can have multiples of the same app+version for different environments. Obviously in the central index we'll need to state only one app+version in our namespace - BTW I really don't like this word, I think we should find another, less overloaded term - and use the proper metadata do link off to the right place in nexus.

I don't see github being any use for us, but it will hopefully do for my personal projects.

>
> I dont think there is really any reason why metadata and tarballs need
> to be stored together.
>

Yes I'm ok with them being separate. On these points we definitely agree:

1. metadata and tarballs need not reside in the same place
2. the metadata in the central index should link to artefacts based on a URL, which could be provided anywhere, by anything

I do think that using the downloads area brings some issues with it, versus uploading to a git repository using the github API. Firstly you've got to deal with clashing names. Github will generate <appname>-<tag-version-number> for any git tags people create (which if they're bothered about versioning at all, they will be doing) so you've got to generate different names for the binary artefacts. Secondly, we have to generate unique (and valid) names. So for a driver based project, we're probably looking at something like

erlxsl-0.5.6-64bit-Darwin-i686-erts5.8.2.zip
erlxsl-0.5.6-32bit-Darwin-i686-erts5.8.2.zip
etc....

I guess I'm ok with this, as the alternatives would use folder structures in the same way, and they're both really a sort of naming convention. We need to make sure that we don't generate file names that are unacceptable to github for any reason. And the final thing is that we're tied into github, though I suppose any usage of the github API means that we're tied in either way, so using the downloads versus option 1 doesn't change that either way.

I suppose the only other thing I've got on my mind here is that I want to make sure it is easy for people to publish their own repository index and for consumers to selectively choose it if they want. I guess the easy way to do that would be

1. always serve a repository index as a straight resource that you can acquire with HTTP GET
2. always link (from the index to the artefact) using an HTTP URL, so downloading artefacts requires nothing more than HTTP GET

Then the only complex stuff that remains is merging all the available indexes (that the user has configured the tools to look for) and resolving the right dependencies, which hopefully is mostly 'there' already in terms of what sinan can do, with the obvious caveats that you need to add support for 'publisher' as well as app + version.

To create my own index, I just upload an index file to github, or create a local one and check it in as a git repository which will have the same effect. Either way, it's very simple to do. Users can add my index to their config (local to the project or machine wide? I'd prefer both options) by hand or on the command line. Nice and simple.

We download an index from the internet somewhere and use this to resolve a dependency graph. Once resolved, the nodes contain the URLs for each artefact and we provide the downloading part as well. We should also support downloading multiple indexes and searching through these. Now it just so happens that we also provide a mechanism to push the index back up to the internet, and for that matter to push an artefact somewhere to.

I think making it easy for people to publish their own indexes alongside the central index is a good idea, especially for organisations that may wish to use our tool set to manage private (e.g., closed source) projects.

Tim Watson

unread,
Feb 25, 2012, 5:33:40 PM2/25/12
to Watson,T,Tim,DK21 R Tim, erlware-...@googlegroups.com, Torben Hoffmann
Also, how will the MD5 thing be expected to work? Will the client/publisher need to upload this in a separate file as maven does?

Interestingly, Github downloads are stored on S3 and you have to interact with S3 to do the uploading part, so we'll have that part of the code base built from the start either way.

Eric B Merritt

unread,
Feb 25, 2012, 7:04:07 PM2/25/12
to erlware-...@googlegroups.com, Torben Hoffmann
At Sat, 25 Feb 2012 22:08:35 +0000,

Tim Watson wrote:
>
> On 25 Feb 2012, at 18:04, Eric Merritt wrote:
>
> > I think these will be much more the exception then the rule really.
> > Maybe we can say this.
>
> I never said we have to store the metadata and the tarballs
> together, so I'm fine with this.

I misunderstood then. We are on the same page.

> I do think that the issue I'm worried about is so much with the
> volume of repositories - of the 70 - 80 I have, only 20 or so
> contain applications that are of real value, but it's also the
> number of different versions etc. I do suspect that you're right at
> probably most of them will be looking at much smaller disk space
> requirements. The problem is that they need to be made aware of
> exactly what kind of cost in disk space they'll pay for publishing.

Hmm, I suspect we can give them some tools to make this easier and I
think integrating with the cloud providers out there will make a
difference.

I should make two notes. The first is that this isn't something we
have to address in version 0.0.1 and we should see if we can get some
provider out there to provide some hosting. Not sure this is possible
but it does not hurt to try. That is hosting for binary tarballs.

>
> I must admit that I'm quite worried that this will actually put
> people off using it, and then we're back to where you were with
> faxien two years ago. Great concept, not enough uptake from the
> community, goes nowhere. I've asserted that the rabbit team do not
> have a massive amount of code and that despite this, if they have a
> lot of versions of everything then they'll be stuck on space. You've
> asserted that they're not the norm and that most people don't have
> to worry about it.

I think we have to make it trivial to use and understand. That was one
of the big problems with faxien. It was far from trivial to get
started. I think by leveraging github we can do that.

My assertion may or may not be correct. I assume it is correct but you
never know.

>
> I'm going to go look at a sample group of github users who regularly
> commit to Erlang source repositories. I'd like to see how they
> compare to the rabbit team. I'm also going to go away and work out
> exactly how many different artefacts rabbitmq really would have to
> deal with.
>
> If we're not careful this will go nowhere. We're already pushing
> back the responsibility for building valid artefacts for all
> supported environments to the package maintainers - don't get me
> wrong, I actually think this really is the right thing to do on
> reflection. Now we're pushing back the responsibility for providing
> disk space as well.

We are, but hopefully we are pushing them back on using things they
already use. I hope that makes the difference.

> That's understandable, given that none of us
> wants to suck up having to deal with it, but I'm not sure if it'll
> stick. And if we don't make the whole process/experience completely
> seamless, then it definitely won't.


seamless and easy to understand both. It can be seamless but if you
dont have a freaking clue whats going on that will put you off as
well.


>
> >
> > Git exists to store metadata, not the actual tarball. The metadata
> > contains a pointer to a url that is http accessable and has the
> > tarball.
>
> I do agree that the index need not be stored along with the
> tarball(s), and if this is the way we go then a URL is absolutely
> the right way to address the artefact's location.
>
> >
> > In the default case we use git and we upload automatically to the git
> > download area when people publish. There is no problem with that. That
> > will serve 95% of the population.
> >
>
> You also have to merge the central index.

I dont think there should be any such thing as a central index (if we
are talking about the ether). We should provide an index, but there
should never be a problem pointing to multiple indexes.

In any case, merging should be trivial. Its a merge request, and since
things should only ever be added and not removed the actual merge
should be strait forward.

> It will only serve 95% of
> the population who're using github, and who are willing to have
> their github disk usage taken up with this. I must admit I was
> hoping the sizes would be much smaller and that we'd get away with
> this better. Not that many people are uploading binary releases to
> github you know, and I suspect it's primarily because they don't
> want to overrun their quota on the free account. This is far less
> the case with sourceforge, where 99% do have downloads available.

fair enough. However, the url model lets us start with github
(probably with our own projects) and branch out. We should be able to
support anything that's referable via the net. That being the case
having stuff in 3rd party places (github, s3, rackspace storage)
should not be a problem.

> > For folks who have non-normal space needs, riak, rabbit, etc. We can
> > do a few things. We can support automatically pushing to a couple of
> > other services, like s3 and dropbox whatever or we can plop out a
> > tarball give, let them upload it somewhere (even their own servers)
> > then publish the metadata via the normal tools. This should allow a
> > lot more flexibility in the long term while allowing us to do the
> > simple thing in the short term.
>
> I agree that we will need multiple repository implementations. At
> work we have a very large volume of data that will need to be
> stored, for which we currently rely on nexus and may continue to do
> so in the future, as it provides <org>/<artefact>/<version> which is
> sufficient for our needs as long as we munge the artefact name (in
> nexus) to include all the os/arch/erts/etc crap so that we can have
> multiples of the same app+version for different
> environments. Obviously in the central index we'll need to state
> only one app+version in our namespace - BTW I really don't like this
> word, I think we should find another, less overloaded term - and use
> the proper metadata do link off to the right place in nexus.

Lets just call it organization. That should cover all the bases.

>
> I don't see github being any use for us, but it will hopefully do
> for my personal projects.

to your company, not to us 'erlware' working on this project?

>
> >
> > I dont think there is really any reason why metadata and tarballs need
> > to be stored together.
> >
>
> Yes I'm ok with them being separate. On these points we definitely agree:
>
> 1. metadata and tarballs need not reside in the same place
>
> 2. the metadata in the central index should link to artefacts based
> on a URL, which could be provided anywhere, by anything
>
> I do think that using the downloads area brings some issues with it,
> versus uploading to a git repository using the github API. Firstly
> you've got to deal with clashing names. Github will generate
> <appname>-<tag-version-number> for any git tags people create (which
> if they're bothered about versioning at all, they will be doing) so
> you've got to generate different names for the binary
> artefacts.

This should not be a problem. Downloads via tags and downloads via
downloads are at different URLs. That means that you can have the same
<appname>-<tag> as a download and it causes no problem.


> Secondly, we have to generate unique (and valid)
> names. So for a driver based project, we're probably looking at
> something like

Why is that? The metadata is the canonical source for all of this
information. The name/url is just a way for use to reference the right
thing. Granted it would be nice if you could tell something about the
tarball from its name, but thats not needed. When we generate a
tarball we will probably want to do that. but we shouldn't require
it. If they want to do something debianish and put all the arch
information in the directory structure thats just fine. We shouldn't
care.

>
> erlxsl-0.5.6-64bit-Darwin-i686-erts5.8.2.zip
> erlxsl-0.5.6-32bit-Darwin-i686-erts5.8.2.zip
> etc....
>
> I guess I'm ok with this, as the alternatives would use folder
> structures in the same way, and they're both really a sort of naming
> convention. We need to make sure that we don't generate file names
> that are unacceptable to github for any reason. And the final thing
> is that we're tied into github, though I suppose any usage of the
> github API means that we're tied in either way, so using the
> downloads versus option 1 doesn't change that either way.
>

As I said, we dont actually care. The only property that matters to
use is uniqueness of the full path. That is that one url points to one
thing.

> I suppose the only other thing I've got on my mind here is that I
> want to make sure it is easy for people to publish their own
> repository index and for consumers to selectively choose it if they

> want. I guess the easy way to do that would be.

If the repository is a git repo it should be trivial for anyone to
publish their own on github and the like.

>
> 1. always serve a repository index as a straight resource that you
> can acquire with HTTP GET

I see what you are saying. However, hosting a git repo is pretty
trivial. Git supports a ton of protocols and (including ssh and
http), I think we will get more bang for our short term buck that way
then trying to create something ourselves.

> 2. always link (from the index to the artefact) using an HTTP URL,
> so downloading artefacts requires nothing more than HTTP GET

I agree with this one.

>
> Then the only complex stuff that remains is merging all the
> available indexes (that the user has configured the tools to look
> for) and resolving the right dependencies,

If its in git its just a merge, should be at least. If the merge fails
we should inform the user that repo x is broken and they should remove
it. Then we can try again. Thats one of the really nice things about
using a DVCS.

> which hopefully is mostly 'there' already in terms of what sinan can
> do, with the obvious caveats that you need to add support for
> 'publisher' as well as app + version.

This should be pretty simple. well, its not simple at all but its
mostly already done. I have a few ideas for better error messages that
I would probably work in as we moved forward.

>
> To create my own index, I just upload an index file to github, or
> create a local one and check it in as a git repository which will
> have the same effect.

I suspect we should put up a 'starter' repo that folks can fork
too. Of course, they can fork the main repo as well. It shouldnt matter.

> Either way, it's very simple to do. Users can add my index to their
> config (local to the project or machine wide? I'd prefer both
> options) by hand or on the command line. Nice and simple.

Yea. if we are just consuming the repo we don't care at all where it
is or where its coming from.

>
> We download an index from the internet somewhere and use this to
> resolve a dependency graph. Once resolved, the nodes contain the
> URLs for each artefact and we provide the downloading part as
> well.

Exactly, though we need to figure out the user experience here. I
suspect the user should somehow indicate that he wants to pull
something down.

> We should also support downloading multiple indexes and
> searching through these.

absolutely. You should be able to have any number of indexes. Since
indexes are additive there should be no conflicts.

> Now it just so happens that we also provide a mechanism to push the
> index back up to the internet, and for that matter to push an
> artefact somewhere to.

Yes. but this should probably be part of the publishing experience
rather then the consuming experience.

> I think making it easy for people to publish their own indexes
> alongside the central index is a good idea,

As I said, in the software I dont think there is any concept of a
central index. We will provide one, and I think most folks will use
it. But the stuff we create should be index agnostic.

> especially for organisations that may wish to use our tool set to
> manage private (e.g., closed source) projects.
>

Eric B Merritt

unread,
Feb 25, 2012, 7:05:45 PM2/25/12
to erlware-...@googlegroups.com, Watson,T,Tim,DK21 R Tim, Torben Hoffmann
At Sat, 25 Feb 2012 22:33:40 +0000,

Tim Watson wrote:
>
> Also, how will the MD5 thing be expected to work? Will the
> client/publisher need to upload this in a separate file as maven
> does?

I suspect it should be part of the 'binary url' metadata. That is when
you publish the url it publishes the md5 hash along with that. We have
yet to work out the metadata for that so I am not yet sure how it
would work.

Tim Watson

unread,
Feb 25, 2012, 8:28:50 PM2/25/12
to erlware-...@googlegroups.com, Watson,T,Tim,DK21 R Tim, Torben Hoffmann

On 26 Feb 2012, at 00:05, Eric B Merritt wrote:

> At Sat, 25 Feb 2012 22:33:40 +0000,
> Tim Watson wrote:
>>
>> Also, how will the MD5 thing be expected to work? Will the
>> client/publisher need to upload this in a separate file as maven
>> does?
>
> I suspect it should be part of the 'binary url' metadata. That is when
> you publish the url it publishes the md5 hash along with that. We have
> yet to work out the metadata for that so I am not yet sure how it
> would work.
>

Ok that makes good sense to me.

Tim Watson

unread,
Feb 25, 2012, 9:01:04 PM2/25/12
to erlware-...@googlegroups.com, Torben Hoffmann
On 26 Feb 2012, at 00:04, Eric B Merritt wrote:

> At Sat, 25 Feb 2012 22:08:35 +0000,
> Tim Watson wrote:
>>
>> On 25 Feb 2012, at 18:04, Eric Merritt wrote:
>>
>>> I think these will be much more the exception then the rule really.
>>> Maybe we can say this.
>>
>> I never said we have to store the metadata and the tarballs
>> together, so I'm fine with this.
>
> I misunderstood then. We are on the same page.
>
>> I do think that the issue I'm worried about is so much with the
>> volume of repositories - of the 70 - 80 I have, only 20 or so
>> contain applications that are of real value, but it's also the
>> number of different versions etc. I do suspect that you're right at
>> probably most of them will be looking at much smaller disk space
>> requirements. The problem is that they need to be made aware of
>> exactly what kind of cost in disk space they'll pay for publishing.
>
> Hmm, I suspect we can give them some tools to make this easier and I
> think integrating with the cloud providers out there will make a
> difference.
>
> I should make two notes. The first is that this isn't something we
> have to address in version 0.0.1 and we should see if we can get some
> provider out there to provide some hosting. Not sure this is possible
> but it does not hurt to try. That is hosting for binary tarballs.

Yes you're right, we don't have to worry about it too much until later.

>
>>
>> I must admit that I'm quite worried that this will actually put
>> people off using it, and then we're back to where you were with
>> faxien two years ago. Great concept, not enough uptake from the
>> community, goes nowhere. I've asserted that the rabbit team do not
>> have a massive amount of code and that despite this, if they have a
>> lot of versions of everything then they'll be stuck on space. You've
>> asserted that they're not the norm and that most people don't have
>> to worry about it.
>
> I think we have to make it trivial to use and understand. That was one
> of the big problems with faxien. It was far from trivial to get
> started. I think by leveraging github we can do that.

Yes I think we'll have to start with Github - that's why I suggested it anyway.

>
> My assertion may or may not be correct. I assume it is correct but you
> never know.
>

I'm going to do some more analysis, but I think as you say we'll start with github and figure things out as we iterate towards 1.0.0.

>>
>> I'm going to go look at a sample group of github users who regularly
>> commit to Erlang source repositories. I'd like to see how they
>> compare to the rabbit team. I'm also going to go away and work out
>> exactly how many different artefacts rabbitmq really would have to
>> deal with.
>>
>> If we're not careful this will go nowhere. We're already pushing
>> back the responsibility for building valid artefacts for all
>> supported environments to the package maintainers - don't get me
>> wrong, I actually think this really is the right thing to do on
>> reflection. Now we're pushing back the responsibility for providing
>> disk space as well.
>
> We are, but hopefully we are pushing them back on using things they
> already use. I hope that makes the difference.

Possibly, if the disk space usage doesn't turn out to be exponential, the fact we're using github might be a plus.

>
>> That's understandable, given that none of us
>> wants to suck up having to deal with it, but I'm not sure if it'll
>> stick. And if we don't make the whole process/experience completely
>> seamless, then it definitely won't.
>
>
> seamless and easy to understand both. It can be seamless but if you
> dont have a freaking clue whats going on that will put you off as
> well.
>
>>
>>>
>>> Git exists to store metadata, not the actual tarball. The metadata
>>> contains a pointer to a url that is http accessable and has the
>>> tarball.
>>
>> I do agree that the index need not be stored along with the
>> tarball(s), and if this is the way we go then a URL is absolutely
>> the right way to address the artefact's location.
>>
>>>
>>> In the default case we use git and we upload automatically to the git
>>> download area when people publish. There is no problem with that. That
>>> will serve 95% of the population.
>>>
>>
>> You also have to merge the central index.
>
> I dont think there should be any such thing as a central index (if we
> are talking about the ether). We should provide an index, but there
> should never be a problem pointing to multiple indexes.

Yes ok 'central' is a poor choice of words.

>
> In any case, merging should be trivial. Its a merge request, and since
> things should only ever be added and not removed the actual merge
> should be strait forward.
>

I agree that because the indexes are additive (as you pointed out below) the work will be simple.

>> It will only serve 95% of
>> the population who're using github, and who are willing to have
>> their github disk usage taken up with this. I must admit I was
>> hoping the sizes would be much smaller and that we'd get away with
>> this better. Not that many people are uploading binary releases to
>> github you know, and I suspect it's primarily because they don't
>> want to overrun their quota on the free account. This is far less
>> the case with sourceforge, where 99% do have downloads available.
>
> fair enough. However, the url model lets us start with github
> (probably with our own projects) and branch out. We should be able to
> support anything that's referable via the net. That being the case
> having stuff in 3rd party places (github, s3, rackspace storage)
> should not be a problem.

Good, I think that will work and gives us the flexibility to grow as and when we need/want to.

>
>>> For folks who have non-normal space needs, riak, rabbit, etc. We can
>>> do a few things. We can support automatically pushing to a couple of
>>> other services, like s3 and dropbox whatever or we can plop out a
>>> tarball give, let them upload it somewhere (even their own servers)
>>> then publish the metadata via the normal tools. This should allow a
>>> lot more flexibility in the long term while allowing us to do the
>>> simple thing in the short term.
>>
>> I agree that we will need multiple repository implementations. At
>> work we have a very large volume of data that will need to be
>> stored, for which we currently rely on nexus and may continue to do
>> so in the future, as it provides <org>/<artefact>/<version> which is
>> sufficient for our needs as long as we munge the artefact name (in
>> nexus) to include all the os/arch/erts/etc crap so that we can have
>> multiples of the same app+version for different
>> environments. Obviously in the central index we'll need to state
>> only one app+version in our namespace - BTW I really don't like this
>> word, I think we should find another, less overloaded term - and use
>> the proper metadata do link off to the right place in nexus.
>
> Lets just call it organization. That should cover all the bases.
>

I think that's probably best too.

>>
>> I don't see github being any use for us, but it will hopefully do
>> for my personal projects.
>
> to your company, not to us 'erlware' working on this project?
>

Correct. For us we'll be fine as I don't anticipate we'll have too many projects, although I can see us having lots of minor releases. I wonder if we can get the .ez (or .zip) generating code to compress the archive a bit more ruthlessly. That's something we'll have to think about carefully.

>>
>>>
>>> I dont think there is really any reason why metadata and tarballs need
>>> to be stored together.
>>>
>>
>> Yes I'm ok with them being separate. On these points we definitely agree:
>>
>> 1. metadata and tarballs need not reside in the same place
>>
>> 2. the metadata in the central index should link to artefacts based
>> on a URL, which could be provided anywhere, by anything
>>
>> I do think that using the downloads area brings some issues with it,
>> versus uploading to a git repository using the github API. Firstly
>> you've got to deal with clashing names. Github will generate
>> <appname>-<tag-version-number> for any git tags people create (which
>> if they're bothered about versioning at all, they will be doing) so
>> you've got to generate different names for the binary
>> artefacts.
>
> This should not be a problem. Downloads via tags and downloads via
> downloads are at different URLs. That means that you can have the same
> <appname>-<tag> as a download and it causes no problem.
>

Ok, I didn't know that.

>
>> Secondly, we have to generate unique (and valid)
>> names. So for a driver based project, we're probably looking at
>> something like
>
> Why is that? The metadata is the canonical source for all of this
> information. The name/url is just a way for use to reference the right
> thing. Granted it would be nice if you could tell something about the
> tarball from its name, but thats not needed. When we generate a
> tarball we will probably want to do that. but we shouldn't require
> it. If they want to do something debianish and put all the arch
> information in the directory structure thats just fine. We shouldn't
> care.

This is not about the metadata, it's about the name of the file you're uploading to github, which uses S3 behind the scenes. If you've got 15 different flavours of myapp-1.0.2 for various operating systems and whatnot, then you're going to need to upload 15 different binaries, one for each combination of os/arch/etc. Surely github isn't going to let you upload the same file name 15 times!?

>
>>
>> erlxsl-0.5.6-64bit-Darwin-i686-erts5.8.2.zip
>> erlxsl-0.5.6-32bit-Darwin-i686-erts5.8.2.zip
>> etc....
>>
>> I guess I'm ok with this, as the alternatives would use folder
>> structures in the same way, and they're both really a sort of naming
>> convention. We need to make sure that we don't generate file names
>> that are unacceptable to github for any reason. And the final thing
>> is that we're tied into github, though I suppose any usage of the
>> github API means that we're tied in either way, so using the
>> downloads versus option 1 doesn't change that either way.
>>
>
> As I said, we dont actually care. The only property that matters to
> use is uniqueness of the full path. That is that one url points to one
> thing.
>

Ok that's fine, by my point is that if we're uploading to github (in the publication part of this) then we absolutely must care, because we cannot (I'm assuming) upload the same file name multiple times. And even if the user was uploading manually, they'd have to deal with this issue, which makes their life more complicated - which we've already agreed will put people off. Either we munge the filenames to make sure they're unique or we go for a git repository (per organisation) to put the binaries into and use folder structure for all the differences. Or we upload files with completely arbitrary names (using a UUID), but I think people won't want their download pages full of odd looking file names like that.

>> I suppose the only other thing I've got on my mind here is that I
>> want to make sure it is easy for people to publish their own
>> repository index and for consumers to selectively choose it if they
>> want. I guess the easy way to do that would be.
>
> If the repository is a git repo it should be trivial for anyone to
> publish their own on github and the like.
>

Yep, I like this part.

>>
>> 1. always serve a repository index as a straight resource that you
>> can acquire with HTTP GET
>
> I see what you are saying. However, hosting a git repo is pretty
> trivial. Git supports a ton of protocols and (including ssh and
> http), I think we will get more bang for our short term buck that way
> then trying to create something ourselves.
>

Yes I do agree. Even if we decided not to go with github, I'd be temped to use something pre-existing rather than build a custom solution. After all, our server side behaviour consists almost entirely of just serving up the index(es) somewhere and something else (maybe somewhere else) serving up the artefacts.

>> 2. always link (from the index to the artefact) using an HTTP URL,
>> so downloading artefacts requires nothing more than HTTP GET
>
> I agree with this one.
>
>>
>> Then the only complex stuff that remains is merging all the
>> available indexes (that the user has configured the tools to look
>> for) and resolving the right dependencies,
>
> If its in git its just a merge, should be at least. If the merge fails
> we should inform the user that repo x is broken and they should remove
> it. Then we can try again. Thats one of the really nice things about
> using a DVCS.
>

Wow, so you're proposing that to merge the index we simply do a 'git merge' operation? I'm not clear on how this would work, especially in terms of additivity. How do you prevent incoming changes from removing data? Maybe I've misunderstood you here. I had assumed we would go for a really simple strategy where we do something like

1. for each dependency, for each index (that the user is configured to look at)
2. resolve the dependency or fail

Obviously you process the indexes sequentially, such that once you've resolve something you stop (and recurse into the transitive dependencies) but if you can't find it in index1, then you try index2, etc. I guess my thinking here is a lot more simplistic because I'm used to maven's approach of making you declare the version you require explicitly. It doesn't support this notion of '>= v1.2.3' and therefore you're not looking for the 'best match' you're just looking for the specific thing that's been asked for. If you can't find it via the organisation itself, you'd then consider 3rd party signed stuff if the user allows it (globally of via a whitelist).

>> which hopefully is mostly 'there' already in terms of what sinan can
>> do, with the obvious caveats that you need to add support for
>> 'publisher' as well as app + version.
>
> This should be pretty simple. well, its not simple at all but its
> mostly already done. I have a few ideas for better error messages that
> I would probably work in as we moved forward.

Cool.

>>
>> To create my own index, I just upload an index file to github, or
>> create a local one and check it in as a git repository which will
>> have the same effect.
>
> I suspect we should put up a 'starter' repo that folks can fork
> too. Of course, they can fork the main repo as well. It shouldnt matter.
>

This is interesting. Based on what you've said, I'm kind of inferring that the index will have a directory structure rather than just be a simple file. I'm also assuming that the metadata will be human readable/editable. Is that what you're currently thinking?

>> Either way, it's very simple to do. Users can add my index to their
>> config (local to the project or machine wide? I'd prefer both
>> options) by hand or on the command line. Nice and simple.
>
> Yea. if we are just consuming the repo we don't care at all where it
> is or where its coming from.
>

That's good. What I was thinking here was that the user explicitly states which indexes they want to be searched when they're looking for something. It's basically the same thing as configuring apt (or whatever) to point to certain repos.

>>
>> We download an index from the internet somewhere and use this to
>> resolve a dependency graph. Once resolved, the nodes contain the
>> URLs for each artefact and we provide the downloading part as
>> well.
>
> Exactly, though we need to figure out the user experience here. I
> suspect the user should somehow indicate that he wants to pull
> something down.
>

I do like what maven does here. If I try to build something, it will go and verify the dependencies. If there are dependencies scoped to build/compile time, it will not execute that (build) phase unless they can be resolved. If there are dependencies scoped to test, then it will go ahead and compile but won't run any tests until the test scoped ones are resolved. If there are dependencies marked as some other scope - e.g., the concept of runtime scope that you have in sinan, which I'm guessing would be required for packaging and possibly other stuff like integration testing - then it does the same thing.

This is a bit sticky, because half of that is the build/test tool's responsibility (i.e., it is up to sinan/rebar to figure out when to enforce dependency resolution and what to do when it fails) and the other half depends on the way the dependencies are declared. I *really* want this so that I don't *have* to pull down PropEr unless I'm testing - this is particularly important if I'm assembling a big release with lots of components. I don't want all the various testing libraries that are transitive dependencies to get fetched. I just want the stuff I need to build the release. If I need things (like PropEr, Erlang-Hamcrest, etc) for testing all the components of the release, then I'll make them direct dependencies as and when I want them.

Now obviously the index metadata and artefacts themselves are completely ignorant of this, so perhaps it's just up to the tool developers to make sure that they expose the dependency configuration in such a way that they can capture this. If we leave this completely out of the tools for resolving/fetching (which we should) then I think we end up with an API that just takes a set of dependency specifications and resolves them. The build tools can pass each set of dependencies as is appropriate to the phase their currently dealing with.

>> We should also support downloading multiple indexes and
>> searching through these.
>
> absolutely. You should be able to have any number of indexes. Since
> indexes are additive there should be no conflicts.
>

Yes that's good.

>> Now it just so happens that we also provide a mechanism to push the
>> index back up to the internet, and for that matter to push an
>> artefact somewhere to.
>
> Yes. but this should probably be part of the publishing experience
> rather then the consuming experience.
>

Yes it is.

>> I think making it easy for people to publish their own indexes
>> alongside the central index is a good idea,
>
> As I said, in the software I dont think there is any concept of a
> central index. We will provide one, and I think most folks will use
> it. But the stuff we create should be index agnostic.
>

Yes you're absolutely right, and my use of the word 'central' is an artefact of my time spent using maven and it's central repository, which is also just one of hundreds.

Tim Watson

unread,
Feb 27, 2012, 5:41:37 AM2/27/12
to Tim Watson, erlware-...@googlegroups.com, Torben Hoffmann
Some other thoughts, inline...

So one of the things I've done now is compare my usage profile with that of the rabbitmq organisation. As a matter of interest, I decided to compare the volume of source code to the post build output (in terms of artefacts). Rabbit have c. 103Mb of source code in their repositories (as a whole), versus my combined profiles which comes to around 3Gb (inclusive of private repositories and the like). So what I'm thinking I will do for my own profile is attempt to introduce all of my artefacts to a git repository on bitbucket, which (as a provider) appears not to have any hard or soft limits. If bitbucket allows me to do this then I'm happy enough, and we can consider the following two options for publication support:

1. upload the artefact to the github project's downloads section
2. commit the artefacts to a git repository (somewhere)

I will take the time out to support (2) if it proves necessary, as this is clearly an edge case for those users who have a larger volume of data to handle. Obviously this functionality isn't 'core' in terms of what the publisher should do, so I propose that we provide a mechanism that makes it easy for the publisher to support 'plugins' that provide access to alternative repository implementations at the back end. The things I'm asking for then are:

1. an API for publishing to a repository
2. a means to get the 'plugin' code on the code path easily
3. a means to make the publication tool(s) aware that I want to use the 'plugin'

Item (2) is probably best served by resolving the 'plugin' as you would an ordinary dependency. Once the plugin is downloaded into the user's local repository, after which it should be added to the code path. As for item (3), I suspect a bit of simple (machine local) configuration would suffice.

This mechanism will also allow me to write a maven/nexus repository provider, which I can use and test at work as well.

Eric B Merritt

unread,
Feb 27, 2012, 11:51:35 AM2/27/12
to erlware-...@googlegroups.com, Torben Hoffmann
At Sun, 26 Feb 2012 02:01:04 +0000,
Tim Watson wrote:

> >
> > Why is that? The metadata is the canonical source for all of this
> > information. The name/url is just a way for use to reference the right
> > thing. Granted it would be nice if you could tell something about the
> > tarball from its name, but thats not needed. When we generate a
> > tarball we will probably want to do that. but we shouldn't require
> > it. If they want to do something debianish and put all the arch
> > information in the directory structure thats just fine. We shouldn't
> > care.
>
> This is not about the metadata, it's about the name of the file
> you're uploading to github, which uses S3 behind the scenes.

I understand. All I am saying is that the names and the path need to
be unique nothing more. The metadata has all the relevant information
that we need. The tarball is just a blob that has to be identified in
some way.

It would be good if they have reasonable names. But they dont have
to. That is of course, for the remote repos. It would probably be
different for the local repo, but renaming would happen there no
matter what.


> If you've got 15 different flavours of myapp-1.0.2 for various
> operating systems and whatnot, then you're going to need to upload
> 15 different binaries, one for each combination of
> os/arch/etc. Surely github isn't going to let you upload the same
> file name 15 times!?

no not at all. But they just need to have different names.

>
> >
> >>
> >> erlxsl-0.5.6-64bit-Darwin-i686-erts5.8.2.zip
> >> erlxsl-0.5.6-32bit-Darwin-i686-erts5.8.2.zip
> >> etc....
> >>
> >> I guess I'm ok with this, as the alternatives would use folder
> >> structures in the same way, and they're both really a sort of naming
> >> convention. We need to make sure that we don't generate file names
> >> that are unacceptable to github for any reason. And the final thing
> >> is that we're tied into github, though I suppose any usage of the
> >> github API means that we're tied in either way, so using the
> >> downloads versus option 1 doesn't change that either way.
> >>
> >
> > As I said, we dont actually care. The only property that matters to
> > use is uniqueness of the full path. That is that one url points to one
> > thing.
> >
>
> Ok that's fine, by my point is that if we're uploading to github (in
> the publication part of this) then we absolutely must care, because
> we cannot (I'm assuming) upload the same file name multiple
> times. And even if the user was uploading manually, they'd have to
> deal with this issue, which makes their life more complicated -
> which we've already agreed will put people off. Either we munge the
> filenames to make sure they're unique or we go for a git repository
> (per organisation) to put the binaries into and use folder structure
> for all the differences. Or we upload files with completely
> arbitrary names (using a UUID), but I think people won't want their
> download pages full of odd looking file names like that.

I think we just rename when you are pushing up with a tool and rely on
the user to handle it himself if they are doing it manually.

The tool should give them reasonable names
<os>-<flavor>-<arch>-<erl-flaver>-<package-name>.tar.gz. Thats pretty
standard. However, again we don't really care and we shouldn't enforce it.

Hmm, let me step back and try to explain this better.

1. All important information is in the metadata
2. The file is just a thing to hold a blob of binary data that we need
3. We dont want to force more work on the user then we need to or
force them to change if the dont need to.

The only property we require from a url is uniqueness. That is that a
url for a binary only ever points to a binary.

Things we will encourage, and support in the tools.

We can support decent identifying names in the publishing tools, but
that doesn't break the above principles.

> >
> > I see what you are saying. However, hosting a git repo is pretty
> > trivial. Git supports a ton of protocols and (including ssh and
> > http), I think we will get more bang for our short term buck that way
> > then trying to create something ourselves.
> >
>
> Yes I do agree. Even if we decided not to go with github, I'd be
> temped to use something pre-existing rather than build a custom
> solution. After all, our server side behaviour consists almost
> entirely of just serving up the index(es) somewhere and something
> else (maybe somewhere else) serving up the artefacts.

we are on the same page here.

> > If its in git its just a merge, should be at least. If the merge fails
> > we should inform the user that repo x is broken and they should remove
> > it. Then we can try again. Thats one of the really nice things about
> > using a DVCS.
> >
>
> Wow, so you're proposing that to merge the index we simply do a 'git
> merge' operation? I'm not clear on how this would work, especially
> in terms of additivity. How do you prevent incoming changes from
> removing data? Maybe I've misunderstood you here. I had assumed we
> would go for a really simple strategy where we do something like
>
> 1. for each dependency, for each index (that the user is configured to look at)

So its an optimization, but why look at multiple indexes when you dont
have to. Its pretty easy to make sure things are additive before the
merge. If nothing else you could look at the diffs and make sure there
are no '-' in the first character. Simplistic but it would work, and
having one merged index would make it much simpler for a user to
actually look at and understand.

> 2. resolve the dependency or fail

true in any case :) though its a bit more subtle then that.



> Obviously you process the indexes sequentially, such that once
> you've resolve something you stop (and recurse into the transitive
> dependencies) but if you can't find it in index1, then you try
> index2, etc. I guess my thinking here is a lot more simplistic
> because I'm used to maven's approach of making you declare the
> version you require explicitly. It doesn't support this notion of
> '>= v1.2.3' and therefore you're not looking for the 'best match'
> you're just looking for the specific thing that's been asked for. If
> you can't find it via the organisation itself, you'd then consider
> 3rd party signed stuff if the user allows it (globally of via a
> whitelist).

When you are trying to do the solving you actually want the index to
be as simple as possible. With this approach its really hard to give
the user exact information. If they have a non-trivial conflict they
will end up going down the index trying to figure out where the
conflict is. In that module having multiple indexes becomes much more
painful.

I am not opposed to either route, I was just assuming that having one
on disk store of knowledge was simpler. That make be an invalid
assumption on my part.

> >>
> >> To create my own index, I just upload an index file to github, or
> >> create a local one and check it in as a git repository which will
> >> have the same effect.
> >
> > I suspect we should put up a 'starter' repo that folks can fork
> > too. Of course, they can fork the main repo as well. It shouldnt matter.
> >
>
> This is interesting. Based on what you've said, I'm kind of
> inferring that the index will have a directory structure rather than
> just be a simple file. I'm also assuming that the metadata will be
> human readable/editable. Is that what you're currently thinking?

Thats how it has been forming up in my head.

> >>
> >> We download an index from the internet somewhere and use this to
> >> resolve a dependency graph. Once resolved, the nodes contain the
> >> URLs for each artefact and we provide the downloading part as
> >> well.
> >
> > Exactly, though we need to figure out the user experience here. I
> > suspect the user should somehow indicate that he wants to pull
> > something down.
> >
>
> I do like what maven does here. If I try to build something, it will
> go and verify the dependencies. If there are dependencies scoped to
> build/compile time, it will not execute that (build) phase unless
> they can be resolved. If there are dependencies scoped to test, then
> it will go ahead and compile but won't run any tests until the test
> scoped ones are resolved. If there are dependencies marked as some
> other scope - e.g., the concept of runtime scope that you have in
> sinan, which I'm guessing would be required for packaging and
> possibly other stuff like integration testing - then it does the
> same thing.


So remember we are thinking about these as different tools. There will
probably be a 'over tool' that strings things together. However, the
fetcher, the resolver and the builder are three possibly different
things.

>
> This is a bit sticky, because half of that is the build/test tool's
> responsibility (i.e., it is up to sinan/rebar to figure out when to
> enforce dependency resolution and what to do when it fails) and the
> other half depends on the way the dependencies are declared.

Well, sinan has the idea of 'excluding' dependencies. Basically for
the purposes of resolution you ignore an app. That allows things to
progress when there is a conflict you cant fix.

However, in the case of compile time vs run time dependencies those
should be two different resolution attempts in some way I
think. Though in almost every case runtime dependencies are a subset
of compile time dependencies.


> I *really* want this so that I don't *have* to pull down PropEr
> unless I'm testing - this is particularly important if I'm
> assembling a big release with lots of components. I don't want all
> the various testing libraries that are transitive dependencies to
> get fetched. I just want the stuff I need to build the release. If I
> need things (like PropEr, Erlang-Hamcrest, etc) for testing all the
> components of the release, then I'll make them direct dependencies
> as and when I want them.
>
> Now obviously the index metadata and artefacts themselves are
> completely ignorant of this, so perhaps it's just up to the tool
> developers to make sure that they expose the dependency
> configuration in such a way that they can capture this. If we leave
> this completely out of the tools for resolving/fetching (which we
> should) then I think we end up with an API that just takes a set of
> dependency specifications and resolves them. The build tools can
> pass each set of dependencies as is appropriate to the phase their
> currently dealing with.


Well it sounds like we need runtime, compile time, and test time
dependencies. Though I think in most cases test-time and compile-time
are one in the same. In any case, it should be pretty doable all
around. We just need to come up with the way to tell the solver what
the starting deps are in each case.


Eric B Merritt

unread,
Feb 27, 2012, 11:53:36 AM2/27/12
to erlware-...@googlegroups.com, Tim Watson, Torben Hoffmann
At Mon, 27 Feb 2012 10:41:37 +0000,

Agreed on the above.

Tim Watson

unread,
Feb 27, 2012, 12:44:46 PM2/27/12
to Tim Watson, erlware-...@googlegroups.com, Torben Hoffmann
I've uploaded a bunch more binary artefacts into my bitbucket repository (ignore the meta data files there - I was just messing around before we decided to collaborate on this) and because there's no releases, the repo is still tiny. I haven't uploaded any native stuff yet (I'll try that tonight) but even with multiple versions, I don't think there will be an issue. Releases though, are another thing altogether.

After all this time, I think I finally understand why faxien had packages for specific OTP releases, which was always something I struggled to grok previously. My concerns about disk space are, I realise, based on my assumptions about the way releases are packaged. When rebar produces a release, it builds an embedded system with the whole erts runtime (i.e., all the libraries, etc) embedded in the release folder. This makes for quite a large tarball, so multiple versions of multiple releases makes up the bulk of my disk space usage.

So the question is, how should release packaging work? My worries about the faxien method (of providing me with Erlang/OTP from the Erlware repository) really goes away once we have proper code signing in place. I'll happily trust a binary distribution that has come from a trusted source, or use an internally signed one if I'm really concerned. So I could take R15 from an internal BT repository, or from Erlware or from Erlang Solutions, as long as it is signed.

With that concern (about trust) out of the way, how *should* releases be packaged then? My assumption is that building an embedded system works for many people, but uploading the same (erts) binaries multiple times for each release (or product even) seems a bit crazy - I get why faxien did things differently now I think.

Do we need some multiple packaging types so that you can choose to upload/download

1. just an application
2. a release minus the OTP runtime and libraries - 'release'
3. a release combined with the OTP runtime and libraries - 'release-embedded'

What would be *really* nice is if I can upload one of (2) but download one of (3), i.e., the client program fetches the release (2) and the additional libraries required for it to work as an embedded system (3) and combines them before putting it into the local repository.

I guess what i'm thinking about is that at work, we take the binary artefacts from nexus when we're doing an automated deployment. In the case of java applications, these are jar/war/ear files that get deployed into a container (such as Tomcat or Weblogic) and we don't redistribute the JVM each time (although we do 'install' the JVM from a separate artefact in nexus the fist time the server is automatically built) but for Erlang applications (and a few other kinds of deployable) we do actually store the whole embedded system (runtime libraries and all) in nexus. We do this because disk space isn't an issue, but I wonder about the waste and whether there is a better way.

Anyway, sorry for dragging this conversation out so much - I think I get how this is going to work now. Am very interested to hear your thoughts about releases.

Cheers,

Tim

Eric B Merritt

unread,
Feb 27, 2012, 2:35:57 PM2/27/12
to erlware-...@googlegroups.com, Tim Watson, Torben Hoffmann
At Mon, 27 Feb 2012 17:44:46 +0000,
Tim Watson wrote:
>
> snip ...

>
> After all this time, I think I finally understand why faxien had
> packages for specific OTP releases, which was always something I
> struggled to grok previously. My concerns about disk space are, I
> realise, based on my assumptions about the way releases are
> packaged. When rebar produces a release, it builds an embedded
> system with the whole erts runtime (i.e., all the libraries, etc)
> embedded in the release folder. This makes for quite a large
> tarball, so multiple versions of multiple releases makes up the bulk
> of my disk space usage.

I have been thinking about this. I think the only things we store from
a release are the non-otp apps. That is the config, the rel/boot/etc
scripts and nothing more. That presupposes that all OTP apps are
published for the release, however, I think that should be doable.


> So the question is, how should release packaging work? My worries
> about the faxien method (of providing me with Erlang/OTP from the
> Erlware repository) really goes away once we have proper code
> signing in place. I'll happily trust a binary distribution that has
> come from a trusted source, or use an internally signed one if I'm
> really concerned. So I could take R15 from an internal BT
> repository, or from Erlware or from Erlang Solutions, as long as it
> is signed.

I would really like to avoid having erts in the repo. That was a
massive pain in the but for the faxien repo. I dont think it would be
too much to ask that people have erlang in place on disk already. This
would work the same way that gems, or npms or any other language
specific package manager works.

>
> With that concern (about trust) out of the way, how *should*
> releases be packaged then? My assumption is that building an
> embedded system works for many people, but uploading the same (erts)
> binaries multiple times for each release (or product even) seems a
> bit crazy - I get why faxien did things differently now I think.

We should have a tool that assembles it when you as fro a
release. Pulls the release metadata from the repo, along with any
applications it needs and combines that with the correct version of
the local erts.

Things I think we will probably need

* A way to give a range of erts versions in the same way we give a
range of app deps.

* A way to hold on to organization information in the release
metadata.


>
> Do we need some multiple packaging types so that you can choose to upload/download
>
> 1. just an application
> 2. a release minus the OTP runtime and libraries - 'release'
> 3. a release combined with the OTP runtime and libraries - 'release-embedded'

I think we have two types. OTP Applications and OTP releases. If
publish a release it automatically pulls it apart and publishes in the
right way. The release assembler should be able to assemble any
embedded release you might want to have.


> What would be *really* nice is if I can upload one of (2) but
> download one of (3), i.e., the client program fetches the release
> (2) and the additional libraries required for it to work as an
> embedded system (3) and combines them before putting it into the
> local repository.

I think we do 2 and the assembler or packager can do the packaging in
a correct way.

>
> I guess what i'm thinking about is that at work, we take the binary
> artefacts from nexus when we're doing an automated deployment. In
> the case of java applications, these are jar/war/ear files that get
> deployed into a container (such as Tomcat or Weblogic) and we don't
> redistribute the JVM each time (although we do 'install' the JVM
> from a separate artefact in nexus the fist time the server is
> automatically built) but for Erlang applications (and a few other
> kinds of deployable) we do actually store the whole embedded system
> (runtime libraries and all) in nexus. We do this because disk space
> isn't an issue, but I wonder about the waste and whether there is a
> better way.

I dont think we should actually store that. Well we kind of do, but no
reason to store redundant information.

>
> Anyway, sorry for dragging this conversation out so much - I think I
> get how this is going to work now. Am very interested to hear your
> thoughts about releases.


No its all good. Its much cheaper to talk about this stuff before the
fact then after and I would like to have it well thought out then blog
about it on the erlware blog and try to get some feedback from the
community.

btw, I am not fixed on any of this stuff at all I am just putting out
how i am thinking now.

(as an aside i am a bit worried about maintaining organization in otp
metadata)

Tim Watson

unread,
Feb 27, 2012, 2:59:53 PM2/27/12
to Eric B Merritt, erlware-...@googlegroups.com, Torben Hoffmann

On 27 Feb 2012, at 19:35, Eric B Merritt wrote:

> At Mon, 27 Feb 2012 17:44:46 +0000,
> Tim Watson wrote:
>>
>> snip ...
>>
>> After all this time, I think I finally understand why faxien had
>> packages for specific OTP releases, which was always something I
>> struggled to grok previously. My concerns about disk space are, I
>> realise, based on my assumptions about the way releases are
>> packaged. When rebar produces a release, it builds an embedded
>> system with the whole erts runtime (i.e., all the libraries, etc)
>> embedded in the release folder. This makes for quite a large
>> tarball, so multiple versions of multiple releases makes up the bulk
>> of my disk space usage.
>
> I have been thinking about this. I think the only things we store from
> a release are the non-otp apps. That is the config, the rel/boot/etc
> scripts and nothing more. That presupposes that all OTP apps are
> published for the release, however, I think that should be doable.
>
>

Ok I'm happy with this approach.

>> So the question is, how should release packaging work? My worries
>> about the faxien method (of providing me with Erlang/OTP from the
>> Erlware repository) really goes away once we have proper code
>> signing in place. I'll happily trust a binary distribution that has
>> come from a trusted source, or use an internally signed one if I'm
>> really concerned. So I could take R15 from an internal BT
>> repository, or from Erlware or from Erlang Solutions, as long as it
>> is signed.
>
> I would really like to avoid having erts in the repo. That was a
> massive pain in the but for the faxien repo. I dont think it would be
> too much to ask that people have erlang in place on disk already. This
> would work the same way that gems, or npms or any other language
> specific package manager works.
>

Yes that's very true.

>>
>> With that concern (about trust) out of the way, how *should*
>> releases be packaged then? My assumption is that building an
>> embedded system works for many people, but uploading the same (erts)
>> binaries multiple times for each release (or product even) seems a
>> bit crazy - I get why faxien did things differently now I think.
>
> We should have a tool that assembles it when you as fro a
> release. Pulls the release metadata from the repo, along with any
> applications it needs and combines that with the correct version of
> the local erts.
>

I like this idea in general. What I think could work is allowing the user to publish only the metadata and scripts for a release as you say. When they want to install it locally, we do the assembly on the local machine. I do think we should give people the ability to define a release in one of two ways

1. embedded system - when we assemble the release (locally) we embed the whole runtime system
2. 'non-embedded' system - relies on the runtime ($ERL_TOP) being pre-installed on the machine(s) it will run on (the user defines the location of $ERL_TOP or we use the environment variable)

That way people can do it both ways, depending on what suits their target environment. What do you think?

> Things I think we will probably need
>
> * A way to give a range of erts versions in the same way we give a
> range of app deps.
>

Ok that seems reasonable.

> * A way to hold on to organization information in the release
> metadata.
>

+1 - we should definitely do this somehow.

>
>>
>> Do we need some multiple packaging types so that you can choose to upload/download
>>
>> 1. just an application
>> 2. a release minus the OTP runtime and libraries - 'release'
>> 3. a release combined with the OTP runtime and libraries - 'release-embedded'
>
> I think we have two types. OTP Applications and OTP releases. If
> publish a release it automatically pulls it apart and publishes in the
> right way. The release assembler should be able to assemble any
> embedded release you might want to have.
>
>

I'm definitely sold on this being the best approach.

>> What would be *really* nice is if I can upload one of (2) but
>> download one of (3), i.e., the client program fetches the release
>> (2) and the additional libraries required for it to work as an
>> embedded system (3) and combines them before putting it into the
>> local repository.
>
> I think we do 2 and the assembler or packager can do the packaging in
> a correct way.
>
>>
>> I guess what i'm thinking about is that at work, we take the binary
>> artefacts from nexus when we're doing an automated deployment. In
>> the case of java applications, these are jar/war/ear files that get
>> deployed into a container (such as Tomcat or Weblogic) and we don't
>> redistribute the JVM each time (although we do 'install' the JVM
>> from a separate artefact in nexus the fist time the server is
>> automatically built) but for Erlang applications (and a few other
>> kinds of deployable) we do actually store the whole embedded system
>> (runtime libraries and all) in nexus. We do this because disk space
>> isn't an issue, but I wonder about the waste and whether there is a
>> better way.
>
> I dont think we should actually store that. Well we kind of do, but no
> reason to store redundant information.
>

Yes agreed.

>>
>> Anyway, sorry for dragging this conversation out so much - I think I
>> get how this is going to work now. Am very interested to hear your
>> thoughts about releases.
>
>
> No its all good. Its much cheaper to talk about this stuff before the
> fact then after and I would like to have it well thought out then blog
> about it on the erlware blog and try to get some feedback from the
> community.
>

I think that's absolutely the right way to go. I think we'll see quite a positive response on this.

> btw, I am not fixed on any of this stuff at all I am just putting out
> how i am thinking now.
>

Fair enough. As you've intimated, it'll be feedback from the community at large will help shape the final outcome anyway.

> (as an aside i am a bit worried about maintaining organization in otp
> metadata)
>

Isn't just a bit of addition 'env' metadata? Or a 'term config' file in ./priv or something? I think it only needs to be 'present' - we don't actually use it at runtime anyway.

Eric B Merritt

unread,
Feb 28, 2012, 11:09:08 AM2/28/12
to Tim Watson, Eric B Merritt, erlware-...@googlegroups.com, Torben Hoffmann
At Mon, 27 Feb 2012 19:59:53 +0000,

Tim Watson wrote:
>
>
> >>
> >> With that concern (about trust) out of the way, how *should*
> >> releases be packaged then? My assumption is that building an
> >> embedded system works for many people, but uploading the same (erts)
> >> binaries multiple times for each release (or product even) seems a
> >> bit crazy - I get why faxien did things differently now I think.
> >
> > We should have a tool that assembles it when you as fro a
> > release. Pulls the release metadata from the repo, along with any
> > applications it needs and combines that with the correct version of
> > the local erts.
> >
>
> I like this idea in general. What I think could work is allowing the
> user to publish only the metadata and scripts for a release as you
> say. When they want to install it locally, we do the assembly on the
> local machine. I do think we should give people the ability to
> define a release in one of two ways
>
> 1. embedded system - when we assemble the release (locally) we embed the whole runtime system
> 2. 'non-embedded' system - relies on the runtime ($ERL_TOP) being pre-installed on the machine(s) it will run on (the user defines the location of $ERL_TOP or we use the environment variable)
>
> That way people can do it both ways, depending on what suits their target environment. What do you think?

I think that works. It should be up to the bundler, the consumer, to
specify that though not the release creator.

> > (as an aside i am a bit worried about maintaining organization in otp
> > metadata)
> >
>
> Isn't just a bit of addition 'env' metadata? Or a 'term config' file
> in ./priv or something? I think it only needs to be 'present' - we
> don't actually use it at runtime anyway.

That might be the best way to do it. However, I dont want to pollute
the app too much with additional metadata artifacts. For the OTP Apps
I am very sure we can add tuples to the tuple list there without
causing problems. For the release its more problematic. Maybe an
additional file in with the release metadata or in the comments of the
release though I probably like that even less.

The thing I worry about is just making sure that its kept around
through the entire process. That is for each dependency we always know
the organization name it comes from. I think this might fall out when
we start talking about metadata in more detail.

I am going to make a wiki page on the erlware github project so we can
get this written up in some reasonable way and have a point of
collaboration. I think the first steps are to describe what we have
come up with now and get some feedback on it.

Tim Watson

unread,
Feb 28, 2012, 11:28:38 AM2/28/12
to Eric B Merritt, erlware-...@googlegroups.com, Torben Hoffmann
On 28 Feb 2012, at 16:09, Eric B Merritt wrote:

> At Mon, 27 Feb 2012 19:59:53 +0000,
> Tim Watson wrote:
>>
>>
>>>>
>>>> With that concern (about trust) out of the way, how *should*
>>>> releases be packaged then? My assumption is that building an
>>>> embedded system works for many people, but uploading the same (erts)
>>>> binaries multiple times for each release (or product even) seems a
>>>> bit crazy - I get why faxien did things differently now I think.
>>>
>>> We should have a tool that assembles it when you as fro a
>>> release. Pulls the release metadata from the repo, along with any
>>> applications it needs and combines that with the correct version of
>>> the local erts.
>>>
>>
>> I like this idea in general. What I think could work is allowing the
>> user to publish only the metadata and scripts for a release as you
>> say. When they want to install it locally, we do the assembly on the
>> local machine. I do think we should give people the ability to
>> define a release in one of two ways
>>
>> 1. embedded system - when we assemble the release (locally) we embed the whole runtime system
>> 2. 'non-embedded' system - relies on the runtime ($ERL_TOP) being pre-installed on the machine(s) it will run on (the user defines the location of $ERL_TOP or we use the environment variable)
>>
>> That way people can do it both ways, depending on what suits their target environment. What do you think?
>
> I think that works. It should be up to the bundler, the consumer, to
> specify that though not the release creator.
>

Agreed.

>
>>> (as an aside i am a bit worried about maintaining organization in otp
>>> metadata)
>>>
>>
>> Isn't just a bit of addition 'env' metadata? Or a 'term config' file
>> in ./priv or something? I think it only needs to be 'present' - we
>> don't actually use it at runtime anyway.
>
> That might be the best way to do it. However, I dont want to pollute
> the app too much with additional metadata artifacts. For the OTP Apps
> I am very sure we can add tuples to the tuple list there without
> causing problems. For the release its more problematic. Maybe an
> additional file in with the release metadata or in the comments of the
> release though I probably like that even less.
>
> The thing I worry about is just making sure that its kept around
> through the entire process. That is for each dependency we always know
> the organization name it comes from. I think this might fall out when
> we start talking about metadata in more detail.
>
>
> I am going to make a wiki page on the erlware github project so we can
> get this written up in some reasonable way and have a point of
> collaboration. I think the first steps are to describe what we have
> come up with now and get some feedback on it.

Sounds good to me.

>

Reply all
Reply to author
Forward
0 new messages