metadata repos (one or many)

Doug Tangren

unread,

Feb 22, 2013, 12:55:13 AM2/22/13

to adep...@googlegroups.com

Still wrapping my head around Marks brain dump ( https://github.com/sbt/adept/wiki/NEScala-Proposal ) and am currently thinking about the metadata side of things.

I'm totally for the split between metadata repo and artifact repo. +1 on that.

What I am not clear on though is if the thought was for one have metadata repo others push changes to or many metadata repos. I feel like having one uber metadata repo containing metadata about every library adds a bit of friction to publishing and management. It introduces questions like who can push, who manages/oversees this pushings and pullings. It creates a sense of stress for new authors to are all ready to go but need access to this "blessed" publishing circle. It took months for me to get blessed into scala tools, thankfully less for sonatype ( though it felt like a 1000 step process ). I believe that in order to have a flourishing community of scala libraries you want to remove that kind of friction.

So then I started thinking about many metadata repos and what that means.

When thinking about a frictionless model for hosting metadata for implicit.ly ls that still guarantees authenticity. I came up with a schema where the user used a tool (sbt) to serialize metadata information within their projects repo, commited it, pushed it to github, then told the ls service to synchronize with this (hosted version on github). This system has some of the properties Mark outlined as pluses using dcvs to handle many tasks out of the box. This metadata is now versioned, hosted, and authentication is handled via implicit knowledge of push access to a given repo.

I was wondering if this could be expanded to fit marks vision of a local repo of metadata. What if publishing meant storing metadata in a specific location ( or branch ) of a author owned git repo and pushing to a remote like github. Then instead of telling a remote service to sync with that repo, just register your git url with it once. Adept could grab git urls for repos containing metadata from that service, clone them then it could just git pull to fetch changes from locally cloned repos. This is kind of like how bower ( http://twitter.github.com/bower/ ) works. The bower service basically just tracks names and git repos.

Thoughts on one uber repo vs many author managed repos?

-Doug Tangren
http://lessis.me

Josh Suereth

unread,

Feb 22, 2013, 7:16:39 AM2/22/13

to Doug Tangren, adep...@googlegroups.com

I've been talking with the "bintray" folks, which is jforg's "github for binaries". I think we should try to pair with this model. We can make user-hosted and uber repos simple *and* social.

Uber repositories can grow organically with contributions from any person community. It's also simple to request your repo be added to an uber repo in bintray.

Then again, there's always user pain if you don't have good default repositories. Still, I look at these new tools/services and think we should leverage what will be.

Doug Tangren

unread,

Feb 22, 2013, 8:50:59 AM2/22/13

to Josh Suereth, adep...@googlegroups.com

On Fri, Feb 22, 2013 at 7:16 AM, Josh Suereth <joshua....@gmail.com> wrote:

I've been talking with the "bintray" folks, which is jforg's "github for binaries". I think we should try to pair with this model. We can make user-hosted and uber repos simple *and* social.

Uber repositories can grow organically with contributions from any person community. It's also simple to request your repo be added to an uber repo in bintray.

Then again, there's always user pain if you don't have good default repositories. Still, I look at these new tools/services and think we should leverage what will be.

I just looked at bintray, https://bintray.com/beta. besides the robot ( I do like robots :) ) it's not very clear exactly what the "new way" is from their site. Can you be a good about page and explain what their service does exactly? I don't know what it means to be able to "fork" a jar :)

To be clear, in this thread I'm talking about repositories for metadata, a place where I can download a descriptor for a project or projects, not jars.

I'm wondering if project metadata makes sense to live closer to the project itself, maybe a branch of the repo hosting the source code, ect to make it easy for the library author to access and modify locally. With an uber repo of metadata I think you'd need someone to make sure edits by library authors are localized to what they should be able to edit or you'd have to introduce some maintenance step where the developer submits a request to include or update metadata and the manager approves and commits it. I'm not a huge fan of this but I feel like it would be required.

Wes Freeman

unread,

Feb 22, 2013, 9:58:51 AM2/22/13

to Doug Tangren, Josh Suereth, adept-dev

Yeah, initially from Mark's talk I imagined something like pointing to a github project in sbt/adept:

metadataResolver += "https://github.com/dispatch/reboot/reboot.git" % "metadata"

And then having a branch named "metadata" where you'd have metadata, including a list of pointers to the jars. The whole branch could be programmatically maintained with an sbt plugin or something... for people who don't want to actually type commands for adept (easier to adopt this way). And everyone already has a git repository for their project, right? :P

sbt adept-generate

which would read the sbt build files and build metadata in that branch. Maybe allow the branch to be specified with the default being "metadata"?

Using github/bitbucket to host the metadata seems like an awesome idea. A central repository could still clone those repos and host mirrors if desired... and there could be a default metadata resolver like maven central, but it would be just a big collection of .git files.

Just thinking aloud here.

Wes

eugene yokota

unread,

Feb 22, 2013, 12:02:26 PM2/22/13

to Doug Tangren, adep...@googlegroups.com

I think I like having the metadata in centralized place. May not be just one, but not 100 github repos.

Self-hosting metadata sounds similar to those who attempt to host maven repo using github pages.

When it works, it does, but kind of flaky, and you're at the mercy of the author not killing the files, etc..

Plus I think you might lose some of the goodness that comes from metadata tracked in version control (locking, syncing, going back in time, forking? etc.).

If authentication process is decentralized using github (see orgid thread), you could get the best of both worlds.

Just sign the metadata using PGP and post it to the central. It could automatically accept the pull req as long as the pubkey matches the paired orgid.

-eugene

Wes Freeman

unread,

Feb 22, 2013, 12:38:40 PM2/22/13

to eugene yokota, Doug Tangren, adept-dev

I'm ok with centralized as long as it's extremely easy to sign up for a new library author that just wants to publish their stuff. I looked at publishing to maven central and balked and just made my own repo instead (including registering a domain name for the project). I suppose I should go back now and follow the 37 steps for maven central, since people are actually using it. Seriously, 15 pages for instructions? Granted, there are a lot of screenshots, but if we can keep our instructions down to less than a page, I think that's something to strive for.

Clojars (sorry I keep comparing to Clojars, but I think they have some things we can learn from), on the other hand, was relatively easy--got my first library published in 15 minutes. I did a live demo of publishing PGP signed projects to Clojars with Lein just a couple of weeks ago, and even people following along were able to finish it in <15 minutes, including signing up for Clojars and generating a PGP key. Even people who didn't have GPG installed. Even people who didn't have GPG installed running windows <gasp>.

Wes

Mark Harrah

unread,

Feb 22, 2013, 6:22:15 PM2/22/13

to adep...@googlegroups.com

On Fri, 22 Feb 2013 12:38:40 -0500
Wes Freeman <freem...@gmail.com> wrote:

> I'm ok with centralized as long as it's extremely easy to sign up for a new
> library author that just wants to publish their stuff. I looked at
> publishing to maven central and balked and just made my own repo instead
> (including registering a domain name for the project). I suppose I should
> go back now and follow the 37 steps for maven central, since people are
> actually using it. Seriously, 15 pages for instructions? Granted, there are
> a lot of screenshots, but if we can keep our instructions down to less than
> a page, I think that's something to strive for.
>
> Clojars (sorry I keep comparing to Clojars, but I think they have some
> things we can learn from),

Examples from existing tools are great. There have been many efforts here and it would be good to not repeat mistakes.

> on the other hand, was relatively easy--got my
> first library published in 15 minutes. I did a live demo of publishing PGP
> signed projects to Clojars with Lein just a couple of weeks ago, and even
> people following along were able to finish it in <15 minutes, including
> signing up for Clojars and generating a PGP key. Even people who didn't
> have GPG installed. Even people who didn't have GPG installed running
> windows <gasp>.

I think this is what we need to aim for as well.

> Wes
>
> On Fri, Feb 22, 2013 at 12:02 PM, eugene yokota <eed3...@gmail.com> wrote:
>
> > I think I like having the metadata in centralized place. May not be just
> > one, but not 100 github repos.
> > Self-hosting metadata sounds similar to those who attempt to host maven
> > repo using github pages.
> > When it works, it does, but kind of flaky, and you're at the mercy of the
> > author not killing the files, etc..

Agree.

-Mark

Mark Harrah

unread,

Feb 22, 2013, 6:22:26 PM2/22/13

to adep...@googlegroups.com

On Fri, 22 Feb 2013 00:55:13 -0500
Doug Tangren <d.ta...@gmail.com> wrote:

> Still wrapping my head around Marks brain dump (
> https://github.com/sbt/adept/wiki/NEScala-Proposal ) and am currently
> thinking about the metadata side of things.
>
> I'm totally for the split between metadata repo and artifact repo. +1 on
> that.
>
> What I am not clear on though is if the thought was for one have metadata
> repo others push changes to or many metadata repos. I feel like having one
> uber metadata repo containing metadata about every library adds a bit of
> friction to publishing and management. It introduces questions like who can
> push, who manages/oversees this pushings and pullings. It creates a sense
> of stress for new authors to are all ready to go but need access to this
> "blessed" publishing circle. It took months for me to get blessed into
> scala tools, thankfully less for sonatype ( though it felt like a 1000 step
> process ). I believe that in order to have a flourishing community of scala
> libraries you want to remove that kind of friction.

We are definitely in agreement on having less than 1000 steps in the publishing process.

> So then I started thinking about many metadata repos and what that means.
>
> When thinking about a frictionless model for hosting metadata for
> implicit.ly ls that still guarantees authenticity. I came up with a schema
> where the user used a tool (sbt) to serialize metadata information within
> their projects repo, commited it, pushed it to github, then told the ls
> service to synchronize with this (hosted version on github). This system
> has some of the properties Mark outlined as pluses using dcvs to handle
> many tasks out of the box. This metadata is now versioned, hosted, and
> authentication is handled via implicit knowledge of push access to a given
> repo.

The authentication is weak, though. It isn't part of the metadata (no auditing, for example), is dependent on the hosting service, and isn't granular (can't trace to the actual individual who posted it). I like signing git commits, but it does have the problem of how to deal with merging.

> I was wondering if this could be expanded to fit marks vision of a local
> repo of metadata. What if publishing meant storing metadata in a specific
> location ( or branch ) of a author owned git repo and pushing to a remote
> like github. Then instead of telling a remote service to sync with that
> repo, just register your git url with it once. Adept could grab git urls
> for repos containing metadata from that service, clone them then it could
> just git pull to fetch changes from locally cloned repos. This is kind of
> like how bower ( http://twitter.github.com/bower/ ) works. The bower
> service basically just tracks names and git repos.

I don't think storing the metadata in the actual repository will work because you have to clone the whole repository to get at the metadata. It also means you have a tool like adept writing to your git repository directly. (Of course we won't have any serious bugs, but I'd still rather not touch people's source repository.)

I like the general idea of author managed repositories and aggregating repositories, though. We probably need to aggregate binaries as well as metadata. I think that is what the idea behind bintray is (we'll have to wait for Josh to confirm).

> Thoughts on one uber repo vs many author managed repos?

I'm interested in alternatives to a single repository. They are expensive to host, harder to mirror, block work when they go down, etc... I think some things already proposed for adept may mitigate this and make centralized more feasible, but I'd like to see where torrents and author managed repos or at least less centralized repos go.

-Mark

> -Doug Tangren
> http://lessis.me

Josh Suereth

unread,

Feb 23, 2013, 7:35:56 AM2/23/13

to adept-dev

Yeah, bintray is just an aggregation service. Authors are free to publish their repos to something under their "name/project" location on bintray.

Then, organizations, other others get "repo merge requests", I.e. will you proxy my maven repo? This allows organically grown "central repos" to emerge from trusted sources.

When I suggest bintray, I'm suggesting aiming for the best of all worlds. We have decentralised repositories and we hook our "proxy/merge" metadata code into bintray to form a "central repo", but we leave the control of that aspect up to the community (on bintray). If someone makes a better port of maven central, it should be a couple clicks to migrate from one merged repo to the other...

In any case, just a suggestion. I find deployment of OSS using bintray to be the way things are moving....

Reply all

Reply to author

Forward