The finial two bits of the puzzle

27 views
Skip to first unread message

Eric Merritt

unread,
Mar 15, 2012, 12:47:46 PM3/15/12
to Tim Watson, erlware-...@googlegroups.com
Tim,

I think the final two bits we need to work out here are blob storage
and making sure we dont have to pull down all the binaries when we do
a pull. The other is the metadata in one file.

On blobs,

I have done a bit of research and I cant figure out how to pull down
only certain subdirectories/repo parts. I think it might be possible
to do it below the git api layer, that is but implementing parts of
git or using library to interact directly with the server, I could be
very wrong there though. Going down into the deeps might be acceptable
but it makes me nervous.

On metadata in a single file.

The only negative that I can bring on this is that you loose the
ability to introspect whats in the library trivially with ls or tree.
That is in the multiple file version I can ls the organizations
directory and see all the orgs in the repo. In the single file model
we need to implement introspection in the tool. I am not at all sure
this is a benefit and so something to worry about.

Eric

Tim Watson

unread,
Mar 16, 2012, 8:11:35 AM3/16/12
to Eric Merritt, erlware-...@googlegroups.com
On 15 Mar 2012, at 16:47, Eric Merritt wrote:

Tim,

I think the final two bits we need to work out here are blob storage
and making sure we dont have to pull down all the binaries when we do
a pull. The other is the metadata in one file.

On blobs,

I have done a bit of research and I cant figure out how to pull down
only certain subdirectories/repo parts. I think it might be possible
to do it below the git api layer, that is but implementing parts of
git or using library to interact directly with the server, I could be
very wrong there though. Going down into the deeps might be acceptable
but it makes me nervous.


I don't actually know why we have to make this 'pure git only' for a first release, as I thought we were going to suck up being tied to github initially. In the 'tied to github' case, you can find and download individual blobs easily using the github REST api.

This is possible using pure git anyway, I think. Looking at http://progit.org/book/ch9-6.html I can see that there are ways to introspect the commit history against branches and tags and to download specific bundles of data. If we keep the index in the master branch, then I think we can do something like the following when a new artefact+version is added:

- add the organisation/artefact/version data to the index in branch 'master' and commit
- create and checkout a branch for the organisation/artefact/version that is being published: 'git co -b os_env-0.0.1' 
- delete the index from this branch
- add the physical folder structure and zip/tar/ez data for the artefact to the file system (within the branch), add to git and commit: `mkir nebularis/os_env/0.0.1 && git add <folder> && git ci -m "publish <details"'
- create an annotated tag named for the artefact+version: 'git tag -a pub-os_env-0.0.1 -m "Publish os_env-0.0.1"
- push the changes to the remote repository: 'git push origin os_env-0.0.1 --tags'
- go back and checkout the master branch and push the updated index: 'git co master && git push origin' 

Now when a client wants the index or a specific release of something, they can either use HTTP and download from github using the API *or* if we want to be pure git, then we should be able to do this by finding either the SHA for the current HEAD on master branch (which contains the index) or looking for a specific tag:

t4@malachi:tmp $ ssh -x g...@github.com "git-receive-pack 'hyperthunk/gitfoo.git'"
00725fcf7c6e0e54324bb7d9564d9998a5937640cde9 refs/heads/master report-status delete-refs side-band-64k ofs-delta
00455fd50ae1a78496cc2159c51a94302697da07d760 refs/heads/os_env-0.0.1
00485fd50ae1a78496cc2159c51a94302697da07d760 refs/tags/lib-os_env-0.0.1
0000

I haven't finished figuring out how the rest of this interaction goes, but based on that chapter of the pro-git book and poking around in the man pages for 5 mins, I get the impression that with a combination of this and the fetch-pack/git-upload-pack commands, it should be possible to obtain the index and pack file (containing the data) for just one SHA - 00485fd50ae1a78496cc2159c51a94302697da07d760 clearly contains *only* the pack data for the tag, and the tag was built from a branch where we

* deleted the index
* added the specific artefact/version __only__

So the pack file should contain *only* the data we want for os_env-0.0.1 and nothing else. Managing this on the client shouldn't be terrible, as there is an ssh capability in OTP - although I suspect we may fall fowl of deliberate github API limitations if we're not careful. Also the publication (locally into your own organisation repo) shouldn't be too difficult, as it's just a matter of branching, tagging and a bit of file system manipulation, plus a 'git push' when you're ready to make the changes public.

On metadata in a single file.

The only negative that I can bring on this is that you loose the
ability to introspect whats in the library trivially with ls or tree.
That is in the multiple file version I can ls the organizations
directory and see all the orgs in the repo. In the single file model
we need to implement introspection in the tool. I am not at all sure
this is a benefit and so something to worry about.


Ok fair enough, I can go with your point of view here and actually I agree with your point about discovery and introspection being much easier. Based on what I've said above about the use of low level git commands, we'll be able to checkout the master branch with the complete index at will, pulling only the index metadata without the published binaries. The individual binaries will be accessible separately using the low level git commands and these can be stored either in a parallel location or whatever. 

Eric

Eric Merritt

unread,
Mar 16, 2012, 10:21:51 AM3/16/12
to Tim Watson, erlware-...@googlegroups.com
On Fri, Mar 16, 2012 at 7:11 AM, Tim Watson <watson....@gmail.com> wrote:
> I don't actually know why we have to make this 'pure git only' for a first
> release, as I thought we were going to suck up being tied to github
> initially. In the 'tied to github' case, you can find and download
> individual blobs easily using the github REST api.

I have neither the interest nor the intention of making it pure git on
the first release. I just want to make sure that we can eventually go
to a pure git solution in the not-to-distant future. Preferably with
out a ground up rewrite.

I have been living too much in the porcelain I guess.

> I get the impression that with a combination of this and the
> fetch-pack/git-upload-pack commands, it should be possible to obtain the
> index and pack file (containing the data) for just one SHA
> - 00485fd50ae1a78496cc2159c51a94302697da07d760 clearly contains *only* the
> pack data for the tag, and the tag was built from a branch where we
>
> * deleted the index
> * added the specific artefact/version __only__
>
> So the pack file should contain *only* the data we want for os_env-0.0.1 and
> nothing else. Managing this on the client shouldn't be terrible, as there is
> an ssh capability in OTP - although I suspect we may fall fowl of deliberate
> github API limitations if we're not careful. Also the publication (locally
> into your own organisation repo) shouldn't be too difficult, as it's just a
> matter of branching, tagging and a bit of file system manipulation, plus a
> 'git push' when you're ready to make the changes public.

Actually now that you mention it. We could probably do this trivially
by just sticking the binary on its own branch and merging that branch
as needed into the 'core' working branch. I don't know why that didnt
occur to me.

>
>
> Ok fair enough, I can go with your point of view here and actually I agree
> with your point about discovery and introspection being much easier. Based
> on what I've said above about the use of low level git commands, we'll be
> able to checkout the master branch with the complete index at will, pulling
> only the index metadata without the published binaries. The individual
> binaries will be accessible separately using the low level git commands and
> these can be stored either in a parallel location or whatever.

This is good enough I think. Lets start migrating this to a document
so we can follow up to the erlang mailing list. If I can get some time
this weekend I will mine the history to do just that.

> Eric
>
>

Tim Watson

unread,
Mar 16, 2012, 1:25:10 PM3/16/12
to Eric Merritt, erlware-...@googlegroups.com

Hang on a minute, how will that work for people consuming the repository? If they do git clone <repo> they get everything by default, even if the binaries aren't merged into the main branch. Keeping them in separate branches (+ immutable tags) provides a cleaner separation and allows to download only the bits required.

>>
>>
>> Ok fair enough, I can go with your point of view here and actually I agree
>> with your point about discovery and introspection being much easier. Based
>> on what I've said above about the use of low level git commands, we'll be
>> able to checkout the master branch with the complete index at will, pulling
>> only the index metadata without the published binaries. The individual
>> binaries will be accessible separately using the low level git commands and
>> these can be stored either in a parallel location or whatever.
>
> This is good enough I think. Lets start migrating this to a document
> so we can follow up to the erlang mailing list. If I can get some time
> this weekend I will mine the history to do just that.
>

Ok cool, thanks sounds good.

>> Eric
>>
>>

Eric Merritt

unread,
Mar 16, 2012, 2:48:01 PM3/16/12
to Tim Watson, erlware-...@googlegroups.com
>>
>> Actually now that you mention it. We could probably do this trivially
>> by just sticking the binary on its own branch and merging that branch
>> as needed into the 'core' working branch. I don't know why that didnt
>> occur to me.
>>
>
> Hang on a minute, how will that work for people consuming the repository? If they do git clone <repo> they get everything by default, even if the binaries aren't merged into the main branch. Keeping them in separate branches (+ immutable tags) provides a cleaner separation and allows to download only the bits required.

What ever works. Its actually pretty trivial to clone just single
branch. Though you are right that a default clone pulls down the
entire repo.

Whatever accomplishes the goal of pulling down only the required
binary is fine with me.

Tim Watson

unread,
Mar 16, 2012, 5:28:38 PM3/16/12
to Eric Merritt, erlware-...@googlegroups.com
On 16 Mar 2012, at 18:48, Eric Merritt wrote:

>>>
>>> Actually now that you mention it. We could probably do this trivially
>>> by just sticking the binary on its own branch and merging that branch
>>> as needed into the 'core' working branch. I don't know why that didnt
>>> occur to me.
>>>
>>
>> Hang on a minute, how will that work for people consuming the repository? If they do git clone <repo> they get everything by default, even if the binaries aren't merged into the main branch. Keeping them in separate branches (+ immutable tags) provides a cleaner separation and allows to download only the bits required.
>
> What ever works. Its actually pretty trivial to clone just single
> branch. Though you are right that a default clone pulls down the
> entire repo.
>
> Whatever accomplishes the goal of pulling down only the required
> binary is fine with me.
>

Good. I think as you say, it's time to write it up and see what kind of feedback we get from the community. Want me to do some writing up of bits as well, or would you rather put the initial draft together?

Tim Watson

unread,
Mar 16, 2012, 7:16:17 PM3/16/12
to Tim Watson, Eric Merritt, erlware-...@googlegroups.com
On 16 Mar 2012, at 17:25, Tim Watson wrote:

On 16 Mar 2012, at 14:21, Eric Merritt wrote:

On Fri, Mar 16, 2012 at 7:11 AM, Tim Watson <watson....@gmail.com> wrote:
I don't actually know why we have to make this 'pure git only' for a first
release, as I thought we were going to suck up being tied to github
initially. In the 'tied to github' case, you can find and download
individual blobs easily using the github REST api.

I have neither the interest nor the intention of making it pure git on
the first release. I just want to make sure that we can eventually go
to a pure git solution in the not-to-distant future. Preferably with
out a ground up rewrite.

Actually I suspect a pure git based solution will be quite simple to build, now that I'm poking around with it. Given my 'branch+tag per item' idea, I've set up the following demo repo:

t4@malachi:gitfoo $ git co master
Switched to branch 'master'
t4@malachi:gitfoo $ ls -la
total 8
drwxr-xr-x    4 t4  staff   136 16 Mar 23:03 .
drwxr-xr-x  160 t4  staff  5440 16 Mar 10:00 ..
drwxr-xr-x   13 t4  staff   442 16 Mar 23:03 .git
-rw-r--r--    1 t4  staff    13 16 Mar 23:03 index.meta
t4@malachi:gitfoo $ git co os_env-0.0.1
Switched to branch 'os_env-0.0.1'
t4@malachi:gitfoo $ ls -la nebularis/os_env/0.0.1/
total 16
drwxr-xr-x  3 t4  staff   102 16 Mar 23:03 .
drwxr-xr-x  3 t4  staff   102 16 Mar 23:03 ..
-rw-r--r--  1 t4  staff  8061 16 Mar 23:03 os_env-0.0.1.zip
t4@malachi:gitfoo $ git tag
lib-os_env-0.0.1

And this is published on github. Now if I want to obtain only a particular subset of the branches/tags without downloading everything, I can do so fairly easily - note that the data is pulled out correctly and into the right place:

t4@malachi:tmp $ mkdir tmp-clone
t4@malachi:tmp $ cd tmp-clone/
t4@malachi:tmp-clone $ git init
Initialized empty Git repository in /private/tmp/tmp-clone/.git/
t4@malachi:tmp-clone $ git fetch-pack --include-tag -v g...@github.com:hyperthunk/gitfoo.git refs/tags/lib-os_env-0.0.1
Server supports multi_ack_detailed
Server supports side-band-64k
Server supports ofs-delta
want 5fd50ae1a78496cc2159c51a94302697da07d760 (refs/tags/lib-os_env-0.0.1)
done
remote: Counting objects: 10, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 10 (delta 0), reused 10 (delta 0)
Unpacking objects: 100% (10/10), done.
5fd50ae1a78496cc2159c51a94302697da07d760 refs/tags/lib-os_env-0.0.1
t4@malachi:tmp-clone $ mkdir -p nebularis/os_env-0.0.1
t4@malachi:tmp-clone $ git archive 5fd50ae1a78496cc2159c51a94302697da07d760 >> archive.zip
t4@malachi:tmp-clone $ unzip archive.zip -d nebularis/os_env-0.0.1/
Archive:  archive.zip
warning [archive.zip]:  4096 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  inflating: nebularis/os_env-0.0.1/os_env/ebin/os_env.beam  
  inflating: nebularis/os_env-0.0.1/os_env/ebin/os_env.app  
  inflating: nebularis/os_env-0.0.1/os_env/include/os_env.hrl  
t4@malachi:tmp-clone $ 

You can do `git archive --remote....' but github actually block this, probably because they want to track download usage and the like. 

Anyway, based on this sketching, I think it'll be quite easy to do in pure git if/when the time comes.

Eric Merritt

unread,
Mar 19, 2012, 6:06:29 PM3/19/12
to Tim Watson, erlware-...@googlegroups.com
top posting.

Sweet. Lets start getting this written up and pushed out (I may have
already mentioned this). I can do the general stuff if you want to
write something specific on repo organization and handling.

Eric

Tim Watson

unread,
Mar 20, 2012, 6:25:57 AM3/20/12
to Eric Merritt, erlware-...@googlegroups.com
Ok will find some time for that in the next day or two.

Tim Watson

unread,
Mar 20, 2012, 10:54:43 AM3/20/12
to Tim Watson, Eric Merritt, erlware-...@googlegroups.com
Top posting again.

Just one other thing I wanted to cover before we finalise and start documenting. For packages that contain native code, I feel that the publisher should be able to override the auto-selected 'supported-platform' or perhaps add additional 'supported-platforms' such that we can manually distinguish between builds that only work on certain flavours of linux, versus generic linux, versus generic (posix compliant) unix platforms e.g., any platform supporting glibc >= version X. This will make it much easier when we know we can produce a binary that will work on across various unix based platforms.

In order for that to work, I think the OS hierarchy will need to have basic support for something like:

{os_platforms, [
{windows, [.....]},
{unix, [
generic, %% no version information required....
{linux, [
{generic, [">= 2.6"]},
{linux_<flavour>, [">= 2"]}
]},
{bsd, [
{darwin, ["10.6.8"]},
{free_bsd, [...]}
%% etc
]}
]}
}.


Thoughts???

Eric Merritt

unread,
Mar 20, 2012, 11:17:32 AM3/20/12
to Tim Watson, erlware-...@googlegroups.com
On Tue, Mar 20, 2012 at 9:54 AM, Tim Watson <watson....@gmail.com> wrote:
> Top posting again.
>
> Just one other thing I wanted to cover before we finalise and start documenting. For packages that contain native code, I feel that the publisher should be able to override the auto-selected 'supported-platform' or perhaps add additional 'supported-platforms' such that we can manually distinguish between builds that only work on certain flavours of linux, versus generic linux, versus generic (posix compliant) unix platforms e.g., any platform supporting glibc >= version X. This will make it much easier when we know we can produce a binary that will work on across various unix based platforms.
>
> In order for that to work, I think the OS hierarchy will need to have basic support for something like:
>
> {os_platforms, [
>    {windows, [.....]},
>    {unix, [
>        generic,    %% no version information required....
>        {linux, [
>            {generic, [">= 2.6"]},
>            {linux_<flavour>, [">= 2"]}
>        ]},
>        {bsd, [
>            {darwin, ["10.6.8"]},
>            {free_bsd, [...]}
>            %% etc
>        ]}
>    ]}
> }.
>
>
> Thoughts???

The first thing that comes to my head (and I am far from sure this is
valid) is that you will have a fair amount of mapping with this
approach. That is that the information you will get back from erlang
or uname will be something like linux, free_bsd, darwin etc. So with
a hierarchical structure you will need to query someplace what
'family' this particular thing belongs to. That is, I dont believe the
family information is provided through any api. Again, that mapping
should be pretty static and having the hierarchy is probably a win
there.

On a side note (and I realize this is just an example), I am not a big
fan of including the constraint in the version string. It just
introduces a parsing problem. We can easly have a tuple there and it
should be just as readable and have no parsing issue at all.

Tim Watson

unread,
Mar 20, 2012, 12:30:55 PM3/20/12
to Eric Merritt, erlware-...@googlegroups.com

sounds right to me.

> On a side note (and I realize this is just an example), I am not a big
> fan of including the constraint in the version string. It just
> introduces a parsing problem. We can easly have a tuple there and it
> should be just as readable and have no parsing issue at all.

Indeed. Was just hacking an example but I do concur that {atom(), predicate(), semver()} is a much cleaner approach, where we've got something like...

predicate() :: equals | greater_than | greater_than_or_equals | less_than | lteq... | '=' | '>' | '>=' | '<' | '=<'.

I also think that a two tuple should be shorthand for equals, so that these two definitions are semantically equivalent: {Thing, equals, Vsn} === {Thing, Vsn}.


Eric Merritt

unread,
Mar 20, 2012, 12:32:14 PM3/20/12
to Tim Watson, erlware-...@googlegroups.com

This is exactly what sinans constraint solver does. So I am on board
with all of this. :P

Tim Watson

unread,
Mar 20, 2012, 12:48:41 PM3/20/12
to Eric Merritt, erlware-...@googlegroups.com

Awesome. :D

Eric Merritt

unread,
Mar 22, 2012, 12:00:58 PM3/22/12
to Tim Watson, erlware-...@googlegroups.com
I think we should start working on a name for the suite. It may sound
trivial but I think its actually important.

I will try to come up with some candidates

Eric

Tim Watson

unread,
Mar 22, 2012, 6:46:24 PM3/22/12
to Eric Merritt, erlware-...@googlegroups.com
On 22 Mar 2012, at 16:00, Eric Merritt wrote:

> I think we should start working on a name for the suite. It may sound
> trivial but I think its actually important.
>

Yes I think you're right and it does really matter.

> I will try to come up with some candidates
>

Ok cool.

Reply all
Reply to author
Forward
0 new messages