Is anyone out there using SparkleShare to share documents? I remember looking at this a while back and it was a bit half baked, but looks like it's quite usable now. It basically looks and feels like drop box, but uses all open source tech behind the scene (e.g. git for syncing). You can set it up to sync against a centralized repository (DropBox style) or peer-to-peer distribution.Might be worth setting this up at least for our individual labs. Not sure if we have enough files to share for an educoder SparkleShare repo just yet.
--
You received this message because you are subscribed to the Google Groups "Educoder" group.
To post to this group, send email to educ...@googlegroups.com.
To unsubscribe from this group, send email to educoder+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/educoder?hl=en.
Saw this flash by and wanted to point out that it's maybe not quite
the right way of putting it for people who are new to git, and that
knowing what's really going on can help you decide what the problems
might be if you use git in the case described upthread.
Here are two excellent references for knowing what's "really going on"
when you use git:
http://tom.preston-werner.com/2009/05/19/the-git-parable.html and
http://www.gitguys.com/topics/ (Definitely work through the latter;
start with "All Git Object Types: Blob, Tree, Commit And Tag", study
the diagrams, try the commands, and keep hitting "next".)
Basically, git stores every file you add to its object database (the
directory .git/objects) as a so-called "blob" that is indexed by a
40-character SHA hash of the blob's contents (which are the contents
of your file prepended with the file length and a null byte--but *not*
including the file name or path). That means that, across your entire
history of commits to a single repository, whenever the identical
version of your file is found--however it got put there, and whatever
its name and directory location happen to be at that moment--it will
be represented by the exact same "blob" file in your repo's object
database (because the SHA will be identical for identical blobs, and
hash collisions are astronomically unlikely).
The upshot is that if you add a file (say, a large binary file) to the
git index and don't change it, you're free to rename it, add it to
every commit or only some, move it around in the directory structure,
even copy it into several different places in the directory
structure--until you change its contents. Once you change the contents
or length of large binary file and add that to git, you have now a
second large blob in your index. It's up to you to decide whether
that's ok.
That's what people are referring to when they say git isn't great for
binary files (at least if they change often).
I know what is described above is probably what Matt was getting at by
saying "in effect only the diffs are stored", but it's handy to have
the right mental model when you ask yourself whether your repo will
grow as you add binary files.
Also, I'm not aware of git doing any optimization to store "deltas" or
"diffs" in its object tree to accomodate large files that change
often; there are tools for handling this as per
http://stackoverflow.com/questions/540535/managing-large-binary-files-with-git
But, also, here I've reached the end of my knowledge of git internals.
By the way, git also stores in the object database, and also indexes
by SHA, "tree" objects which it uses to reconstruct the directory
structure of any given commit, and "commit" objects, each of which
points (1) to a tree object that represents a snapshot of your working
directory at one "point in time" and (2) to one or more parent commits
from which the child is derived. Many, many different commits and the
corresponding files and directories are overlaid in .git/objects at
the same time without "blowing up" its size. (Branches are basically
constantly-updated pointers to the "latest" commit object in a stream
of commits. whose descended from a particular commit, and tags are
frozen-in-time pointers to a single commit.)
HTH,
--Richard
Matt,
Thanks, I found the same reference after posting, and was in the
process of writing something about how I shouldn't have opened my
mouth about diffing since I've certainly *heard of* `git gc' and
packfiles and I certainly knew enough to know, when writing, that they
*do* have something or other to do with diffing files and packing the
diffed results. :)
--R
P.S. one thing to be mindful of, typically a git repo will at least double your storage requirements. i.e. if your'e storing 1 GB of files in a directory, git-ifying that directory will add at least another 1 GB.
Plus using it to store large video files is probably a bad idea in general... I don't want multi-gigabyte .MOVs downloaded to my machine automatically, without notice.
I think if you run some actual experiments you'll be surprised.
Git's optimized for dealing with thousands of text files (the kind that would be in a large source code repository) and generating the smallest info representing changes to the state of those files as fast as possible.
Git is VERY VERY VERY good at this job.
However git is also pretty good at managing larger binary files.
I have a git clone of this CC subversion repository with over 25000 commits:
https://svn.concord.org/svn/projects/trunk/common
I just manually ran git gc in the directory (the reason git doesn't do this automatically is that git is more concerned about speed than disk space).
The total size of the .git directory is 936 MB.
The total size of the rest of the files in the working directory is 2238 MB.
In other words the .git directory is about 60% SMALLER than all the other files in the working directory!
But ... then consider this ... that .git directory has not only all of the files in the working directory .. it also can re-generate EVERY version of all of those files over the last 25000+ commits!
Here's an experiment with my documentation directory -- it has lots of big opaque files and it's 5.6 GB:
$ du -ch -d 0 documentation
5.6G documentation
$ cd documentation
$ git init .
$ git add .
$ git commit -m 'initial commit'
$ du -ch -d 0 .git
4.7G .git
After committing the files the .git directory is about 20% smaller than the rest of the files.
$ $ git gc
Counting objects: 50161, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (46816/46816), done.
Writing objects: 100% (50161/50161), done.
Total 50161 (delta 27650), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.
$ du -ch -d 0 .git
4.4G .git
Running git gc drops that size another 2%.