SparkleShare?

116 views
Skip to first unread message

Matt Zukowski

unread,
Jan 26, 2012, 3:15:14 PM1/26/12
to educ...@googlegroups.com
Is anyone out there using SparkleShare to share documents? I remember looking at this a while back and it was a bit half baked, but looks like it's quite usable now. It basically looks and feels like drop box, but uses all open source tech behind the scene (e.g. git for syncing). You can set it up to sync against a centralized repository (DropBox style) or peer-to-peer distribution.

Might be worth setting this up at least for our individual labs. Not sure if we have enough files to share for an educoder SparkleShare repo just yet.

gbby

unread,
Jan 26, 2012, 3:17:27 PM1/26/12
to educ...@googlegroups.com
Armin spent a couple of hours last Saturday.
Also, ginger coons is a longtime user. I can put you in touch with her if you like...


On Thu, Jan 26, 2012 at 3:15 PM, Matt Zukowski <ma...@roughest.net> wrote:
Is anyone out there using SparkleShare to share documents? I remember looking at this a while back and it was a bit half baked, but looks like it's quite usable now. It basically looks and feels like drop box, but uses all open source tech behind the scene (e.g. git for syncing). You can set it up to sync against a centralized repository (DropBox style) or peer-to-peer distribution.

Might be worth setting this up at least for our individual labs. Not sure if we have enough files to share for an educoder SparkleShare repo just yet.

--
You received this message because you are subscribed to the Google Groups "Educoder" group.
To post to this group, send email to educ...@googlegroups.com.
To unsubscribe from this group, send email to educoder+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/educoder?hl=en.

Armin Krauss

unread,
Jan 26, 2012, 3:19:41 PM1/26/12
to educ...@googlegroups.com
Hey guys,

I tried to set up the server on the lab mac and somehow it does not work. Tried it on
my private Linux server and no problem. Right now I am in the process of getting an
nice server at OISE >100GB to give our team a repo for videos and stuff.

So yes, I am using it a bit, but wouldn't mind to dive deeper into thing like peer-to-peer
sharing.

Armin

Matt Zukowski

unread,
Jan 26, 2012, 3:36:01 PM1/26/12
to educ...@googlegroups.com
Armin, I was able to get it up quickly just by creating a bare git repo on the remote server (git init --bare) and then [SparkleShare] --> Add Hosted Project. From the looks of it there's very little magic going on here. It's just straight up git. The only difference is that SparkleShare does the commits for you automatically, and there are some post-commit hooks to take care of notifying peers via the SparkleShare server. If you go into a hosted directory, it's just plain-old git.

Curious to see how it will handle conflicts. Haven't seen that come up yet.

Armin Krauss

unread,
Jan 26, 2012, 3:38:26 PM1/26/12
to educ...@googlegroups.com
Hey Matt,

as I said on Linux no problem. Not sure why the issue on the mac is/was.

Armin

Matt Zukowski

unread,
Jan 26, 2012, 3:39:29 PM1/26/12
to educ...@googlegroups.com
P.S. one thing to be mindful of, typically a git repo will at least double your storage requirements. i.e. if your'e storing 1 GB of files in a directory, git-ifying that directory will add at least another 1 GB.

Plus using it to store large video files is probably a bad idea in general... I don't want multi-gigabyte .MOVs downloaded to my machine automatically, without notice.

Alessandro Gnoli

unread,
Jan 26, 2012, 3:40:09 PM1/26/12
to educ...@googlegroups.com
This might be well worth a look. I remember Armin trying to set it up at the same time Cheryl was trying to copy 100gb of video data from an external hard drive and she ended up staying up all night. :-)

My biggest concerns are privacy (because of all IRB / ethics restrictions we have ) and usability (no windows or browser client yet??)

-gugo



Matt Zukowski

unread,
Jan 26, 2012, 3:41:15 PM1/26/12
to educ...@googlegroups.com
People still use Windows? :P

Armin Krauss

unread,
Jan 26, 2012, 3:45:55 PM1/26/12
to educ...@googlegroups.com
Hey Gugo,

if the server is at OISE edu commons would this be private enough to fulfill the ethics?
Client and web interface is a problem, but for Mac users is fine. Seems like most people
have a Mac (nerds can always use git directly).

Didn't know git adds so much overhead, thought is was super efficient. Well if that is the case
SparkleShare is a no go since the primary usage for the server is to exchange large video
files without having them sitting on servers we don't control.

We can also use ssh/scp and find some client extensions that mount these as if they are
a drive or use a transfer client that speaks sftp.

Thoughts?

Armin

Stian Håklev

unread,
Jan 26, 2012, 4:35:10 PM1/26/12
to educ...@googlegroups.com
That's not my experience at all. I just did a git init in my Movies folder, it has 14 GB... all that did was add a 60kb .git folder...

Now of course if you are carrying multiple versions of those video files around, that will make the story very different... Although I think git is able to do binary deltas, but typically a slightly edited video file that is recompressed could be completely different from the binary point of view - that would take up twice the space... So for just storing and transfering unique files, it should work well, but you definitively wouldn't want to put your iMovie working directory under git!

Stian
http://reganmian.net/blog -- Random Stuff that Matters

Matt Zukowski

unread,
Jan 26, 2012, 4:48:32 PM1/26/12
to educ...@googlegroups.com
In git, each commit is actually a (gzipped) snapshot of the entire repo. This is one of the big differences between git and svn. Fortunately git will generally pack the snapshots together so that in effect only the diffs are stored on disk. But like you said, I'm not sure how well that packing works for video files where the codecs use all kinds of crazy compression, potentially making the diffs between even two small edits huge.

The .git directory in your Movies folder will almost certainly add another 14 GB. Check .git/objects. If it's empty, then maybe you haven't committed any real data yet.

Armin Krauss

unread,
Jan 26, 2012, 4:58:15 PM1/26/12
to educ...@googlegroups.com
Another alternative is http://owncloud.org/

Which give you a web interface on the server and allows sharing and stuff. Could be worth a try
once version 3 is out.

Armin

Stian Håklev

unread,
Jan 26, 2012, 5:01:33 PM1/26/12
to educ...@googlegroups.com
You are completely right, of course :) My mistake! Just did add * and commit, and that made a big difference... Of course it is possibly to have a headless repository where you have nothing in the directory, only the objects - that way it would take the same space, but it wouldn't be practically usable for anything...

Too bad,
Stian

On Thu, Jan 26, 2012 at 16:48, Matt Zukowski <ma...@roughest.net> wrote:

Richard Klancer

unread,
Jan 26, 2012, 6:32:20 PM1/26/12
to educ...@googlegroups.com
On Thu, Jan 26, 2012 at 4:48 PM, Matt Zukowski <ma...@roughest.net> wrote:
> In git, each commit is actually a (gzipped) snapshot of the entire repo.
> This is one of the big differences between git and svn. Fortunately git will
> generally pack the snapshots together so that in effect only the diffs are
> stored on disk.

Saw this flash by and wanted to point out that it's maybe not quite
the right way of putting it for people who are new to git, and that
knowing what's really going on can help you decide what the problems
might be if you use git in the case described upthread.

Here are two excellent references for knowing what's "really going on"
when you use git:
http://tom.preston-werner.com/2009/05/19/the-git-parable.html and
http://www.gitguys.com/topics/ (Definitely work through the latter;
start with "All Git Object Types: Blob, Tree, Commit And Tag", study
the diagrams, try the commands, and keep hitting "next".)

Basically, git stores every file you add to its object database (the
directory .git/objects) as a so-called "blob" that is indexed by a
40-character SHA hash of the blob's contents (which are the contents
of your file prepended with the file length and a null byte--but *not*
including the file name or path). That means that, across your entire
history of commits to a single repository, whenever the identical
version of your file is found--however it got put there, and whatever
its name and directory location happen to be at that moment--it will
be represented by the exact same "blob" file in your repo's object
database (because the SHA will be identical for identical blobs, and
hash collisions are astronomically unlikely).

The upshot is that if you add a file (say, a large binary file) to the
git index and don't change it, you're free to rename it, add it to
every commit or only some, move it around in the directory structure,
even copy it into several different places in the directory
structure--until you change its contents. Once you change the contents
or length of large binary file and add that to git, you have now a
second large blob in your index. It's up to you to decide whether
that's ok.

That's what people are referring to when they say git isn't great for
binary files (at least if they change often).

I know what is described above is probably what Matt was getting at by
saying "in effect only the diffs are stored", but it's handy to have
the right mental model when you ask yourself whether your repo will
grow as you add binary files.

Also, I'm not aware of git doing any optimization to store "deltas" or
"diffs" in its object tree to accomodate large files that change
often; there are tools for handling this as per
http://stackoverflow.com/questions/540535/managing-large-binary-files-with-git
But, also, here I've reached the end of my knowledge of git internals.

By the way, git also stores in the object database, and also indexes
by SHA, "tree" objects which it uses to reconstruct the directory
structure of any given commit, and "commit" objects, each of which
points (1) to a tree object that represents a snapshot of your working
directory at one "point in time" and (2) to one or more parent commits
from which the child is derived. Many, many different commits and the
corresponding files and directories are overlaid in .git/objects at
the same time without "blowing up" its size. (Branches are basically
constantly-updated pointers to the "latest" commit object in a stream
of commits. whose descended from a particular commit, and tags are
frozen-in-time pointers to a single commit.)

HTH,

--Richard

Matt Zukowski

unread,
Jan 26, 2012, 6:43:51 PM1/26/12
to educ...@googlegroups.com
Richard, thanks for the clarification.

Regarding optimization in storing "deltas" or "diffs", see the second section in  http://book.git-scm.com/7_how_git_stores_objects.html 

I was referring to packfiles, but trying to phrase it in Stian's terms.

Richard Klancer

unread,
Jan 26, 2012, 6:56:00 PM1/26/12
to educ...@googlegroups.com
On Thu, Jan 26, 2012 at 6:43 PM, Matt Zukowski <ma...@roughest.net> wrote:
> Richard, thanks for the clarification.
>
> Regarding optimization in storing "deltas" or "diffs", see the second
> section in  http://book.git-scm.com/7_how_git_stores_objects.html
>
> I was referring to packfiles, but trying to phrase it in Stian's terms.

Matt,

Thanks, I found the same reference after posting, and was in the
process of writing something about how I shouldn't have opened my
mouth about diffing since I've certainly *heard of* `git gc' and
packfiles and I certainly knew enough to know, when writing, that they
*do* have something or other to do with diffing files and packing the
diffed results. :)

--R

Jim Slotta

unread,
Jan 26, 2012, 9:36:55 PM1/26/12
to educ...@googlegroups.com
On Thu, Jan 26, 2012 at 3:39 PM, Matt Zukowski <ma...@roughest.net> wrote:
P.S. one thing to be mindful of, typically a git repo will at least double your storage requirements. i.e. if your'e storing 1 GB of files in a directory, git-ifying that directory will add at least another 1 GB.

Plus using it to store large video files is probably a bad idea in general... I don't want multi-gigabyte .MOVs downloaded to my machine automatically, without notice.

a good point.  esp for us air users.  You could create adrop box bigger than my hard drive!!

Is ther a setting where the drop box becomes more like an external directory, with permissions, and a desktop image?
j

Stephen Bannasch

unread,
Jan 27, 2012, 1:10:30 AM1/27/12
to educ...@googlegroups.com
At 3:39 PM -0500 1/26/12, Matt Zukowski wrote:
>P.S. one thing to be mindful of, typically a git repo will at least double your storage requirements. i.e. if your'e storing 1 GB of files in a directory, git-ifying that directory will add at least another 1 GB.

I think if you run some actual experiments you'll be surprised.

Git's optimized for dealing with thousands of text files (the kind that would be in a large source code repository) and generating the smallest info representing changes to the state of those files as fast as possible.

Git is VERY VERY VERY good at this job.

However git is also pretty good at managing larger binary files.

I have a git clone of this CC subversion repository with over 25000 commits:

https://svn.concord.org/svn/projects/trunk/common

I just manually ran git gc in the directory (the reason git doesn't do this automatically is that git is more concerned about speed than disk space).

The total size of the .git directory is 936 MB.

The total size of the rest of the files in the working directory is 2238 MB.

In other words the .git directory is about 60% SMALLER than all the other files in the working directory!

But ... then consider this ... that .git directory has not only all of the files in the working directory .. it also can re-generate EVERY version of all of those files over the last 25000+ commits!

Here's an experiment with my documentation directory -- it has lots of big opaque files and it's 5.6 GB:

$ du -ch -d 0 documentation
5.6G documentation


$ cd documentation
$ git init .
$ git add .
$ git commit -m 'initial commit'
$ du -ch -d 0 .git
4.7G .git

After committing the files the .git directory is about 20% smaller than the rest of the files.

$ $ git gc
Counting objects: 50161, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (46816/46816), done.
Writing objects: 100% (50161/50161), done.
Total 50161 (delta 27650), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ du -ch -d 0 .git
4.4G .git

Running git gc drops that size another 2%.

Reply all
Reply to author
Forward
0 new messages