allowing largish files and preserving web-based interaction

17 views
Skip to first unread message

James Battat

unread,
Aug 18, 2016, 11:17:36 AM8/18/16
to gitit-discuss
In previous posts, folks have suggested storing large, static, binary files (like PDFs) separate from the git repository (i.e. not in the wikidata/ directory).

I'd like to use gitit as a research lab notebook and want to enable my collaborators to edit with me, all through the web interface.  I agree that image and PDF files don't need git, but as far as I know, the recommendation in other threads (put the papers in a different directory) is not possible via the web interface.

* Is it really a problem to dump large-ish files (~10MB or less) into the wikidata/ dir?  
* Is it possible for a user to upload images into the static/ dir (through the web interface?  I've not yet found a way)?
* Can someone recommend an approach that allows multi-user web-based modification of the gitit site, while preserving ability to upload large files?

Many thanks,
James

Henning Thielemann

unread,
Aug 18, 2016, 11:23:02 AM8/18/16
to gitit-discuss

On Thu, 18 Aug 2016, James Battat wrote:

> * Is it really a problem to dump large-ish files (~10MB or less) into the wikidata/ dir?  

If you simply copy a document to wikidata, gitit will not find it. Gitit
only shows what git stores as the latest committed version. Putting a
large file into a directory is not a problem, but as soon as you start
managing it with Git, every new version of the PDF will be stored
additionally to all previous versions. This becomes pretty big pretty
soon.

James Battat

unread,
Aug 18, 2016, 11:51:18 AM8/18/16
to gitit-...@googlegroups.com

>
>> * Is it really a problem to dump large-ish files (~10MB or less) into the wikidata/ dir?
>
> If you simply copy a document to wikidata, gitit will not find it. Gitit only shows what git stores as the latest committed version. Putting a large file into a directory is not a problem, but as soon as you start managing it with Git, every new version of the PDF will be stored additionally to all previous versions. This becomes pretty big pretty soon.


Thanks for the reply, but I meant having users upload 10MB files through the gitit web interface. So git will know.

So what is the recommended approach for a user, with only web-based access to the site, to upload ~10MB files?


Henning Thielemann

unread,
Aug 18, 2016, 11:56:02 AM8/18/16
to gitit-...@googlegroups.com

On Thu, 18 Aug 2016, James Battat wrote:

> Thanks for the reply, but I meant having users upload 10MB files through the gitit web interface. So git will know.
>
> So what is the recommended approach for a user, with only web-based access to the site, to upload ~10MB files?

Maybe they put their files on public servers, say dropbox, and then add
URLs to a gitit wiki page.

James Battat

unread,
Aug 18, 2016, 12:10:02 PM8/18/16
to gitit-...@googlegroups.com
It’s important to me to keep the site self-contained — i.e. not rely on other servers/hosts.

So the criteria are:
* users are restricted to web-based interaction with the site (view/upload/edit)
* all content located within my machine (maybe in git, maybe not)

Can gitit make use of git Large File Storage (or similar)?
https://git-lfs.github.com

Also, I guess I’m not clear on what the drawbacks are to having a huge git repository (i.e. just throw all of the images into git). Especially if the PDFs are static (not changing). Yes, the repo will be big, which makes cloning it a pain, but are there major performance issues otherwise?

James

Simon Heath

unread,
Aug 19, 2016, 1:44:15 AM8/19/16
to gitit-...@googlegroups.com
Git-LFS is made specifically for that very situation. Unfortunately it
is also developed by Github... so the client is open, the specification
is open, but the server is closed (so they can charge you to use it).
There's some open source server implementations but I don't know how
good they are.

If all the files truly are read-only, then putting them in git should be
okay. There's a couple downsides that I'm aware of: First, they will
always be in the repo, so even if you "clean" them out they're still
taking up disk space (unless you descend into the guts of git to excise
them). Second, checking in a very large file (100's of MB) can take a
long time (several minutes per file).

It sounds like what you really want is a content management system,
which is a bit of a different beast from Gitit.

Simon

Henning Thielemann

unread,
Aug 19, 2016, 2:01:55 AM8/19/16
to gitit-...@googlegroups.com

On Fri, 19 Aug 2016, Simon Heath wrote:

> If all the files truly are read-only, then putting them in git should be
> okay. There's a couple downsides that I'm aware of: First, they will always
> be in the repo, so even if you "clean" them out they're still taking up disk
> space (unless you descend into the guts of git to excise them).

That's also my top concern. If a user only wants to inspect the small text
portion of the Wiki he cannot simply omit the MBs of PDFs when cloning the
repository.

James Battat

unread,
Aug 19, 2016, 7:59:50 AM8/19/16
to gitit-...@googlegroups.com
Thanks Simon, Henning.
For my use, there are no users cloning the wiki (at least not often), and the whole point of the repo is to be an lab record, so the “once in the repo always in the repo” is, in principle, a feature that I like. So having static PDFs in the git repo will likely be fine. For my use, it’s certainly preferable to farming out the responsibility for hosting PDFs to each individual user.

I’m going to proceed by simply increasing the file size limit in the gitit config file, and have users upload images/PDFs through the web interface.

If others have suggestions/comments, I’m all ears.

Thanks for the help,
James
Reply all
Reply to author
Forward
0 new messages