GitHub space plan

4 views
Skip to first unread message

Bill Stumbo

unread,
Jun 8, 2021, 10:23:59 AM6/8/21
to Larry Masinter, Interlisp core, Bill Stumbo
In the 7 June meeting, Larry mentioned I had proposed a second option for resolving space issues on GitHub.  Since I had to drop off early for a work meeting, I thought I'd send an email and hopefully spur on the discussion.

The initial conversation is documented at:  https://github.com/Interlisp/medley/discussions/102

The short synopsis is that there is a lot of bloat in the repo - driven primarily by the addition of a collection of pdf documents that were later moved out of the repository.  And, as Larry has also mentioned saving old tilde versions of files into git has also led to some clutter and wasted space. 

We've discussed two potential options.  Option one is to 'surgically' remove unneeded content from git storage and remove any mention of it from git's history.  We have tested this approach with the docs/irm.pdf directory.  Using a third party python package I was able to remove all the files in this directory, its subdirectories and rewrite git history so there is no knowledge of them within the git Medley repository.  The discussion mentioned above details the package and steps in greater detail.

This action by itself significantly reduces the size of the Medley repository.

There are a couple drawbacks to this approach.  First, it rewrites git history, meaning that anyone who has forked our repository is going to suddenly find their repository no longer matches ours.  Essentially after running the operation and updating a local git repository, we need to do a force push to reset the GitHub version of the repo to match the revised local one.  Anyone uses the repository is going to need to update their local version and depending on how much work they have done locally may need to do more than stashing and popping from the stash to be resynched with the repo.

Given its a potentially painful exercise, the next concern is minimizing the number of times we rewrite global history.  It would be best to only force push a new version of the repo once.  Which leads to the question, how much scrubbing should be done?  When have we removed the right amount of cruft?   And, if we are removing stuff and rewriting history, is there a point were we accidentally break something?  What is our level of confidence that the toolset we use won't make a mistake - and what is our ability to identify an error before committing the changes?

Option two is much simpler.  Decide at point in time to start clean.  Archive the existing repository.  Delete the existing repository from GitHub and create a new git repository using the current source code base.  This way, you're guaranteed the only files in the repository are what we deemed worthy of being under source control at time zero.  There is no need to hunt down old files or worry about corrupting history.  Of course, the downside, is there is no history.  History, if needed, could be extracted from the archived repository and using it. 

This approach has the same issue that the first approach does - we're replacing the repository on GitHub and anyone that has forked a copy or pulled from our repo would suddenly find themselves out of sync with the existing repository.  Again, there would likely be some effort required to resync with the repo.

On the positive side, with this approach, the repository size is going to start at its smallest possible size.  Another positive, option two is much simpler to implement.  There is no risk of creating a version of master that somehow has a corrupted history.

A couple questions to consider as we determine the path forward:
 - What is the value of maintaining an uninterrupted history?
 - How much of the early work, grappling with the available content and making decisions on where things should go is worth keeping?

Thoughts?  Questions?
-- 
Bill Stumbo
wst...@charter.net

Larry Masinter

unread,
Jun 8, 2021, 5:49:00 PM6/8/21
to Bill Stumbo, Interlisp core

The other big change that is queued up and interacts with GitHub is
the renaming ALL-CAPS-NO-EXTENSION to lower-case-names.ilisp (for files managed by the file package).

 

This means a break in the version history because git-versions doesn’t do the right thing with the “move” (on linux) of FILENAME to filename.  (Getting git mv to do the right thing is hard on MacOS and Windows case insensitive file systems).

 

There is some question about whether this change is “worth it”.  I know there will be some changes needed for the file package, and finding if there is still a problem with FONTSAVAILABLE with the case of the font filenmes.

On the other hand, the benefit is one less strangeness on case sensitive file systems.

 

Probably some maiko changes will be needed to make {DSK} do Unicode file names.

I think for implementing things like git-versions.sh

 

--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lispcore/2c408f71-2e5c-31d2-42e2-6377cf93cc48%40charter.net.

Larry Masinter

unread,
Jul 16, 2021, 12:32:11 AM7/16/21
to Bill Stumbo, Interlisp core

Lately I’ve been leaning toward some reconsideration:

 

 

 

 

From: lisp...@googlegroups.com <lisp...@googlegroups.com> On Behalf Of Bill Stumbo
Sent: Tuesday, June 8, 2021 7:24 AM
To: Larry Masinter <L...@acm.org>; 'Interlisp core' <lisp...@googlegroups.com>
Cc: Bill Stumbo <wst...@charter.net>
Subject: GitHub space plan

 

In the 7 June meeting, Larry mentioned I had proposed a second option for resolving space issues on GitHub.  Since I had to drop off early for a work meeting, I thought I'd send an email and hopefully spur on the discussion.

-- 
Bill Stumbo
wst...@charter.net

--

Nick Briggs

unread,
Jul 16, 2021, 2:21:51 AM7/16/21
to Larry Masinter, Bill Stumbo, Interlisp core
I just tried installing git-lfs, both from the prebuilt binaries and building from source -- and it won't run on my every-day Mac because the OS is too old (and can't be upgraded).   Sigh.  There was really no reason for them to write it in Go rather than C++ or C.

-- Nick

Bill Stumbo

unread,
Jul 17, 2021, 10:40:27 PM7/17/21
to Larry Masinter, Interlisp core
The one problem with moving to Git LFS is that it only solves the problem for new files.  Files that are already checked into the repository will still be managed by the standard Git repository system. 

Our choices with existing files are either to live with it or rewrite history to remove them from the repository so that they can be managed by LFS going forward.

See:  https://docs.github.com/en/github/managing-large-files/versioning-large-files/moving-a-file-in-your-repository-to-git-large-file-storage


Bill Stumbo
wst...@charter.net
Reply all
Reply to author
Forward
0 new messages