Master 5-26

0 views

Skip to first unread message

Skye Severy

unread,

Aug 5, 2024, 12:54:05 PM8/5/24

to histbicacoup

Thenwhen I do a git status it tells me that my branch is ahead by X commits (presumably the same number of commits that I have made). Is it because when you push the code it doesn't actually update your locally cached files (in the .git folders)? git pull seems to 'fix' this strange message, but I am still curious why it happens, maybe I am using git wrong?

Note: this particular question is, at the time I write this answer, quite old. It was posted three years before the first release of a version of Git that fixed many of these problems. It seems worth adding a modern answer along with explainers, though. The existing accepted answer suggests running git fetch -p,1 which is a good idea, although less-often required these days. It was much more necessary before Git version 1.8.2 came out; that Git was released three years after the original question.

The this in question is the fact that after git push origin master, the OP runs git status and sees the message On branch master followed by Your branch is ahead of 'origin/master' by 1 commit. To properly answer the question, we need to break it up into pieces.

This claim is actually just a bit too strong. Each of your own local branches, in your own Git repository, can have one setting that Git calls an upstream. Or, that branch can have no upstream. Old versions of Git were not very consistent about calling this an upstream setting, but in modern Git, it's more consistent. We also have git branch --set-upstream-to and git branch --unset-upstream for setting, or clearing, an upstream.

If you run git branch --set-upstream-to=origin/xyzzy, that sets the current branch's upstream to origin/xyzzy. For a branch named xyzzy, this would be the typical correct setting. Some acts of creating branches automatically set the (typically-correct) upstream, and some don't, so if you used a branch-creating operation that set the right upstream automatically, you need not do anything. If you want a different upstream, or if you used a branch-creating operation that set no upstream, you can use this to change the upstream.

where $branch is the current branch name, and $upstream is the string from its upstream setting (from git branch --set-upstream-to above). There are three dots between the two names here, and the --count, --left-right, and three dots are all required to get git rev-list to spit out the two numbers.

2If you have Git 2.23 or later, it's a good idea to move to git switch because it avoids some tricky git checkout behaviors that have historically led beginners into trouble (and occasionally even tripped up Git experts). However, if you are used to git checkout, you can keep using it as long as you like, as it's still supported. The real problem is basically that git checkout was overly powerful and could destroy work unexpectedly. The new git switch is deliberately less-powerful and won't do that; the "destroy my work on purpose" operations were moved into git restore.

3It's possible to be on no branch, in what Git calls detached HEAD mode. If you use git checkout it can put you in this mode suddenly (though it prints a big scary warning, so if you don't see the scary warning, it didn't do that), but if you use git switch, you must allow detached-HEAD mode with git switch --detach. There's nothing wrong with this mode, you just need to be careful once you're in it, not to lose any new commits you make. It's easy to lose them if you're not careful. In normal mode, Git won't lose new commits like this.

This part is a bit technical and I will outsource much of it to a web site, Think Like (a) Git. I will summarize it here though like this: branch names (like main or xyzzy) and remote-tracking names (origin/main, origin/xyzzy) are how Git finds commits. Git is all about the commits. Your branch names only matter for finding your commits. Of course, if you can't find them, you're in trouble, so your branch names do matter. But the key is reachability, which is the technical term.

As Think Like (a) Git notes, this is a bit like a railway train. It's great once you're on the train, which in this case will take you backwards, automatically, to all the earlier train stops. But first you have to find your way to a train station. A Git branch name will do that: it holds the hash ID of the latest commit on your branch.

The branch name branch holds the hash ID of the latest commit. We say that the name points to the commit. Whatever big ugly hash ID that really is, we've just used the letter H to stand in for it here.

G is of course also an actual commit: it has a saved snapshot and some metadata. In the metadata for G, Git saved the hash ID of an earlier commit F. We say that G points to F, and now Git can find F using this saved hash ID.

This repeats forever, or rather, until we get to the very first commit ever. That commit (presumably we'd call it A here) doesn't point backwards to an earlier commit, because there is no earlier commit.

This concept of reachability is basically a summary of what happens if we start at commit H, as found by branch name branch, and work backwards. We reach commit H, which reaches backwards to commit G, which reaches back to F, and so on.

When you use git checkout or git switch to get on a branch, and then make a new commit, Git automatically updates the branch name's stored hash ID. That is, suppose we have a series of commits like this:

We're "on" branch xyzzy, which I like to denote by attaching the special name HEAD to it. This is useful when there's more than one branch name in the diagram. Note that H is, at the moment, the newest commit. But now we'll make another one, in the usual way.

This new commit gets a new, unique, big ugly hexadecimal hash ID, just like any commit. Git makes sure that the new commit points backwards to commit H, since that's the commit we used to make the new commit. We'll use the letter I to represent this new commit. Let's draw it in:

This picture is actually mid-commit: Git has made I, but isn't done with the git commit action yet. Ask yourself this question: how will we find commit I later? We're going to need it's hash ID. Where can we store a hash ID?

That's how branch names work. It's really pretty simple, in the end: it just requires getting your head around several things at once. The name finds the commit. It finds the latest commit. From there, Git works backwards because each commit finds an earlier commit.

What about the remote-tracking names? Well, the trick here is that your Git talks to some other Git. Each Git has its own branch names. You have your master or main; they have theirs. You have your xyzzy branch and they can have theirs too.

There's a problem though. Their main or master, or their xyzzy if they have one, doesn't necessarily mean the same commit as your main or master or xyzzy. The solution is simple though: Git just takes their branch name and turns it into your remote-tracking name.

If origin's main or master or xyzzy has moved, you simply run git fetch or git fetch origin, perhaps with --prune. Your Git calls up their Git. They list out their branch names and commit hash IDs. Your Git gets any new commits from them, if necessary: commits they have, that you don't. Then your Git turns their branch names into your remote-tracking names and creates or updates your remote-tracking names to remember where their branch names pointed, at the moment you ran this git fetch.

If you use --prune, this handles the case where they deleted some branch name(s). Let's say they had a branch named oldstuff. You got it earlier so you have origin/oldstuff in your remote-tracking names. Then they deleted oldstuff, so this time they ... just don't have it any more. Without --prune, your Git ignores this. You keep your old origin/oldstuff even though it's dead now. With --prune, your Git says: Oh, huh, this looks dead now and prunes it away: a remote-tracking name in your Git that doesn't correspond to one of their branch names, just gets deleted.

4That's less common now in 2021 than it was back in 2010. It was substantially more common in 2005 when Git was first released. It used to be the case that, say, on an airline flight to a Linux conference, you couldn't get any access to the Internet, for any price.

What git rev-list does here is count reachable commits. The three-dot syntax, described in the gitrevisions documentation, produces what in set theory is called a symmetric difference. In non-math-jargon, though, we can just think about this as doing two commit reachability tests,which we can draw like this:

At the same time, commit K is reachable from your remote-tracking name origin/xyzzy. Commit H is reachable from K. From commit H on back, commits G and F and so on are all also reachable. But the two "railroad tracks" join up at commit H: commit H and all earlier commits are reachable from both names.

This makes commits I-J special in that they're reachable *only from the name xyzzy, and K special in that it's reachable *only from the name origin/xyzzy. The three-dot notation finds these commits: the ones reachable only from one name, or only from the other.

as the output, telling us that there are two commits on xyzzy that aren't on origin/xyzzy, and one commit that's on origin/xyzzy that is not on xyzzy. These are commits J-and-I (on xyzzy) and K (on origin/xyzzy) respectively.

This is an easy way to see which commits are on your branch, and which commits are on the upstream. It's usually more useful with --decorate, --oneline, and --graph (and you might want to add --boundary as well in some cases).

0 and 0: Our branch name and our remote-tracking name (or whatever is in the upstream) point to the same commit. Nobody is ahead or behind. The git status command will say Your branch is up to date with ''.

zero, nonzero: There are no commits on the current branch that are not on the upstream, but there are some on the upstream that are not on the current branch. This means our branch is behind the upstream.