[PATCH 0/2] avoid refreshing submodule pointers by default

19 views
Skip to first unread message

Jacob Keller

unread,
Nov 28, 2017, 7:04:54 PM11/28/17
to st...@googlegroups.com, Jacob Keller
This patch series implements a couple of submodule-related changes and
fixes. The first patch is a pure bug fix and should not be
controversial. The second patch is a behavior change which may spark
more discussion.

First, patch a bug with how submodules are handled when checking for a
clean working tree. Because git does not move submodule pointers
automatically (without calls to git submodule update), our current code
flow will always report a non-clean working tree whenever submodules are
out of date. This is a huge pain for users of stgit with submodules, as
it confuses them, since most git commands do not complain. Additionally,
in some situations it is difficult to understand how to get out of the
situation cleanly. More detail of the problems is explained in the
specific patch.

The second patch changes behavior of stg refresh to ignore submodules
when refreshing. This is due to feedback from various users I work with
who use submodules, and find it too easy to accidentally commit a
submodule pointer update.

These updates occur because stg refresh wants to always update the
entire working tree. However, since submodules in many workflows are
often out-of-date, or include new changes not yet submitted to the
submodule upstream, this can cause unexpected changes to the submodule
pointer. Additionally, it is easy for less experienced users to not
understand how to revert such a change if they do include it.

It is possible with rigor to avoid such issues, but remembering to
always use "stgit refresh --index", but one slip of the mind and you can
accidentally include changes with no easy way to escape.

I thought it best to change the default for a few reasons:
a) most users I've discussed with here seem to agree that the new
behavior is an improvement in regards to submodules
b) few people use submodules, even fewer are likely to use both stgit
and submodules, especially due to the pain points noted above, they
are likely to have abandoned stgit for other tools.
c) you can work around this in numerous ways including using git add
and stg refresh --index
d) those that like the old behavior can re-enable it.

Because the pain points above are very real for those who do use
submodules, I thought it best to not require them to discover the
configuration variable and enable it. Instead, I thought it best to fix
the issues but provide an out. This ensures that the maximum number of
people receive the new updated behavior, at the slight risk of creating
problems for users who were not expecting a change. I believe the risk
is small enough, and that most users will gladly prefer the new default
behavior.

Jacob Keller (2):
stgit: ignore submodules when checking if working tree is clean
refresh: do not include submodules by default

stgit/commands/refresh.py | 33 +++++++++++++++++++++++++---
stgit/config.py | 29 +++++++++++++------------
stgit/lib/git.py | 11 +++++++++-
t/t1200-push-modified.sh | 16 ++++++++++++++
t/t5000-refresh-submodules.sh | 50 +++++++++++++++++++++++++++++++++++++++++++
5 files changed, 121 insertions(+), 18 deletions(-)
create mode 100755 t/t5000-refresh-submodules.sh

--
2.15.0

Jacob Keller

unread,
Nov 28, 2017, 7:04:54 PM11/28/17
to st...@googlegroups.com, Jacob Keller
If a patch contains a change to a submodule, and a user attempts to pop
and then push the patch from the queue, errors similar to the following
can occur:

$stg pop
Popped a-new-patch
No patch applied
$stg push
stg push: Worktree not clean. Use "refresh" or "reset --hard"

This error occurs because after popping the patch off the queue, the
submodule contents are not updated. This is due to the way that git
treats submodules and does not recursively check them out.

Later, once we go to push the patch back on, we check if the working
tree is clean, by using update-index. Since the submodule remained
checked out at the previous location, we now complain. Worse yet, the
two suggestions we've given do not help.

First, if the user tries to "reset --hard" to throw away these changes,
nothing happens. This, again, is because the submodule is not
automatically moved. Thus, the user can try running reset --hard as many
time as they like, and it will not resolve the problem.

Second, the user can use refresh. This is actually worse, because it
will superficially appear to resolve the problem. The refresh will
simply merge in this submodule pointer change into the previous patch on
the queue. Now, the user can continue with the push, which will notice
the submodule change was already part of the previous patch from the
queue and will essentially remove it from the original patch.

This is much worse now, we've accidentally moved the submodule change to
the wrong patch!

This issue and similar other issues exist due to the default way
submodules behave. We can fix all of them by telling "git update-index"
to ignore submodules. Doing so, now it will not consider submodules
which aren't checked out correctly as a problem for a dirty working
tree. This solves the problem, and makes stgit behave more how git
itself behaves with submodules by default.

Add a new test which demonstrates that popping and subsequently pushing
a patch that changes a submodule will now pass.

Signed-off-by: Jacob Keller <jacob.e...@intel.com>
---
stgit/lib/git.py | 2 +-
t/t1200-push-modified.sh | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/stgit/lib/git.py b/stgit/lib/git.py
index e9d9856d0522..82b7a900afb7 100644
--- a/stgit/lib/git.py
+++ b/stgit/lib/git.py
@@ -1086,7 +1086,7 @@ class IndexAndWorktree(RunWithEnvCwd):
def worktree_clean(self):
"""Check whether the worktree is clean relative to index."""
try:
- self.run(['git', 'update-index', '--refresh']).discard_output()
+ self.run(['git', 'update-index', '--ignore-submodules', '--refresh']).discard_output()
except RunException:
return False
else:
diff --git a/t/t1200-push-modified.sh b/t/t1200-push-modified.sh
index e04dac59680a..705e5463adbf 100755
--- a/t/t1200-push-modified.sh
+++ b/t/t1200-push-modified.sh
@@ -71,4 +71,20 @@ test_expect_success \
)
'

+test_expect_success \
+ 'pop then push a patch with a change to a submodule should not produce a conflict' '
+ (
+ cd bar &&
+ stg clone ../foo baz &&
+ stg new p3 -m p3 &&
+ git submodule add ../foo baz &&
+ stg refresh &&
+ (cd baz && git reset --hard HEAD^) &&
+ stg new p4 -m p4 &&
+ stg refresh &&
+ stg pop &&
+ stg push
+ )
+'
+
test_done
--
2.15.0

Jacob Keller

unread,
Nov 28, 2017, 7:04:55 PM11/28/17
to st...@googlegroups.com, Jacob Keller
When refreshing patches, the default action is to include every path
which is changed relative to the current working directory. This common
workflow creates problems when one uses a submodule. Due to the nature
of submodules, under some workflows they can often be out-of-sync, such
as when working on new updates within a submodule.

Accidental refreshes which include the submodule changes often occur,
which leads to commits which unintentionally include changes to
submodule pointers. These commits likely cause submodules to point to
incorrect locations.

Since it is not easy to convince many developers to always use "refresh
--index", instead, modify the default refresh behavior.

Instead of including all files, now check to make sure that we exclude
submodule files.

We can do this by using git ls-tree to list the files and their modes.
To avoid re-running ls-tree many times, we'll use a compiled regex to
match lines that look like submodules, and extract their filename.

Add a new option to refresh "--submodules" which tells refresh to
include submodules regardless. Additionally, add a "--no-submodules" and
a stg.refreshsubmodules configuration so that users may opt to configure
this behavior.

Since the number of users who use submodules is likely small, make this
the new default behavior.

Signed-off-by: Jacob Keller <jacob.e...@intel.com>
---
stgit/commands/refresh.py | 33 +++++++++++++++++++++++++---
stgit/config.py | 29 +++++++++++++------------
stgit/lib/git.py | 9 ++++++++
t/t5000-refresh-submodules.sh | 50 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 104 insertions(+), 17 deletions(-)
create mode 100755 t/t5000-refresh-submodules.sh

diff --git a/stgit/commands/refresh.py b/stgit/commands/refresh.py
index f6c0bf371436..631167cf5445 100644
--- a/stgit/commands/refresh.py
+++ b/stgit/commands/refresh.py
@@ -6,6 +6,7 @@ from stgit import argparse, utils
from stgit.argparse import opt
from stgit.commands import common
from stgit.lib import git, transaction, edit
+from stgit.config import config, GitConfigException
from stgit.out import out

__copyright__ = """
@@ -71,7 +72,11 @@ options = [
opt('-e', '--edit', action = 'store_true',
short = 'Invoke an editor for the patch description'),
opt('-a', '--annotate', metavar = 'NOTE',
- short = 'Annotate the patch log entry')
+ short = 'Annotate the patch log entry'),
+ opt('-s', '--submodules', action = 'store_true',
+ short = 'Include submodules when refreshing patch contents'),
+ opt('', '--no-submodules', action = 'store_false', dest = 'submodules',
+ short = 'Exclude submodules when refreshing patch contents')
] + (argparse.message_options(save_template = False) +
argparse.hook_options() +
argparse.sign_options() + argparse.author_options())
@@ -91,7 +96,7 @@ def get_patch(stack, given_patch):
'Cannot refresh top patch, because no patches are applied')
return stack.patchorder.applied[-1]

-def list_files(stack, patch_name, args, index, update):
+def list_files(stack, patch_name, args, index, update, submodules):
"""Figure out which files to update."""
if index:
# --index: Don't update the index.
@@ -102,6 +107,15 @@ def list_files(stack, patch_name, args, index, update):
# --update: Restrict update to the paths that were already
# part of the patch.
paths &= stack.patches.get(patch_name).files()
+ else:
+ # Avoid including submodule files by default. This is to ensure that
+ # users in repositories with submodueles do not accidentally include
+ # submodule changes to patches just because they happen to have not
+ # run "git submodule update" prior to running stg refresh. We won't
+ # exclude them if we're explicitly told to include them, or if we're
+ # given explicit paths.
+ if not args and not submodules:
+ paths -= stack.repository.submodules(stack.head.data.tree)
return paths

def write_tree(stack, paths, temp_index):
@@ -245,9 +259,22 @@ def func(parser, options, args):
raise common.CmdException(
'You cannot --force a full refresh when using --index mode')

+ if options.update and options.submodules:
+ raise common.CmdException(
+ '--submodules is meaningless when only updating modified files')
+
+ if options.index and options.submodules:
+ raise common.CmdException(
+ '--submodules is meaningless when keeping the current index')
+
+ # If submodules was not specified on the command line, infer a default
+ # from configuration.
+ if options.submodules is None:
+ options.submodules = (config.get('stgit.refreshsubmodules') == 'yes')
+
stack = directory.repository.current_stack
patch_name = get_patch(stack, options.patch)
- paths = list_files(stack, patch_name, args, options.index, options.update)
+ paths = list_files(stack, patch_name, args, options.index, options.update, options.submodules)

# Make sure there are no conflicts in the files we want to
# refresh.
diff --git a/stgit/config.py b/stgit/config.py
index e6134cc3c82d..4cb609407aec 100644
--- a/stgit/config.py
+++ b/stgit/config.py
@@ -32,20 +32,21 @@ class GitConfigException(StgException):

class GitConfig(object):
__defaults = {
- 'stgit.smtpserver': ['localhost:25'],
- 'stgit.smtpdelay': ['5'],
- 'stgit.pullcmd': ['git pull'],
- 'stgit.fetchcmd': ['git fetch'],
- 'stgit.pull-policy': ['pull'],
- 'stgit.autoimerge': ['no'],
- 'stgit.keepoptimized': ['no'],
- 'stgit.shortnr': ['5'],
- 'stgit.pager': ['less'],
- 'stgit.alias.add': ['git add'],
- 'stgit.alias.rm': ['git rm'],
- 'stgit.alias.mv': ['git mv'],
- 'stgit.alias.resolved': ['git add'],
- 'stgit.alias.status': ['git status -s']
+ 'stgit.smtpserver': ['localhost:25'],
+ 'stgit.smtpdelay': ['5'],
+ 'stgit.pullcmd': ['git pull'],
+ 'stgit.fetchcmd': ['git fetch'],
+ 'stgit.pull-policy': ['pull'],
+ 'stgit.autoimerge': ['no'],
+ 'stgit.keepoptimized': ['no'],
+ 'stgit.refreshsubmodules': ['no'],
+ 'stgit.shortnr': ['5'],
+ 'stgit.pager': ['less'],
+ 'stgit.alias.add': ['git add'],
+ 'stgit.alias.rm': ['git rm'],
+ 'stgit.alias.mv': ['git mv'],
+ 'stgit.alias.resolved': ['git add'],
+ 'stgit.alias.status': ['git status -s']
}

__cache = None
diff --git a/stgit/lib/git.py b/stgit/lib/git.py
index 82b7a900afb7..62b3377c7cc5 100644
--- a/stgit/lib/git.py
+++ b/stgit/lib/git.py
@@ -815,6 +815,15 @@ class Repository(RunWithEnv):
return None
finally:
index.delete()
+ def submodules(self, tree):
+ """Given a L{Tree}, return list of paths which are submodules."""
+ assert isinstance(tree, Tree)
+ # A simple regex to match submodule entries
+ regex = re.compile(r'160000 commit [0-9a-f]{40}\t(.*)$')
+ # First, use ls-tree to get all the trees and links
+ files = self.run(['git', 'ls-tree', '-d', '-r', '-z', tree.sha1]).output_lines('\0')
+ # Then extract the paths of any submodules
+ return set(m.group(1) for m in map(regex.match, files) if m)
def diff_tree(self, t1, t2, diff_opts, binary = True):
"""Given two L{Tree}s C{t1} and C{t2}, return the patch that takes
C{t1} to C{t2}.
diff --git a/t/t5000-refresh-submodules.sh b/t/t5000-refresh-submodules.sh
new file mode 100755
index 000000000000..927d16afe6af
--- /dev/null
+++ b/t/t5000-refresh-submodules.sh
@@ -0,0 +1,50 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Intel Corporation
+#
+
+test_description='Refresh with submodules'
+
+. ./test-lib.sh
+
+test_expect_success 'refresh with a submodule does not include by default' '
+ test_create_repo foo &&
+ git submodule add ./foo foo &&
+ git commit -m "submodule" &&
+ stg init &&
+ (
+ cd foo &&
+ touch file1 &&
+ git add file1 &&
+ git commit -m "change in submodule"
+ ) &&
+ stg new p1 -m p1 &&
+ stg refresh &&
+ [ "$(stg status)" == " M foo" ]
+'
+
+test_expect_success 'refresh includes non-submodule changes' '
+ touch file2 &&
+ git add file2 &&
+ stg refresh &&
+ [ "$(stg status)" == " M foo" ]
+'
+
+test_expect_success 'refresh with --submodules' '
+ stg refresh --submodules &&
+ [ "$(stg status)" == "" ]
+'
+
+test_expect_success 'refresh --no-submodules overrides config' '
+ stg undo && stg undo &&
+ git config stgit.refreshsubmodules yes &&
+ stg refresh --no-submodules &&
+ [ "$(stg status)" == " M foo" ]
+'
+
+test_expect_success 'refresh with config' '
+ stg refresh &&
+ [ "$(stg status)" == "" ]
+'
+
+test_done
--
2.15.0

Reply all
Reply to author
Forward
0 new messages