[PATCH 2/2] refresh: do not include submodules by default

10 views
Skip to first unread message

Jacob Keller

unread,
Nov 15, 2017, 5:26:57 AM11/15/17
to st...@googlegroups.com, Jacob Keller
When refreshing patches, the default action is to include every path
which is changed relative to the current working directory. This common
workflow creates problems when one uses a submodule. Due to the nature
of submodules, under some workflows they can often be out-of-sync, such
as when working on new updates within a submodule.

Accidental refreshes which include the submodule changes often occur,
which leads to commits which unintentionally include changes to
submodule pointers. These commits likely cause submodules to point to
incorrect locations.

Since it is not easy to convince many developers to always use "refresh
--index", instead, modify the default refresh behavior.

Instead of including all files, now check to make sure that we exclude
submodule files.

We can do this by using git ls-tree to list the files and their modes.
To avoid re-running ls-tree many times, we'll use a compiled regex to
match lines that look like submodules, and extract their filename.

Add a new option to refresh "--submodules" which tells refresh to
include submodules regardless. Additionally, add a "--no-submodules" and
a stg.refreshsubmodules configuration so that users may opt to configure
this behavior.

Since the number of users who use submodules is likely small, make this
the new default behavior.

Signed-off-by: Jacob Keller <jacob.e...@intel.com>
---
stgit/commands/refresh.py | 33 ++++++++++++++++++++++++++++++---
stgit/config.py | 29 +++++++++++++++--------------
stgit/lib/git.py | 9 +++++++++
3 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/stgit/commands/refresh.py b/stgit/commands/refresh.py
index f6c0bf371436..631167cf5445 100644
--- a/stgit/commands/refresh.py
+++ b/stgit/commands/refresh.py
@@ -6,6 +6,7 @@ from stgit import argparse, utils
from stgit.argparse import opt
from stgit.commands import common
from stgit.lib import git, transaction, edit
+from stgit.config import config, GitConfigException
from stgit.out import out

__copyright__ = """
@@ -71,7 +72,11 @@ options = [
opt('-e', '--edit', action = 'store_true',
short = 'Invoke an editor for the patch description'),
opt('-a', '--annotate', metavar = 'NOTE',
- short = 'Annotate the patch log entry')
+ short = 'Annotate the patch log entry'),
+ opt('-s', '--submodules', action = 'store_true',
+ short = 'Include submodules when refreshing patch contents'),
+ opt('', '--no-submodules', action = 'store_false', dest = 'submodules',
+ short = 'Exclude submodules when refreshing patch contents')
] + (argparse.message_options(save_template = False) +
argparse.hook_options() +
argparse.sign_options() + argparse.author_options())
@@ -91,7 +96,7 @@ def get_patch(stack, given_patch):
'Cannot refresh top patch, because no patches are applied')
return stack.patchorder.applied[-1]

-def list_files(stack, patch_name, args, index, update):
+def list_files(stack, patch_name, args, index, update, submodules):
"""Figure out which files to update."""
if index:
# --index: Don't update the index.
@@ -102,6 +107,15 @@ def list_files(stack, patch_name, args, index, update):
# --update: Restrict update to the paths that were already
# part of the patch.
paths &= stack.patches.get(patch_name).files()
+ else:
+ # Avoid including submodule files by default. This is to ensure that
+ # users in repositories with submodueles do not accidentally include
+ # submodule changes to patches just because they happen to have not
+ # run "git submodule update" prior to running stg refresh. We won't
+ # exclude them if we're explicitly told to include them, or if we're
+ # given explicit paths.
+ if not args and not submodules:
+ paths -= stack.repository.submodules(stack.head.data.tree)
return paths

def write_tree(stack, paths, temp_index):
@@ -245,9 +259,22 @@ def func(parser, options, args):
raise common.CmdException(
'You cannot --force a full refresh when using --index mode')

+ if options.update and options.submodules:
+ raise common.CmdException(
+ '--submodules is meaningless when only updating modified files')
+
+ if options.index and options.submodules:
+ raise common.CmdException(
+ '--submodules is meaningless when keeping the current index')
+
+ # If submodules was not specified on the command line, infer a default
+ # from configuration.
+ if options.submodules is None:
+ options.submodules = (config.get('stgit.refreshsubmodules') == 'yes')
+
stack = directory.repository.current_stack
patch_name = get_patch(stack, options.patch)
- paths = list_files(stack, patch_name, args, options.index, options.update)
+ paths = list_files(stack, patch_name, args, options.index, options.update, options.submodules)

# Make sure there are no conflicts in the files we want to
# refresh.
diff --git a/stgit/config.py b/stgit/config.py
index e6134cc3c82d..4cb609407aec 100644
--- a/stgit/config.py
+++ b/stgit/config.py
@@ -32,20 +32,21 @@ class GitConfigException(StgException):

class GitConfig(object):
__defaults = {
- 'stgit.smtpserver': ['localhost:25'],
- 'stgit.smtpdelay': ['5'],
- 'stgit.pullcmd': ['git pull'],
- 'stgit.fetchcmd': ['git fetch'],
- 'stgit.pull-policy': ['pull'],
- 'stgit.autoimerge': ['no'],
- 'stgit.keepoptimized': ['no'],
- 'stgit.shortnr': ['5'],
- 'stgit.pager': ['less'],
- 'stgit.alias.add': ['git add'],
- 'stgit.alias.rm': ['git rm'],
- 'stgit.alias.mv': ['git mv'],
- 'stgit.alias.resolved': ['git add'],
- 'stgit.alias.status': ['git status -s']
+ 'stgit.smtpserver': ['localhost:25'],
+ 'stgit.smtpdelay': ['5'],
+ 'stgit.pullcmd': ['git pull'],
+ 'stgit.fetchcmd': ['git fetch'],
+ 'stgit.pull-policy': ['pull'],
+ 'stgit.autoimerge': ['no'],
+ 'stgit.keepoptimized': ['no'],
+ 'stgit.refreshsubmodules': ['no'],
+ 'stgit.shortnr': ['5'],
+ 'stgit.pager': ['less'],
+ 'stgit.alias.add': ['git add'],
+ 'stgit.alias.rm': ['git rm'],
+ 'stgit.alias.mv': ['git mv'],
+ 'stgit.alias.resolved': ['git add'],
+ 'stgit.alias.status': ['git status -s']
}

__cache = None
diff --git a/stgit/lib/git.py b/stgit/lib/git.py
index 82b7a900afb7..32a4ef4ddf52 100644
--- a/stgit/lib/git.py
+++ b/stgit/lib/git.py
@@ -815,6 +815,15 @@ class Repository(RunWithEnv):
return None
finally:
index.delete()
+ def submodules(self, tree):
+ """Given a L{Tree}, return list of paths which are submodules."""
+ assert isinstance(tree, Tree)
+ # A simple regex to match submodule entries
+ regex = re.compile(r'^160000 commit [0-9a-f]{40}\t(.*)$')
+ # First, use ls-tree to get all the file information
+ files = self.run(['git', 'ls-tree', '-r', '-z', tree.sha1]).output_lines('\0')
+ # Then extract the paths of any submodules
+ return set([m.group(1) for f in files for m in [regex.search(f)] if m])
def diff_tree(self, t1, t2, diff_opts, binary = True):
"""Given two L{Tree}s C{t1} and C{t2}, return the patch that takes
C{t1} to C{t2}.
--
2.15.0

Peter Grayson

unread,
Nov 18, 2017, 8:25:42 AM11/18/17
to Jacob Keller, StGit (Stacked Git)
Having dulled my sword on git submodules, I am empathetic to your desire for these submodule-aware features in stgit. I defer to others with more submodules experience, but I would sign-off on these patches.

I do have some suggestions for minor improvements to this patch. See inline comments.

Cheers,
Pete

Spelling: "submodueles"
It is not clear to me that submodule "commit" are guaranteed to have
the mode of 160000 in all cases. Perhaps using '^[0-7]{6}' or even
'^.{6}' would be more fail-safe?
 
+        # First, use ls-tree to get all the file information
+        files = self.run(['git', 'ls-tree', '-r', '-z', tree.sha1]).output_lines('\0')

This files list will capture every single blob (file) in the tree. This has the
potential to be very large in repos with very many files. I believe we can
get a smaller list by also passing the '-d' option which will limit the output
to trees (directories) and commits (submodules), excluding blobs.
 
+        # Then extract the paths of any submodules
+        return set([m.group(1) for f in files for m in [regex.search(f)] if m])

Nit: we do not need a list comprehension (which builds a list that
immediately becomes garbage), but can instead just use a generator
expression (i.e. by removing the square brackets):

    set(m.group(1) for f in files for m in [regex.search(f)] if m)

But I think we can do somewhat better by using map().

    set(m.group(1) for m in map(regex.search, files) if m)

With Python3, map() returns a generator which avoids building any
temporary lists. And with Python2, using map() means that only one
temporary list will be built instead of one for each file.

One last potential improvement would be to use regex.match() instead
of regex.search(). The former matches from the beginning of the string
instead of searching for matches anywhere in the string. We could thus
drop the '^' from regex and match() might perform slightly better.
 
     def diff_tree(self, t1, t2, diff_opts, binary = True):
         """Given two L{Tree}s C{t1} and C{t2}, return the patch that takes
         C{t1} to C{t2}.
--
2.15.0

--
You received this message because you are subscribed to the Google Groups "StGit (Stacked Git)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stgit+unsubscribe@googlegroups.com.
To post to this group, send an email to st...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/stgit/20171115003326.24239-2-jacob.e.keller%40intel.com.
For more options, visit https://groups.google.com/d/optout.

Keller, Jacob E

unread,
Nov 20, 2017, 6:00:42 PM11/20/17
to Peter Grayson, StGit (Stacked Git)

I’ll include those suggestions in a re-roll, thanks!

 

According to the git documentation, git links *are* by definition 160000 mode: at least as documented in git-fast-import, and the git code relies on 0160000 being a gitlink, by seeing S_IFGITLINK to 0160000 and having S_ISGITLINK(m) be defined to enforce the mode of 160000, (see cache.h), so I do not believe we should allow other modes.

 

Regards,

Jake

 

To unsubscribe from this group and stop receiving emails from it, send an email to stgit+un...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages