[PATCH 3 of 6 STABLE V2] dirstate: add 'rdirs()' to examine 'directory pattern' matching correctly

FUJIWARA Katsunori

unread,

Mar 22, 2012, 11:06:33 AM3/22/12

to mercuri...@selenic.com

# HG changeset patch
# User FUJIWARA Katsunori <fo...@lares.dti.ne.jp>
# Date 1332428327 -32400
# Branch stable
# Node ID 200e12e1d1f146e52bf01aed854562df30089c45
# Parent d09da50da3b2a89eb2addc4deaea23b8b1683076
dirstate: add 'rdirs()' to examine 'directory pattern' matching correctly

when files are marked as 'removed' in working context:

- entry in 'dirstate' is dropped, and
- if there is no other files in target directory:
- entry in 'dirstate._dirs' is dropped, and
- directory itself is removed from filesystem

so, 'f in dirstate.dirs()' can't examine whether specified pattern is
related to removed largefiles or not correctly, if it is 'directory
pattern'.

this patch adds 'dirstate.rdirs()' (removed-dirs) to know what
directories are related to removed files.

'dirstate.rdirs()' is not for serious purpose, but for checking of
directory matching to removed files, so it is updated only in
'_droppath()'.

in almost all cases, 'dirstate.rdirs()' is probably invoked after
examination with 'dirstate.dirs()', so 'dirstate._rdirs' property is
built in 'dirstate._dirs()' to prevent manifest from being scanned twice.

diff -r d09da50da3b2 -r 200e12e1d1f1 mercurial/dirstate.py
--- a/mercurial/dirstate.py Thu Mar 22 23:58:47 2012 +0900
+++ b/mercurial/dirstate.py Thu Mar 22 23:58:47 2012 +0900
@@ -46,6 +46,15 @@
return
del dirs[base]

+def _addrdirs(dirs, path):
+ pos = path.rfind('/')
+ while pos != -1:
+ path = path[:pos]
+ if path in dirs:
+ break # dirs already contains this and above
+ dirs.add(path)
+ pos = path.rfind('/')
+
class dirstate(object):

def __init__(self, opener, ui, root, validate):
@@ -113,9 +122,13 @@
@propertycache
def _dirs(self):
dirs = {}
+ rdirs = set()
for f, s in self._map.iteritems():
if s[0] != 'r':
_incdirs(dirs, f)
+ else:
+ _addrdirs(rdirs, f)
+ self._rdirs = rdirs
return dirs

def dirs(self):
@@ -261,8 +274,9 @@
self._pl = p

def invalidate(self):
- for a in ("_map", "_copymap", "_foldmap", "_branch", "_pl", "_dirs",
- "_ignore"):
+ for a in ("_map", "_copymap", "_foldmap", "_branch", "_pl",
+ "_dirs", "_rdirs",
+ "_ignore"):
if a in self.__dict__:
delattr(self, a)
self._lastnormaltime = 0
@@ -285,8 +299,10 @@
return self._copymap

def _droppath(self, f):
- if self[f] not in "?r" and "_dirs" in self.__dict__:
- _decdirs(self._dirs, f)
+ if self[f] not in "?r":
+ if "_dirs" in self.__dict__:
+ _decdirs(self._dirs, f)
+ _addrdirs(self._rdirs, f)

def _addpath(self, f, check=False):
oldstate = self[f]
@@ -425,8 +441,9 @@

def clear(self):
self._map = {}
- if "_dirs" in self.__dict__:
- delattr(self, "_dirs")
+ for a in ("_dirs", "_rdirs"):
+ if a in self.__dict__:
+ delattr(self, a)
self._copymap = {}
self._pl = [nullid, nullid]
self._lastnormaltime = 0
@@ -742,3 +759,14 @@

return (lookup, modified, added, removed, deleted, unknown, ignored,
clean)
+
+ @propertycache
+ def _rdirs(self):
+ '''not for strict purpose, but for checking of directory
+ matching to removed files'''
+
+ self._dirs
+ return self._rdirs
+
+ def rdirs(self):
+ return self._rdirs
_______________________________________________
Mercurial-devel mailing list
Mercuri...@selenic.com
http://selenic.com/mailman/listinfo/mercurial-devel

FUJIWARA Katsunori

unread,

Mar 22, 2012, 11:06:34 AM3/22/12

to mercuri...@selenic.com

# HG changeset patch
# User FUJIWARA Katsunori <fo...@lares.dti.ne.jp>
# Date 1332428327 -32400
# Branch stable

# Node ID dac978818d9bcd735a3135d584e514d76a1cef01
# Parent 200e12e1d1f146e52bf01aed854562df30089c45
largefiles: use 'dirstate.rdirs()' to match against removed directory

original implementation uses '__contains__()' and 'rdirs()' of
dirstate to examine whether specified pattern is related to largefiles
in target context.

but the directory removed recursively by 'hg remove' is not recognized
in both, so 'directory pattern' for it can't be recognized correctly.

this patch uses 'dirstate.rdirs()' for 'directory pattern' matched
against removed directory.

diff -r 200e12e1d1f1 -r dac978818d9b hgext/largefiles/reposetup.py
--- a/hgext/largefiles/reposetup.py Thu Mar 22 23:58:47 2012 +0900
+++ b/hgext/largefiles/reposetup.py Thu Mar 22 23:58:47 2012 +0900
@@ -140,7 +140,8 @@
if working:
sf = lfutil.standin(file)
dirstate = repo.dirstate
- if sf in dirstate or sf in dirstate.dirs():
+ if (sf in dirstate or sf in dirstate.dirs() or
+ sf in dirstate.rdirs()):
return sf
return file

@@ -173,7 +174,8 @@
def sfindirstate(f):
sf = lfutil.standin(f)
dirstate = repo.dirstate
- return sf in dirstate or sf in dirstate.dirs()
+ return (sf in dirstate or sf in dirstate.dirs() or
+ sf in dirstate.rdirs())
match._files = [f for f in match._files
if sfindirstate(f)]
# Don't waste time getting the ignored and unknown
diff -r 200e12e1d1f1 -r dac978818d9b tests/test-largefiles.t
--- a/tests/test-largefiles.t Thu Mar 22 23:58:47 2012 +0900
+++ b/tests/test-largefiles.t Thu Mar 22 23:58:47 2012 +0900
@@ -1014,3 +1014,26 @@

$ cd ..
+
+test of 'hg status' with 'directory pattern' for recursively removed
+directory.
+
+ $ hg init removeddirpat
+ $ cd removeddirpat
+
+ $ mkdir -p sub/sub/sub
+ $ echo a > sub/sub/sub/a.large
+ $ echo b > b.large
+ $ hg add --large sub/sub/sub/a.large b.large
+ $ hg commit -m '#0'
+ Invoking status precommit hook
+ A b.large
+ A sub/sub/sub/a.large
+
+ $ hg remove sub/sub/sub/a.large
+ $ test ! -d sub/sub/sub
+ $ hg status -A b.large sub/sub/sub
+ R sub/sub/sub/a.large
+ C b.large
+
+ $ cd ..

FUJIWARA Katsunori

unread,

Mar 22, 2012, 11:06:36 AM3/22/12

to mercuri...@selenic.com

# HG changeset patch
# User FUJIWARA Katsunori <fo...@lares.dti.ne.jp>
# Date 1332428327 -32400
# Branch stable

# Node ID b2eb2c3bfd8860500448c31392c982347ef230d1
# Parent 0218251e818b57226cbed23a10434f586a4d6ed8
largefiles: check also another context for files known only in it

original implementation examines whether specified pattern is related
to largefiles in target context only with dirstate.

but this can't recognize patterns related to largefiles only known in
non-working context.

this patch checks also another context for such case.

this patch defines 'finctx()' and 'dinctx()' separately, because
building up 'dirs()' and 'rdirs()' of dirstate/changectx can be
avoided, if all specified patterns match against largefiles directly.

this patch treats '.' as special character, because:

- '.' is not recognized in dirstate/changectx:
this patch avoids unnecessary examination by finctx/dinctx.

- it means 'matches all files in working directory':
this patch chooses(or avoids) 'performance boost' route correctly
(also related to issue 3327)

diff -r 0218251e818b -r b2eb2c3bfd88 hgext/largefiles/reposetup.py

--- a/hgext/largefiles/reposetup.py Thu Mar 22 23:58:47 2012 +0900
+++ b/hgext/largefiles/reposetup.py Thu Mar 22 23:58:47 2012 +0900

@@ -120,14 +120,31 @@
if match is None:
match = match_.always(self.root, self.getcwd())

+ dirstate = repo.dirstate
+ if parentworking:
+ finctx = lambda f: f in dirstate
+ dinctx = lambda d: (d in dirstate.dirs() or
+ d in dirstate.rdirs())
+ elif working:
+ finctx = lambda f: f in dirstate or f in ctx1
+ dinctx = lambda d: (d in dirstate.dirs() or
+ d in dirstate.rdirs() or
+ d in ctx1.dirs())
+ else:
+ finctx = lambda f: f in ctx2 or f in ctx1
+ dinctx = lambda d: d in ctx2.dirs() or d in ctx1.dirs()
+
# First check if there were files specified on the
# command line. If there were, and none of them were
# largefiles, we should just bail here and let super
# handle it -- thus gaining a big performance boost.
lfdirstate = lfutil.openlfdirstate(ui, self)
if match.files() and not match.anypats():
- for f in lfdirstate:
- if match(f):
+ for f in match.files():
+ if f == '.':
+ break
+ sf = lfutil.standin(f)
+ if finctx(sf) or dinctx(sf):
break
else:
return super(lfiles_repo, self).status(node1, node2,
@@ -137,11 +154,9 @@
# Create a copy of match that matches standins instead
# of largefiles.
def tostandin(file):
- if working:
+ if working and file != '.':
sf = lfutil.standin(file)
- dirstate = repo.dirstate
- if (sf in dirstate or sf in dirstate.dirs() or
- sf in dirstate.rdirs()):
+ if finctx(sf) or dinctx(sf):
return sf
return file

diff -r 0218251e818b -r b2eb2c3bfd88 tests/test-largefiles.t

--- a/tests/test-largefiles.t Thu Mar 22 23:58:47 2012 +0900
+++ b/tests/test-largefiles.t Thu Mar 22 23:58:47 2012 +0900

@@ -1059,3 +1059,62 @@
C b.large

$ cd ..
+
+Test 'hg status' for largefiles known only in another revision.
+
+ $ hg init knownanother
+ $ cd knownanother
+
+ $ echo a > a.normal
+ $ hg add a.normal

+ $ hg commit -m '#0'
+ Invoking status precommit hook

+ A a.normal
+ $ echo b > b.normal
+ $ hg add b.normal
+ $ hg commit -m '#1'

+ Invoking status precommit hook

+ A b.normal

+ $ mkdir -p sub/sub/sub

+ $ echo c > sub/sub/sub/c.large
+ $ hg add --large sub/sub/sub/c.large
+ $ hg commit -m '#2'

+ Invoking status precommit hook

+ A sub/sub/sub/c.large
+
+ $ hg update 1
+ 0 files updated, 0 files merged, 1 files removed, 0 files unresolved
+ getting changed largefiles
+ 0 largefiles updated, 1 removed

+ $ test ! -d sub/sub/sub
+

+'file pattern' for the file known also in working context should be
+specified for certain test, because it prevents dirstate from full
+scan which may show correct status unexpectedly.
+
+ $ hg status --rev 2 sub/sub/sub/c.large a.normal
+ R sub/sub/sub/c.large
+ $ hg status --rev 0 --rev 2 sub/sub/sub/c.large a.normal
+ A sub/sub/sub/c.large
+
+ $ hg status --rev 2 sub/sub/sub a.normal
+ R sub/sub/sub/c.large
+ $ hg status --rev 0 --rev 2 sub/sub/sub a.normal
+ A sub/sub/sub/c.large

+
+ $ cd ..

+
+treat '.' as the pattern (maybe) related to largefiles (issue3327)
+
+ $ hg init issue3327
+ $ echo a > issue3327/large
+ $ hg -R issue3327 add --large issue3327/large
+ $ hg -R issue3327 commit -m '#0' issue3327/large

+ Invoking status precommit hook

+ A large
+ $ hg --cwd issue3327 -R . status --all
+ C large
+ $ hg --cwd issue3327 -R . status --all .
+ C large
+ $ hg --cwd issue3327 -R . status --all -I '*' .
+ C large

FUJIWARA Katsunori

unread,

Mar 22, 2012, 11:06:30 AM3/22/12

to mercuri...@selenic.com

this patch series fixes bugs of "hg status" with largefiles extension.

refactoring/new-features are minimized than previous post, so this is
posted with STABLE flag, too.

FUJIWARA Katsunori

unread,

Mar 22, 2012, 11:06:31 AM3/22/12

to mercuri...@selenic.com

# HG changeset patch
# User FUJIWARA Katsunori <fo...@lares.dti.ne.jp>
# Date 1332428327 -32400
# Branch stable

# Node ID 447a62ee6b761f083fb839a2f65df9b1d9579cc2
# Parent 2338ab19b23690e74e1ba3936667399dfeb9490c
largefiles: suppress unexpected warning of 'hg status' for removed files

original implementation queries whether specified pattern is related
or not to largefiles, to target context.

but changectx/workingctx returns False about relationship with files
marked as removed.

So, 'hg status' with 'file pattern' for removed file shows unexpected
warning message in below process:

1. 'tostandin()' returns non-STANDIN filename for removed file,
because changectx/workingctx returns False about relationship
with it

2. 'match.files()' contains non-STANDIN filename, which is already
removed from working directory

3. 'dirstate.walk()' invoked via 'localrepository.status()' treats
non-STANDIN filename as bad filename, because there is no entry
for it in dirstate: only STANDIN is managed in dirstate

4. 'dirstate.walk()' invokes 'match.bad()', which is defined in
'localrepository.status()' as 'bad()'

5. 'bad()' shows warning message for non-STANDIN, because it is
not related to source context: only STANDIN is related to it

this patch queries to dirstate instead of changectxt/workingctx,
because dirstate returns expected result for removed files.

'match.files()' is used by 'localrepository.status()' only in
'working' case, so this patched code also works correctly in
non-'working' case.

diff -r 2338ab19b236 -r 447a62ee6b76 hgext/largefiles/reposetup.py
--- a/hgext/largefiles/reposetup.py Sun Mar 18 18:21:32 2012 -0500

+++ b/hgext/largefiles/reposetup.py Thu Mar 22 23:58:47 2012 +0900

@@ -137,7 +137,7 @@

# Create a copy of match that matches standins instead
# of largefiles.
def tostandin(file):

- if inctx(lfutil.standin(file), ctx2):
+ if working and lfutil.standin(file) in repo.dirstate:
return lfutil.standin(file)
return file

diff -r 2338ab19b236 -r 447a62ee6b76 tests/test-largefiles.t
--- a/tests/test-largefiles.t Sun Mar 18 18:21:32 2012 -0500

+++ b/tests/test-largefiles.t Thu Mar 22 23:58:47 2012 +0900

@@ -68,6 +68,8 @@
Remove both largefiles and normal files.

$ hg remove normal1 large1
+ $ hg status large1
+ R large1
$ hg commit -m "remove files"
Invoking status precommit hook
R large1

FUJIWARA Katsunori

unread,

Mar 22, 2012, 11:06:32 AM3/22/12

to mercuri...@selenic.com

# HG changeset patch
# User FUJIWARA Katsunori <fo...@lares.dti.ne.jp>
# Date 1332428327 -32400
# Branch stable

# Node ID d09da50da3b2a89eb2addc4deaea23b8b1683076
# Parent 447a62ee6b761f083fb839a2f65df9b1d9579cc2
largefiles: use 'dirstate.dirs()' for 'directory pattern' relation check

original implementation queries whether specified pattern is related

or not to largefiles in target context by 'dirstate.__contains__()'.

but this can't recognize 'directory pattern' correctly, so this patch
uses 'dirstate.dirs()' for it.

this patch uses dirstate instead of lfdirstate in 'working' route
(second patch hunk for 'hgext/largefiles/reposetup.py'), because
'dirs()' information may be already built for dirstate but not yet for
lfdirstate at this point. this prevents lfdirstate from building up
and having 'dirs()' information.

diff -r 447a62ee6b76 -r d09da50da3b2 hgext/largefiles/reposetup.py
--- a/hgext/largefiles/reposetup.py Thu Mar 22 23:58:47 2012 +0900

+++ b/hgext/largefiles/reposetup.py Thu Mar 22 23:58:47 2012 +0900

@@ -137,8 +137,11 @@

# Create a copy of match that matches standins instead
# of largefiles.
def tostandin(file):

- if working and lfutil.standin(file) in repo.dirstate:
- return lfutil.standin(file)
+ if working:
+ sf = lfutil.standin(file)
+ dirstate = repo.dirstate
+ if sf in dirstate or sf in dirstate.dirs():
+ return sf
return file

# Create a function that we can use to override what is
@@ -167,8 +170,12 @@
orig_ignore = lfdirstate._ignore
lfdirstate._ignore = _ignoreoverride

- match._files = [f for f in match._files if f in
- lfdirstate]
+ def sfindirstate(f):
+ sf = lfutil.standin(f)
+ dirstate = repo.dirstate
+ return sf in dirstate or sf in dirstate.dirs()
+ match._files = [f for f in match._files
+ if sfindirstate(f)]

# Don't waste time getting the ignored and unknown

# files again; we already have them
s = lfdirstate.status(match, [], False,
diff -r 447a62ee6b76 -r d09da50da3b2 tests/test-largefiles.t
--- a/tests/test-largefiles.t Thu Mar 22 23:58:47 2012 +0900

+++ b/tests/test-largefiles.t Thu Mar 22 23:58:47 2012 +0900

@@ -251,6 +251,13 @@
A sub2/large6
A sub2/large7

+Test "hg status" with combination of 'file pattern' and 'directory
+pattern' for largefiles:
+
+ $ hg status sub2/large6 sub2
+ A sub2/large6
+ A sub2/large7
+
Config settings (pattern **.dat, minsize 2 MB) are respected.

$ echo testdata > test.dat

Matt Mackall

unread,

Mar 23, 2012, 4:45:32 PM3/23/12

to FUJIWARA Katsunori, mercuri...@selenic.com

On Fri, 2012-03-23 at 00:06 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <fo...@lares.dti.ne.jp>
> # Date 1332428327 -32400
> # Branch stable
> # Node ID 200e12e1d1f146e52bf01aed854562df30089c45
> # Parent d09da50da3b2a89eb2addc4deaea23b8b1683076
> dirstate: add 'rdirs()' to examine 'directory pattern' matching correctly

Ok, let's hit pause here.

This patch just further confirms my impression that something is deeply
wrong with the largefiles implementation. And the rule is: extensions
are not allowed to infect the core with their broken design. The core
doesn't need or want any of this extra complexity.

Obviously, this could be done in the largefiles code. For instance, we
could subclass dirstate. But it really seems to me there ought to be a
way to fix the abstraction to avoid the need for this sort of hack
entirely.

For instance, I think we should never have to ask the question "is this
a largefile?" outside a very small set of functions. But we're checking
this or converting largefile paths to standin paths or vice-versa in
hundreds of places. That means there are probably dozens of places where
we've done it wrong or forgotten to do it.

--
Mathematics is the supreme nostalgia of our time.

Reply all

Reply to author

Forward