I'm using iniparse in an extension to parse Mercurial (http://mercurial.selenic.com/) configuration files. I received a number of complaints/requests that the file being produced was particularly legible - specifically there wasn't always an empty line at the end of the file and sometimes extra spaces were inserted. To resolve this, I had to modify iniparse with the following function. I am now using my modified iniparse, but would like to see this functionality included in a future release of iniparse (so I may go back to using the 'official' version). Please let me know what you think :)
Best,
Paul
diff --git a/ini.py b/ini.py
--- a/ini.py
+++ b/ini.py
@@ -485,6 +485,24 @@
yield x.name
d.add(x.name)
+ def clean_format(self):
+ '''
+ this functions makes the configuration look
+ clean and handwritten - two consecutive EmptyLines are removed,
+ and one is guaranteed to be at the end of the file
+ '''
+ cont = self._data.contents
+ if (len(cont) > 1):
+ for i in range(1, len(cont)):
+ if i > len(cont)-1:
+ break
+ while isinstance(cont[i-1], EmptyLine) and isinstance(cont[i], EmptyLine):
+ cont.remove(cont[i])
+ if i > len(cont)-1:
+ break
+ if cont and not isinstance(cont[len(cont) - 1], EmptyLine):
+ self._data.add(EmptyLine())
+
def _new_namespace(self, name):
if self._data.contents:
self._data.add(EmptyLine())
I think it would be a useful addition. The one change I would make is
to prefix the function name with an underscore, so that it has a
smaller chance of clashing with an INI section name. Another option
is for clean_format to be a helper function in the module, outside the
class. Which do you think is better?
-param
> --
> To post to this group, send email to iniparse...@googlegroups.com
> To unsubscribe from this group, send email to iniparse-discu...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/iniparse-discuss?hl=en
Well in most Python I'm familiar with, prefixing with an underscore denotes the author intends the variable/function to be a 'private' member. So if we prefixed it we an underscore, I wouldn't be able to call clean_format from my client code (extension) without violating that convention. Doing that would probably get me in trouble with the Mercurial people ;). So I'd think the second option, having clean_format as a helper function in the module makes the most sense.
Best,
Paul
I'll commit it in a few days and release a new version.
-param
So sorry about this delay. When I announced my extension on the mercurial mailing list, it came up that iniparse was incompatible with the %include syntax mercurial uses for config files. See:
http://selenic.com/pipermail/mercurial-devel/2010-February/018320.html
http://selenic.com/pipermail/mercurial-devel/2010-February/018323.html
I took Steve Borho's monkey patch that he used for TortoiseHG, and am including it my revisited patch (along with clean_format) attached here. I've also attached a little unit test for clean_format - I wasn't sure of the exact implementation you wanted for including it as a helper function in the module, so I just used the API I had (as a method of INIParse), but I think it should be pretty trivial to switch to whatever final implementation you choose.
Best,
Paul
Regarding comment syntax:
* leading "rem": until you reminded me, I had forgotten about this
oddity. The Python documentation doesn't refer to it either.
However, the ConfigParser module supports that syntax.
* Subversion include syntax: you'd like the include statements to be
treated as comments. That means that you could set a value using
iniparse, and the include could override it. Is that OK?
One of the primary goals of iniparse is to stay compatible with the
standard library's ConfigParser module. So the new regexp cannot be
used unconditionally - it would have to be enabled explicitly, maybe
using a module-level flag.
Is Mercurial's include syntax specific to it, or is it more generally used?
-param
> Patch:
>
> diff --git a/a/ini.py b/b/ini.py
> index d58c38f..6fb0eff 100644
> --- a/a/ini.py
> +++ b/b/ini.py
> @@ -159,8 +159,9 @@ class OptionLine(LineType):
>
>
> class CommentLine(LineType):
> - regex = re.compile(r'^(?P<csep>[;#]|[rR][eE][mM])'
> - r'(?P<comment>.*)$')
> +# Mercurial-safe comment line regex, as given by Steve Borho
> +# bitbucket.org/tortoisehg/stable/src/tip/tortoisehg/hgtk/thgconfig.py#cl-1084
> + regex = re.compile(r'^(?P<csep>[%;#])(?P<comment>.*)$')
>
> def __init__(self, comment='', separator='#', line=None):
> super(CommentLine, self).__init__(line)
> @@ -485,6 +486,24 @@ class INIConfig(config.ConfigNamespace):
> yield x.name
> d.add(x.name)
>
> + def clean_format(self):
> + '''
> + this functions makes the configuration look
> + clean and handwritten - two consecutive EmptyLines are removed,
> + and one is guaranteed to be at the end of the file
> + '''
> + cont = self._data.contents
> + if (len(cont) > 1):
> + for i in range(1, len(cont)):
> + if i > len(cont)-1:
> + break
> + while isinstance(cont[i-1], EmptyLine) and isinstance(cont[i], EmptyLine):
> + cont.remove(cont[i])
> + if i > len(cont)-1:
> + break
> + if (not cont) or (cont and not isinstance(cont[len(cont) - 1], EmptyLine)):
> + self._data.add(EmptyLine())
> +
> def _new_namespace(self, name):
> if self._data.contents:
> self._data.add(EmptyLine())
>
>
> Unit Test:
>
> import unittest
> from ini import INIConfig, EmptyLine
>
>
> class CleanFormatTest(unittest.TestCase):
>
> def setUp(self):
> self.cfg = INIConfig()
>
> def testfinalnewline(self):
> self.assertTrue(not self.cfg._data.contents)
> self.cfg.clean_format()
> cont = self.cfg._data.contents
> self.assertTrue(isinstance(cont[0], EmptyLine))
>
> def testremovethreenewlines(self):
> self.cfg.newsection.newproperty = "Ok"
> self.cfg.newsection2.newproperty2 = "Ok"
> cont = self.cfg._data.contents
> self.assertTrue(isinstance(cont[1], EmptyLine) and isinstance(cont[2], EmptyLine))
> self.cfg.clean_format()
> self.assertFalse(isinstance(cont[1], EmptyLine) and isinstance(cont[2], EmptyLine))
>
>
> if __name__ == '__main__':
> unittest.main()
clean_format: I renamed it to tidy() - it's a module function that
takes an INIConfig object (or a ConfigParser object) as input. It
will remove consecutive blank lines (both within and outside
sections), remove all blank lines at the start of the file, and ensure
that the file ends with a blank line, unless the file is empty. Let
me know whether it works for your use case.
Also, as I realized that iniparse on its own should not cause
consecutive blank lines to appear in the file... so I'm unsure of how
you were ending up with untidy files. Are the files being changed in
other ways as well, or have I missed a way for iniparse to cause
consecutive blank lines?
Finally, I also added a change_comment_syntax() function with the
following signature:
def change_comment_syntax(comment_chars='%;#', allow_rem=False):
If you call it without arguments, it should give you the
mercurial-safe comment syntax. Calling it with (';#', True) will
result in backward-compatible syntax. The default comment syntax
remains unchanged.
Please try out the latest code from SVN and let me know if it works
for you. Once you confirm, I'll do a release.
-param
> Patch:
>
> diff --git a/a/ini.py b/b/ini.py
> index d58c38f..6fb0eff 100644
> --- a/a/ini.py
> +++ b/b/ini.py
> @@ -159,8 +159,9 @@ class OptionLine(LineType):
>
>
> class CommentLine(LineType):
> - regex = re.compile(r'^(?P<csep>[;#]|[rR][eE][mM])'
> - r'(?P<comment>.*)$')
> +# Mercurial-safe comment line regex, as given by Steve Borho
> +# bitbucket.org/tortoisehg/stable/src/tip/tortoisehg/hgtk/thgconfig.py#cl-1084
> + regex = re.compile(r'^(?P<csep>[%;#])(?P<comment>.*)$')
>
> def __init__(self, comment='', separator='#', line=None):
> super(CommentLine, self).__init__(line)
> @@ -485,6 +486,24 @@ class INIConfig(config.ConfigNamespace):
> yield x.name
> d.add(x.name)
>
> + def clean_format(self):
> + '''
> + this functions makes the configuration look
> + clean and handwritten - two consecutive EmptyLines are removed,
> + and one is guaranteed to be at the end of the file
> + '''
> + cont = self._data.contents
> + if (len(cont) > 1):
> + for i in range(1, len(cont)):
> + if i > len(cont)-1:
> + break
> + while isinstance(cont[i-1], EmptyLine) and isinstance(cont[i], EmptyLine):
> + cont.remove(cont[i])
> + if i > len(cont)-1:
> + break
> + if (not cont) or (cont and not isinstance(cont[len(cont) - 1], EmptyLine)):
> + self._data.add(EmptyLine())
> +
> def _new_namespace(self, name):
> if self._data.contents:
> self._data.add(EmptyLine())
>
>
> Unit Test:
>
> import unittest
> from ini import INIConfig, EmptyLine
>
>
> class CleanFormatTest(unittest.TestCase):
>
> def setUp(self):
> self.cfg = INIConfig()
>
> def testfinalnewline(self):
> self.assertTrue(not self.cfg._data.contents)
> self.cfg.clean_format()
> cont = self.cfg._data.contents
> self.assertTrue(isinstance(cont[0], EmptyLine))
>
> def testremovethreenewlines(self):
> self.cfg.newsection.newproperty = "Ok"
> self.cfg.newsection2.newproperty2 = "Ok"
> cont = self.cfg._data.contents
> self.assertTrue(isinstance(cont[1], EmptyLine) and isinstance(cont[2], EmptyLine))
> self.cfg.clean_format()
> self.assertFalse(isinstance(cont[1], EmptyLine) and isinstance(cont[2], EmptyLine))
>
>
> if __name__ == '__main__':
> unittest.main()
>
>
>
> On 2010-01-28, at 6:53 PM, Paramjit Oberoi wrote:
>