New issue 24 by chris.morganiser: Inaccurate parsing compared to proper
Windows functions
http://code.google.com/p/iniparse/issues/detail?id=24
There are a number of issues in parsing which produce inaccurate results
when compared to normal INI parsing methods (i.e. Windows' *PrivateProfile*
functions).
Here are a few which I've noticed which are inconsistent with how
they "should" be parsed for INI files.
Quoted values: if the read value is wrapped in single or double quotes, the
quotes should be removed, permitting leading and trailing whitespace which
would otherwise not work. And it is not a "valid string" check, it is
purely a first and last character check - so Key="Val"ue" should be Val"ue,
and Key="Value1" "Value2" should be Value1" "Value2.
Comments:
* The semicolon is the only accepted comment character:
http://msdn.microsoft.com/en-us/library/ms724348%28VS.85%29.aspx, "Comments
(any line that starts with a semicolon) ..."
* There is no support for inline comments in option lines. I'm not certain
about section lines though; I didn't test that when I was working directly
with the PrivateProfile functions, but I expect that they're not allowed.
In short, the only acceptable comment form in INI files is a line starting
with a semicolon (/^;/, no leading whitespace).
Combining the quotes and comments things, it would simplify parsing
immensely if you removed support for the inline comments altogether - just
string.strip(); string[0] == string[-1] and string[0] in ('"', "'") and
string[1:-1] or string. But I quite understand that you won't want to do it
that way seeing as people use this for non-INI files with other comment
characters and inline comments and things like that. A better solution will
be attributes like "comment_chars = ';'", "inline_comments = False"
and "unquote_values = True" in your INIDialect you talked of in issue 23.
Examples of erroneous INI parsing:
[Section]
A="value"
B = ' value'
C=a ;b
+-----+----------+----------+
| Key | Expected | Actual |
+-----+----------+----------+
| A | value | "value" |
| B | value | " value" |
| C | a ;b | a |
+-----+----------+----------+
Thanks for this library, it's handy in the ways it's better than
ConfigParser :-)
Just another thing I'll add to this now; INIConfig's __init__ has one of
its parameters optionxformvalue=lower - which means that when fp is given,
it converts all the keys to lowercase. This can be overridden with
INIConfig(thingummy, optionxformvalue=None), but for an INI file it's just
plain bad - combining this with case sensitivity was just breaking my use
utterly where most values start with an uppercase letter and I'd written
all my validation like that too. For real INI files on Windows it's quite
normal to use mixed case, normally camel case. Whereas with .cfg files it's
more normal to use lowercase sections (I note that sections aren't
converted to lowercase..?) and keys, all snake case.
This time though I don't think it's really an INIDialect matter. Converting
values to lowercase when read, but not when working with it (e.g. setting
ini.Section.Key sets Key, not key) is counter-productive. And when it comes
to writing, it would break things in my application and in any others which
happen to have any upper case key names - they'd lose their upper case.
optionxformvalue=None would be a far better default.
With all of this (including trying making a system for dialects) I'm
willing to help - if you want to email me to discuss it, you should be able
to work out my email address from my account name.
Patches are most welcome. I have a three month old at home, and so the
time I can spend on iniparse these days is very limited. Use the
iniparse...@googlegroups.com mailing list to hash out the design or to
ask questions.
Regarding the case insensitivity for option names: I wasn't able to
reproduce the problem... It may be that I haven't understood exactly what
you meant. Can you post a short code snippet demonstrating the issue?
I just went over the case insensitivity thing again and couldn't find it
for a while, using it directly, but I finally worked out what it was.
Accessing [Section]:Key in the INIConfig worked, but due to the
optionxformvalue=lower, when accessing the INISection as an iterable - e.g.
set(ini.Section) - returned set(['key']) rather than set(['Key']), which
was breaking it for me as I was then doing set differences for validation
purposes.
Test case:
>>> import iniparse, StringIO
>>> list(iniparse.INIConfig(StringIO.StringIO('[Section]\nKey=value')).Section)
['Key']
# Error: actual value will be ['key'] rather than ['Key']
Sorry I got you to go to the trouble of trying to reproduce it before... I
think /I/ didn't understand exactly what I meant, because I was using it
that way. I just knew that my validator was reporting "required value Foo
is missing" and "extra invalid value foo" and whatnot.
(Since writing the patch below, one thing has occurred to me; I think this
will remove case sensitivity in accessing keys; currently the section is
case sensitive but the key is insensitive. A better fix may be needed if
you want to retain case sensitivity but still get the iterator reporting
the correct thing.)
Patch to fix it:
--- a/iniparse/ini.py
+++ b/iniparse/ini.py
@@ -439,10 +439,6 @@
yield line
-def lower(x):
- return x.lower()
-
-
class INIConfig(config.ConfigNamespace):
_data = None
_sections = None
@@ -454,7 +450,7 @@
_parse_exc = None
_bom = False
def __init__(self, fp=None, defaults=None, parse_exc=True,
- optionxformvalue=lower, optionxformsource=None,
+ optionxformvalue=None, optionxformsource=None,
sectionxformvalue=None, sectionxformsource=None):
self._data = LineContainer()
self._parse_exc = parse_exc
Comment #4 on issue 24 by psoberoi: Inaccurate parsing compared to proper
Windows functions
http://code.google.com/p/iniparse/issues/detail?id=24
Ah, I see. I think iteration should preserve case, so this is a bug.
The fix is more complicated than your patch though, because case
insensitive lookup needs to work for ConfigParser compatibility.