Patches: ini the values in quote, no section, BOM elimination

28 views
Skip to first unread message

Dwayne Bailey

unread,
Mar 12, 2008, 7:15:00 PM3/12/08
to iniparse-discuss
Hi All,

I've hacked together some patches which I uploaded to the Files
section in google groups. Hope you can retrieve them. I use iniparse
in the Translate Toolkit for localisation work so my needs are
slightly different in that I need to often read INI files that are not
quite standard. But some of these patches might be useful for other
general users and for commiting to the official iniparse. I have not
tested them against the tests within iniparse.

ini2po-with-quotes.diff
I implemented this to be able to read ini files where the value is
quoted without the value actually containing the quotes.

Thus:
name = "Something"

Will results in value of name being equal to 'Something' not
'"Somthing"'

ini-nosections2.diff
This I implemented to be able to read INI files with parts that are
not in any [sections]. I have a few files that either have no
sections at all or have a part at the top which is not part of a
section but then they do have other sections later on.

ini-bom.diff
This is a horrible hack. I had files that started with the Unicode
BOM (byte order marker). The parser fails if it finds those. They
can quite realistically be present if someone edits an INI file on
Windows and their editor adds these characters. But I couldn't work
out an elegant way to detect all cases.

Paramjit Oberoi

unread,
Mar 15, 2008, 1:32:26 PM3/15/08
to iniparse...@googlegroups.com
I'm out of town - will be back in a week.

Tim, can you take a look at these if you get the chance? If not, I
should be able to spend some time on this around March 22nd.

-param

Paramjit Oberoi

unread,
Mar 22, 2008, 1:05:51 PM3/22/08
to iniparse...@googlegroups.com
I just looked at the patches briefly. But first: thanks for posting
them, and also, thanks for describing how you are using iniparse.

So, regarding the patches: I think the features would be all good
additions. The tricky thing will be to figure out a way of
integrating the functionality without affecting compatibility.

BOM removal: I think this would be best implemented within
readline_iterator(). The file should be opened with codecs.open()
instead of the builtin open(), and if calling readline() on the file
returns a unicode string, the BOM would be stripped from it (for the
first line only). What do you think?

For convenience, maybe there should also be a function that takes a
filename and an encoding, instead of an open file object like
readfp().

INI file with quotes: This needs to be added in such a way that the
old behavior - i.e. the quotes being part of the value - continues to
work by default.

This may fit in with something else I had in mind: an optional
function that transforms the value when it is read or written. This
could be used to implement interpolation (right now interpolation of
%(VAR)s only works if compat.ConfigParser is used). The application
could then provide a value transforming function could remove and add
the quotes as needed. Or, given that the quotes are probably a
commonly needed functionality, the library could provide such a
function, and all the application would need to do is to hook up the
function to the INIConfig object.

INI files with parts outside sections: The comment about backward
compatibility applies here as well. I think the easiest way to
achieve this would be to inject a "[DEFAULTSECT]" line at the start of
the file - the rest of the parsing would then just work. This could
be done by a wrapper class.

I'll see if I can implement some of this in the next few days to see
what it looks like. If you have time to rework the patches, let me
know and we could split the tasks between us.

-param

Dwayne Bailey

unread,
Mar 24, 2008, 12:51:07 PM3/24/08
to iniparse-discuss
Hi,

Thanks for looking at these patches. I won't have time to relook at
them at the moment I'm afraid, got some new things starting in April.

BOM: I liked the proposal for the BOM. My only concern is that
codec.open seems to want you to specify the encoding, so you'd need to
iterate through some BOM options I guess.

Quotes: a string unquoting fn would be a good example of such
interpolation functionality.

Global section: Yip injecting a global section could also work. I
could try falling back to that wrapper functionality of the normal
does not work.

Paramjit Oberoi

unread,
Mar 25, 2008, 11:14:42 AM3/25/08
to iniparse...@googlegroups.com
> BOM: I liked the proposal for the BOM. My only concern is that
> codec.open seems to want you to specify the encoding, so you'd need to
> iterate through some BOM options I guess.

The user would have to specify the encoding, and then reading from the
file would return unicode objects. So we'd just have to check for
U+FEFF, right?

-param

Dwayne Bailey

unread,
Mar 25, 2008, 12:08:36 PM3/25/08
to iniparse-discuss
The user would need to know if its utf16 or utf8 (although it is
possible to guess). The actual BOM will depend on the byte order.

Paramjit Oberoi

unread,
Mar 25, 2008, 12:45:32 PM3/25/08
to iniparse...@googlegroups.com
> The user would need to know if its utf16 or utf8 (although it is
> possible to guess). The actual BOM will depend on the byte order.

Ah, so the real question is: should the library attempt to guess the
correct encoding, or should the application be responsible for
explicitly specifying the encoding?

I would be hesitant to add code for guessing the encoding to iniparse,
unless it turns out that INI files use a variety of encodings in the
wild, and applications often have no idea what kind of encoding they
are dealing with.

-param

Tim Lauridsen

unread,
Mar 25, 2008, 12:54:35 PM3/25/08
to iniparse...@googlegroups.com
+1 for the application, should specify the encoding (default should be
utf-8)

Tim

Dwayne Bailey

unread,
Mar 28, 2008, 9:42:39 AM3/28/08
to iniparse-discuss


On Mar 25, 6:54 pm, Tim Lauridsen <tim.laurid...@googlemail.com>
wrote:
+1 Yip don't guess let the app decide/user specify. Also agree on
UTF8. Not sure this completely sorts out the BOM issue though as its
possible to specify all of these with and without a BOM.

Paramjit Oberoi

unread,
Mar 28, 2008, 11:11:53 AM3/28/08
to iniparse...@googlegroups.com
> +1 Yip don't guess let the app decide/user specify. Also agree on
> UTF8. Not sure this completely sorts out the BOM issue though as its
> possible to specify all of these with and without a BOM.

My plan is this: modify the function that iterates over the file to
skip a leading BOM *if* the readline() function of the file object
returns Unicode strings. In addition, maybe I'll add a function that
takes a file name and encoding instead of a file object, and its
encoding will default to utf-8.

-param

Paramjit Oberoi

unread,
Mar 31, 2008, 11:05:19 AM3/31/08
to iniparse...@googlegroups.com
> My plan is this: modify the function that iterates over the file to
> skip a leading BOM *if* the readline() function of the file object
> returns Unicode strings.

I implemented this yesterday (and checked it in to SVN), and then I
realized that this was not enough. One of the main goals of iniparse
is the ability to round-trip - and although this approach works just
fine for parsing, it makes round-tripping difficult/impossible. To
recreate the original file on disk, we must know what its encoding
was, whether or not it started with a BOM, etc.

Implementing full-featured round-tripping for arbitrarily encoded
files (including values that are not representable as ASCII) goes
beyond simply ignoring BOMs at the beginning of ASCII files... but I
think that would be a good feature to have. I'm planning to work on
that next.

-param

Reply all
Reply to author
Forward
0 new messages