parse a string of parameters and values

bsneddon

unread,

Dec 12, 2009, 7:16:32 PM12/12/09

to

I have a problem that I can come up with a brute force solution to
solve but it occurred to me that there may be an
"one-- and preferably only one --obvious way to do it".

I am going to read a text file that is an export from a control
system.
It has lines with information like

base=1 name="first one" color=blue

I would like to put this info into a dictionary for processing.
I have looked at optparse and getopt maybe they are the answer but
there could
be and very straight forward way to do this task.

Thanks for your help

Steven D'Aprano

unread,

Dec 12, 2009, 10:27:49 PM12/12/09

to

On Sat, 12 Dec 2009 16:16:32 -0800, bsneddon wrote:

> I have a problem that I can come up with a brute force solution to solve
> but it occurred to me that there may be an
> "one-- and preferably only one --obvious way to do it".

I'm not sure that "brute force" is the right description here. Generally,
"brute force" is used for situations where you check every single
possible value rather than calculate the answer directly. One classical
example is guessing the password that goes with an account. The brute
force attack is to guess every imaginable password -- eventually you'll
find the matching one. A non-brute force attack is to say "I know the
password is a recent date", which reduces the space of possible passwords
from many trillions to mere millions.

So I'm not sure that brute force is an appropriate description for this
problem. One way or another you have to read every line in the file.
Whether you read them or you farm the job out to some pre-existing
library function, they still have to be read.

> I am going to read a text file that is an export from a control system.
> It has lines with information like
>
> base=1 name="first one" color=blue
>
> I would like to put this info into a dictionary for processing.

Have you looked at the ConfigParser module?

Assuming that ConfigParser isn't suitable, you can do this if each
key=value pair is on its own line:

d = {}
for line in open(filename, 'r'):
if not line.strip():
# skip blank lines
continue
key, value = line.split('=', 1)
d[key.strip()] = value.strip()

If you have multiple keys per line, you need a more sophisticated way of
splitting them. Something like this should work:

d = {}
for line in open(filename, 'r'):
if not line.strip():
continue
terms = line.split('=')
keys = terms[0::2] # every second item starting from the first
values = terms[1::2] # every second item starting from the second
for key, value in zip(keys, values):
d[key.strip()] = value.strip()

--
Steven

John Machin

unread,

Dec 13, 2009, 12:52:04 AM12/13/09

to pytho...@python.org

Steven D'Aprano <steve <at> REMOVE-THIS-cybersource.com.au> writes:

>
> On Sat, 12 Dec 2009 16:16:32 -0800, bsneddon wrote:
>

>
> > I am going to read a text file that is an export from a control system.
> > It has lines with information like
> >
> > base=1 name="first one" color=blue
> >
> > I would like to put this info into a dictionary for processing.
>
> Have you looked at the ConfigParser module?
>
> Assuming that ConfigParser isn't suitable, you can do this if each
> key=value pair is on its own line:

> [snip]

> If you have multiple keys per line, you need a more sophisticated way of
> splitting them. Something like this should work:
>
> d = {}
> for line in open(filename, 'r'):
> if not line.strip():
> continue
> terms = line.split('=')
> keys = terms[0::2] # every second item starting from the first
> values = terms[1::2] # every second item starting from the second
> for key, value in zip(keys, values):
> d[key.strip()] = value.strip()
>

There appears to be a problem with the above snippet, or you have a strange
interpretation of "put this info into a dictionary":

| >>> line = 'a=1 b=2 c=3 d=4'
| >>> d = {}

| >>> terms = line.split('=')

| >>> print terms
| ['a', '1 b', '2 c', '3 d', '4']

| >>> keys = terms[0::2] # every second item starting from the first
| >>> values = terms[1::2] # every second item starting from the second
| >>> for key, value in zip(keys, values):

| ... d[key.strip()] = value.strip()
| ...
| >>> print d
| {'a': '1 b', '2 c': '3 d'}
| >>>

Perhaps you meant

terms = re.split(r'[= ]', line)

which is an improvement, but this fails on cosmetic spaces e.g. a = 1 b = 2 ...

Try terms = filter(None, re.split(r'[= ]', line))

Now we get to the really hard part: handling the name="first one" in the OP's
example. The splitting approach has run out of steam.

The OP will need to divulge what is the protocol for escaping the " character if
it is present in the input. If nobody knows of a packaged solution to his
particular scheme, then he'll need to use something like pyparsing.

Steven D'Aprano

unread,

Dec 13, 2009, 1:45:45 AM12/13/09

to

On Sun, 13 Dec 2009 05:52:04 +0000, John Machin wrote:

> Steven D'Aprano <steve <at> REMOVE-THIS-cybersource.com.au> writes:
[snip]
>> If you have multiple keys per line, you need a more sophisticated way
>> of splitting them. Something like this should work:

[...]

> There appears to be a problem with the above snippet, or you have a
> strange interpretation of "put this info into a dictionary":

D'oh!

In my defence, I said it "should" work, not that it did work!

--
Steven

Peter Otten

unread,

Dec 13, 2009, 5:28:24 AM12/13/09

to

bsneddon wrote:

Have a look at shlex:

>>> import shlex
>>> s = 'base=1 name="first one" color=blue equal="alpha=beta" empty'
>>> dict(t.partition("=")[::2] for t in shlex.split(s))
{'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '', 'equal':
'alpha=beta'}

Peter

bsneddon

unread,

Dec 13, 2009, 10:23:19 AM12/13/09

to

Thanks to all for your input.

It seems I miss stated the problem. Text is always quoted so blue
above -> "blue".

Peter,

The part I was missing was t.partition("=") and slicing skipping by
two.
It looks like a normal split will work for me to get the arguments I
need.
To my way of thinking your is very clean any maybe the "--obvious way
to do it"
Although it was not obvious to me until seeing your post.

Bill

Gabriel Genellina

unread,

Dec 14, 2009, 7:23:38 PM12/14/09

to pytho...@python.org

En Sun, 13 Dec 2009 07:28:24 -0300, Peter Otten <__pet...@web.de>
escribi�:
> bsneddon wrote:

>> I am going to read a text file that is an export from a control
>> system.
>> It has lines with information like
>>
>> base=1 name="first one" color=blue
>>
>> I would like to put this info into a dictionary for processing.
>

>>>> import shlex
>>>> s = 'base=1 name="first one" color=blue equal="alpha=beta" empty'
>>>> dict(t.partition("=")[::2] for t in shlex.split(s))
> {'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '', 'equal':
> 'alpha=beta'}

Brilliant!

--
Gabriel Genellina

Tim Chase

unread,

Dec 14, 2009, 7:51:06 PM12/14/09

to Gabriel Genellina, pytho...@python.org

Gabriel Genellina wrote:
> Peter Otten escribi�:

>> bsneddon wrote:
>>> I am going to read a text file that is an export from a control
>>> system.
>>> It has lines with information like
>>>
>>> base=1 name="first one" color=blue
>>>
>>> I would like to put this info into a dictionary for processing.

>>>>> import shlex
>>>>> s = 'base=1 name="first one" color=blue equal="alpha=beta" empty'
>>>>> dict(t.partition("=")[::2] for t in shlex.split(s))
>> {'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '', 'equal':
>> 'alpha=beta'}
>

> Brilliant!

The thing I appreciated about Peter's solution was learning a
purpose for .partition() as I've always just used .split(), so I
would have done something like

>>> dict('=' in s and s.split('=', 1) or (s, '') for s in

shlex.split(s))
{'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '',
'equal': 'alpha=beta'}

Using .partition() makes that a lot cleaner. However, it looks
like .partition() was added in 2.5, so for my code stuck in 2.4
deployments, I'll stick with the uglier .split()

-tkc