Strange behavior with non-ascii strings on newer version

Luca Lesinigo

unread,

Feb 29, 2012, 3:03:42 PM2/29/12

to python...@googlegroups.com

I have some DBF files from an oldish Win32 business management software and I'm using the excellent python-dbf package to read them out and import data in a newer system.

First of all, thank you guys for providing us with this module.

Second, I started out the first tests using Ubuntu's packaged version: that's python-dbf-0.88.16-1, so far so good. Then I wanted to look at what pypi had to offer and downloaded version 0.90.002 from there. Here things started to go wrong: all other things unchanged, the newer dbf chokes when reading out records with "high ascii" characters, whereas the older one does not give any error and decodes them correctly to unicode.

The older, correctly working, version is installed with Ubuntu's package system and lies in some files under /usr/share/pyshared/dbf, on the other hand I tested the newer 0.90.002 by just extracting dbf.py from the zip file downloaded from pypi and putting it in the same directory of my script.

Actual error from 0.90.002 is:

File "/foo/bar/baz/dbf.py", line 1651, in retrieveCharacter

return fielddef['class'](data)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 3: ordinal not in range(128)

Is it a regression in dbf.py or am I doing something wrong?

Thanks.

Ethan Furman

unread,

Feb 29, 2012, 3:26:33 PM2/29/12

to python...@googlegroups.com

Luca Lesinigo wrote:
> I have some DBF files from an oldish Win32 business management software
> and I'm using the excellent python-dbf package to read them out and
> import data in a newer system.
> First of all, thank you guys for providing us with this module.

Glad to help.

> Second, I started out the first tests using Ubuntu's packaged version:
> that's python-dbf-0.88.16-1

> <http://packages.ubuntu.com/search?keywords=python-dbf&searchon=names&suite=all&section=all>,

> so far so good. Then I wanted to look at what pypi had to offer and

> downloaded version 0.90.002 <http://pypi.python.org/pypi/dbf> from

> there. Here things started to go wrong: all other things unchanged, the
> newer dbf chokes when reading out records with "high ascii" characters,
> whereas the older one does not give any error and decodes them correctly
> to unicode.
>
> The older, correctly working, version is installed with Ubuntu's package
> system and lies in some files under /usr/share/pyshared/dbf, on the
> other hand I tested the newer 0.90.002 by just extracting dbf.py from
> the zip file downloaded from pypi and putting it in the same directory
> of my script.
>
> Actual error from 0.90.002 is:
> File "/foo/bar/baz/dbf.py", line 1651, in retrieveCharacter
> return fielddef['class'](data)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 3:
> ordinal not in range(128)
>
> Is it a regression in dbf.py or am I doing something wrong?

Using both versions do a

test_table = dbf.Table('/whatever/tablename')
print test_table

The newer version should say

Table: ...
Type: ...
Codepage: ascii (plain ol' ascii)
...

What does the older version say?

~Ethan~

Luca Lesinigo

unread,

Mar 7, 2012, 4:24:36 PM3/7/12

to python...@googlegroups.com

Il giorno mercoledì 29 febbraio 2012 21:26:33 UTC+1, Leaf ha scritto:

Using both versions do a
test_table = dbf.Table('/whatever/tablename')
print test_table
The newer version should say
Table: ...
Type: ...
Codepage: ascii (plain ol' ascii)
...
What does the older version say?

Both the older, working, version (Ubuntu's 0.88.16) and the newer, not working version (0.90.002 from PyPi) say:

Type: FoxPro w/memos

Codepage: cp1252 (Windows ANSI)

but then if I do:

for record in table:

foo = record.scatter_fields()

# and nothing else, really!

the newer one will screw up:

Traceback (most recent call last):

File "test.py", line 10, in <module>

foo = record.scatter_fields()

File "/home/luca/dbf.py", line 1427, in scatter_fields

values = [yo[field] for field in keys]

File "/home/luca/dbf.py", line 1277, in __getitem__

return yo.__getattr__(item)

File "/home/luca/dbf.py", line 1261, in __getattr__

value = yo._retrieveFieldValue(index, name)

File "/home/luca/dbf.py", line 1191, in _retrieveFieldValue

datum = retrieve(record_data, fielddef, yo._layout.memo)

File "/home/luca/dbf.py", line 1651, in retrieveCharacter

Ethan Furman

unread,

Mar 13, 2012, 6:17:17 PM3/13/12

to Luca Lesinigo, python...@googlegroups.com

Luca Lesinigo wrote:

> Il giorno mercoled� 29 febbraio 2012 21:26:33 UTC+1, Leaf ha scritto:
>
> Using both versions do a
>
> test_table = dbf.Table('/whatever/tablename')
> print test_table
>
> The newer version should say
>
> Table: ...
> Type: ...
> Codepage: ascii (plain ol' ascii)
> ...
>
> What does the older version say?
>
> Both the older, working, version (Ubuntu's 0.88.16) and the newer, not
> working version (0.90.002 from PyPi) say:
> Type: FoxPro w/memos
> Codepage: cp1252 (Windows ANSI)

Luca,

Can you send me the dbf file?

~Ethan~

Ethan Furman

unread,

Mar 14, 2012, 11:42:05 PM3/14/12

to python...@googlegroups.com

Luca Lesinigo wrote:

> Il giorno mercoled� 29 febbraio 2012 21:26:33 UTC+1, Leaf ha scritto:
>
> Using both versions do a
>
> test_table = dbf.Table('/whatever/tablename')
> print test_table
>
> The newer version should say
>
> Table: ...
> Type: ...
> Codepage: ascii (plain ol' ascii)
> ...
>
> What does the older version say?
>
> Both the older, working, version (Ubuntu's 0.88.16) and the newer, not
> working version (0.90.002 from PyPi) say:
> Type: FoxPro w/memos
> Codepage: cp1252 (Windows ANSI)
> but then if I do:
>
> for record in table:
> foo = record.scatter_fields()
> # and nothing else, really!

Found the problem -- I'll have a new version released shortly.

~Ethan~

krobin

unread,

Apr 28, 2012, 3:44:21 PM4/28/12

to python...@googlegroups.com

Is this fixed in 0.90.004? because I got a similar issue using 0.90.004:

Traceback (most recent call last):

File "dbtrans.py", line 33, in <module>
sys.exit(main(sys.argv))
File "dbtrans.py", line 29, in main
reader.get_symbols()
File "dbtrans.py", line 22, in get_symbols
print table[i][j]
File "d:\foo\bar\dbf.py", line 1271, in __getitem__
return yo[yo._layout.fields[item]]
File "d:\foo\bar\dbf.py", line 1278, in __getitem__
return yo.__getattr__(item)
File "d:\foo\bar\dbf.py", line 1262, in __getattr__
value = yo._retrieveFieldValue(index, name)
File "d:\foo\bar\dbf.py", line 1192, in _retrieveFieldValue

datum = retrieve(record_data, fielddef, yo._layout.memo)

File "d:\foo\bar\dbf.py", line 1824, in retrieveVfpMemo
return typ(memo.get_memo(block, fielddef))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 342: ordina
l not in range(128)

On Wednesday, March 14, 2012 11:42:05 PM UTC-4, Leaf wrote:
> Luca Lesinigo wrote:

Ethan Furman

unread,

Apr 28, 2012, 7:27:00 PM4/28/12

to python...@googlegroups.com

krobin wrote:
> Is this fixed in 0.90.004? because I got a similar issue using 0.90.004:
>
> Traceback (most recent call last):
> File "dbtrans.py", line 33, in <module>
> sys.exit(main(sys.argv))
> File "dbtrans.py", line 29, in main
> reader.get_symbols()
> File "dbtrans.py", line 22, in get_symbols
> print table[i][j]
> File "d:\foo\bar\dbf.py", line 1271, in __getitem__
> return yo[yo._layout.fields[item]]
> File "d:\foo\bar\dbf.py", line 1278, in __getitem__
> return yo.__getattr__(item)
> File "d:\foo\bar\dbf.py", line 1262, in __getattr__
> value = yo._retrieveFieldValue(index, name)
> File "d:\foo\bar\dbf.py", line 1192, in _retrieveFieldValue
> datum = retrieve(record_data, fielddef, yo._layout.memo)
> File "d:\foo\bar\dbf.py", line 1824, in retrieveVfpMemo
> return typ(memo.get_memo(block, fielddef))
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 342: ordina
> l not in range(128)

I thought it was, but if you're getting an error then something's not
right. Can you send me a portion of the dbf file that is not working?

~Ethan~

Manuel Cameselle

unread,

Jun 12, 2012, 4:50:39 PM6/12/12

to python...@googlegroups.com

Hello,

Thanks for this module.

I have similar issue, but with even more errors. Maybe there's something wrong with my DBF files or is something not implemented in the module. I found a dirty workaround that it works to me, so i described the problem just in case is a bug and can help.

The attached file has codepage 'cp850 International MS-DOS' and has a spanish character on 12th record.

The code:

source = dbf.Table('zonas_cp850.dbf', read_only=True)
print source

source.open()
for record in source:

print record['zona'] + ':', record['nombre']

source.close()

With 0.88.16 got:

Type: dBase III Plus

Codepage: ascii (plain ol' ascii)

and

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 7: ordinal not in range(128)

With 0.90.003 got:

Type: dBase III Plus

Codepage: ascii (plain ol' ascii)

and

Exception AttributeError: "'NoneType' object has no attribute 'seek'" in <bound method _DbfRecord.__del__ of > ignored

and

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 7: ordinal not in range(128)

With 0.93.011 got:

dbf.DbfError: record data not correct -- first character should be a ' ' or a '*'.

Now I change this code on tables.py or dbf.py:

def codepage(yo, cp=None):

to

def codepage(yo, cp='\x02'):

and then

With 0.88.16 got:

Codepage: cp850 (International MS-DOS)

and everything looks good.

With 0.90.003 got:

Codepage: cp850 (International MS-DOS)

and

Exception AttributeError: "'NoneType' object has no attribute 'seek'" in <bound method _DbfRecord.__del__ of > ignored

but the error dissapears and everything looks good.

With 0.93.011 got the same error as before.

Regards,

Manuel Cameselle.

zonas_cp850.dbf

Ethan Furman

unread,

Jun 12, 2012, 6:06:48 PM6/12/12

to python...@googlegroups.com

Manuel Cameselle wrote:
> Hello,
>
> Thanks for this module.
>
> I have similar issue, but with even more errors. Maybe there's something
> wrong with my DBF files or is something not implemented in the module. I
> found a dirty workaround that it works to me, so i described the problem
> just in case is a bug and can help.
>
> The attached file has codepage 'cp850 International MS-DOS' and has a
> spanish character on 12th record.

Manuel, thanks for the detailed report. The are two problems with your
dbf file (both of which the next version will better handle): 1) the
codepage specified in the file is incorrect (ascii instead of cp850);
and 2) null bytes are being used instead of spaces for active records.

This is how to get around the codepage issue (will work in the next
release):

source = dbf.Table('zonas_cp850.dbf', read_only=True, codepage='cp850')

This will temporarily use cp850 for the encoding/decoding functions. To
set it permanently, open the table in normal read-write mode, open it,
then set the codepage (this way it gets saved to disk):

source = dbf.table('zonas_cp850')
source.open()
source.codepage = 'cp850'
source.close()

The reason for the continuing errors in the latest release even after
you're work-around is that I just added checking to make sure the first
byte of the record was either a space or a '*' (the only two allowable
characters) but your file is using null bytes instead of spaces. The
next version will work with that (although it will change the nulls to
spaces when records are updated).

Your workaround for the codepage issue is fine if all your files are
actually cp850, but not everybody's is -- so keep using it until the
next version comes out. :)

Thanks again for the very good bug report!

~Ethan~

Manuel Cameselle

unread,

Jun 13, 2012, 4:26:01 AM6/13/12

to python...@googlegroups.com

Hello Ethan,

source = dbf.Table('zonas_cp850.dbf', read_only=True, codepage='cp850')

This will temporarily use cp850 for the encoding/decoding functions. To

Great. I'll do that when next version comes, because i cannot modify those files (they are still in production with an old app and i only can access in read_only mode).

Thanks again for the very good bug report!

Glad to help.

Thank you very much for your fast and detailed answer!

Regards,
Manuel Cameselle.

Reply all

Reply to author

Forward