On 03/01/2013 10:20 AM, Lucas Taylor wrote:
> On Feb 28, 2013, at 7:09 PM, Ethan Furman wrote:
>>
>> Were you able to test this, and did it work?
>
> I did, but there are a few issues:
>
> 'latin1' isn't an available codepage. I thought cp1252 would be the best match, but it happens to have a few
> undefined/unused code points.
>
> However, it appears that regardless of the codepage specified, the ascii codec is used for decoding:
>
> table = Table('test_cp1252', 'memo M', dbf_type='fp', codepage='cp1252')
> with table:
> table.append({'memo': 'Test, NOPE' + chr(143)})
> DbfError: unable to write updates to disk, original data restored: UnicodeDecodeError('ascii', 'Test, NOPE\x8f', 10, 11,
> 'ordinal not in range(128)')
>
> table = Table('test_mac_roman', 'memo M', dbf_type='fp', codepage='mac_roman')
> with table:
> table.append({'memo': 'Test, NOPE' + chr(143)})
> DbfError: unable to write updates to disk, original data restored: UnicodeDecodeError('ascii', 'Test, NOPE\x8f', 10, 11,
> 'ordinal not in range(128)')
>
>
> Now, in my case I don't want *any* decoding to occur...I just want to treat the Memo as binary. I was looking for a way
> to pass a flag to Table(...) or somehow specify that the Memo should be treated as binary.
You'll need to change the default encoding:
dbf.default_codepage = 'latin1'
Then, for a test, change the ascii codec out for latin1 as well:
dbf.code_pages['\x00'] = ('latin1', 'no translation')
This should make it so that any non-unicode data round-trips back to what it started as.
--
~Ethan~