Hey All,
Invalid characters cause the csv import to fail
(s3xml.py ln:1501 - text = cls.xml_encode(unicode(text.decode("utf-8")))
Any suggestions on the best method for “clean” invalid characters from a csv file to be imported. Is it legitimate to convert them all to _, or remove them?
Cheers
Michael
modifying input data due to encoding issues is generally a bad idea. I am very
reluctant to that.
Generally, the input should be UTF-8 or plain ASCII to be on the safe side
with the Python csv module. However, I'm aware that Windows applications often
don't do proper UTF-8 encoding, so we probably have to loop-in a utf-8 encoder
into the reader.
Let me add that and then we try again.
Could you please send me the respective source so that I can test for the
issue?
Dominic
made "csv2tree" guessing the character encoding of the source, and re-encode
as UTF-8 before import.
You can simply add the encodings you want to support. Keep this list short,
though, with only the most likely encodings - we do not really want or need to
guess through all possible values here (otherwise we should use chardect).
Encoding all source files properly as UTF-8 is still the best option.
Dominic
Thanks a lot - all seems to work fine now!
Cheers
Michael