Best option to check and/or convert encoding for csv files

50 views
Skip to first unread message

Fellipe Henrique

unread,
Sep 22, 2017, 8:34:05 AM9/22/17
to Django Users
Hello guys,

So, I have several csv files, to open using pyexcel... but I start to have issues with CSV saved from Excel, with other encoding...

There's any option to verify the encoding of file, or change the encoding?


regards


T.·.F.·.A.·.     S+F
Fellipe Henrique P. Soares

e-mail: > echo "lkrrovknFmsgor4ius" | perl -pe \ 's/(.)/chr(ord($1)-2*3)/ge'
Twitter: @fh_bash

Mike Dewhirst

unread,
Sep 25, 2017, 7:09:14 PM9/25/17
to Django users
On 26/09/2017 1:53 AM, Fellipe Henrique wrote:
> Thanks Mike,
>
> But my problem Ithink is more deep...
>
> I use pyexcel to try to open a CSV file, to work with that on my
> software...
>
> I receive these error everytime: 'utf-8' codec can't decode byte 0xa1
> in position 14: invalid start byte
> my code is abouve:

Fellipe

My heartfelt sympathy. Surely there is a robot somewhere which can fix
coding and decoding so us poor humans don't have to think about it. That
is the perfect use-case for AI in Python's standard library. Anyone
listening? Has anyone done it already?

Recently someone here posted
https://djangodeployment.com/2017/06/19/encodings-part-1/ and in
particular http://www.i18nqa.com/debug/utf8-debug.html

I think the right approach is to recognise that the problem is in
reality and you have to cope with it. I have never properly dealt with
it myself. Just writing this makes me realise it. I will have to use
those debuggiing charts and gradually build a set of functions to
"magically" recover from decode exceptions. But not until later next month.

I haven't used the csv library you mention. All my conversions so far
have been small-scale LibreOffice save-as-csv which have let me use my
editor's replace function to get rid of problem chars and repair csv
files prior to importing data. I know this is the wimp's way out but I
have never had time to do it properly.

I'm really sorry I can't help

Mike


>
> @property def sheet(self):
> """ Returns the file content in a format based on pyexcel api. """ if
> not self.file:
> return None file_extension =self.file.name.split('.')[-1]
> file_type = mimetypes.types_map['.' + file_extension]
> if file_typenot in settings.ALOWED_DATA_FILE_CONTENT_TYPES:
> return None if self.transpose_columns:
> data_file =self._get_transposed_file()
> file_extension = data_file.name.split('.')[-1]
> else:
> data_file =self.file
>
> self._sheet = pyexcel.get_sheet(file_type=file_extension, file_content=data_file.read(), name_columns_by_row=0)
> data_file.seek(0)# it is necessary to return file cursor after read return self._sheet.to_dict()
>
>
> It's works fine with the 0.4.4 pyexcel version.. but, I don't know, I
> start to have other issue with the old version, make me to update to
> the new version... here is the error:
>
>
> Inline image 1
>
> Do you see these error before?
>
> I spent more then 2 weeks to try to solve that, and nothing.. :(
>
>
> Thanks a lot
>
> Regards!
>
>
>
> T.·.F.·.A.·.     S+F
> *Fellipe Henrique P. Soares*
>
> e-mail: > echo "lkrrovknFmsgor4ius" | perl -pe \
> 's/(.)/chr(ord($1)-2*3)/ge'
> /Fedora Ambassador: https://fedoraproject.org/wiki/User:Fellipeh/
> /Blog: /http:www.fellipeh.eti.br
> /GitHub: https://github.com/fellipeh/
> /Twitter: @fh_bash/
>
> On Sun, Sep 24, 2017 at 5:02 AM, Mike Dewhirst <mi...@dewhirst.com.au
> <mailto:mi...@dewhirst.com.au>> wrote:
>
> On 22/09/2017 10:32 PM, Fellipe Henrique wrote:
>
> Hello guys,
>
> So, I have several csv files, to open using pyexcel... but I
> start to have issues with CSV saved from Excel, with other
> encoding...
>
> There's any option to verify the encoding of file, or change
> the encoding?
>
>
> I use LibreOffice which provides an option to set one of any
> number of encodings including utf-8 when saving xlsx Excel files
> as csv
>
>
>
> regards
>
>
> T.·.F.·.A.·.     S+F
> *Fellipe Henrique P. Soares*
>
> e-mail: > echo "lkrrovknFmsgor4ius" | perl -pe \
> 's/(.)/chr(ord($1)-2*3)/ge'
> /Fedora Ambassador:
> https://fedoraproject.org/wiki/User:Fellipeh/
> <https://fedoraproject.org/wiki/User:Fellipeh/>
> /Blog: /http:www.fellipeh.eti.br <http://www.fellipeh.eti.br>
> /GitHub: https://github.com/fellipeh/
> /Twitter: @fh_bash/
> --
> You received this message because you are subscribed to the
> Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to django-users...@googlegroups.com
> <mailto:django-users%2Bunsu...@googlegroups.com>
> <mailto:django-users...@googlegroups.com
> <mailto:django-users%2Bunsu...@googlegroups.com>>.
> To post to this group, send email to
> django...@googlegroups.com
> <mailto:django...@googlegroups.com>
> <mailto:django...@googlegroups.com
> <mailto:django...@googlegroups.com>>.
> Visit this group at
> https://groups.google.com/group/django-users
> <https://groups.google.com/group/django-users>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CAF1jwZFrM%2Bze3LQ6bXkhXOAA_LmSrbXYLFYzhF63KHNweAB_jw%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-users/CAF1jwZFrM%2Bze3LQ6bXkhXOAA_LmSrbXYLFYzhF63KHNweAB_jw%40mail.gmail.com>
> <https://groups.google.com/d/msgid/django-users/CAF1jwZFrM%2Bze3LQ6bXkhXOAA_LmSrbXYLFYzhF63KHNweAB_jw%40mail.gmail.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/django-users/CAF1jwZFrM%2Bze3LQ6bXkhXOAA_LmSrbXYLFYzhF63KHNweAB_jw%40mail.gmail.com?utm_medium=email&utm_source=footer>>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
>

Reply all
Reply to author
Forward
0 new messages