Encoding filepaths for windows & mac issues Python 3

83 views
Skip to first unread message

Benjam901

unread,
Jan 24, 2021, 9:38:01 AM1/24/21
to Python Programming for Autodesk Maya
Hello community,

I am having some trouble with my project inside of the windows build, this has happened twice now with 2 separate users so it needs to be addressed. The thing is, I find this very weird...

The user inputs a directory they wish to iterate, and I use good old os.walk, replace backslashes with forward and then I normalize the path in order to make sure the sql database is fine with displaying the special chars, which I think might be the cause of my problems: normalize('NFC', file_path)

Mac has 0 issues with this but Windows is another story.

Is there a fool proof way to encode characters for both platforms that I am missing?

Here is the gist of the simple function that gathers the paths:

Cheers,

Ben

Benjam901

unread,
Jan 24, 2021, 9:56:42 AM1/24/21
to Python Programming for Autodesk Maya
Here is the error that the user has on their machine and happens when trying to setup a data container for the path. The reason I am confused about this is because the path exists and was found when I walked the users dir path...

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'G:/De sortat 2020 apr-dec/06 - brüc - Second Live.flac'

Justin Israel

unread,
Jan 24, 2021, 1:29:44 PM1/24/21
to python_in...@googlegroups.com
What is the implementation of your custom normalize function? I see you are calling it, and mentioned it might be the cause of your problems, but don't see where it is defined. Does it simply call os.path.normpath? 
Have you tried py3 pathlib to handle all of this, instead of os and manual string formatting? 
Can you easily reproduce the problem with just some simple operations on that one problematic path? For me, it seems valid in both linux and windows, so something must be going on in the string conversion as you suggested.

Justin


--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/f3866246-3f0d-4d09-bcf3-6966610fc3dfn%40googlegroups.com.

Benjam901

unread,
Jan 24, 2021, 3:32:12 PM1/24/21
to Python Programming for Autodesk Maya
Hey Justin,

My bad, here is some more detail. The reason this is implemented is without it, database entries that have for example swedish characters
Adam Strömsted
come out looking like this
Screenshot 2021-01-24 at 21.30.39.png

Function:
from unicodedata import normalize
def normalize_filepath(file_path):
    return normalize('NFC', file_path)

I was thinking about using pathlib module, I have read it can help with a lot of issues such as file encoding but I was worried about the speed difference between it and scandir which is built into the os.walk module. 

What are your thoughts on making the change?

// Ben

Message has been deleted

Benjam901

unread,
Jan 24, 2021, 3:48:53 PM1/24/21
to Python Programming for Autodesk Maya
I have a repro case now with one of the beta users files. I can run some tests on the path in question now!

Benjam901

unread,
Jan 24, 2021, 4:15:40 PM1/24/21
to Python Programming for Autodesk Maya
An update, if I run an os.path.exists(file_path) on the path in question it returns False... however if I unicode.normalize('NFD', file_path) and run the same query it returns True.

After some more testing with my wildchars folder it appears that NFC and NFD will work based on what is in the string. 
For example my wildchars folder has filenames like so which all work fine with normalize('NFC'):
pîrvu - aлфавит 44.wav
Triptil - c  hÉr , c  mÉr .wav
Venda - 1.2 [pământ].wav

However this path in the error example will ONLY work when composed into NFD unicode form

06 - brüc - Second Live.flac

Am I missing something with my character encoding perhaps? 

// Ben

Benjam901

unread,
Jan 24, 2021, 4:51:05 PM1/24/21
to Python Programming for Autodesk Maya
A further update:

I have amended my function only to normalize NFC the path if it detects that it is on Mac. Without this it all seems totally fine on Windows, I have sent a newer build out for testing and will let the community know if it works out ok.

Whatever was going on with the string on windows when normalized was making it behave double strange...

Justin Israel

unread,
Jan 24, 2021, 5:32:56 PM1/24/21
to python_in...@googlegroups.com
Between my work experience in the US and New Zealand, I have been lucky enough to avoid complicated unicode handling. That is, until more recently when dealing with python2 to python3 code updates. So my experience in all of the nitty-gritty conversions and normalizations has been pretty low. But reading over the docs on that unicodedata.normalize() function, it talks about choosing between different forms and how single unicode characters can be represented by different codes under different forms. So this makes me wonder if choosing to normalize with the wrong form could make a string that is visually the same actually be a different filepath? Your tests seems to indicate that it might be the case, when seeing different results with different unicode forms. I have no idea how to go about choosing the right form in terms of still matching the windows filesystem.


Benjam901

unread,
Jan 25, 2021, 5:30:24 AM1/25/21
to Python Programming for Autodesk Maya
I thought the same thing, perhaps windows had some kind of standard "Windows NFC" set up that running the python NFC messed the string up somehow.
Reply all
Reply to author
Forward
0 new messages