Unicode in file names

34 views
Skip to first unread message

Bruce B

unread,
Nov 12, 2011, 1:16:34 AM11/12/11
to mymedia-rok...@googlegroups.com
I have a fair number of files with unicode in their filenames. Seems that neither the master nor the next branch likes the unicode. I can take a look at fixing it (I've played with UC a bit in the past). Is this already being worked on?

Also... I've done most of my python with python3. Do you plan to convert from 2.6 to 3 anytime soon?

-Bruce

Brian Taylor

unread,
Nov 12, 2011, 8:12:56 AM11/12/11
to mymedia-rok...@googlegroups.com
Try out the channel branch for the current state of the art. If that doesnt support the filenames you're trying then go for it. I dont think anyone else is working on it.

Python3 isn't on the roadmap. The reason I picked python 2.5/2.6 was that it comes already installed virtually everywhere.  It was an effort to minimize the difficulty of getting into my media for non programmers. 

That said, if you want to maintain a python 3 port that's fine with me. I'd really like to have a proper installer for MyMedia and that would make the python version choice less of an issue. 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "MyMedia Roku Developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mymedia-roku-developers/-/iNcCAH-iay0J.
To post to this group, send email to mymedia-rok...@googlegroups.com.
To unsubscribe from this group, send email to mymedia-roku-deve...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mymedia-roku-developers?hl=en.

Bruce B

unread,
Nov 13, 2011, 1:19:41 PM11/13/11
to MyMedia Roku Developers
The problem isn't actually in mymedia. It's in the python system
library. The system library assumes that file system paths are strings
that can be encoded in unicode. For lunix... that's not true. File
names are arbitrary sequences of octets that do not include the NULL
or "/" character. Anything else is valid.

For example, one of the issues may be worked around by modifying the
system library:

def join(a, *p):
"""Join two or more pathname components, inserting '/' as needed.
If any component is an absolute path, all previous path components
will be discarded."""
path = a
for b in p:
if b.startswith('/'):
path = b
elif path == '' or path.endswith('/'):
# path += b
path = (path.encode('hex') +
b.encode('hex')).decode('hex')
else:
# path += '/' + b
path = (path.encode('hex') + '/'.encode('hex') +
b.encode('hex')).decode('hex')
return path

"/usr/lib/python2.7/posixpath.py" [Modified] line 25 of 419 --5%-- col
5


This isn't very good code... but does serve to demonstrate the issue.
os.* really needs to be recoded to use bytearrays instead of unicode
strings.


On Nov 12, 5:12 am, Brian Taylor <el.w...@gmail.com> wrote:
> Try out the channel branch for the current state of the art. If that doesnt support the filenames you're trying then go for it. I dont think anyone else is working on it.
>
> Python3 isn't on the roadmap. The reason I picked python 2.5/2.6 was that it comes already installed virtually everywhere.  It was an effort to minimize the difficulty of getting into my media for non programmers.
>
> That said, if you want to maintain a python 3 port that's fine with me. I'd really like to have a proper installer for MyMedia and that would make the python version choice less of an issue.
>
> Sent from my iPhone
>
> On Nov 12, 2011, at 1:16 AM, Bruce B <bbea...@gmail.com> wrote:
>
>
>
>
>
>
>
> > I have a fair number of files with unicode in their filenames. Seems that neither the master nor the next branch likes the unicode. I can take a look at fixing it (I've played with UC a bit in the past). Is this already being worked on?
>
> > Also... I've done most of my python with python3. Do you plan to convert from 2.6 to 3 anytime soon?
>
> > -Bruce
> > --
> > You received this message because you are subscribed to the Google Groups "MyMedia Roku Developers" group.
> > To view this discussion on the web visithttps://groups.google.com/d/msg/mymedia-roku-developers/-/iNcCAH-iay0J.

Brian Taylor

unread,
Nov 13, 2011, 1:28:23 PM11/13/11
to mymedia-rok...@googlegroups.com
That's really interesting. Is this a known issue in the python community? This seems like something that should land at http://bugs.python.org/ .

Does this fix your problem if you swap in your version of join?

And--wow, you have file names that can't be expressed in unicode? Where did those come from?
Reply all
Reply to author
Forward
0 new messages