Issue 198 in pyftpdlib: Patch to add unicode support to trunk

pyft...@googlecode.com

unread,

Jan 10, 2012, 11:42:21 AM1/10/12

to pyftpdli...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Review Priority-Medium

New issue 198 by darren.w...@gmail.com: Patch to add unicode support to
trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Hey guys - we've forked the code and ported the unicode support from the
py3k branch over to it. I've rebased against trunk today and you can find
our code here:

https://github.com/iwebhosting/pyftpdlib/

Purpose of code changes on this branch:
To add utf-8 support to trunk/python 2.x

When reviewing my code changes, please focus on:
We have one failing test that I'm not sure what to do with - test_oob_abor.

One of these two tests may be incorrect. The Python 2 version sends 0xf4
0xff but the Python 3 version sends 0xc3 0xb4 0xc3 0xb4 - according to
http://support.microsoft.com/kb/231866 it looks like it *should* be 0xff
0xf4 (the *reverse* of the python 2 test) - which version should we attempt
to get the code to pass?

Here's the 3k test:
http://code.google.com/p/pyftpdlib/source/browse/branches/trunk-3k/test/test_ftpd.py#1563

Also I've omitted test_wrong_encoding:

http://code.google.com/p/pyftpdlib/source/browse/branches/trunk-3k/test/test_ftpd.py#2277

It wasn't raising a UnicodeDecodeError because all of the bytes sent fit
into utf-8.

After the review, I'll merge this branch into:
/trunk

pyft...@googlecode.com

unread,

Jan 10, 2012, 12:45:07 PM1/10/12

to pyftpdli...@googlegroups.com

Comment #1 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Oh! This is interesting.
I took a quick look at what you've done.
Starting from py3k branch was a good idea but it shouldn't be imported into
trunk as-is as it uses some statements which will work on python >= 2.6
only and we need to support python 2.4.
I mean this:

return b''.join(buffer)
self.set_terminator(b"\r\n")
...

Basically all the b'' occurrences must be replaced with ''.
Also, any other reference which refers to the python 3.x porting must be
removed. Example:

str(line, 'utf8')

Other than that and your failing test issue, how's the status of this?
Have you tried it in production?
With what encodings?

Basically, if you successfully manage to add UTF8 support in your branch
and enrich the test suite to prove it works and doesn't break the current
bytes-based implementation, the next step would consist in producing a
patch from your branch and apply it to main pyftpdlib trunk.

pyft...@googlecode.com

unread,

Jan 10, 2012, 12:58:14 PM1/10/12

to pyftpdli...@googlegroups.com

Comment #2 on issue 198 by darren.w...@gmail.com: Patch to add unicode

support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Yep, we're using it with a low volume of traffic and its working fine for
us using utf-8. I wasnt aware of the >=python 2.4 requirement, I'll get
that sorted and remove those shamelessly copied docstrings :)

There are a couple of tests (MKD and MSLT) but I'll put a few more in there
now I'm more familiar with how the tests work - thanks for the feedback. We
have ran through a suite of operations using Cyberduck and Filezilla
though, and they're fine in those clients at least
(upload/download/mkdir/list/rename/delete).

pyft...@googlecode.com

unread,

Jan 11, 2012, 5:51:08 AM1/11/12

to pyftpdli...@googlegroups.com

Comment #3 on issue 198 by darren.w...@gmail.com: Patch to add unicode

support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Ok, I think thats covered what you asked for - tests pass (except for the
oob_abor) on python 2.4-2.7 (tested on Ubuntu 2.6.38 x64). I've attached a
patch against trunk - how does that look?

Attachments:
unicode.patch 14.3 KB

pyft...@googlecode.com

unread,

Jan 11, 2012, 5:56:11 AM1/11/12

to pyftpdli...@googlegroups.com

Comment #4 on issue 198 by darren.w...@gmail.com: Patch to add unicode

support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Lets try that again :)

Attachments:
unicode.patch 14.2 KB

pyft...@googlecode.com

unread,

Jan 11, 2012, 4:57:29 PM1/11/12

to pyftpdli...@googlegroups.com

Comment #5 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

I'm probably missing something but I extracted the tests from your patch
(tests only, no changes to ftpserver.py) and they work fine with the
current bytes-based version.
Tried with filezilla as a client, and I was able to delete:

127.0.0.1:41796 <== DELE ☃ci☃ao
127.0.0.1:41796 ==> 250 File removed.

I'm confused here. Wasn't that supposed to NOT work? =)
And yes, I admit I've never fully understood how to work with unicode in
python.

pyft...@googlecode.com

unread,

Jan 11, 2012, 5:15:40 PM1/11/12

to pyftpdli...@googlegroups.com

Comment #6 on issue 198 by darren.w...@gmail.com: Patch to add unicode

support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Heh, I'm not 100% sure either :)

When we forked the code before Christmas (October or so I think) it didnt
work, maybe something has changed since then? We only rebased against trunk
a few days ago.

pyft...@googlecode.com

unread,

Jan 12, 2012, 10:32:12 AM1/12/12

to pyftpdli...@googlegroups.com

Comment #7 on issue 198 by darren.w...@gmail.com: Patch to add unicode

support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Ok, I see what's happened here.

http://docs.python.org/howto/unicode.html#unicode-filenames

listdir() will return unicode strings *if called* with a unicode string,
which is what we were doing in our filesystem subclass. If we switch to
byte strings, pyftpdlib doesn't care and the clients figure out the
encoding on their own.

So it's up to you if you want to keep this patch - it will explicitly make
sure the that we're dealing with bytes. May as well keep the tests in
either case :) Sorry for the confusion!

pyft...@googlecode.com

unread,

Jan 13, 2012, 1:26:02 PM1/13/12

to pyftpdli...@googlegroups.com

Comment #8 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

I had a chat with Josiah Carlson who kindly explained to me how this is
supposed to work (chat log is in attachment).
Basically what we have to do is this:

- read data (bytes) from the socket and convert that to unicode (line =
sock.read(1024); line = line.decode('utf8'))
- pass the unicode string the the underlying filesystem functions (e.g. ls
= os.listdir(u'path')) which will return unicode strings rather than bytes
- before sending the result back to client, encode the unicode string (ls =
ls.encode('utf8'); sock.sendall(ls))

This is a pretty huge change but I think it's worth the effort as it should
make the python 3 porting a lot easier.
Being a major change, it's likely we're going to break existing code. As
such, this should happen in a major release (0.8.0 maybe).

The only thing which is not clear to me is what to do in case the client
sends a line with a malformed encoding (!= UTF8).
RFC-2640 is not clear about this.

Attachments:
chat.txt 5.3 KB

pyft...@googlecode.com

unread,

Jan 14, 2012, 9:19:37 AM1/14/12

to pyftpdli...@googlegroups.com

Comment #9 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

I added your tests as r958.

pyft...@googlecode.com

unread,

Jan 14, 2012, 12:14:08 PM1/14/12

to pyftpdli...@googlegroups.com

Updates:
Status: Started

Comment #10 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Ok, preliminary patch in attachment (applied to r963) adds 2.x unicode
support.
It seems to work and also passes all tests.
I fixed OOB test failure was with:

line = line.decode('utf8', errors='replace')

Not sure whether it is better to return an error response though.

Attachments:
unicode.patch 10.1 KB

pyft...@googlecode.com

unread,

Jan 14, 2012, 3:22:02 PM1/14/12

to pyftpdli...@googlegroups.com

Comment #11 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

New patch (based on r967) including test changes I previously left out by
accident.

Attachments:
unicode2.patch 26.1 KB

pyft...@googlecode.com

unread,

Jan 22, 2012, 12:37:34 PM1/22/12

to pyftpdli...@googlegroups.com

Comment #12 on issue 198 by darren.w...@gmail.com: Patch to add unicode

support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Sorry for the big delay - busy work week and a new born at home is keeping
me occupied :)

The patch looks great to me - I've ran it internally and it certainly
behaves the same as well.

Is the 0.7 release imminent?

pyft...@googlecode.com

unread,

Jan 22, 2012, 2:08:58 PM1/22/12

to pyftpdli...@googlegroups.com

Updates:
Labels: Compatibility

Comment #13 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Yep, I hope to release it next week.
After then, we can start working on this.
We'll first introduce unicode support for python 2.x, stabilize it as much
as possible, and then start the transition towards python 3.x as I already
did with psutil:
http://code.google.com/p/psutil/issues/detail?id=75
That way both python 2.x and 3.x will share the same code base and, most
important, the same API.

It's likely that the new version supporting unicode and python 3 will be
marked as 1.0.0 since that's going to break existent code using a
customized filesystem class.

pyft...@googlecode.com

unread,

May 22, 2012, 8:17:18 AM5/22/12

to pyftpdli...@googlegroups.com

Comment #14 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Issue 121 has been merged into this issue.

pyft...@googlecode.com

unread,

May 22, 2012, 8:33:42 AM5/22/12

to pyftpdli...@googlegroups.com

Updates:
Status: FixedInSVN
Labels: Milestone-1.0.0

Comment #15 on issue 198 by g.rodola: Patch to add unicode support to trunk
http://code.google.com/p/pyftpdlib/issues/detail?id=198

Ladies and gents, we made it: we now have full unicode support (RFC-2640)
as part of the process which ported pyftpdlib to python 3.
Attached is a patch including all the changes which were submitted against
the py3-unicode branch and which are now merged back into trunk as r1046.

The change was massive in that we now use unicode instead of bytes pretty
much everywhere, so it's likely that existent code bases overriding the
base authorizer or filesystem classes won't work.
I'll write down instructions on how to make the porting later on.

Attachments:
py3-unicode.patch 99 KB

Reply all

Reply to author

Forward