On 17/03/12 10:11, Sudharshan S wrote:
> but I'd like to know why whoosh doesn't do something like this.
You can't change from a sequence of bytes (str) to unicode without knowing
the encoding. Generally that is let slide if all the bytes in the str are
less than 127 and assume they are ascii but that is not necessarily correct.
Pragmatic Unicode (python specific)
http://nedbatchelder.com/text/unipain.html
The Absolute Minimum Every Software Developer Absolutely, Positively Must
Know About Unicode and Character Sets (No Excuses!)
http://www.joelonsoftware.com/articles/Unicode.html
Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEYEARECAAYFAk9oyngACgkQmOOfHg372QRa1gCgtEqPuIPXxvW7v9wA0rQPaV+M
an8An0jUv2S6S7tfVABHFo/GXQQjHts8
=EV4k
-----END PGP SIGNATURE-----