Proper SQLite Unicode Case-Folding finally solved

312 views
Skip to first unread message

Markus Gritsch

unread,
Jun 6, 2009, 10:25:28 AM6/6/09
to web...@googlegroups.com
Hi,

vanilla SQLite supports a case insensitive LIKE statement only for
ASCII characters. So the following works
SELECT 'a' LIKE 'A'
-> 1
but for characters outside the ASCII range the LIKE statement is case sensitive:
SELECT 'ä' LIKE 'Ä'
-> 0

The same holds true for the LOWER and UPPER functions which work only
for ASCII characters:
SELECT UPPER 'abc'
-> ABC
SELECT UPPER 'μαöä'
-> μαöä

To solve this, SQLite supports the ICU library [1] via the ICU
extension [2]. The problem [3] and a solution [4] has already been
presented on this list. However, there are several drawbacks in using
the ICU library:

• Compared to the quite small and lightweight SQLite library, ICU is
quite large and weights in with several MB.
• The ICU library is a C++ library which brings in a dependeny to the
C++ runtime library.
• ICU should be compiled with the same compiler the Python version was
compiled with. Since Python 2.5 on Windows was compiled using MSVC
2003 and ICU only ships with a solution file for MSVC 2009 this is no
easy task.
• Queries using LIKE become about 3 times slower.

This bothered me quite some time now, and with some luck I recently
came across some simpler solution. sqlite3_unicode.zip [5] is a small
extension for SQLite which tries to solve this case folding problem.
It has been around a while and has been updated to support also more
recent SQLite versions [6]. I was finally able to compile it, and the
latest version, which contains bugfixed tables for accented latin
characters can be found attached to this email. The precompiled
Windows binary for Python 2.5 can be found here: [7] This binary was
compiled to support proper case folding for the LIKE statement. It
supports the UPPER and LOWER functions for characters outside the
ASCII range. The only thing changed from the default #defines was the
automatic unaccenting, which was turned off. So with this build 'ä'
is different from 'a' which is not the case using the default
settings.

• The resulting library is only 9 kB larger than the original one.
• No dependency to the C++ runtime.
• I used MSVC 2003 to compile it, so the same as used for Python 2.5.
• Queries run at the same speed as without the extension.

Currently, the head version of web2py has to be used, which was
recently modified [8] to use a locally installed pysqlite library
before trying to import the version which ships with Python. With
this version you can try in the web2py shell the following:
print db.executesql('SELECT UPPER("ä")')[0][0].encode('utf-8')
which will output the character Ä.

Massimo: Should this version of SQLite be included in the default
web2py binary distribution? The used source file is in the public
domain and has no copyright. I am only able to provide the Windows
binary.

Kind regards,
Markus

[1] http://site.icu-project.org/
[2] http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt
[3] http://groups.google.com/group/web2py/browse_thread/thread/a98973adb0225649/
[4] http://groups.google.com/group/web2py/browse_thread/thread/43fe62d6da330833/
[5] http://ioannis.mpsounds.net/blog/2007/12/19/sqlite-native-unicode-like-support/
[6] http://ioannis.mpsounds.net/blog/2009/01/11/sqlite3_unicode-updated-for-sqlite3-v367/
[7] http://xile.org/le/pysqlite-2.5.5.win32-py2.5.exe
[8] http://bazaar.launchpad.net/~mdipierro/web2py/devel/revision/870

sqlite3_unicode.c.zip

mdipierro

unread,
Jun 6, 2009, 10:55:34 AM6/6/09
to web2py Web Framework

Markus Gritsch

unread,
Jun 24, 2009, 3:18:36 AM6/24/09
to web...@googlegroups.com
On Sat, Jun 6, 2009 at 4:55 PM, mdipierro<mdip...@cs.depaul.edu> wrote:
>
> Probably yes, thanks Markus.
> Any objection?

Since there were no objections, it should probably be included:

http://groups.google.com/group/web2py/browse_thread/thread/a7688ce510d46357/

Markus

DenesL

unread,
Jun 24, 2009, 5:21:13 AM6/24/09
to web2py Web Framework
I have not seen this thread, nice job Markus!.
One question: does it work with other versions of Python?.
Some people here use 2.6 or even 2.4, would it affect them?.

Markus Gritsch

unread,
Jun 24, 2009, 5:33:31 AM6/24/09
to web...@googlegroups.com

I precompiled it only for Python 2.5 for Windows since I use this
version. The official web2py binary download also uses 2.5, so it
would be usable there too.

For 2.4 or 2.6 one has to compile the pysqlite extension and
incorporate sqlite3_unicode.c in the compilation.

Markus

mdipierro

unread,
Jun 29, 2009, 12:31:58 AM6/29/09
to web2py Web Framework
Markus,

sorry for my delay about this issue. Two questions:
1) will this make sqlite files created by web2py windows binary
incompatible with web2py osx binary and or web2py running form source?
If so how bad is the incompatibility?
2) Can you send me step by step instructions for including the file
[7] that you linked into web2py_win.zip

Massimo

On Jun 24, 4:33 am, Markus Gritsch <m.grit...@gmail.com> wrote:

Markus Gritsch

unread,
Jun 29, 2009, 8:33:25 AM6/29/09
to web...@googlegroups.com
On Mon, Jun 29, 2009 at 6:31 AM, mdipierro<mdip...@cs.depaul.edu> wrote:
>
> Markus,
>
> sorry for my delay about this issue. Two questions:
> 1) will this make sqlite files created by web2py windows binary
> incompatible with web2py osx binary and or web2py running form source?

No. The data in the database is not affected by this patch. It only
changes how non-ASCII characters are treated in queries when using
UPPER, LOWER, LIKE, ...

> 2) Can you send me step by step instructions for including the file
> [7] that you linked into web2py_win.zip

• Download the source version of pysqlite from
http://oss.itsystementwicklung.de/trac/pysqlite/

• You need to have MSVC 2003 installed.

• Try building pysqlite by calling
C:\Python25\python setup.py build_static
C:\Python25\python setup.py bdist_wininst

• Place the attached file sqlite3_unicode.c from my original email
into the 'amalgamation' directory.

• Apply the patch which is attached to this email.

• Rebuild pysqlite using
C:\Python25\python setup.py build_static
C:\Python25\python setup.py bdist_wininst

Now you have an installabel pysqlite extension in the 'dist' directory.

Kind regards,
Markus

sqlite3_unicode.patch

mdipierro

unread,
Jun 29, 2009, 9:39:51 AM6/29/09
to web2py Web Framework
I do not have a windows machine.
Can you build this, unzip web2py_win.zip, place the file in the proper
place and rezip it?

Massimo

On Jun 29, 7:33 am, Markus Gritsch <m.grit...@gmail.com> wrote:
> On Mon, Jun 29, 2009 at 6:31 AM, mdipierro<mdipie...@cs.depaul.edu> wrote:
>
> > Markus,
>
> > sorry for my delay about this issue. Two questions:
> > 1) will this make sqlite files created by web2py windows binary
> > incompatible with web2py osx binary and or web2py running form source?
>
> No.  The data in the database is not affected by this patch.  It only
> changes how non-ASCII characters are treated in queries when using
> UPPER, LOWER, LIKE, ...
>
> > 2) Can you send me step by step instructions for including the file
> > [7] that you linked into web2py_win.zip
>
> • Download the source version of pysqlite fromhttp://oss.itsystementwicklung.de/trac/pysqlite/
>
> • You need to have MSVC 2003 installed.
>
> • Try building pysqlite  by calling
> C:\Python25\python setup.py build_static
> C:\Python25\python setup.py bdist_wininst
>
> • Place the attached file sqlite3_unicode.c from my original email
> into the 'amalgamation' directory.
>
> • Apply the patch which is attached to this email.
>
> • Rebuild pysqlite using
> C:\Python25\python setup.py build_static
> C:\Python25\python setup.py bdist_wininst
>
> Now you have an installabel pysqlite extension in the 'dist' directory.
>
> Kind regards,
> Markus
>
>  sqlite3_unicode.patch
> 2KViewDownload

Markus Gritsch

unread,
Jun 29, 2009, 9:56:04 AM6/29/09
to web...@googlegroups.com
On Mon, Jun 29, 2009 at 3:39 PM, mdipierro<mdip...@cs.depaul.edu> wrote:
>
> I do not have a windows machine.
> Can you build this, unzip web2py_win.zip, place the file in the proper
> place and rezip it?

http://xile.org/le/web2py_win.zip

I will leave the file there for about a week.

Markus

mdipierro

unread,
Jun 29, 2009, 10:07:24 AM6/29/09
to web2py Web Framework
Fantastic! i will try process it tonight.

Massimo

On Jun 29, 8:56 am, Markus Gritsch <m.grit...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages