-----BEGIN PGP SIGNED MESSAGE-----
Hi Pygments users,
There's been a bug in Pygments for a while now, with the way it
handled non-BMP characters in the XQuery lexer. I don't think many
people used this lexer, and there were no testcases for this
particular issue, so I don't think anybody noticed.
The behavior, before March when I "fixed" it, was that non-BMP
characters would never match.
The behavior, after I "fixed" it, was that non-BMP characters would
match on wide Python builds. But on Narrow builds, like ship on OS-X
apparently, the regex would be uncompilable.
I'd like to fix it properly before the next release goes out.
Attached is a patch I'd like to propose, which adds a new function
'unirange' which will construct the appropriate regex to match a
non-BMP range against the internal representation of a string.
I've tested against
- - narrow python 2.67/2.7.1 (OS X Lion)
- - wide python 2.7.3 (Arch Linux)
- - wide python 3.2.3 (Arch Linux)
- - jython 2.5 
Please test on other platforms/versions, give it a look, and comment,
especially with regard to 2to3 compatibility. I hope this is an
acceptable level of ugliness in order to have portable support for all
of Unicode. And sorry for the breakage. I'm updating regexlint to
detect cases like this explicitly.
 in a nutshell, BMP = \u0000-\uffff, non-BMP is > \uffff
 there's an unrelated change in vimbuiltins that is necessary to
work around a code size limitation in jython, as well as one test that
still fails due to what appears to be a limit on alternation
repetitions for the way we typically match strings, for which I intend
to report a bug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----