[vim/vim] Python syntax highlighting does not support Unicode characters in identifiers (Issue #12059)

33 views
Skip to first unread message

Gabriel Dupras

unread,
Feb 25, 2023, 12:32:19 PM2/25/23
to vim/vim, Subscribed

Steps to reproduce

  1. vim --clean unicode.py
  2. :set termguicolors (optional, only to make the colors match the image below)
  3. Paste the following code:
def ma_décoration(f):

    pass



@ma_décoration

def lire_données(d):

    pass



class Donnée:

    pass

  1. Note that "ma_décoration", "lire_données", and "Donnée" are partially highlighted because of the "é" character.

Screenshot_20230225_121423

Expected behaviour

Identifiers should be fully highlighted even when they contain Unicode characters.

The code is valid. It runs without issue.

Python supports Unicode characters in identifiers as documented on this page:

https://docs.python.org/3/reference/lexical_analysis.html?highlight=identifier#identifiers

@zvezdan

Version of Vim

9.0.1333

Environment

OS: Kubuntu 22.10
Terminal: Konsole

Logs and stack traces

No response


Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/12059@github.com>

lacygoill

unread,
Feb 25, 2023, 1:22:31 PM2/25/23
to vim/vim, Subscribed

diff --git a/runtime/syntax/python.vim b/runtime/syntax/python.vim
index e67bf58b0..6789efcb1 100644
--- a/runtime/syntax/python.vim
+++ b/runtime/syntax/python.vim
@@ -100,7 +100,7 @@ syn match   pythonConditional   "^\s*\zsmatch\%(\s\+.*:\s*\%(#.*\)\=$\)\@="
 " Decorators
 " A dot must be allowed because of @MyClass.myfunc decorators.
 syn match   pythonDecorator	"@" display contained
-syn match   pythonDecoratorName	"@\s*\h\%(\w\|\.\)*" display contains=pythonDecorator
+syn match   pythonDecoratorName	"@\s*\h\%(\i\|\.\)*" display contains=pythonDecorator
 
 " Python 3.5 introduced the use of the same symbol for matrix multiplication:
 " https://www.python.org/dev/peps/pep-0465/.  We now have to exclude the
@@ -122,7 +122,7 @@ syn match   pythonMatrixMultiply
       \ contains=ALLBUT,pythonDecoratorName,pythonDecorator,pythonFunction,pythonDoctestValue
       \ transparent
 
-syn match   pythonFunction	"\h\w*" display contained
+syn match   pythonFunction	"\h\i*" display contained
 
 syn match   pythonComment	"#.*$" contains=pythonTodo,@Spell
 syn keyword pythonTodo		FIXME NOTE NOTES TODO XXX contained


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/12059/1445175051@github.com>

Gabriel Dupras

unread,
Feb 25, 2023, 6:06:08 PM2/25/23
to vim/vim, Subscribed

I tried your patch and it looks good to me.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/12059/1445223757@github.com>

Zvezdan Petkovic

unread,
Feb 26, 2023, 4:15:28 AM2/26/23
to vim/vim, Subscribed

@gdupras and @lacygoill I am not sure that the patch will work as requested by this issue.
For example, the \h still limits the first character to [A-Za-z_].
Thus, if your program writes def échange it probably won't work because that first latter is not in the range of \h.

Furthermore, according to Vim documentation, \i covers only a subset of the first 256 characters. So, while it may work for West European languages, it will not work for Asian languages that require a full Unicode support or for Cyrillic alphabets.

This will require more investigation for the correct approach.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/12059/1445305821@github.com>

Gabriel Dupras

unread,
Feb 27, 2023, 9:21:10 AM2/27/23
to vim/vim, Subscribed

Then how about this:

diff --git a/runtime/syntax/python.vim b/runtime/syntax/python.vim

index e67bf58b0..90cd241ae 100644

--- a/runtime/syntax/python.vim

+++ b/runtime/syntax/python.vim

@@ -100,7 +100,7 @@ syn match   pythonConditional   "^\s*\zsmatch\%(\s\+.*:\s*\%(#.*\)\=$\)\@="

 " Decorators

 " A dot must be allowed because of @MyClass.myfunc decorators.

 syn match   pythonDecorator	"@" display contained

-syn match   pythonDecoratorName	"@\s*\h\%(\w\|\.\)*" display contains=pythonDecorator

+syn match   pythonDecoratorName	"@\s*\K\%(\k\|\.\)*" display contains=pythonDecorator

 

 " Python 3.5 introduced the use of the same symbol for matrix multiplication:

 " https://www.python.org/dev/peps/pep-0465/.  We now have to exclude the

@@ -122,7 +122,7 @@ syn match   pythonMatrixMultiply

       \ contains=ALLBUT,pythonDecoratorName,pythonDecorator,pythonFunction,pythonDoctestValue

       \ transparent

 

-syn match   pythonFunction	"\h\w*" display contained

+syn match   pythonFunction	"\K\k*" display contained

 

 syn match   pythonComment	"#.*$" contains=pythonTodo,@Spell

 syn keyword pythonTodo		FIXME NOTE NOTES TODO XXX contained

In my quick tests this worked with def échange, and seems to handle Cyrillic and Asian languages correctly. Highlight could be incorrect however if the user modifies iskeyword.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/12059/1446407086@github.com>

lacygoill

unread,
Feb 27, 2023, 9:31:00 AM2/27/23
to vim/vim, Subscribed

Highlight could be incorrect however if the user modifies iskeyword.

You could include this command:

syntax iskeyword @,48-57,_,192-255

See :help :syn-iskeyword.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/12059/1446426515@github.com>

Zvezdan Petkovic

unread,
Feb 28, 2023, 6:03:17 AM2/28/23
to vim/vim, Subscribed

I was looking at the usage of \k and \K with explicit definition of iskeyword for syntax only after the first report.
More testing is needed though, and I can do that next weekend only. Thanks!


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/12059/1447981506@github.com>

Reply all
Reply to author
Forward
0 new messages