[vim/vim] Impossible to pass non-UTF-8 strings from vim to Python 3 (#1053)

Marius Gedminas

unread,

Sep 10, 2016, 3:13:39 PM9/10/16

to vim/vim

When a vim variable has a value that is not a valid UTF-8 string, :py3 vim.eval('variable') raises UnicodeDecodeError.

This is causing plugins such as UltiSnips to crash when they try to process a list of mappings that involve <A-@>, since Vim thinks <A-@> is byte 0xC0. Here's what happens in the plugin:

it runs vim.command('redir => _tmp_smaps | smap | redir END') to get the mappings
it tries to parse vim.eval('_tmp_smaps').splitlines() to see what select-mode mappings there are

Now I've the following mapping in my .vimrc

map! <Esc>@ <A-@>

(I've many more, but this one seems to be the one causing the problem)

When I run smap, vim shows this mapping as

but when I do

:redir => tmp | smap <Esc>@ | redir END
:echo tmp

I see

and I can reproduce the crash with

:py3 import vim; vim.eval('tmp')

Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 18: invalid start byte

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub

Björn Linse

unread,

Sep 11, 2016, 10:14:42 AM9/11/16

to vim/vim

Here is another test case

:0put =printf('%c',0xFF)
:py3 print(repr(vim.current.buffer[0]))
:py3 vim.current.buffer[0] += "x"
:py3 print(vim.eval("getbufline('', 1)"))

apparently the first :py3 is ok, it uses surrogatescape, but not the other two. Probably :python3 should use surrogatescape always, and both when decoding and encoding. (This is what neovim's :python3 implementation does, BTW)

Björn Linse

unread,

Sep 11, 2016, 10:28:17 AM9/11/16

to vim/vim

Maybe like this:

diff --git a/src/if_py_both.h b/src/if_py_both.h
index c44fc93..19f8584 100644
--- a/src/if_py_both.h
+++ b/src/if_py_both.h
@@ -134,7 +134,7 @@ StringToChars(PyObject *obj, PyObject **todecref)
     {
        PyObject        *bytes;

-       if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL)))
+       if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, CODEC_ERROR_HANDLER)))
            return NULL;

        if(PyBytes_AsStringAndSize(bytes, (char **) &str, NULL) == -1
@@ -4117,7 +4117,7 @@ StringToLine(PyObject *obj)
     }
     else if (PyUnicode_Check(obj))
     {
-       if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL)))
+       if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, CODEC_ERROR_HANDLER)))
            return NULL;

        if (PyBytes_AsStringAndSize(bytes, &str, &len) == -1
@@ -6197,7 +6197,7 @@ _ConvertFromPyObject(PyObject *obj, typval_T *tv, PyObject *lookup_dict)
        PyObject        *bytes;
        char_u  *str;

-       bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL);
+       bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, CODEC_ERROR_HANDLER);
        if (bytes == NULL)
            return -1;

diff --git a/src/if_python.c b/src/if_python.c
index 622634d..edb6400 100644
--- a/src/if_python.c
+++ b/src/if_python.c
@@ -90,6 +90,9 @@ struct PyMethodDef { Py_ssize_t a; };
 # define PySequenceMethods Py_ssize_t
 #endif

+/* The "surrogateescape" error handler is new in Python 3.1 */
+#define CODEC_ERROR_HANDLER NULL
+
 #if defined(PY_VERSION_HEX) && PY_VERSION_HEX >= 0x02070000
 # define PY_USE_CAPSULE
 #endif
diff --git a/src/if_python3.c b/src/if_python3.c
index 53a1313..5d9c058 100644
--- a/src/if_python3.c
+++ b/src/if_python3.c
@@ -96,7 +96,7 @@
 # define PyString_Check(obj) PyUnicode_Check(obj)
 #endif
 #define PyString_FromString(repr) \
-    PyUnicode_Decode(repr, STRLEN(repr), ENC_OPT, NULL)
+    PyUnicode_Decode(repr, STRLEN(repr), ENC_OPT, CODEC_ERROR_HANDLER)
 #define PyString_FromFormat PyUnicode_FromFormat
 #ifndef PyInt_Check
 # define PyInt_Check(obj) PyLong_Check(obj)

nuko8

unread,

Sep 11, 2016, 11:32:16 AM9/11/16

to vim/vim

To incorporate @bfredl 's patch into our build system, it looks like we need an additional patch:

diff --git a/src/if_python.c b/src/if_python.c
index edb6400..ffad23e 100644
--- a/src/if_python.c
+++ b/src/if_python.c
@@ -91,7 +91,9 @@ struct PyMethodDef { Py_ssize_t a; };
 #endif

 /* The "surrogateescape" error handler is new in Python 3.1 */
-#define CODEC_ERROR_HANDLER NULL
+#if defined(PY_VERSION_HEX) && PY_VERSION_HEX < 0x03010000
+# define CODEC_ERROR_HANDLER NULL
+#endif


 #if defined(PY_VERSION_HEX) && PY_VERSION_HEX >= 0x02070000
 # define PY_USE_CAPSULE

With those patches, I confirm that the resulting vim works well with the examples @bfredl mentioned above.

But I don't think I am a good python tester. It would be far better if someone else could confirm that, ideally, with other examples.

Björn Linse

unread,

Sep 11, 2016, 11:42:30 AM9/11/16

to vim/vim

Hmm, but ain't PY_VERSION_HEX < 0x03010000 always true when you are in if_python.c ? If the version hex is >= 0x03000000, if_python3.c should be used...

Nikolay Aleksandrovich Pavlov

unread,

Sep 11, 2016, 12:09:03 PM9/11/16

to vim_dev, reply+00b1d198484b2e50035e442bcdedcac5744d6ea...@reply.github.com, vim/vim

I would suggest something like below, but I have no tests.

commit 615351832d75df3dfbc3f22694e675583e0b325d
Author: ZyX <kp-...@yandex.ru>
Date: Tue Aug 16 21:42:24 2016 +0300

Use surrogateescape when appropriate

diff --git a/src/if_py_both.h b/src/if_py_both.h
index 35ad5d0..6709300 100644
--- a/src/if_py_both.h
+++ b/src/if_py_both.h
@@ -134,7 +134,8 @@ StringToChars(PyObject *obj, PyObject **todecref)

{
PyObject *bytes;

- if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL)))
+ if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT,

+ ERRORS_ENCODE_ARG)))

return NULL;

if(PyBytes_AsStringAndSize(bytes, (char **) &str, NULL) == -1

@@ -4117,7 +4118,8 @@ StringToLine(PyObject *obj)

}
else if (PyUnicode_Check(obj))
{
- if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL)))
+ if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT,

+ ERRORS_ENCODE_ARG)))

return NULL;

if (PyBytes_AsStringAndSize(bytes, &str, &len) == -1

@@ -6197,7 +6199,7 @@ _ConvertFromPyObject(PyObject *obj, typval_T

*tv, PyObject *lookup_dict)
PyObject *bytes;
char_u *str;

- bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL);

+ bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, ERRORS_ENCODE_ARG);

if (bytes == NULL)
return -1;

diff --git a/src/if_python.c b/src/if_python.c
index 622634d..1cafe34 100644
--- a/src/if_python.c
+++ b/src/if_python.c
@@ -70,6 +70,9 @@
# undef PY_SSIZE_T_CLEAN
#endif

+#define ERRORS_DECODE_ARG NULL
+#define ERRORS_ENCODE_ARG ERRORS_DECODE_ARG
+
#if defined(MACOS) && !defined(MACOS_X_UNIX)
# include "macglue.h"
# include <CodeFragments.h>
diff --git a/src/if_python3.c b/src/if_python3.c
index 10984cd..9085c3c 100644
--- a/src/if_python3.c
+++ b/src/if_python3.c
@@ -91,12 +91,15 @@
/* Python 3 does not support CObjects, always use Capsules */
#define PY_USE_CAPSULE

+#define ERRORS_DECODE_ARG CODEC_ERROR_HANDLER
+#define ERRORS_ENCODE_ARG ERRORS_DECODE_ARG
+
#define PyInt Py_ssize_t
#ifndef PyString_Check

# define PyString_Check(obj) PyUnicode_Check(obj)
#endif
#define PyString_FromString(repr) \
- PyUnicode_Decode(repr, STRLEN(repr), ENC_OPT, NULL)

+ PyUnicode_Decode(repr, STRLEN(repr), ENC_OPT, ERRORS_DECODE_ARG)

#define PyString_FromFormat PyUnicode_FromFormat
#ifndef PyInt_Check
# define PyInt_Check(obj) PyLong_Check(obj)

@@ -969,8 +972,8 @@ DoPyCommand(const char *cmd, rangeinitializer
init_range, runner run, void *arg)
/* PyRun_SimpleString expects a UTF-8 string. Wrong encoding may cause
* SyntaxError (unicode error). */
cmdstr = PyUnicode_Decode(cmd, strlen(cmd),
- (char *)ENC_OPT, CODEC_ERROR_HANDLER);
- cmdbytes = PyUnicode_AsEncodedString(cmdstr, "utf-8", CODEC_ERROR_HANDLER);
+ (char *)ENC_OPT, ERRORS_DECODE_ARG);
+ cmdbytes = PyUnicode_AsEncodedString(cmdstr, "utf-8", ERRORS_ENCODE_ARG);
Py_XDECREF(cmdstr);

run(PyBytes_AsString(cmdbytes), arg, &pygilstate);
@@ -1642,7 +1645,7 @@ LineToString(const char *str)
}
*p = '\0';

- result = PyUnicode_Decode(tmp, len, (char *)ENC_OPT, CODEC_ERROR_HANDLER);
+ result = PyUnicode_Decode(tmp, len, (char *)ENC_OPT, ERRORS_DECODE_ARG);

vim_free(tmp);
return result;

u.diff

vim-dev ML

unread,

Sep 11, 2016, 12:09:36 PM9/11/16

to vim/vim, vim-dev ML, Your activity

Nikolai Aleksandrovich Pavlov

unread,

Sep 11, 2016, 12:11:12 PM9/11/16

to vim/vim, vim-dev ML, Comment

—
You are receiving this because you commented.

Nikolai Aleksandrovich Pavlov

unread,

Sep 11, 2016, 12:14:14 PM9/11/16

to vim/vim, vim-dev ML, Comment

615351832d75df3dfbc3f22694e675583e0b325d

—
You are receiving this because you commented.

Nikolai Aleksandrovich Pavlov

unread,

Sep 11, 2016, 12:15:26 PM9/11/16

to vim/vim, vim-dev ML, Comment

ZyX-I/vim@6153518

—
You are receiving this because you commented.

Björn Linse

unread,

Sep 11, 2016, 12:23:21 PM9/11/16

to vim/vim, vim-dev ML, Comment

What is wrong with just using CODEC_ERROR_HANDLER directly?

—
You are receiving this because you commented.

Björn Linse

unread,

Sep 11, 2016, 12:59:41 PM9/11/16

to vim_dev, vim-dev...@256bit.org

BTW, something is wrong with the github <-> vim-dev bridge, this comment (and the edited version with the correct link below) is really by ZyX-I, but posted by the bridge on vim-dev under my name (Björn Linse).

Christian Brabandt

unread,

Sep 11, 2016, 1:51:59 PM9/11/16

to vim_dev

Hi Björn!

On So, 11 Sep 2016, Björn Linse wrote:

> On Sunday, September 11, 2016 at 6:14:14 PM UTC+2, Björn Linse wrote:
> > 615351832d75df3dfbc3f22694e675583e0b325d

> BTW, something is wrong with the github <-> vim-dev bridge, this comment (and the edited version with the correct link below) is really by ZyX-I, but posted by the bridge on vim-dev under my name (Björn Linse).

I received the same mail with the wrong From header from github.

Best,
Christian
--
Willst du dir den Tag versauen, mußt du in den Spiegel schauen.

Nikolay Aleksandrovich Pavlov

unread,

Sep 11, 2016, 4:49:43 PM9/11/16

to vim_dev, reply+00b1d198126a5e5513230433f03eaae48c72b39...@reply.github.com, vim/vim, vim-dev ML, Comment

2016-09-11 19:23 GMT+03:00 Björn Linse <vim-dev...@256bit.org>:
> What is wrong with just using CODEC_ERROR_HANDLER directly?

AFAIR this is because Python-2 may need "surrogateescape" argument in
the Python->Vim direction (to make writing cross-Python scripts), but
it definitely does not need it when converting from Vim to Python.
That commit does nothing like this, it is a minimal required change to
fix the issue.

>
> —

> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
>

> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups
> "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vim_dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

vim-dev ML

unread,

Sep 11, 2016, 4:50:09 PM9/11/16

to vim/vim, vim-dev ML, Your activity

2016-09-11 19:23 GMT+03:00 Björn Linse <vim-dev...@256bit.org>:
> What is wrong with just using CODEC_ERROR_HANDLER directly?

AFAIR this is because Python-2 may need "surrogateescape" argument in
the Python->Vim direction (to make writing cross-Python scripts), but
it definitely does not need it when converting from Vim to Python.
That commit does nothing like this, it is a minimal required change to
fix the issue.

>
> —

> You are receiving this because you commented.

> Reply to this email directly, view it on GitHub
>

> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups
> "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vim_dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Björn Linse

unread,

Sep 11, 2016, 5:03:55 PM9/11/16

to vim/vim, vim-dev ML, Comment

But python2 does not have surrogateescape, in python2 one would represent a byte string as str with no problems. The very point of "surrogateescape" is that you use it in both directions, so that a bytestring can be roundtripped losslessly as a python3 str and then back to a bytestring (but only in that direction).

I'm asking because after your patch CODEC_ERROR_HANDLER is then only ever to #define another macro. Why not just use it directly as in my patch? If indirection is needed later it could be added later when it's needed.

—
You are receiving this because you commented.

Nikolai Aleksandrovich Pavlov

unread,

Sep 11, 2016, 6:16:57 PM9/11/16

to vim/vim, vim-dev ML, Comment

@bfredl Python2 has unicode() strings. And if you want to write python23 scripts you will use from __future__ import unicode_literals (thus Python->Vim will receive unicode strings with no non-unicode characters unless explicitly requested), but Vim API will still produce byte strings in Python 2, so using this in “both directions” is impossible. Also my Python2 has surrogateescape error handler, though after some investigation it appeared that it was coming from some site package provided for compatibility with Python3 so this idea does not make much sense.

—
You are receiving this because you commented.

Christian Brabandt

unread,

Sep 12, 2016, 4:57:39 AM9/12/16

to vim_dev

Hi Björn!

On So, 11 Sep 2016, Björn Linse wrote:

> On Sunday, September 11, 2016 at 6:14:14 PM UTC+2, Björn Linse wrote:
> > 615351832d75df3dfbc3f22694e675583e0b325d

> BTW, something is wrong with the github <-> vim-dev bridge, this
> comment (and the edited version with the correct link below) is really
> by ZyX-I, but posted by the bridge on vim-dev under my name (Björn
> Linse).

No, the mirror script works correctly:
https://groups.google.com/d/msg/vim_dev/nFp_J257Yyc/DKisrqYEAQAJ

If you look into the original, with all headers:
https://groups.google.com/forum/#!original/vim_dev/nFp_J257Yyc/DKisrqYEAQAJ
you can see the From header:

,----
| From: Nikolai Aleksandrovich Pavlov (Vim Github Repository) <vim-dev...@256bit.org>
`----

I am not sure, why your mailer put your name into the attribution.

Best,
Christian
--
Warum besteht Zitronenlimonade größtenteils aus künstlichen Zutaten,
während in Geschirrspülmittel richtiger Zitronensaft drin ist?

Björn Linse

unread,

Sep 12, 2016, 5:09:42 AM9/12/16

to vim_dev

Not sure what you mean with "mailer", I was replying to how it looked at the google groups web interface. But today it looks correct, so there could've been a bug at the web interface that was fixed.

Christian Brabandt

unread,

Sep 12, 2016, 6:21:44 AM9/12/16

to vim...@googlegroups.com

Am 2016-09-12 11:09, schrieb Björn Linse:
> On Monday, September 12, 2016 at 10:57:39 AM UTC+2, Christian Brabandt
> wrote:
>> On So, 11 Sep 2016, Björn Linse wrote:
>> > On Sunday, September 11, 2016 at 6:14:14 PM UTC+2, Björn Linse wrote:
>> > > 615351832d75df3dfbc3f22694e675583e0b325d
>> > BTW, something is wrong with the github <-> vim-dev bridge, this
>> > comment (and the edited version with the correct link below) is really
>> > by ZyX-I, but posted by the bridge on vim-dev under my name (Björn
>> > Linse).
>>
>> No, the mirror script works correctly:
>> https://groups.google.com/d/msg/vim_dev/nFp_J257Yyc/DKisrqYEAQAJ
>>
>> If you look into the original, with all headers:
>> https://groups.google.com/forum/#!original/vim_dev/nFp_J257Yyc/DKisrqYEAQAJ
>> you can see the From header:
>>
>> ,----
>> | From: Nikolai Aleksandrovich Pavlov (Vim Github Repository)
>> <vim-dev...@256bit.org>
>> `----
>>
>> I am not sure, why your mailer put your name into the attribution.
>>

> Not sure what you mean with "mailer", I was replying to how it looked
> at the google groups web interface. But today it looks correct, so
> there could've been a bug at the web interface that was fixed.

I meant mailer like MUA, since I saw the problem only in the mail, you
quoted.
So it looked like a problem with your mailer/MUA

Best,
Christian

lacygoill

unread,

Dec 21, 2020, 9:38:55 AM12/21/20

to vim/vim, vim-dev ML, Comment

In case it helps, here is an updated patch from @ZyX-I:

diff --git a/src/if_py_both.h b/src/if_py_both.h
index 7b748b25e..e657624dd 100644
--- a/src/if_py_both.h
+++ b/src/if_py_both.h
@@ -130,7 +130,8 @@ StringToChars(PyObject *obj, PyObject **todecref)
     {
 	PyObject	*bytes;

-	if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL)))
+	if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT,
+			ERRORS_ENCODE_ARG)))
 	    return NULL;
 
 	if(PyBytes_AsStringAndSize(bytes, (char **) &str, NULL) == -1

@@ -4243,7 +4244,8 @@ StringToLine(PyObject *obj)
     }
     else if (PyUnicode_Check(obj))
     {

-	if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL)))
+	if (!(bytes = PyUnicode_AsEncodedString(obj, ENC_OPT,
+			ERRORS_ENCODE_ARG)))
 	    return NULL;
 
 	if (PyBytes_AsStringAndSize(bytes, &str, &len) == -1

@@ -6290,7 +6292,7 @@ _ConvertFromPyObject(PyObject *obj, typval_T *tv, PyObject *lookup_dict)
 	PyObject	*bytes;
 	char_u	*str;

-	bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, NULL);
+	bytes = PyUnicode_AsEncodedString(obj, ENC_OPT, ERRORS_ENCODE_ARG);
 	if (bytes == NULL)
 	    return -1;
 
diff --git a/src/if_python.c b/src/if_python.c

index 6338a5b8d..29f7ed560 100644
--- a/src/if_python.c
+++ b/src/if_python.c
@@ -69,6 +69,9 @@

 # undef PY_SSIZE_T_CLEAN
 #endif
 
+#define ERRORS_DECODE_ARG NULL
+#define ERRORS_ENCODE_ARG ERRORS_DECODE_ARG
+

 #undef main // Defined in python.h - aargh
 #undef HAVE_FCNTL_H // Clash with os_win32.h
 
diff --git a/src/if_python3.c b/src/if_python3.c
index a51be2949..ea4fd7dd8 100644
--- a/src/if_python3.c
+++ b/src/if_python3.c
@@ -81,12 +81,15 @@
 // Python 3 does not support CObjects, always use Capsules
 #define PY_USE_CAPSULE

+#define ERRORS_DECODE_ARG CODEC_ERROR_HANDLER
+#define ERRORS_ENCODE_ARG ERRORS_DECODE_ARG
+
 #define PyInt Py_ssize_t
 #ifndef PyString_Check
 # define PyString_Check(obj) PyUnicode_Check(obj)
 #endif
 #define PyString_FromString(repr) \
-    PyUnicode_Decode(repr, STRLEN(repr), ENC_OPT, NULL)
+    PyUnicode_Decode(repr, STRLEN(repr), ENC_OPT, ERRORS_DECODE_ARG)
 #define PyString_FromFormat PyUnicode_FromFormat
 #ifndef PyInt_Check
 # define PyInt_Check(obj) PyLong_Check(obj)

@@ -1088,8 +1091,8 @@ DoPyCommand(const char *cmd, rangeinitializer init_range, runner run, void *arg)
     // PyRun_SimpleString expects a UTF-8 string. Wrong encoding may cause
     // SyntaxError (unicode error).
     cmdstr = PyUnicode_Decode(cmd, strlen(cmd),

-					(char *)ENC_OPT, CODEC_ERROR_HANDLER);
-    cmdbytes = PyUnicode_AsEncodedString(cmdstr, "utf-8", CODEC_ERROR_HANDLER);
+					(char *)ENC_OPT, ERRORS_DECODE_ARG);
+    cmdbytes = PyUnicode_AsEncodedString(cmdstr, "utf-8", ERRORS_ENCODE_ARG);
     Py_XDECREF(cmdstr);
 
     run(PyBytes_AsString(cmdbytes), arg, &pygilstate);

@@ -1745,7 +1748,7 @@ LineToString(const char *str)
     }
     *p = '\0';

-    result = PyUnicode_Decode(tmp, len, (char *)ENC_OPT, CODEC_ERROR_HANDLER);
+    result = PyUnicode_Decode(tmp, len, (char *)ENC_OPT, ERRORS_DECODE_ARG);
 
     vim_free(tmp);
     return result;

As for the question asked in the relevant todo item:

https://github.com/vim/vim/blob/ef2dff52de52c17fe1bd7c06cbb32d8955901f5a/runtime/doc/todo.txt#L1276-L1277

Yes, the patch works. I tested it against the original example. Without the patch:

vim -Nu NONE -S <(cat <<'EOF'
    smap <Esc>@ <A-@>
    py3 vim.command('redir => _tmp_smaps | smap | redir END')
    py3 vim.eval('_tmp_smaps').splitlines()
EOF
)

Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 18: invalid start byte

With the patch:

./src/vim -Nu NONE -S <(cat <<'EOF'
    smap <Esc>@ <A-@>
    py3 vim.command('redir => _tmp_smaps | smap | redir END')
    py3 vim.eval('_tmp_smaps').splitlines()
EOF
)

# no error

Note that someone has asked a question on vi.stackexchange which I think has the same cause. This command:

:let variable = "\<bs>" | py3 print(vim.eval('variable'))

raises this error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

I've tried the patch, and unfortunately the error persists; but it changes from UnicodeDecodeError to UnicodeEncodeError:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

ZhiyuanLck

unread,

Dec 21, 2020, 9:52:14 AM12/21/20

to vim/vim, vim-dev ML, Comment

Note that someone has asked a question on vi.stackexchange which I think has the same cause. This command:
:let variable = "\<bs>" | py3 print(vim.eval('variable'))
raises this error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

I've tried the patch, and unfortunately the error persists; but it changes from UnicodeDecodeError to UnicodeEncodeError:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed

So it is an unsolved issue now?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

Christian Brabandt

unread,

Dec 21, 2020, 9:57:19 AM12/21/20

to vim/vim, vim-dev ML, Comment

it's probably unsolved until it has been merged successfully.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

Bram Moolenaar

unread,

Dec 21, 2020, 10:03:41 AM12/21/20

to vim/vim, vim-dev ML, Comment

Closed #1053 via 2e2f52a.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

Bram Moolenaar

unread,

Dec 21, 2020, 10:08:35 AM12/21/20

to vim/vim, vim-dev ML, Comment

Please check for any remaining encoding/decoding issues.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

lacygoill

unread,

Dec 21, 2020, 12:18:34 PM12/21/20

to vim/vim, vim-dev ML, Comment

Please check for any remaining encoding/decoding issues.

The patch has fixed the original issue. But a similar one persists for a string containing <bs>. This command:

:let variable = "\<bs>" | py3 print(vim.eval('variable'))

Raises:

UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

Bram Moolenaar

unread,

Dec 21, 2020, 2:25:11 PM12/21/20

to vim/vim, vim-dev ML, Comment

Reopened #1053.

puremo...@gmail.com

unread,

Dec 22, 2020, 5:05:52 AM12/22/20

to vim_dev

Bit off the wall, but perhaps we should provide a py3 api something like `vim.eval_bytes()` returning a `bytes` instance rather than trying to decode the evaluated bytes into a (unicode) `str` object. In the general case a vim string variable can contain any bytes (except NUL?); to read that into a python object, we have to decode it, but if it's just a variable with no context, we can't assume that it's encoded in any particular way.

I believe that's actually what happens what you access `vim.vars` - you get a `bytes` instance:

i.e. doing "let g:test = 'byte me'" and then

* Ctrl-r= py3eval( 'vim.eval( "g:test" )' )

* Ctrl-r= py3eval( 'type( vim.eval( "g:test" ) ).__name__' )

and

* Ctrl-r=py3eval( 'vim.vars[ "g:test" ]' )

* Ctrl-r=py3eval( 'type( vim.vars[ "g:test" ] ).__name__' )

```

Eval: byte me

Type: str

vim.vars: byte me

Type: bytes

```

This would push he burden of the decoding on to the script author, who may have better knowledge of what the variable contains (encoding wise) ?

Just a thought.

Reply all

Reply to author

Forward