[vim/vim] TermDebug DecodeMessage adjustment for un-escaping (PR #9228)

8 views
Skip to first unread message

Simon Sobisch

unread,
Nov 26, 2021, 6:01:52 AM11/26/21
to vim/vim, Subscribed

Those escapes (octal ones) where seen in #2417.


You can view, comment on, or merge this pull request online at:

  https://github.com/vim/vim/pull/9228

Commit Summary

  • 65422df TermDebug DecodeMessage adjustment for un-escaping

File Changes

(1 file)

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub.
Triage notifications on the go with GitHub Mobile for iOS or Android.

Bram Moolenaar

unread,
Nov 26, 2021, 11:51:16 AM11/26/21
to vim/vim, Subscribed

I'll include it, but we should probably handle those TODOs.

Bram Moolenaar

unread,
Nov 26, 2021, 11:51:21 AM11/26/21
to vim/vim, Subscribed

Closed #9228.

Simon Sobisch

unread,
Nov 26, 2021, 2:40:40 PM11/26/21
to vim/vim, Subscribed

we should probably handle those TODOs.

Totally - but as noted in the referenced issue - that's above my "viml", both the "conversion" part and the question if Vim could handle the resolved value.

Maybe the best way to do all of those escaping is to just evaluate the complete message after we removed \n and the "0" from "\0x" - because then everything else like \\, \t and all escaped numbers will be converted that way? As :echo of this works fine it likely will also handle multi-byte strings correctly, wouldn't it?

Bram Moolenaar

unread,
Nov 27, 2021, 5:49:57 AM11/27/21
to vim/vim, Subscribed


> > we should probably handle those TODOs.
>
> Totally - but as noted in the referenced issue - that's above my
> "viml", both the "conversion" part and the question if Vim could
> handle the resolved value.
>
> Maybe the best way to do all of those escaping is to just evaluate the
> complete message after we removed \n and the "0" from "\0x" - because
> then everything else like `\\`, `\t` and all escaped numbers will be
> converted that way? As `:echo` of this works fine it likely will also
> handle multi-byte strings correctly, wouldn't it?

It is a matter of making it look good for the user.
Is there a way to reproduce this, to see these messages?

--
INSPECTOR END OF FILM: Move along. There's nothing to see! Keep moving!
[Suddenly he notices the cameras.]
INSPECTOR END OF FILM: (to Camera) All right, put that away sonny.
[He walks over to it and puts his hand over the lens.]
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Simon Sobisch

unread,
Nov 27, 2021, 11:46:19 AM11/27/21
to vim/vim, Subscribed

Is there a way to reproduce this, to see these messages?

Yes. Just create the folders:

mkdir -p 俺/開発/プログラム

vim 俺/開発/プログラム/test.c

Then add minimal code, compile it with full path and debugging info, then run TermDebug on it.

I did so with the shipped Termdebug of my Debian installation, had the same issue from #2417 (vim just did not recognized the filenames so couldn't follow at all), then did wget the current termdebug.vim, sourced it and retested - now everything works as expected.

Bram Moolenaar

unread,
Nov 28, 2021, 5:53:27 AM11/28/21
to vim/vim, Subscribed

I tried debugging a command in a directory with asian characters, and I don't see a problem.
The name shows up properly in the gdb window, and when setting a breakpoint the value of "argv" show the directory name.
So, what is the problem?

Simon Sobisch

unread,
Nov 28, 2021, 5:57:03 AM11/28/21
to vim/vim, Subscribed

With the new termdebug.vim there is no issue, with the old there was an issue if the source reference in the debugging info contains asian characters (happens when you compile with the full or a relative path or if the source filename itself contains asian characters). As noted: I've verified both the failure and the "now fix" so now there is no problem with that any more.

Simon Sobisch

unread,
Nov 28, 2021, 8:31:30 AM11/28/21
to vim/vim, Subscribed

Sorry, my test missed one thing:

  • execution works
  • cursor following works
  • recognizing breakpoints works
  • setting breakpoints with position specifier works in most cases (because that is passed to GDB as is)
  • deleting breakpoints always works, because we just pass the number
  • setting breakpoints via position doesn't work as vim reasonably sends the full path - and obviously we'd need to escape those characters when sending the -break-insert; I could PR a change where every filename is converted to octal before being sent to GDB , but the only way I could imagine is to loop over the parameter and (try to) use printf on each character to do the conversion - @lacygoill do you have an idea for a better conversion ("eee" -> "\144\144\144) and maybe even one that only converts the non-ascii parts?

lacygoill

unread,
Nov 28, 2021, 10:35:58 AM11/28/21
to vim/vim, Subscribed

do you have an idea for a better conversion ("eee" -> "\144\144\144)

I would try something like this:

:echo 'eee'->substitute('.', {-> '\' .. printf('%o', submatch(0)->char2nr())}, 'g')

\145\145\145


and maybe even one that only converts the non-ascii parts?

I think you can match non-ascii characters with this regex:

[^\x00-\x7f]

But I'm not sure the previous command would work as expected on non-ascii characters. Consider this simple test on the string containing the text résumé:

:echo 'résumé'->substitute('[^\x00-\x7f]', {-> '\' .. printf('%o', submatch(0)->char2nr())}, 'g')

r\351sum\351

Now, if you evaluate this in a double-quoted string, you don't get back résumé. Instead you get r<e9>sum<e9>:

:echo "r\351sum\351"

r<e9>sum<e9>

AFAIK, there is no builtin function to encode a character into its actual byte sequence, as reported by the normal g8 command. I guess you would have to defer the task to xxd(1) which should ship with Vim:

:echo 'résumé'->substitute('[^\x00-\x7f]', {-> systemlist('xxd -plain', submatch(0))->get(0, submatch(0))->split('..\zs')->map({_, v -> ('0x' .. v)->eval()->printf('\%o')})->join('')}, 'g')

r\303\251sum\303\251

The latter sequence is correctly evaluated back into résumé:

:echo "r\303\251sum\303\251"

résumé

Unfortunately, calling an external process might cause the plugin to be too slow. If that's the case, we would need a new Vim script function providing the equivalent of the normal g8 command.

Simon Sobisch

unread,
Nov 28, 2021, 10:52:07 AM11/28/21
to vim/vim, Subscribed

The match with a negative range is good (but I guess that should be [^\x21-\x7e]).

Spawning external commands for each filename the plugin sends to gdb seems to be bad (even when this only happens if such characters are in) and possibly also not portable either. Having a new vim function would be nice - but that would break 8.2 compatibility so it would not be good "as sole option".

From what I've read I think it should be possible to put the output of g8 into a register and then use that as replacement, but I did not worked with either g8 nor with registers before... maybe this would need a new function within termdebug.vim that calls another function for the replacement?

lacygoill

unread,
Nov 28, 2021, 12:17:14 PM11/28/21
to vim/vim, Subscribed

Having a new vim function would be nice - but that would break 8.2 compatibility so it would not be good "as sole option".

Well, the plugin is shipped with Vim, and AFAIK, it's not maintained in a separate repo, so I don't see an issue.

There is even a todo item for using Vim9 for runtime files. If people install an old Vim, they have an old version of the plugin. If they install a recent Vim, they have a recent version of the plugin.


From what I've read I think it should be possible to put the output of g8 into a register and then use that as replacement, but I did not worked with either g8 nor with registers before... maybe this would need a new function within termdebug.vim that calls another function for the replacement?

The problem is that g8 expects to interact with some text in a buffer. Not with a string received from a function. If you know the position of the character in an existing buffer, you might get the output of g8 via something like:

let g8_output = execute('normal! g8')->split('\n')->get(0, '')

That's assuming the buffer is displayed in the current window. If it's in another window, whose ID is known, then you could get g8's output with win_execute():

let g8_output = win_execute(ID, 'normal! g8')->split('\n')->get(0, '')
                            ^^
                            can be obtained with win_getid(winnr)

If there is no easy way to know the position of the character. Then you would need to write it in ad-hoc buffer displayed in a hidden popup window (hidden so as to not disturb the user). For all of that you would probably need:

  • popup_create() with the {'hidden': v:true} option argument to create a hidden popup
  • win_execute() to execute commands in the hidden popup
  • maybe popup_settext() to set or reset the text in the hidden popup at some point

In any case, see this PR for an example.

All of this looks a bit hacky though. But it could work; I don't know.

Yegappan Lakshmanan

unread,
Nov 29, 2021, 11:24:23 AM11/29/21
to vim_dev, reply+ACY5DGA6DLFP57DSKM...@reply.github.com, vim/vim, Subscribed
Hi,

On Sun, Nov 28, 2021 at 7:35 AM lacygoill <vim-dev...@256bit.org> wrote:

do you have an idea for a better conversion ("eee" -> "\144\144\144)

I would try something like this:

:echo 'eee'->substitute('.', {-> '\' .. printf('%o', submatch(0)->char2nr())}, 'g')

\145\145\145


and maybe even one that only converts the non-ascii parts?

I think you can match non-ascii characters with this regex:

[^\x00-\x7f]

But I'm not sure the previous command would work as expected on non-ascii characters. Consider this simple test on the string containing the text résumé:

:echo 'résumé'->substitute('[^\x00-\x7f]', {-> '\' .. printf('%o', submatch(0)->char2nr())}, 'g')

r\351sum\351

Now, if you evaluate this in a double-quoted string, you don't get back résumé. Instead you get r<e9>sum<e9>:

:echo "r\351sum\351"

r<e9>sum<e9>

AFAIK, there is no builtin function to encode a character into its actual byte sequence, as reported by the normal g8 command. I guess you would have to defer the task to xxd(1) which should ship with Vim:


Can you use the str2list() function for this?

- Yegappan

vim-dev ML

unread,
Nov 29, 2021, 11:24:38 AM11/29/21
to vim/vim, vim-dev ML, Your activity

Hi,


On Sun, Nov 28, 2021 at 7:35 AM lacygoill ***@***.***> wrote:

> do you have an idea for a better conversion ("eee" -> "\144\144\144)
>
> I would try something like this:
>
> :echo 'eee'->substitute('.', {-> '\' .. printf('%o', submatch(0)->char2nr())}, 'g')
>
> \145\145\145
>
>
> ------------------------------

>
> and maybe even one that only converts the non-ascii parts?
>
> I think you can match non-ascii characters with this regex:
>
> [^\x00-\x7f]
>
>
> But I'm not sure the previous command would work as expected on non-ascii
> characters. Consider this simple test on the string containing the text
> résumé:
>
> :echo 'résumé'->substitute('[^\x00-\x7f]', {-> '\' .. printf('%o', submatch(0)->char2nr())}, 'g')
>
> r\351sum\351
>
>
> Now, if you evaluate this in a double-quoted string, you don't get back
> résumé. Instead you get r<e9>sum<e9>:
>
> :echo "r\351sum\351"
>
> r<e9>sum<e9>
>
>
> AFAIK, there is no builtin function to encode a character into its actual
> byte sequence, as reported by the normal g8
> <https://vimhelp.org/various.txt.html#g8> command. I guess you would have

> to defer the task to xxd(1) which should ship with Vim:
>

Can you use the str2list() function for this?

- Yegappan


> :echo 'résumé'->substitute('[^\x00-\x7f]', {-> systemlist('xxd -plain', submatch(0))->get(0, submatch(0))->split('..\zs')->map({_, v -> ('0x' .. v)->eval()->printf('\%o')})->join('')}, 'g')
>
> r\303\251sum\303\251
>
>
> The latter sequence is correctly evaluated back into résumé:
>
> :echo "r\303\251sum\303\251"
>
> résumé
>
>
> Unfortunately, calling an external process might cause the plugin to be
> too slow. If that's the case, we would need a new Vim script function
> providing the equivalent of the normal g8 command.
>
>
>

lacygoill

unread,
Nov 29, 2021, 11:36:13 AM11/29/21
to vim/vim, vim-dev ML, Comment

Can you use the str2list() function for this?

Oh, you're right. It would be much easier to read and more efficient:

:echo 'eee'->str2list()->map({_, v -> v->printf('\%o')})->join('')

\145\145\145

It's OK for ascii characters, but for non-ascii characters, the issue remains, as for those, it's the byte sequence which seems to be necessary here:

:echo str2list('é')

[233]

I guess that what would be needed is a function which would output [303, 251]. Maybe something like str2byte():

:echo str2byte('é')

[303, 251]


You are receiving this because you commented.

lacygoill

unread,
Nov 29, 2021, 11:38:32 AM11/29/21
to vim/vim, vim-dev ML, Comment

I guess that what would be needed is a function which would output [303, 251]. Maybe something like str2byte():

:echo str2byte('é')

[303, 251]

Actually, bytes are usually represented with hexadecimal numbers:

:echo str2byte('é')

['c3', 'a9']


You are receiving this because you commented.

Reply all
Reply to author
Forward
0 new messages