[vim/vim] How to decode utf-16le streams? (Discussion #19163)

4 views
Skip to first unread message

ubaldot

unread,
Jan 11, 2026, 10:23:40 PMJan 11
to vim/vim, Subscribed

I am running ipython in a terminal buffer and I am trying to capture its stdout in a script by means of out_cb callback.
In the stdout stream there is pretty much everything (ANSI escape codes, etc) that I can somehow filter out, but the main problem is that the stream is utf-16 (I am working on Windows) and I kinda feel that I am reinventing the wheel by writing my own utf-16 decoder which is start becoming very tedious due to all the singularities that may happen.

At the same time I don't think I am the first person on Earth to have bumper in such a problem and I am wondering if anyone already wrote a utf-16 decoder that I can borrow and/or where to find one.

Thanks!


Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/repo-discussions/19163@github.com>

ubaldot

unread,
Jan 17, 2026, 11:37:49 PMJan 17
to vim/vim, Subscribed

FWIW, here is what I made and seems to work. I went the brute-force method to check character-by-character what is sent from the terminal emulator inside Vim, including "garbage".

vim9script

var is_utf16: bool = true
var raw_buf: string = ''

def HandleLine(line: string)
  echo line
enddef

def StripAnsiEscapeSequences(msg: string): string
    # Strip ANSI escape sequences
    return msg->substitute('\e\=\[[0-9;?]*[@-~]', '', 'g')
enddef

def FeedChars(bytes: string, console_prompt: string)
  var line = ''
  var nbytes = is_utf16 ? 2 : 1

  # Accumulate bytes as they appear on the terminal stdout
  raw_buf ..= bytes

  while true
    # OBS! If \r\n is received, then you get an extra blank line as result.
    var idx = is_utf16
      ? match(raw_buf, "\x0D\x00\|\x0A\x00")
      : match(raw_buf, "\r\|\n")

    if idx < 0
      break
    elseif idx == 0
      line = ''
    else
      # Extract one full UTF-16 line (without terminator)
      line = raw_buf[: idx - 1]
    endif

    try
      if is_utf16
        HandleLine(StripAnsiEscapeSequences(iconv(line, 'utf-16le', 'utf-8')), console_prompt)
      else
        HandleLine(StripAnsiEscapeSequences(line), console_prompt)
      endif
    catch
      raw_buf = ''
      repl.Echoerr("Cannot convert utf-16 string (raw buf)")
      break
    endtry

    # Leftovers
    raw_buf = raw_buf[idx + nbytes :]

  endwhile
enddef

def OutCb( _: channel, msg: string)
  FeedChars(msg,)
enddef


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/repo-discussions/19163/comments/15529014@github.com>

Reply all
Reply to author
Forward
0 new messages