[vim/vim] GVIM not reporting correct byte offsets (Issue #13731)

3052

unread,

Dec 19, 2023, 5:34:51 PM12/19/23

to vim/vim, Subscribed

Steps to reproduce

using this file (inside, not the zip):

https://github.com/vim/vim/files/13720982/index_video_5_0_1.zip

If I open the same file in GVIM and enter /mdat, enter, g, ctrl+g I get:

Byte 2785

Expected behaviour

if I run this Go program:

package main

import (
   "bytes"
   "os"
)

func main() {
   b, err := os.ReadFile("index_video_5_0_1.mp4")
   if err != nil {
      panic(err)
   }
   i := bytes.Index(b, []byte("mdat"))
   println(i)
}

I get 2578. why is Vim off by over 200 bytes?

Version of Vim

https://github.com/vim/vim-win32-installer/releases/tag/v9.0.2175

Environment

https://github.com/vim/vim-win32-installer/releases/tag/v9.0.2175

Logs and stack traces

No response

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

zeertzjq

unread,

Dec 19, 2023, 5:44:27 PM12/19/23

to vim/vim, Subscribed

You need to edit the file with vim -b or :edit ++bin, otherwise Vim will convert the file encoding and the line endings.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

3052

unread,

Dec 19, 2023, 6:09:28 PM12/19/23

to vim/vim, Subscribed

if I use your option I get 2579, which is OK since Vim starts with 1. thank you!

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

3052

unread,

Dec 19, 2023, 6:09:29 PM12/19/23

to vim/vim, Subscribed

Closed #13731 as completed.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Gary Johnson

unread,

Dec 20, 2023, 2:20:42 AM12/20/23

to reply+ACY5DGCMYT6GQJS32Y...@reply.github.com, vim...@googlegroups.com

On 2023-12-19, 3052 wrote:
> Steps to reproduce
>
> using this file (inside, not the zip):
>
> https://github.com/vim/vim/files/13720982/index_video_5_0_1.zip
>
> If I open the same file in GVIM and enter /mdat, enter, g, ctrl+g I get:
>
> Byte 2785
>
> Expected behaviour
>
> if I run this Go program:
>
> package main
>
> import (
> "bytes"
> "os"
> )
>
> func main() {
> b, err := os.ReadFile("index_video_5_0_1.mp4")
> if err != nil {
> panic(err)
> }
> i := bytes.Index(b, []byte("mdat"))
> println(i)
> }
>
> I get 2578. why is Vim off by over 200 bytes?
>
> Version of Vim
>
> https://github.com/vim/vim-win32-installer/releases/tag/v9.0.2175
>
> Environment
>
> https://github.com/vim/vim-win32-installer/releases/tag/v9.0.2175

I can replicate it with vim 9.0.2130 on Ubuntu 20.04, but I can't
explain it. Rather than use a custom program to count the bytes,
I just used hexdump. The full output of g Ctrl-G in vim is this:

Col 551-983 of 1063-1559; Line 8 of 4914; Word 713 of 35890; Char 2579 of 1282363; Byte 2785 of 1920490

Note that it reports "Char 2579", which should be byte 2579.

Regards,
Gary

vim-dev ML

unread,

Dec 20, 2023, 2:21:03 AM12/20/23

to vim/vim, vim-dev ML, Your activity

On 2023-12-19, 3052 wrote:
> Steps to reproduce
>
> using this file (inside, not the zip):
>
> https://github.com/vim/vim/files/13720982/index_video_5_0_1.zip
>
> If I open the same file in GVIM and enter /mdat, enter, g, ctrl+g I get:
>
> Byte 2785
>
> Expected behaviour
>
> if I run this Go program:
>
> package main
>
> import (
> "bytes"
> "os"
> )
>
> func main() {
> b, err := os.ReadFile("index_video_5_0_1.mp4")
> if err != nil {
> panic(err)
> }
> i := bytes.Index(b, []byte("mdat"))
> println(i)
> }
>
> I get 2578. why is Vim off by over 200 bytes?
>
> Version of Vim
>
> https://github.com/vim/vim-win32-installer/releases/tag/v9.0.2175
>
> Environment
>
> https://github.com/vim/vim-win32-installer/releases/tag/v9.0.2175

I can replicate it with vim 9.0.2130 on Ubuntu 20.04, but I can't
explain it. Rather than use a custom program to count the bytes,
I just used hexdump. The full output of g Ctrl-G in vim is this:

Col 551-983 of 1063-1559; Line 8 of 4914; Word 713 of 35890; Char 2579 of 1282363; Byte 2785 of 1920490

Note that it reports "Char 2579", which should be byte 2579.

Regards,
Gary

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

3052

unread,

Dec 20, 2023, 8:54:03 AM12/20/23

to vim/vim, vim-dev ML, Comment

Reopened #13731.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

3052

unread,

Dec 20, 2023, 8:56:09 AM12/20/23

to vim/vim, vim-dev ML, Comment

I think I agree with the previous user. the byte offset is 2578 (or 2579 in VIM), so its should not be possible for VIM to be returning any higher number such as 2785. so I would consider that an error. the byte is the smallest "unit of measurement", so it should always be the highest quantity. reopening this.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

zeertzjq

unread,

Dec 20, 2023, 8:58:45 AM12/20/23

to vim/vim, vim-dev ML, Comment

As I said in the previous comment, if 'binary' is not set, Vim converts the line endings of a file to be LF, CR or CRLF.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

3052

unread,

Dec 20, 2023, 9:02:46 AM12/20/23

to vim/vim, vim-dev ML, Comment

that doesnt explain the increased count. can you please clarify?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

zeertzjq

unread,

Dec 20, 2023, 9:02:53 AM12/20/23

to vim/vim, vim-dev ML, Comment

And, some bytes in the file correspond to a multibyte char in latin-1 encoding, so such a byte counts as two bytes.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

3052

unread,

Dec 20, 2023, 9:03:41 AM12/20/23

to vim/vim, vim-dev ML, Comment

Closed #13731 as completed.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

zeertzjq

unread,

Dec 20, 2023, 9:08:51 AM12/20/23

to vim/vim, vim-dev ML, Comment

To be clearer, in Vim, 'encoding' is set to utf-8 by default, while the file is recognized as a latin1 file. g_CTRL-G returns the byte count in Vim's current 'encoding', and some bytes in the file correspond to a single-byte char in Latin-1, but a multi-byte char in UTF-8.

Therefore, this will also not happen if you run Vim with --cmd 'set encoding=latin1'.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Gary Johnson

unread,

Dec 20, 2023, 3:09:28 PM12/20/23

to reply+ACY5DGALQXOOZW5JDH...@reply.github.com, vim...@googlegroups.com

On 2023-12-20, zeertzjq wrote:
> And, some bytes in the file correspond to a multibyte char in latin-1 encoding,
> so such a byte counts as two bytes.

I didn't understand that statement at first, but now I do. Thanks.

When Vim's 'encoding' is utf-8 and it reads a file it sees as having
a 'fileencoding' of latin1, it expands the latin1-encoded characters
into utf-8-encoded characters in the buffer. Latin1-encoding uses
1 byte per character while UTF-8 uses 1, 2, 3 or 4 bytes per
character. So the number of bytes in Vim's buffer may exceed the
number of bytes in the file, as it does in the OP's case.

If that's a problem, you can fix it by forcing Vim to use latin1
internally:

$ vim --cmd 'set enc=latin1 nofixeol' index_video_5_0_1.mp4

or set binary mode:

$ vim -b --cmd 'set noeol' index_video_5_0_1.mp4

Regards,
Gary

vim-dev ML

unread,

Dec 20, 2023, 3:09:50 PM12/20/23

to vim/vim, vim-dev ML, Your activity

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Reply all

Reply to author

Forward