Capture columns nummers of matches ending with double byte chars

84 views
Skip to first unread message

rameo

unread,
Apr 21, 2016, 3:03:28 PM4/21/16
to vim_use
Since I use Vim I have troubles with double byte characters.

I want to capture all strings of matches together with startcolumn and endcolumn of a match (line by line). I don't need only the strings but also the columnnumbers for other functions.

The last few years I used match/matchend then I noted that it did not capture correctly double byte characters within the string.
Then I adapted everything to use searchpos() but today I found out that it gives troubles with string with a double byte character at the end.


"mylist = list with all linenrs having matches
for n in range(0, len(mylist)-1)
let idx = []
let edx = []
let matches_between_cols = []

"FIND ALL IDX MATCHES
"idx --> forward search
call cursor(mylist[n],1)
while line(".") == mylist[n]
let S= searchpos(@/, '')
if S[0] == mylist[n]
call add(idx, S[1]-1)
endif
endwhile
"idx --> backward search (to include matches on first column)
call cursor(mylist[n],len(getline(mylist[n])))
while line(".") == mylist[n]
let S= searchpos(@/, 'b')
if S[0] == mylist[n]
call add(idx, S[1]-1)
endif
endwhile

"FIND ALL EDX MATCHES
"edx --> forward search
call cursor(mylist[n],1)
while line(".") == mylist[n]
let E= searchpos(@/, 'e')
if E[0] == mylist[n]
call add(edx, E[1])
endif
endwhile
"edx --> backward search (to include matches on first column)
call cursor(mylist[n],len(getline(mylist[n])))
while line(".") == mylist[n]
let E= searchpos(@/, 'eb')
if E[0] == mylist[n]
call add(edx, E[1])
endif
endwhile

if len(idx) > 0
for i in range(0,len(idx)-1)
let r = strpart(getline(mylist[n]),idx[i], edx[i]-idx[i])
call add(matches_between_cols, r)
endfor
endif
endfor

-----------------------------------
Buffer:
city | Felicità
whatever | Peach
pmg00000001 | Perché
text| Céline
bMgbXuEWo | Université


@/ = "| \zs\S\+"
it captures:
Felicit<c3>
Peach
Perch<c3>
Céline
Universit<c3>

Expected:
Felicità
Peach
Perché
Céline
Université

Can you please tell me what I did wrong?
(Is it not possible to let every character be a single byte char as in languages as Python?)

rameo

unread,
Apr 23, 2016, 8:37:15 AM4/23/16
to vim_use
Searchpos() doesn't return the right end value of a match if the match end with a double byte character (èéòìùá...). (encoding utf-8)
Isn't this a bug????

Would it be possible to add a feature in Vim like finditer in Python?
Searchpos() searches the entire file till stopline.
finditer returns all startvalues, endvalues and matches of a search in a string (or line).
This would be great!

From Python website:
re.finditer(pattern, string[, flags])

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

>>> text = "He was carefully disguised but captured quickly by police."
>>> for m in re.finditer(r"\w+ly", text):
... print('%02d-%02d: %s' % (m.start(), m.end(), m.group(0)))
07-16: carefully
40-47: quickly

https://docs.python.org/3/library/re.html

Nikolay Aleksandrovich Pavlov

unread,
Apr 23, 2016, 9:44:55 AM4/23/16
to vim...@googlegroups.com
I cannot say what you did wrong, but calling `searchpos()` multiple
times for each occurence of a pattern is rather wasty. Check how I did
this in [formatvim][1], I collect matches there to highlight them
later, so it gets start and end positions of the match.

[1]: https://bitbucket.org/ZyX_I/formatvim/src/a00edc4c7032bde5c7e970bca7871e9317ee2265/autoload/format.vim#format.vim-1457

> (Is it not possible to let every character be a single byte char as in languages as Python?)

This was already discussed many times. No, it is not: backward
compatibility, though there are special functions (useless because
column is byte index and not character index, virtual column is in
screen cells which also does not match characters).

Also any character above U+00FF in Python3 is *not* a single byte, it
is just as single byte character as 0xFFFF is single byte in `[0xFFFF,
0xFFFE, 0xFFFD][0]`. Simply different way of storing and indexing
strings.

>
> --
> --
> You received this message from the "vim_use" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups "vim_use" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ken Takata

unread,
Apr 25, 2016, 4:26:27 PM4/25/16
to vim_use
Hi rameo,

2016/4/23 Sat 21:37:15 UTC+9 rameo wrote:
> Searchpos() doesn't return the right end value of a match if the match end with a double byte character (èéòìùá...). (encoding utf-8)
> Isn't this a bug????

You might misunderstand the spec of searchpos().
When the 'e' flag is specified, the cursor moves to the last character of
the match and searchpos() returns the position of the cursor. It means
that the returned value is a byte index of start of the last character,
not the end of the character.


> The last few years I used match/matchend then I noted that it did not capture correctly double byte characters within the string.

Can you show an example?
match()/matchend() return a byte offset, not a character count.

FYI, matchstrpos() can be used to get both start and end position
after 7.4.1685.


Regards,
Ken Takata

rameo

unread,
Apr 25, 2016, 5:45:10 PM4/25/16
to vim_use
Op maandag 25 april 2016 22:26:27 UTC+2 schreef Ken Takata:
> Hi rameo,
>
> 2016/4/23 Sat 21:37:15 UTC+9 rameo wrote:
> > Searchpos() doesn't return the right end value of a match if the match end with a double byte character (èéòìùá...). (encoding utf-8)
> > Isn't this a bug????
>
> You might misunderstand the spec of searchpos().
> When the 'e' flag is specified, the cursor moves to the last character of
> the match and searchpos() returns the position of the cursor. It means
> that the returned value is a byte index of start of the last character,
> not the end of the character.
>

Hello Ken,

thank you for your reply :)

Just a question, why does someone need the byte index of the start of the last character and not simply the end of the last character?

>
> > The last few years I used match/matchend then I noted that it did not capture correctly double byte characters within the string.
>
> Can you show an example?
> match()/matchend() return a byte offset, not a character count.
>
> FYI, matchstrpos() can be used to get both start and end position
> after 7.4.1685.
>
>
> Regards,
> Ken Takata

That's great news.
I waited long for such a feature.

Can you please send me the url of the site where the patches can be downloaded?
I can't find it anymore.

Nikolay Aleksandrovich Pavlov

unread,
Apr 25, 2016, 7:14:25 PM4/25/16
to vim...@googlegroups.com
It is much better to download sources from https://github.com/vim/vim.

rameo

unread,
Apr 26, 2016, 2:00:33 AM4/26/16
to vim_use

>
> It is much better to download sources from https://github.com/vim/vim.
>

Found it:
https://github.com/vim/vim-win32-installer/releases

However...
Tried:
gvim_7.4.1782_x86.exe
gvim_7.4.1786_x86.exe

Both gives an error:
Error detected while processing vimrc_example.vim
line 114:
E919: Directory not found in 'packpath': "pack/*/opt/matchit"

Christian Brabandt

unread,
Apr 26, 2016, 2:14:33 AM4/26/16
to vim_use
Hi rameo!
Yes, your $VIMRUNTIME does not include the pack directory.
So, you need to install a new runtime. Check the zip file, it includes
an updated runtime directory.


Best,
Christian
--
Falsche Vorstellung, dass man ein Phänomen durch Kalkül oder
durch Worte abtun und beseitigen könne.
-- Goethe, Maximen und Reflektionen, Nr. 1004

rameo

unread,
Apr 26, 2016, 3:14:52 AM4/26/16
to vim_use
Op dinsdag 26 april 2016 08:14:33 UTC+2 schreef Christian Brabandt:
I just commented out 'packadd matchit', now it works.

Thank you very much Christian for these new features.

Does matchstrpos not capture all matches on line?

for n in range(0, len(LinewithMatches)-1)
let s = getline(LinewithMatches[n])
let r= matchstrpos(s, @/)
echo s
echo r
endfor

It only returns the first match on the 1st line.
Did I do something wrong?

rameo

unread,
Apr 26, 2016, 3:48:29 AM4/26/16
to vim_use
Op dinsdag 26 april 2016 09:14:52 UTC+2 schreef rameo:
Found it, I had to add a startposition and adapt the counter in a while loop.

just an idea: add all matches found in entire string to a list
[['match1',2,7]['match2',12,17]..etc]

Ken Takata

unread,
Apr 26, 2016, 8:09:49 AM4/26/16
to vim_use, vim...@vim.org
(Cc-ing to vim_dev)

Hi,

Oops!
We should have updated the nsis script when we turned some scripts into packages.
Attached patch should fix the problem.

Regards,
Ken Takata

support-pack-with-nsi.patch

Bram Moolenaar

unread,
Apr 26, 2016, 11:49:29 AM4/26/16
to Ken Takata, vim_use, vim...@vim.org
Thanks! Took a while before someone noticed...

--
TALL KNIGHT: We are now no longer the Knights Who Say Ni!
ONE KNIGHT: Ni!
OTHERS: Sh!
ONE KNIGHT: (whispers) Sorry.
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///
Reply all
Reply to author
Forward
0 new messages