[vim/vim] Unicode EN (half an EM) support (#3775)

69 views
Skip to first unread message

Nobuhiro Takasaki

unread,
Jan 7, 2019, 12:20:09 AM1/7/19
to vim/vim, Subscribed

In Unicode, there are only three characters whose display width should be one cell. (EN QUAD / EN SPACE / EN DASH).
It is a problem with CJKV.
ambiwidth should not be applied to these characters.
When it is applied, it looks like this.

image

This screen is written in Japanese.
(EN DASH) is used. It is a short horizontal bar.
In this example, ambiwidth is 2.
In ConPTY at Vim, the rows are swapped.
In winpty in Vim, the rightmost character is not output.
In cmd.exe, it will display normally.

When this patch is applied, the cell width of (EN DASH) becomes 1 and it is displayed normally.


You can view, comment on, or merge this pull request online at:

  https://github.com/vim/vim/pull/3775

Commit Summary

  • Unicode EN (half an EM) support

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub

Codecov

unread,
Jan 7, 2019, 12:39:28 AM1/7/19
to vim/vim, Subscribed

Codecov Report

Merging #3775 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@

##           master    #3775      +/-   ##

==========================================

+ Coverage   78.37%   78.37%   +<.01%     

==========================================

  Files         102      102              

  Lines      141275   141277       +2     

==========================================

+ Hits       110726   110731       +5     

+ Misses      30549    30546       -3
Impacted Files Coverage Δ
src/mbyte.c 62.97% <100%> (+0.04%) ⬆️
src/if_xcmdsrv.c 84.17% <0%> (-0.18%) ⬇️
src/ex_cmds2.c 84.49% <0%> (-0.1%) ⬇️
src/message.c 75.94% <0%> (-0.05%) ⬇️
src/window.c 83.39% <0%> (+0.03%) ⬆️
src/ui.c 47.37% <0%> (+0.07%) ⬆️
src/terminal.c 74.44% <0%> (+0.08%) ⬆️
src/gui.c 57.98% <0%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04b4e1a...9c43f92. Read the comment docs.

Nobuhiro Takasaki

unread,
Jan 7, 2019, 1:13:02 AM1/7/19
to vim/vim, Subscribed

(EN QUAD) is C/D normalized (EN SPACE).
(EN SPACE) is KC/KD normalized to U+0020.
Both are U+0020, which is a blank space of cell width 1.

Nobuhiro Takasaki

unread,
Jan 7, 2019, 2:05:39 AM1/7/19
to vim/vim, Subscribed

After applying the patch.
(EN DASH) with a red underline.

image

mattn

unread,
Jan 7, 2019, 4:51:12 AM1/7/19
to vim/vim, Subscribed

@mattn commented on this pull request.


In src/mbyte.c:

> @@ -1414,6 +1414,15 @@ utf_uint2cells(UINT32_T c)
     int
 utf_char2cells(int c)
 {
+    /* Sorted list of non-overlapping intervals of EN(single width characters
+     * in typesetting) that are not affected by p_ambw and CJKV. */
+    static struct interval singlewidth[] =
+    {
+	{0x2000, 0x2000},
+	{0x2002, 0x2002},
+	{0x2013, 0x2013}
+    };

I don't understand why you make this list. Just removing the entries from variable ambiguous? Also I noticed \u2013 is defined as A in http://www.unicode.org/Public/11.0.0/ucd/EastAsianWidth.txt

mattn

unread,
Jan 7, 2019, 4:59:07 AM1/7/19
to vim/vim, Subscribed

@mattn commented on this pull request.


In src/mbyte.c:

> @@ -1414,6 +1414,15 @@ utf_uint2cells(UINT32_T c)
     int
 utf_char2cells(int c)
 {
+    /* Sorted list of non-overlapping intervals of EN(single width characters
+     * in typesetting) that are not affected by p_ambw and CJKV. */
+    static struct interval singlewidth[] =
+    {
+	{0x2000, 0x2000},
+	{0x2002, 0x2002},
+	{0x2013, 0x2013}
+    };

\u2000 and \u2002 both not exists in doublewidth/emoji_width/ambiguous.

Nobuhiro Takasaki

unread,
Jan 7, 2019, 5:03:22 AM1/7/19
to vim/vim, Subscribed

@ntak commented on this pull request.


In src/mbyte.c:

> @@ -1414,6 +1414,15 @@ utf_uint2cells(UINT32_T c)
     int
 utf_char2cells(int c)
 {
+    /* Sorted list of non-overlapping intervals of EN(single width characters
+     * in typesetting) that are not affected by p_ambw and CJKV. */
+    static struct interval singlewidth[] =
+    {
+	{0x2000, 0x2000},
+	{0x2002, 0x2002},
+	{0x2013, 0x2013}
+    };

I listed the letters with EN attached. It was a coarse patch. I will review again.

Nobuhiro Takasaki

unread,
Jan 7, 2019, 5:14:15 AM1/7/19
to vim/vim, Subscribed

\u2013 for special processing. Otherwise, it can not be compatible with cmd.exe.
An indication from @mattn, mintty draw 2 cells as ambiguous width.
I will think further.
I thought about now that we can set the cell width of \u2013 in the variable 'enwidth'.
Or special processing by detecting cmd.exe.
Otherwise it will allow destruction of the display.

I feel like I should not do anything...

mattn

unread,
Jan 7, 2019, 5:27:45 AM1/7/19
to vim/vim, Subscribed

I wrote some ideas in this comment: #2074 (comment)

Nobuhiro Takasaki

unread,
Jan 7, 2019, 6:49:03 AM1/7/19
to vim/vim, Subscribed

I think that it is good design to read the text file which defined the range.
Since I write a demonstration sample, I want to advance the topic.
If this is not over, I will not be able to proceed with ConPTY because of the display.

Nobuhiro Takasaki

unread,
Jan 9, 2019, 6:49:19 AM1/9/19
to vim/vim, Subscribed

My act is to make a roof on the roof.
I do other things to do.
I need more thought.

Nobuhiro Takasaki

unread,
Jan 18, 2019, 8:06:50 AM1/18/19
to vim/vim, Push

@ntak pushed 2 commits.


You are receiving this because you are subscribed to this thread.

View it on GitHub

Nobuhiro Takasaki

unread,
Jan 18, 2019, 9:50:15 AM1/18/19
to vim/vim, Subscribed

I started with the detection of cmd.exe.
Cause of the screen being incorrect at the compatible terminal, now there is also a patch for it, this will be divided into different patches.
I keep ConEmu's detection in, so I will be a little slovenly, I will be careful.

Nobuhiro Takasaki

unread,
Jan 22, 2019, 7:43:28 AM1/22/19
to vim/vim, Push

@ntak pushed 1 commit.


You are receiving this because you are subscribed to this thread.

View it on GitHub

Nobuhiro Takasaki

unread,
Jan 22, 2019, 8:00:16 AM1/22/19
to vim/vim, Subscribed

I think that it was able to correspond to the whole query.
It would be cool if this was available from the beginning.

The width of EN DASH is Japanese-specific. The value of this patch itself is questioned.
EN DASH becomes one cell only when cmd.exe is Japanese. There are infinite combinations.
It is useful if you can use queries, think about other uses.

Please ignore this issue.
At a later date, reapply as a patch with another purpose and link.

Christian Brabandt

unread,
Jan 28, 2019, 4:23:33 AM1/28/19
to vim/vim, Subscribed

so closing.

Christian Brabandt

unread,
Jan 28, 2019, 4:23:34 AM1/28/19
to vim/vim, Subscribed

Closed #3775.

mattn

unread,
May 17, 2022, 8:11:30 PM5/17/22
to vim/vim, Subscribed

Ref: 08aac3c


Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/3775/c1129435453@github.com>

Reply all
Reply to author
Forward
0 new messages