This patch makes Vim break lines obeying rules for CJK punctuations (like commas should start a line).
This is an old patch. It's not written by me, but by yswzing, who didn't intend to submit it upstream but have agreed me to do so. The original patch was posted on a Chinese forum.
I've been using this patch for years, but I'm not familiar with the code. I hope Vim can include this but I may not be able to answers some questions wrt the code.
https://github.com/vim/vim/pull/3875
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
Merging #3875 into master will decrease coverage by
<.01%.
The diff coverage is71.91%.
@@ Coverage Diff @@ ## master #3875 +/- ## ========================================== - Coverage 78.76% 78.76% -0.01% ========================================== Files 104 104 Lines 141919 142002 +83 ========================================== + Hits 111789 111852 +63 - Misses 30130 30150 +20
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/ops.c | 83.53% <16.66%> (-0.13%) |
⬇️ |
| src/mbyte.c | 65.44% <62.22%> (-0.1%) |
⬇️ |
| src/edit.c | 85.53% <92.1%> (+0.12%) |
⬆️ |
| src/if_xcmdsrv.c | 84.2% <0%> (-0.18%) |
⬇️ |
| src/os_unix.c | 58.81% <0%> (-0.14%) |
⬇️ |
| src/window.c | 83.36% <0%> (-0.04%) |
⬇️ |
| src/gui.c | 58.05% <0%> (ø) |
⬆️ |
| src/message.c | 76.5% <0%> (+0.04%) |
⬆️ |
| src/gui_gtk_x11.c | 48.42% <0%> (+0.14%) |
⬆️ |
| ... and 1 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing data
Powered by Codecov. Last update 26d9821...ba29015. Read the comment docs.
The patch generally looks OK. But the testing is inadequate. It would be good if someone who knows how the rules should work adds more testing.
The dbcs_ functions are either unfinished or pointless, since they always return the same value.
I can add more tests. However I'm not very clear about those dbcs_ functions. They are incomplete and seem to be used when 'encoding' is not utf-8. (I've tested that it works with 'fileencoding' set to gbk even the comments above #define DBCS_ mention 'fileencoding'.)
If so, is it OK for this to work only in UTF-8 mode? There are a lot of issues with UI or plugins when 'encoding' is not utf-8 anyway.
Is this still active? Would be nice to have this one eventually.
So how about dropping the dbcs_ part and adding some better testing?
Hi Bram, I'm done with the updates for this patch. Could you take a look?
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_dev/vim/vim/pull/3875/c531864473%40github.com.
@lilydjwg pushed 9 commits.
—
You are receiving this because you are subscribed to this thread.
Failed test here:
1 FAILED:
Found errors in Test_state():
function RunTheTest[40]..Test_state[32]..WaitForAssert[2]..<SNR>10_WaitForCommon[11]..<lambda>10101 line 1: Pattern 'state: mSc; mode: n' does not match 'state: mc; mode: n\[ occurs 39 times]1,1 All'
SKIPPED Test_timer_peek_and_get_char(): only works in the GUI
# without the +eval feature test_result.log is a copy of test.log
Yes, now seems ready 👍
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
What I don't understand: The ] flag is added to 'formatoptions' but it is not used in tests.
In the code the only check for it checks that it is excluded. That doesn't seem right.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.![]()
I still do not see a test case where the new "]" flag is added to 'formatoptions'.
If I understand it correctly, the new code is used by default and can be disabled by that flag.
Tests are failing...
The code still uses old /* */ comments.
One place has "{" after the if () instead of on the next line.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.![]()
@k-takata commented on this pull request.
C99-style comments might be better?
> @@ -1688,6 +1688,10 @@ B When joining lines, don't insert a space between two multi-byte characters. Overruled by the 'M' flag. 1 Don't break a line after a one-letter word. It's broken before it instead (if possible). +] Respect textwidth rigorously. With this flag set, no line can be + longer than textwidth, unless line-break-prohibition rules make this + impossible. Mainly for multi-byte scripts and work only for UTF-8 + 'encoding'.⬇️ Suggested change
- 'encoding'. + 'encoding'.
Tab should be used here.
In src/mbyte.c:
> @@ -3842,6 +3842,161 @@ utf_head_off(char_u *base, char_u *p)
return (int)(p - q);
}
+/*
+ * whether space is allowed before/after 'c'
+ * return TRUE if not allowed(eat space)
⬇️ Suggested change
- * return TRUE if not allowed(eat space) + * return TRUE if not allowed (eat space)
It seems that Asian people tend to forget the space before the parentheses.
In src/mbyte.c:
> @@ -3842,6 +3842,161 @@ utf_head_off(char_u *base, char_u *p)
return (int)(p - q);
}
+/*
+ * whether space is allowed before/after 'c'
+ * return TRUE if not allowed(eat space)
+ * FALSE otherwise
+ */
+ int
+utf_eat_space(cc)
+ int cc;
+{
+ if ((cc >= 0x2000 && cc <= 0x206F) /* General punctuations */
+ || (cc >= 0x2e00 && cc <= 0x2e7f) /* Supplemental punctuations */
+ || (cc >= 0x3000 && cc <= 0x303f) /* CJK symbols and punctuations */
+ || (cc >= 0xff00 && cc <= 0xffef)) /* Full width ASCII punctuations */
U+FF00 to U+FFEF includes not only punctuations but also full width alphabets and half width katakana.
In src/mbyte.c:
> + {
+ 0x0021, /* ! */
+ 0x0025, /* % */
+ 0x0029, /* ) */
+ 0x002c, /* , */
+ 0x003a, /* : */
+ 0x003b, /* ; */
+ 0x003e, /* > */
+ 0x003f, /* ? */
+ 0x005d, /* ] */
+ 0x007d, /* } */
+ 0x2019, /* ’ right single quotation mark */
+ 0x201d, /* ” right double quotation mark */
+ 0x2020, /* † dagger */
+ 0x2021, /* ‡ double dagger */
+ 0x2026, /* … horizontal ellipis*/
⬇️ Suggested change
- 0x2026, /* … horizontal ellipis*/ + 0x2026, /* … horizontal ellipsis */
In src/mbyte.c:
> + 0x300b, /* 》 right double angle bracket */ + 0x300d, /* 」 right corner bracket */ + 0x300f, /* 』 right white corner bracket */ + 0x3011, /* 】 right black lenticular bracket */ + 0x3015, /* 〕 right tortoise shell bracket */ + 0x3017, /* 〗 right white lenticular bracket */ + 0x3019, /* 〙 right white tortoise shell bracket */ + 0x301b, /* 〛 right white square bracket */ + 0xff01, /* ! fullwidth exclamation mark */ + 0xff09, /* ) fullwidth right parenthesis */ + 0xff0c, /* , fullwidth comma */ + 0xff0e, /* . fullwidth full stop */ + 0xff1a, /* : fullwidth colon */ + 0xff1b, /* ; fullwidth semicolon */ + 0xff1f, /* ? fullwidth question mark */ + 0xff3d, /* ] fullwidth right squre bracket */⬇️ Suggested change
- 0xff3d, /* ] fullwidth right squre bracket */ + 0xff3d, /* ] fullwidth right square bracket */
In src/mbyte.c:
> +}
+
+/*
+ * whether line break is allowed between cc and ncc
+ * return TRUE if allowed
+ * FALSE otherwise
+ */
+ int
+utf_allow_break(cc, ncc)
+ int cc;
+ int ncc;
+{
+ /* don't break between two-letter punctuations */
+ if (cc == ncc
+ && (cc == 0x2014 /* em dash */
+ || cc == 0x2026 /* horizontal ellipis */))
⬇️ Suggested change
- || cc == 0x2026 /* horizontal ellipis */)) + || cc == 0x2026 /* horizontal ellipsis */))
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.![]()
@lilydjwg commented on this pull request.
In src/mbyte.c:
> @@ -3842,6 +3842,161 @@ utf_head_off(char_u *base, char_u *p)
return (int)(p - q);
}
+/*
+ * whether space is allowed before/after 'c'
+ * return TRUE if not allowed(eat space)
+ * FALSE otherwise
+ */
+ int
+utf_eat_space(cc)
+ int cc;
+{
+ if ((cc >= 0x2000 && cc <= 0x206F) /* General punctuations */
+ || (cc >= 0x2e00 && cc <= 0x2e7f) /* Supplemental punctuations */
+ || (cc >= 0x3000 && cc <= 0x303f) /* CJK symbols and punctuations */
+ || (cc >= 0xff00 && cc <= 0xffef)) /* Full width ASCII punctuations */
I've changed to only include punctuations in the range.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.![]()
Thanks for your quick review! (I didn't notice that there were typos...)
I'll include this now and clean it up a bit. However, some of the test cases are failing, I'll add TODO items for these. Please fix!