I have uploaded updated spell files to the ftp site. Does the problem still exist?
—
Reply to this email directly or view it on GitHub.![]()
It does. Spell dump still shows all final sigmas as median, and suggestions always have median sigmas as finals. Using the .spl uploaded 01-Sep-2015
The bug still exists...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
I'm not sure if this is really a bug or if it's expected.
Searching for 'ς' matches both 'ς' and 'σ' when ignoring case.
But it does not match when not ignoring case.
'ς' and 'σ' are both Sigma letters with the same uppercase
letter 'Σ'.
Searching for /ς\C will only match 'ς' as expected.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
I would suggest reopening the issue and I believe it is a bug. Even though this two characters are the same letter when we talk about phonetics they express a different thing in a spell checking point of view you can't use σ at the end of a word and you can't use ς at the middle of a word it is just wrong. So an easy fix on dictionaries would be to just apply the following rule : if a word ends with σ replace it with ς and it would fix the problem.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.![]()
@brammool I don't think that's exactly the case, i found this file which is in utf-8 https://github.com/wooorm/dictionaries/blob/main/dictionaries/el/index.dic but even compiling this dic file with :mkspell el ~/index.dic and use that as a spellfile the problem continue to occur , (I run file index.dic and it prints utf-8) maybe it is a mkspell error
—
No you didn't quite get me I am sorry. The characters in this dictionary (.dic file) are correct , but when I transform it from .dic to spellfile they appear with final σ instead of ς. Try the word παίκτης as an example , make a purposeful typo like παικτής or παίκτις. And then try the suggestions , the only correct suggestion should be παίκτης but it is παίκτησ , even worse παίκτησ is not even recognised as a typo.And the .dic file doesn't even contain παίκτησ (as it is not a correct word)
Run from wherever
mkdir ~/example_spell
cd ~/example_spell
wget https://raw.githubusercontent.com/wooorm/dictionaries/main/dictionaries/el/index.dic
wget https://raw.githubusercontent.com/wooorm/dictionaries/main/dictionaries/el/index.aff
vim
in vim command mode type
:mkspell el2 ~/example_spell/index
then exit vim and type :
mkdir ~/.vim/spell
cp ~/example_spell ~/.vim/spell
echo "παίκτης παίκτησ παικτις" >> ~/example_spell/tex
vim ~/example_spell/text
in vim command mode type :
:set spell spelllang=el2
The second word is not highlighted as wrong even though it doesn't exist on the .dic file.Move the cursor to the third word and press z= the correct word doesn't appear as a correction but instead παίκτησ is proposed.
I did this step by step explanation to also double verify that I am doing the whole process correct.
Also you can search the index.dic file to double verify that the word παίκτης exists and the word παίκτησ doesn't
vim index.dic +550061
—
I think the only edge case in case-folding in the greek language is that.So it might worth the time to continue case folding and apply a small hack. The behaviour in the language is as follows : when and only when a non capitalized word ends* with sigma the sigma has to be a final sigma (ς) . The asterisk exists just to make clear when a world ends, an apostrophe doesn't signal the end of a word but a whitespace or a dot does.
For example "σπόρος" starts with sigma so it has to be the normal one but also ends with sigma so it is the final one.
Also an example "σ'αυτο" is two worlds like "doesn't" but σ ' is not the end of the first word so it must not be a final one (that is already correct in vim because dictionaries contain it in general).I am thinking also about the 'gu' command ,it might be the same error.
https://vi.stackexchange.com/questions/5469/autocorrecting-final-sigma
—
I am sorry I am not very good at explaining my thoughts let me try again with stricter development terms I will take this document as a reference https://www.unicode.org/Public/13.0.0/ucd/CaseFolding.txt .This document claims (At line 326 03A3; C; 03C3; # GREEK CAPITAL LETTER SIGMA) that 03A3 always case-folds to 03C3 but that's not the case as there is a very small exception:
03A3 has to case-fold to 03C2 if its the last letter of a word on all other cases it case-folds to 03C3,all the explanation on my previous message is mainly about when a word ends which might be a bit difficult in the context of a real world text document but it is quite easy on a dictionary .dic file as there is one word per line. Furthermore guw vim command on the start of a word seems to do the job correct so we might have to examine some code from there.
I think I managed to fix the problem, I checked with the examples you gave. Please try it out and reopen if you still have a problem.
I can verify , it works perfect , thank you so much for the amazing job!!
I still have the same problem with vim 9 on Debian 12.
Should I find a patched version?
VIM - Vi IMproved 9.0 (2022 Jun 28, compiled May 04 2023 10:24:44)
Included patches: 1-1378, 1499
Modified by team...@tracker.debian.org
Compiled by team...@tracker.debian.org
Huge version with GTK3 GUI. Features included (+) or not (-):
+acl +file_in_path +mouse_urxvt -tag_any_white
+arabic +find_in_path +mouse_xterm +tcl
+autocmd +float +multi_byte +termguicolors
+autochdir +folding +multi_lang +terminal
-autoservername -footer -mzscheme +terminfo
+balloon_eval +fork() +netbeans_intg +termresponse
+balloon_eval_term +gettext +num64 +textobjects
+browse -hangul_input +packages +textprop
++builtin_terms +iconv +path_extra +timers
+byte_offset +insert_expand +perl +title
+channel +ipv6 +persistent_undo +toolbar
+cindent +job +popupwin +user_commands
+clientserver +jumplist +postscript +vartabs
+clipboard +keymap +printer +vertsplit
+cmdline_compl +lambda +profile +vim9script
+cmdline_hist +langmap -python +viminfo
+cmdline_info +libcall +python3 +virtualedit
+comments +linebreak +quickfix +visual
+conceal +lispindent +reltime +visualextra
+cryptv +listcmds +rightleft +vreplace
+cscope +localmap +ruby +wildignore
+cursorbind +lua +scrollbind +wildmenu
+cursorshape +menu +signs +windows
+dialog_con_gui +mksession +smartindent +writebackup
+diff +modify_fname +sodium +X11
+digraphs +mouse +sound -xfontset
+dnd +mouseshape +spell +xim
-ebcdic +mouse_dec +startuptime -xpm
+emacs_tags +mouse_gpm +statusline +xsmp_interact
+eval -mouse_jsbterm -sun_workshop +xterm_clipboard
+ex_extra +mouse_netterm +syntax -xterm_save
+extra_search +mouse_sgr +tag_binary
-farsi -mouse_sysmouse -tag_old_static
system vimrc file: "/etc/vim/vimrc"
user vimrc file: "$HOME/.vimrc"
2nd user vimrc file: "~/.vim/vimrc"
user exrc file: "$HOME/.exrc"
system gvimrc file: "/etc/vim/gvimrc"
user gvimrc file: "$HOME/.gvimrc"
2nd user gvimrc file: "~/.vim/gvimrc"
defaults file: "$VIMRUNTIME/defaults.vim"
system menu file: "$VIMRUNTIME/menu.vim"
fall-back for $VIM: "/usr/share/vim"
Compilation: gcc -c -I. -Iproto -DHAVE_CONFIG_H -DFEAT_GUI_GTK -I/usr/include/gtk-3.0 -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/harfbuzz -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/fribidi -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/x86_64-linux-gnu -I/usr/include/gio-unix-2.0 -I/usr/include/atk-1.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 -I/usr/include/dbus-1.0 -I/usr/lib/x86_64-linux-gnu/dbus-1.0/include -pthread -Wdate-time -g -O2 -ffile-prefix-map=/build/vim-JA6Vy9/vim-9.0.1378=. -fstack-protector-strong -Wformat -Werror=format-security -DSYS_VIMRC_FILE=\"/etc/vim/vimrc\" -DSYS_GVIMRC_FILE=\"/etc/vim/gvimrc\" -D_REENTRANT -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1
Linking: gcc -Wl,-E -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -o vim -lgtk-3 -lgdk-3 -lz -lpangocairo-1.0 -lpango-1.0 -lharfbuzz -latk-1.0 -lcairo-gobject -lcairo -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lSM -lICE -lXt -lX11 -lXdmcp -lSM -lICE -lm -ltinfo -lselinux -lcanberra -lsodium -lacl -lattr -lgpm -L/usr/lib -llua5.2 -Wl,-E -fstack-protector-strong -L/usr/local/lib -L/usr/lib/x86_64-linux-gnu/perl/5.36/CORE -lperl -ldl -lm -lpthread -lcrypt -L/usr/lib/python3.11/config-3.11-x86_64-linux-gnu -lpython3.11 -ldl -lm -L/usr/lib/x86_64-linux-gnu -ltcl8.6 -ldl -lz -lpthread -lm -lruby-3.1 -lm -L/usr/lib
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
you should have it, since it was included as of patch 8.2.2974
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
@thanasisn The issue is fixed, but the spell file for greek language has not yet been updated on the ftp server
You can generate those files manually. Personally, i use this Makefile:
.PHONY: all all: el.utf-8.spl el.utf-8.spl : el_GR.dic el_GR.aff vim --clean --cmd "mkspell! el el_GR" --cmd q el_GR.diff: curl -Lo el_GR.diff https://github.com/vim/vim/raw/master/runtime/spell/el/el_GR.diff vim --clean -c 'e ++enc=iso-8859-7 el_GR.diff' -c '3,47d' -c 'exe "4normal 62\<C-a>"' -c 'exe "5normal 62\<C-a>E62\<C-a>"' -c 'w ++enc=utf-8' -c 'q' el_GR.aff: el_GR.diff curl -Lo el_GR.aff https://github.com/wooorm/dictionaries/raw/main/dictionaries/el/index.aff patch < el_GR.diff el_GR.dic: curl -Lo el_GR.dic https://github.com/wooorm/dictionaries/raw/main/dictionaries/el/index.dic .PHONY: clear clear: rm el_GR.{aff,dic,diff} el.utf-8.{spl,sug} .PHONY: clean clean: rm el.utf-8.{spl,sug}
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
.PHONY: all all: el.utf-8.spl el.utf-8.spl : el_GR.dic el_GR.aff vim --clean --cmd "mkspell! el el_GR" --cmd q el_GR.diff: curl -Lo el_GR.diff https://github.com/vim/vim/raw/master/runtime/spell/el/el_GR.diff vim --clean -c 'e ++enc=iso-8859-7 el_GR.diff' -c '3,47d' -c 'exe "4normal 62\<C-a>"' -c 'exe "5normal 62\<C-a>E62\<C-a>"' -c 'w ++enc=utf-8' -c 'q' el_GR.aff: el_GR.diff curl -Lo el_GR.aff https://github.com/wooorm/dictionaries/raw/main/dictionaries/el/index.aff patch < el_GR.diff el_GR.dic: curl -Lo el_GR.dic https://github.com/wooorm/dictionaries/raw/main/dictionaries/el/index.dic .PHONY: clear clear: rm el_GR.{aff,dic,diff} el.utf-8.{spl,sug} .PHONY: clean clean: rm el.utf-8.{spl,sug}
Thank you very much!
It works fine.
I will keep it in .vim/spell forever!
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Some good three years after this fix, I would like to report a small bug that was introduced (or just not handled)due to this fix.
Greek words are not allowed to have 0x03C2 in the middle of a word, and this is still not detected as a typo.
The commit 4f13527 introduced the if (c == 0x03a3 || c == 0x03c2) but this is catching too much I modified the code compiled and tested with this snippet and it works:
--- a/src/spell.c +++ b/src/spell.c @@ -2851,13 +2851,15 @@ spell_casefold( // Exception: greek capital sigma 0x03A3 folds to 0x03C3, except // when it is the last character in a word, then it folds to // 0x03C2. - if (c == 0x03a3 || c == 0x03c2) + if (c == 0x03a3) { if (p == str + len || !spell_iswordp(p, wp)) c = 0x03c2; else c = 0x03c3; } + else if( c == 0x03c2) + {} else c = SPELL_TOFOLD(c); ``` So I have noticed some more problems with the Greek language (as it is a very idiomatic language with ancient rules) which I am willing to fix, and I think this issue is a good place to discuss this kind of things,I please any admin who can take a look and let me do a pr
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Please go ahead and create a PR for your issue.
+ else if( c == 0x03c2)
+ {}
else
This should probably be rather:
else if (c != 0x03c2)
c = SPELL_TOFOLD(c);
with a comment added why we exclude 0x03c2.
Also when creating a PR, please make sure to add a test for the issue here. So it doesn't regress in the future.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
I will do the pr in some days. Maybe i'll fix some other things as well that are similar in the greek language. Here I created a mega test that will improve Greek spelling a lot if we fix line 2 and 3 as well:
Σχεδόν άσπρο σαν σάπιος ανανάς. Correct . Vim assumtions are successful
ΣΧΕΔΟΝ ΑΣΠΡΟ ΣΑΝ ΣΑΠΙΟΣ ΑΝΑΝΑΣ. Correct . Vim thinks that it is all wrong
ΣΧΕΔΌΝ ΆΣΠΡΟ ΣΑΝ ΣΆΠΙΟΣ ΑΝΑΝΆΣ. All wrong . Vim thinks that is is all correct
σχεδόν άσπρο σαν σάπιοσ ανανάσ. Only word four and five are wrong . Vim assumes correct
ςχεδόν άςπρο ςαν ςάπιος ανανάς. 1-4 are wrong . Vim used to assume 1 3 4 as mistakes but my fix fixes this
Σχεδόν άΣπρο Σαν Σάπιος ανανάΣ. 2 and 5 are wrong . Vim assumes correctly
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
I would like some help.
Inside src/testdir/test_spell.vim, I created my own testing function:
func Test_spell_greek_idioms() new set spell set spelllang=el set encoding=utf-8 let wrong_string_final_sigma = "\u03AC\u03C2\u03C0\u03C1\u03BF\u03C2" let correct_string = "\u03AC\u03C3\u03C0\u03C1\u03BF\u03C2" call setline(1, wrong_string_final_sigma) call feedkeys(']s1z=', 'tx') call assert_equal(correct_string, getline(1)) endfunc
When i try to run my test like so
./../vim -u NONE -U NONE -i NONE -S test_spell.vim
:call Test_spell_greek_idioms()
I get
Warning: Cannot find word list "en.utf-8.spl" or "en.ascii.spl"
Warning: Cannot find word list "el.utf-8.spl" or "el.ascii.spl"
Error detected while processing function Test_spell_greek_idioms:
line 8:
E756: Spell checking is not possible
So how I can properly run my test after the spell dictionaries are made?
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
@SpyrosMourelatos Have you been able to make any progress since?
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Ty for your interest @ap I am currently trying to add a test case but it is failing very weirdly for me when I run make test
def Test_spell_greek_idioms() set spell spellbadword('bycycle')->assert_equal(['bycycle', 'bad']) set spelllang=el spellbadword('άςπρο')->assert_equal(['άςπρο', 'bad']) spellbadword('ςρο')->assert_equal(['ςρο', 'bad']) set spelllang= set nospell bwipe! enddef
Found errors in Test_spell_greek_idioms():
command line..script /Users/spyros/Documents/nonWork/repos/vim/src/testdir/runtest.vim[607]..function RunTheTest[57]..Test_spell_greek_idioms line 4: Expected ['ά?ς?π?ρ?ο?', 'bad'] but got ['', '']
command line..script /Users/spyros/Documents/nonWork/repos/vim/src/testdir/runtest.vim[607]..function RunTheTest[57]..Test_spell_greek_idioms line 6: Expected ['ς?ρ?ο?', 'bad'] but got ['', '']
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Let me first qualify this comment by the fact that I’m speaking with zero idea of what I’m talking about: I haven’t ran any of the code in this issue nor the PR nor anything else because I don’t have time right now. I’m going purely off of what I see in this discussion thread, so this may well be blithering idiocy. But the question marks in the error message are making my spidey senses tingle, so I want to get this out for now in the hope that it’ll be fruitful for you or anyone else while I can’t devote attention to it myself.
With all that out of the way:
Might this be just an encoding issue with the source code? Just to exclude all else: do you still get the same error if you replace 'άςπρο' with this?
join(map([940,962,960,961,959],'nr2char(v:val,1)'),'')
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
For anyone not knowing the Greek alphabet:
Most Greek letters exist in two shapes, one uppercase and one lowercase (if we disregard the diacritics which may be used on vowels only), with one exception, namely the letter pronounced like Latin s and named "sigma". That letter has three shapes:
Their official Unicode codepoints and names, and Vim digraphs are:
The reason these three shapes are so different from one another is to be searched for in the history of the Greek alphabet.
When writing a Greek phrase in all-caps, both non-final sigma and final sigma become capital sigma. The reverse operation, however, has to take care of the fact that the final sigma is used only at end-of-word while the non-final sigma is never used at end-of-word. As said in an earlier post in this thread, an apostrophe marks a break before the end of a word so whichever letter precedes it is not the final letter of a word (even though it is the last written letter of the preceding word), as in σ'αγαπώ (s'agapō) "I love you".
Best regards,
Tony.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()