I can set the text width and can manually line break imported paragraphs with the following as an example.
set textwidth=72
gqq
I can also navigate English text files with the standard 'w' 'b' 'e' '*' commands, etc.
This works well for English, however Thai and other Brahmic scripts of South and South-east Asia space at the phrasal level. Libreoffice, Word, Indesign, TeX, etc. "know" where line breaks should occur. They also "know" where individual words are, even though there are no spaces. I can navigate by Thai word in these programs. And I can even type English, Thai and Lao in the chrome address bar and then use alternate arrow on my mac to navigate at the word level in all three of these languages. It seems that these programs are tapping into work that has already been done at some lower level. If vim could tap into the same work, then someone could edit a multi-language document without having to do anything fancy. 'w' 'dw' (etc.) would just work happily from one word to the next regardless of the language.
Line breaking poses a different challenge as these languages space at the phrasal level so that the trailing space or absence of a trailing space at the end of the line has meaning when breaking and joining lines. For purpose of example, the spaces are similar to an oxford comma and other punctuation and is the difference of whether or not we had Grandma for breakfast. (Let's eat Grandma. vs. Let's eat, Grandma.) One, also, doesn't, want, random, spaces, coming, when, they, are, not, needed.
My question is two fold: 1. How can vim tap into already available libraries in order to recognize words from Indic languages (including and especially Thai) for the purpose of navigation and other vim word level commands. 2. Is it possible to add language awareness for the purpose of line breaking so that vim does not strip/add spaces when breaking/joining lines at words in Thai or other Indic languages.
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thank you Bram. I believe there are C algorithms (someplace) that define what is a Thai syllable. (I am not sure if this is the same as the ICU algorithms or not.) This would allow wrapping and navigation etc. at syllable level. I wouldn't recommend it for line breaking, but it would be less CPU intensive than a dictionary solution. However, for use of '*' and other nifty word level commands, tapping into ICU dictionary algorithms seems necessary. In my naivety, I ask, could we enable a setting that turns on/off ICU dictionary algorithms for Southeast and Southern Asian languages "in one fell swoop"? Or does this need to be hammered out one language at a time?
And?? If ICU dictionary algorithms are supported, would this mean thai spelling would be naturally supported, or would this be a separate step?
Yes! Thank you. It is the ICU algorithms that I am thinking of.
Correct. Depending on the language, expectations vary. Lao has been under supported and has only had ICU algorithms for a couple of years now. Thai, on the other hand is closer to the 99.++%
Not sure I understand the question about \< and \> being on the same column.
Brian
On Saturday, November 14, 2015 at 9:02:24 PM UTC-8, ZyX wrote:
> `gq` behaviour is by a &formatexpr and &formatprg option values and you may use them if you know a program which serves your purposes. `w` and other motions can be remapped, same for `J` (in the last case you may manually choose between `J` (join with spaces) and `gJ` (join without inserting spaces, but also without removing them)). So you can have some minor level of convenience by configuring Vim without patching it. But this does not work for
>
>
> 1. Motions inside “nore” mappings.
> 2. expand('<cword>') and other means of getting word under the cursor (e.g. :edit <cword>).
> 3. Behaviour when &linebreak option is set.
> 4. `\<`/`\>`. Though I am unsure that this should be fixed: I always parsed this as “place between non-word and word character” and “place between word and non-word character”, and not “place where word starts” and “place where word ends”. Documentation says about the second interpretation, but I have a strong impression (based on wording, actual implementation and the fact that this is my interpretation) that author meant the first variant.
>
>
Thank you ZyX,
Learning the difference between J and gJ is very helpful
Regarding remapping of w, I do not know how to remap it such that it would move to the next Thai word.
4. Perhaps an example will clarify the technical description of what I mean since I am not sure of the difference between the two examples that you give. :) If I type '*' while sitting on a Thai word, I would expect it to go the next matching word and not try to match the entire unspaced-phrase. 'diw' should not delete the entire phrase, but only the Thai word that I am sitting on. etc.
Brian
> --
>
> You received this message from the "vim_dev" maillist.
>
> Do not top-post! Type your reply below the text you are replying to.
>
> For more information, visit http://www.vim.org/maillist.php
>
>
>
> ---
>
> You received this message because you are subscribed to the Google Groups "vim_dev" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+u...@googlegroups.com.
--