When I open a UTF-text file with right-to-left-text (hebrew in this
case) and left-to-right-text (english in this case) in gedit it is
rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
But when I open the same file in gvim the right-to-left text (hebrew) is
showed as left-to-right text (just as the rest of the file, cq. english).
Is there a way to get the same behaviour as in gedit?
I searched for :help already and found things like mlterm, termbidi, set
bomb etc. but I just can't gvim to show the text like gedit.
Thanks in advance,
Adriaan
Moshe
* J.A.J. Pater <jajp...@gmail.com> [02/07/09 08:09]:
> --~--~---------~--~----~------------~-------~--~----~
> You received this message from the "vim_use" maillist.
> For more information, visit http://www.vim.org/maillist.php
> -~----------~----~----~----~------~----~------~--~---
>
No, the whole text is either LTR or RTL.
I have never understood why people put the text in the wrong order in
the file and then change the order when displaying it. The characters
should be in the file in the order they are displayed.
Perhaps there is a filter that change the order for this kind of file.
--
hundred-and-one symptoms of being an internet addict:
49. You never have to deal with busy signals when calling your ISP...because
you never log off.
/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ download, build and distribute -- http://www.A-A-P.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///
If you put the characters into the file in the order they are displayed
regardless of the order they are pronounced, you'll get no end of
trouble when trying to reformat (to wider or narrower width) paragraphs
containing sentences (or book titles etc.) in both directions, or even
text containing separate paragraphs (quotations...) in the opposite
direction. To reformat text with mixed LTR and RTL paragraphs you'll
need to (1) reverse the order of characters in every "wrong-direction"
line; (2) reformat; (3) reverse again. If you have paragraphs in one
direction with at least two consecutive words in the opposite direction,
you'll have to take care of all the possibilities of line breaks coming
and going in the middle of the "wrong-direction" text.
The "right" sequence of letters in a file consists of putting the start
of every word before its end, and the words of every sentence in the
order they are pronounced. Then the reordering happens when displaying,
_after_ deciding where line breaks (if any) have to come. IIUC the worst
headbreaker in that respect lies in the scripts specific to the Indian
subcontinent (not yet supported by Vim), which are LTR on the whole, but
with some vowels written to the left of the consonant which comes before
them.
Vim can display _each window_ as either LTR or RTL but not both, use
":setlocal invrightleft" to toggle. |'rightleft'| Unless you are running
Console Vim in a true-bidi terminal, in which case (IIUC) setting
'termbidi' tells Vim that the terminal, not Vim, is in charge of bidi
display, Arabic shaping, etc.
If I have an English or French paragraph with one word in Hebrew or
Arabic, I'll keep it in 'norightleft' and know that gvim displays the
RTL word "the wrong way". Conversely for Arabic (RTL) text maybe
including some numbers (LTR even with Arabic-Indic digits), where I'll
use 'rightleft' and know that the numbers are displayed in gvim with the
digits reversed. OTOH if I have a file with long sentences in both LTR
and LTR I'll maybe display it in two split-windows, one of them
'rightleft' and the other 'norightleft'. Or else, I'll be busy with only
one language at a time and orient the window accordingly.
What I do to view (or print) a text (or HTML) file with mixed LTR and
LRT text is save it to disk, then display it in my favourite browser.
Thus Hebrew and Arabic appear right-to-left, English etc. appear
left-to-right and mixed-direction paragraphs are handled properly.
Best regards,
Tony.
--
Naeser's Law:
You can make it foolproof, but you can't make it
damnfoolproof.
IIUC that is what bidi mean. The text is in 'correct order' but
displayed as bidi. eg. numbers are ltr in arabic, in order to write
the sentence 'year2009' in arabic on a piece of paper,
(pretending letters in arabic)
y
ey
aey
raey
(jump some space before writing number)
2 raey
20 raey
200 raey
2009raey
--
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
Well, I thought Bram meant that che characters should be in the file in
the order
2
0
0
9
<space>
r
a
e
y
if it's a "LTR file", and in the order
y
e
a
r
<space>
9
0
0
2
if it's a "RTL file" (more probable for Arabic).
The Unicode standard (with which I agree for reasons outlined in my
previous post, and I think you do too), is that they MUST be in the order
y
e
a
r
<space>
2
0
0
9
at least in a Unicode file. (IIUC, some legacy encodings may require one
of the other two).
Best regards,
Tony.
--
"There is no reason for any individual to have a computer in their
home."
-- Ken Olson, President of DEC, World Future Society
Convention, 1977
> Adriaan Pater wrote:
>
>> When I open a UTF-text file with right-to-left-text (hebrew in this
>> case) and left-to-right-text (english in this case) in gedit it is
>> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
>>
>> But when I open the same file in gvim the right-to-left text (hebrew)
>> is showed as left-to-right text (just as the rest of the file, cq.
>> english).
>>
>> Is there a way to get the same behaviour as in gedit?
>>
>> I searched for :help already and found things like mlterm, termbidi,
>> set bomb etc. but I just can't gvim to show the text like gedit.
>
> No, the whole text is either LTR or RTL.
>
> I have never understood why people put the text in the wrong order in
> the file and then change the order when displaying it. The characters
> should be in the file in the order they are displayed.
>
> Perhaps there is a filter that change the order for this kind of file.
They're stored in the file in "logical order", which is the order that
the reader processes them when reading. That means, if he has an English
document with some embedded Hebrew, then when he encounters the first
Hebrew letter, his eyes will skip to the end of the Hebrew phrase (or the
end of the same line if it's a multi-line hebrew phrase), and start
working backwords until he hits the English, at which point his eyes will
skip again across the Hebrew to the English text that follows the Hebrew.
This is "logical order", and it's the order he reads in.
It's also the order that a computer would use if it were:
* lexicographically comparing mixed-language strings
* performing text-to-speech conversion
* rewrapping paragraphs
* it is the order in which text is typed at the keyboard.
See pages 19-20 of the Unicode Standard 5.0 (available online at http://
unicode.org/versions/Unicode5.0.0/ch02.pdf)
Since display is the only part of the system that doesn't operate in
logical order, it's logical to put the conversion into the display
routines, rather than putting it into the file itself where it screws up
every other operation the computer has to perform on it.
--Ken
--
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/
"J.A.J. Pater" <jajp...@gmail.com> wrote:
> When I open a UTF-text file with right-to-left-text (hebrew in this
> case) and left-to-right-text (english in this case) in gedit it is
> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
>
> But when I open the same file in gvim the right-to-left text (hebrew) is
> showed as left-to-right text (just as the rest of the file, cq. english).
>
> Is there a way to get the same behaviour as in gedit?
As a workaround, I use this function to make editing files with mixed
rtl and ltr words easier:
" make right to left editing easier
" replace Mylang and mykeymap
function! Mylang()
setlocal keymap=mykeymap rl delcombine
" use s-tab to switch the direction and keymap
map <buffer> <s-tab> :let &imi=1-&imi<cr>:setlocal invrl<cr>
" the same in insert mode
imap <buffer> <s-tab> <esc><s-tab>a
endfunction
command Mylang call Mylang()
Ali
Although it would be nice to keep h and l going the same direction as in
LTR mode...
> OTOH if I have a file with long sentences in both LTR and LTR I'll
maybe display it in two split-windows, one of them 'rightleft' and the
other 'norightleft'. Or else, I'll be busy with only one language at a
time and orient the window accordingly.
Good idea.
It won't work for editing, because it only works when applying the bidi
algorithm once for each line. I attach the plugin. You should put it in
the same directory as the other urxvt perl plugins (possibly
/usr/lib/urxvt/perl), and add bidi to URxvt.perl-ext-common, or use the
-pe switch (see the man pages for urxvt and urxvtperl for details).
You will need Text::Bidi installed (from CPAN) which, in turn, needs
libfribidi.
There is a resource URxvt.bidiFieldSeparator that can be set to a
sequence of characters, each of which serves as a separator for the bidi
algorithm. This is useful for example to preserve the columns when using
a mail client within urxvt.
Best,
Moshe
Can you give a reference where Unicode specifies this?
> y
> e
> a
> r
> <space>
> 2
> 0
> 0
> 9
>
> at least in a Unicode file. (IIUC, some legacy encodings may require one
> of the other two).
--
hundred-and-one symptoms of being an internet addict:
54. You start tilting your head sideways to smile. :-)
The display is not the only part. Suppose you move your cursor to the
start of a word and type "dw". You expect the word to be deleted.
Since "start of the word" depends on what direction the word is to be
read, the editor needs to understand the meaning of the word to be able
to decide what to do. And it gets worse: What if some of the characters
in the word are LTR and some are RTL? This quickly gets very
complicated.
So Vim uses a simple and reliable method: Display the text either as LTR
or RTL and do the editing assuming all text is to be read that way.
You can open two windows on the same text, one in LTR and one in RTL if
you want to edit mixed text.
It would be really messy to display the text with mixed directions and
then have all edits work one way or perhaps fail with an error. Or
worse: delete the wrong text.
There are actually many more places where it matters: When
concatanating two files with text, "echo -n" in the shell, etc.
That's why i18n is so difficult.
I'm glad Australians don't write upside-down!
--
hundred-and-one symptoms of being an internet addict:
55. You ask your doctor to implant a gig in your brain.
http://unicode.org/reports/tr9/?
Anyway, I don't think storing chars in presentation order is a good
idea. Apart from problems when using the file (other tools expect the
logical order), this does not make editor's task any easier; people
write text in the logical order (not the presentation order). For
instance in your example, although the word looks like RAEY, people
write it as YEAR. So if this is going to work, when inserting the
editor should reverse the order of the characters that appear in parts
which use the characters in an rtl language (this algorithm is explained
in the URL above). So in practice it might be even harder (or at least
as hard).
Ali
Yeah, that's one place, though like most of the "normative" Unicode
texts it is very much "technical" -- the kind which will put non-techies
to sleep before they have a chance to get an idea of what is being
talked about.
This text is about determining how to display Unicode text given the
memory (or disk) representation, but it says near the top that the
representation is "logical" which means that the characters are stored
in memory or on disk in the order they would be pronounced or
handwritten (by someone who knows the language).
Best regards,
Tony.
--
"I don't think they could put him in a mental hospital. On the other
hand, if he were already in, I don't think they'd let him out."
With logical order, the start of a word is the letter which stands
earliest in memory. If you move your cursor to the leading alif of
Allah, stored in memory logically as ALLH (the second alif is usually
not written, or only as a diacritical mark above the second lam), then
do "dw", the word should be deleted until the heh, even though the alif
is displayed rightmost and the heh leftmost. Memory order is what
matters, and with logical storage the first letter is still the first
(though maybe not the leftmost one), not as if you stored Allah as HLLA
in memory.
As for characters needing reordering within a single word, I suppose
that's one of the reasons why Vim doesn't yet support devanagari,
gujarati, and the other Indian-subcontinent scripts of that family.
>
> So Vim uses a simple and reliable method: Display the text either as LTR
> or RTL and do the editing assuming all text is to be read that way.
>
> You can open two windows on the same text, one in LTR and one in RTL if
> you want to edit mixed text.
>
> It would be really messy to display the text with mixed directions and
> then have all edits work one way or perhaps fail with an error. Or
> worse: delete the wrong text.
IIUC it works correctly in Console mode with mlterm (a true-bidi
terminal) though in that case h and l will move the cursor in the
opposite direction when the underlying text is RTL: with my Allah
example, repeatedly hitting l moves from A to L to L to H which is
right-to-left but still logically first-to-last.
>
> There are actually many more places where it matters: When
> concatanating two files with text, "echo -n" in the shell, etc.
> That's why i18n is so difficult.
When concatenating files, assuming there is a paragraph break between
them, logical order gives flawless concatenation in all cases. With
"presentation order", even with a paragraph break you might have to
reverse each line of one of the files if they didn't have the same
direction, and then you would have to somehow know the LTR or RTL
direction of all three files (both inputs and the output) to begin with.
>
> I'm glad Australians don't write upside-down!
>
>
oh, they do, only they aren't conscious of it. ;-) Happily the mailboat
(or plane, or even the email transport) reverses it on the way when
they're writing to us, or we to them.
Best regards,
Tony.
Hm, I think it's one of those things one could get used to in time, no
harder than deleting with d rather than Ctrl-X, pasting with P rather
than Ctrl-V, and copying with y rather than Ctrl-C. I know there is
mswin.vim for the latter three, but IMO it is the result of a misguided
attempt to make Vim more like Notepad. Indeed, in true-bidi terminals
with 'termbidi' on (which should be the Vim default for mlterm) Vim has
no knowledge of character direction, so lllll goes uniformly
first-to-last, and hhhhh last-to-first, even if the movement is a little
jerky when meeting a direction change within a line of text. (Or did I
misunderstand? AFAICT I haven't got mlterm installed)
> Indeed as far as I'm concerned a command like "dw" in RTL mode should
> delete from R to L.
>
> Since gedit seems to be real-bidi I guess GTK+ has the algorithm
> mentioned by Ali sort of implemented.
> Guess this could be used in gvim.
My notion would be that a true-bidi gvim should work exactly like
vim+mlterm with 'termbidi'.
>
> Indeed it will be quite hard to figure out how vim commands should work.
> So maybe a 'real-bidi mode' could use only a subset of vim commands?
>
> Well just my 2 cents.
>
> Adriaan
Best regards,
Tony.
--
Meskimen's Law:
There's never time to do it right, but there's always time to
do it over.
.....................:let &l:imi = !&l:imi..........
no need to clobber the global setting
(for 'rl' you properly used :setlocal)
> " the same in insert mode
> imap<buffer> <s-tab> <esc><s-tab>a
> endfunction
> command Mylang call Mylang()
>
> Ali
Best regards,
Tony.
--
The seven eyes of Ningauble the Wizard floated back to his hood
as he reported to Fafhrd: "I have seen much, yet cannot explain all.
The Gray Mouser is exactly twenty-five feet below the deepest cellar in
the palace of Gilpkerio Kistomerces. Even though twenty-four parts in
twenty-five of him are dead, he is alive.
"Now about Lankhmar. She's been invaded, her walls breached
everywhere and desperate fighting is going on in the streets, by a
fierce host which out-numbers Lankhmar's inhabitants by fifty to one --
and equipped with all modern weapons. Yet you can save the city."
"How?" demanded Fafhrd.
Ningauble shrugged. "You're a hero. You should know."
-- Fritz Leiber, from "The Swords of Lankhmar"
Yep, thanks.
Ali
Hm, I think it's one of those things one could get used to in time, no harder than deleting with d rather than Ctrl-X, pasting with P rather than Ctrl-V, and copying with y rather than Ctrl-C.
so lllll goes uniformly first-to-last, and hhhhh last-to-first, even if the movement is a little jerky when meeting a direction change within a line of text. (Or did I misunderstand? AFAICT I haven't got mlterm installed)
My notion would be that a true-bidi gvim should work exactly like vim+mlterm with 'termbidi'.
Perhaps their left-right placement leads to user misunderstanding, but I
think l should be understood as "next" and h as "previous", so that in a
sentence such as the following, where I've arbitrarily converted all LTR
to lowercase and all RTL to uppercase, but showing it here as it would
be in English text with embedded Hebrew and Arabic:
the name of god is written HVHJ in hebrew and HLLA in arabic.
ll (next-next) skips from the s (last letter) of "is" to the Jod (first
letter) of JHVH, and similarly from the second Heh of JHVH to i of "in",
from d of "and" to Alif of ALLH, from Heh of ALLH to i of "in", all the
while following reading (or handwriting) order, from one letter to the
next, and from the last letter of one word to the word-separaing space
and then the first letter of the next word. Indeed, when hand-copying
such a sentence, you'll write JHVH (Jod-Heh-Vav-Heh) and ALLH
(Alif-Lam-Lam-Heh) from right to left even though it means first
skipping the necessary space (and estimating how large it will have to
be), and similarly you'll skip rightwards from the second Heh of JHVH
and later from the Heh of ALLH, over the just-handwritten RTL word, to
where you'll be writing the next LTR word in the sentence. (And that,
even -or pehaps especially- if you know that the "accepted"
pronunciation for JHVH is in most cases Adonaď, or Elohim when
immediately preceded or followed by Adonaď written as Adonaď.)
>
>> so lllll goes uniformly first-to-last, and hhhhh last-to-first, even if the movement is a little
>> jerky when meeting a direction change within a line of text. (Or did I
>> misunderstand? AFAICT I haven't got mlterm installed)
>>
> Indeed, jerkiness is another reason to have h and l move in only one way.
>> My notion would be that a true-bidi gvim should work exactly like
>> vim+mlterm with 'termbidi'.
>>
> That would be nice! But I think there should be an option to make h and
> l movement unequivocal
> (like it is with gedit+ViGedit).
> And AFAIUI, it is because of these kind of discussions that Bram doesn't
> like the idea of a real-bidi gvim.
I don't kow gedit, but IMHO it is already unequivocal, if we remember
that in Vim the cursor is always "on" a character, never "between"
characters, even in Insert mode where, in gvim, its "25% left-side
vertical bar" shape can make us believe that it is "between" the current
character cell and the one (if any) to its left, or "before" the whole
line if it's on the first character of a line.
As long as true-bidi consoles are a rarity, there is no urgency for a
true-bidi gvim, but I suppose that some years from now, all console
terminals will behave like mlterm, and by then there could be some
demand for a true-bidi gvim. I expect that true-bidi gvim and true-bidi
Console vim will behave the same way, but I suppose that that is still
several years in the future.
>
> Anyway: thanks again!
My pleasure.
>
> Adriaan.
Best regards,
Tony.
--
Probable-Possible, my black hen,
She lays eggs in the Relative When.
She doesn't lay eggs in the Positive Now
Because she's unable to postulate how.
-- Frederick Winsor
by "_hand_-copying" I meant "with pen and paper", not "by typing".
>
> > As long as true-bidi consoles are a rarity, there is no urgency for a
> > true-bidi gvim, but I suppose that some years from now, all console
> > terminals will behave like mlterm, and by then there could be some
> > demand for a true-bidi gvim. I expect that true-bidi gvim and true-bidi
> > Console vim will behave the same way, but I suppose that that is still
> > several years in the future.
>
> Well, I can work with the tips given or (which I think I prefer) mlterm
> and/or ViGedit for so long.
>
> Adriaan.
Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
63. You start using smileys in your snail mail.
I think a bidi vim should have key strokes for both (a set of motion
keystrokes that operate in logical order, and a set of keystrokes that
operate in display order). If we consider gj and gk as the keystrokes
for operating over display lines versus j and k for operating over
logical lines, then the keystrokes for this purpose are actually
pretty obvious.
(gl is free for use for display order navigation, but unfortunately,
gh seems to be taken already.)
> I don't kow gedit, but IMHO it is already unequivocal, if we remember
> that in Vim the cursor is always "on" a character, never "between"
> characters, even in Insert mode where, in gvim, its "25% left-side
> vertical bar" shape can make us believe that it is "between" the current
> character cell and the one (if any) to its left, or "before" the whole
> line if it's on the first character of a line.
That actually helps a lot in figuring out where the cursor will go
next in logical order. When I have an isolated hebrew word in an
English sentence, I always have a hard time in editors where the
cursor is between letters knowing whether the cursor is logically
after or before the Hebrew word, when it is displayed immediately
after the Hebrew word.
> As long as true-bidi consoles are a rarity, there is no urgency for a
> true-bidi gvim, but I suppose that some years from now, all console
> terminals will behave like mlterm, and by then there could be some
> demand for a true-bidi gvim. I expect that true-bidi gvim and true-bidi
> Console vim will behave the same way, but I suppose that that is still
> several years in the future.
Actually, if we were to explore the possiblities for a true-bidi
(g)vim that didn't depend on a bidi terminal, I think we could come up
with a much better and much more intuitive bidi editor than the
existing editors. (For examples of why I think so, see above.)
AFAICT, g<Up> and g<Down> are equivalent to gj and gk. I suppose it
would be possible to define g<Left> and g<Right> via a future patch
implementing true-bidi in gvim; anyone preferring other keys or key
combos could of course remap them.
>
>> I don't kow gedit, but IMHO it is already unequivocal, if we remember
>> that in Vim the cursor is always "on" a character, never "between"
>> characters, even in Insert mode where, in gvim, its "25% left-side
>> vertical bar" shape can make us believe that it is "between" the current
>> character cell and the one (if any) to its left, or "before" the whole
>> line if it's on the first character of a line.
>
> That actually helps a lot in figuring out where the cursor will go
> next in logical order. When I have an isolated hebrew word in an
> English sentence, I always have a hard time in editors where the
> cursor is between letters knowing whether the cursor is logically
> after or before the Hebrew word, when it is displayed immediately
> after the Hebrew word.
>
>> As long as true-bidi consoles are a rarity, there is no urgency for a
>> true-bidi gvim, but I suppose that some years from now, all console
>> terminals will behave like mlterm, and by then there could be some
>> demand for a true-bidi gvim. I expect that true-bidi gvim and true-bidi
>> Console vim will behave the same way, but I suppose that that is still
>> several years in the future.
>
> Actually, if we were to explore the possiblities for a true-bidi
> (g)vim that didn't depend on a bidi terminal, I think we could come up
> with a much better and much more intuitive bidi editor than the
> existing editors. (For examples of why I think so, see above.)
>
> --Ken
>
Quite possibly. The problem, I think, would be to program the true-bidi
capability without introducing bugs in what we already have. I suppose
quite a lot of testing would be necessary before those changes become
part of mainline Vim -- but I hope they eventually will.
Best regards,
Tony.
--
%DCL-MEM-BAD, bad memory
VMS-F-PDGERS, pudding between the ears
Then I recommend creating a new branch in the vim Subversion
repository, or creating a clone of the repository in git or one of the
other distributed version control systems. Implement the true-bidi
version there, and when it's ready then submit a patch, or merge the
results back.
My point wasn't to bikeshed specific key combinations at this point,
rather it was that a highly extensible editor (such as vim or emacs)
would be a good platform for developing better UI concepts for working
with BiDi text.
Unfortunately, none of the real bidi editors out there have
the extensibility and the range of operations allowed by vim,
and it should definitely be a point of development to bring BiDi
support into the powerful editors that people are using today, so that
BiDi editing itself can benefit from such basic ideas as cursor
movement that are implemented much more powerfully in the traditional
UNIX extensible editors (vim, emacs) than they are in regular editors
(gedit, kate, etc).
> Tony Mechelynck <antoine.m...@gmail.com> wrote:
> > On 05/07/09 23:49, Ken Bloom wrote:
> >> Actually, if we were to explore the possiblities for a true-bidi
> >> (g)vim that didn't depend on a bidi terminal, I think we could come up
> >> with a much better and much more intuitive bidi editor than the
> >> existing editors. (For examples of why I think so, see above.)
> >>
> >> --Ken
> >
> > Quite possibly. The problem, I think, would be to program the true-bidi
> > capability without introducing bugs in what we already have. I suppose
> > quite a lot of testing would be necessary before those changes become
> > part of mainline Vim -- but I hope they eventually will.
>
> Then I recommend creating a new branch in the vim Subversion
> repository, or creating a clone of the repository in git or one of the
> other distributed version control systems. Implement the true-bidi
> version there, and when it's ready then submit a patch, or merge the
> results back.
It's probably best done as a separate project. That is easier for
access control. There are a few Vim patches developed that way.
There may be some merging problems, but the code doesn't change
that much, thus these are expected to be minor.
It's hard to estimate how much effort it requires to make Vim work with
bidi support. Best way is to give it a try. Assuming changes such as
"dw" work the same in memory, it's mainly work for the display
updating code and knowing where the cursor actually is (relation between
text column and display column will be very different).
--
hundred-and-one symptoms of being an internet addict:
60. As your car crashes through the guardrail on a mountain road, your first
instinct is to search for the "back" button.