Understanding Word Boundaries

Paul Drummond

unread,

Jun 16, 2010, 6:44:41 AM6/16/10

to help-gn...@gnu.org

I have been an Emacs users for a few years now so definitely still a newbie! While initially I struggled to control its power, I eventually came round. Every issue I've had so far I've been able to fix by a quick search in EmacsWiki, except for one frustrating and re-occurring problem that has plagued me for years - word boundaries.

Before Emacs I used Vim exclusively and the word boundary behaviour in Vim *just worked* - I didn't even have to think about it. No matter what language I used I could navigate and manipulate words without thinking about it. The way word boundaries work in Vim is elegant and I have spent a lot of time trying to find some elisp to replicate the behaviour in Emacs but to no avail.

I could write some elisp myself but I am still very new to it so it will take a while - it's something I would like to do but I don't have time at the moment. Regardless, an elisp solution to the problem is not the point of this post. I want to understand why word boundaries behave the way they do in Vanilla Emacs and I would greatly appropriate some views on this from some Emacs Gurus!

Every time I notice the word boundary behaviour when hacking in Emacs I wonder to myself - "I must be missing something here. Surely, experienced Emacs users don't just *put up* with this! Yet every forum response, blog post, mailing-list post I have read suggests they do. This is atypical of the Emacs community in my experience. Usually when something behaves wrong in Emacs, it's easy to find some elisp that just fixes the problem full stop. Yet with word-boundaries all I can find is suggestions that fix a particular gripe but nothing that provides a general solution.

I have loads of examples but I will mentioned just a few here to hopefully kick-start further discussion.

** Example 1

I use org-mode for my journal and today I hit the word-boundary problem while entering my morning journal entry - here's a contrived example of what I entered:

** [10:27] Understanding Word Boundaries in Emacs
                                   ^
With point at the end of the word "Understanding" I hit C-w (which I bind to backward-kill-word) and the word "Understanding" is killed as expected. But when I hit C-w again, the point kills to the colon. Why? Why is colon a word-boundary but the closing square bracket isn't?

** Example 2

When editing C++ files I often need to delete the "ClassName::" part when declaring functions in the header:

void ClassName::function();
       ^

With point at the start of ClassName I want to press M-d twice to delete ClassName and :: but "::" isn't recognised as a word. In Vim I just type "dw" twice and it *just works*.

** Example 3

I have loads of problems when deleting and navigating words over multiple lines. In the following C++ code for instance:

    Page *page = new _Page(this);
    page.load();
           ^

When point is after "page", before the dot on the second line and I hit M-b (backward-word) point ends up at the first opening bracket of "Page(" !!!

Again, vim does the right thing here - pressing 'b' takes the point to the closing bracket of Page(this) so it doesn't recognise the semi-colon as a bracket which is intuitive and what I would expect. This is really the point I am trying to make. I have never taken the time to understand the behaviour of word boundaries in Vim because *it just works*. In Emacs I am forced to think about word boundaries because Emacs keeps surprising me with its weird behaviour!

Note: My examples happen to be C++ but I use lots of other languages too including elisp, Clojure, JavaScript, Python and Java and the word-boundaries seem to be wrong for all of them.

I have tried several different elisp solutions but each one has at least one feature that isn't quite right. Here are some links I kept, I've tried many other solutions but don't have the links to hand:

http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs
http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365

So to wrap up, the point of this post is to kick-start a discussion about why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 in my case) seem to be so awkward and unintuitive.

Regards,
Paul Drummond

Karan Bathla

unread,

Jun 16, 2010, 4:07:01 PM6/16/10

to help-gn...@gnu.org, Paul Drummond

I don't know about the word boundary thing in vim and elisp code for that but the behaviour of backward-kill-word is simple : kill the last word; where a word is something alphanumeric. Any non alphanumeric characters like : and ( are deleted automatically if between point and last word. There is no concept here of : or ( being word boundaries.

So if you do M-d on ":67a" whole thing gets deleted and in "67a:", : remains (with point at beginning of string).

--- On Wed, 6/16/10, Paul Drummond <paul.d...@iode.co.uk> wrote:

Stefan Monnier

unread,

Jun 16, 2010, 10:20:12 PM6/16/10

to

> I have been an Emacs users for a few years now so definitely still a
> newbie! While initially I struggled to control its power, I eventually came
> round. Every issue I've had so far I've been able to fix by a quick search
> in EmacsWiki, except for one frustrating and re-occurring problem that has
> plagued me for years - word boundaries.

Emacs doesn't so much care about word-boundaries as about words.
So when you forward-word, it just skip until the end of the next word,
where "abc" is a word, but ";-( )" is not.
So in many cases, it ends up doing in one step what VI would do in to:
first skip over the non-word chars, and then skip the next few
word-chars, whereas VI would stop after the run of non-word chars and
stop again after the subsequent run of word chars.

I don't think there a very good reason for doing it like Emacs vs doing
it like VI. Each one has its advantages. VI's approach stops more
often, so there's less chance that it'll skip the position in which
you're interested, which is why you like it. In Emacs's approach OTOH
you'll often get away with fewer operations.

Stefan

Uday S Reddy

unread,

Jun 17, 2010, 6:43:16 AM6/17/10

to

On 6/16/2010 11:44 AM, Paul Drummond wrote:

> Again, vim does the right thing here - pressing 'b' takes the point to
> the closing bracket of Page(this) so it doesn't recognise the semi-colon
> as a bracket which is intuitive and what I would expect. This is really
> the point I am trying to make. I have never taken the time to
> understand the behaviour of word boundaries in Vim because *it just
> works*. In Emacs I am forced to think about word boundaries because
> Emacs keeps surprising me with its weird behaviour!

I never thought about this issue actively. I do have a vague recollection of
facing it when I first moved back from vi to Emacs.

Separating words and word boundaries feels more semantic and less mechanical.
And it seems that you can get more done with the same key binding than we
currently can. Seems like a good idea to implement it:

forward-word-or-boundary, kill-word-or-boundary, ...

My example would be, say "apples, oranges and peaches". Now think of deleting
"apples, ".

Cheers,
Uday

Deniz Dogan

unread,

Jun 17, 2010, 9:37:56 AM6/17/10

to Karan Bathla, help-gn...@gnu.org

2010/6/16 Karan Bathla <karan...@yahoo.com>

>
> I don't know about the word boundary thing in vim and elisp code for that but the behaviour of backward-kill-word is simple : kill the last word; where a word is something alphanumeric. Any non alphanumeric characters like : and ( are deleted automatically if between point and last word. There is no concept here of : or ( being word boundaries.
>
> So if you do M-d on ":67a" whole thing gets deleted and in "67a:", : remains (with point at beginning of string).
>

To be more specific, I think it depends on what the syntax table of
the active mode looks like. You can make your own syntax table to
change the behavior of "word commands" to some extent.

--
Deniz Dogan

Elena

unread,

Jun 17, 2010, 4:16:52 PM6/17/10

to

You may be interested in Emacs' standard library "thingatpt" and
related contributed libraries: http://www.emacswiki.org/emacs/ThingAtPoint

Xah Lee

unread,

Jun 18, 2010, 1:30:32 AM6/18/10

to

> http://stackoverflow.com/questions/2078855/about-the-forward-and-back...http://stackoverflow.com/questions/1771102/changing-emacs-forward-wor...

>
> So to wrap up, the point of this post is to kick-start a discussion about
> why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1
> in my case) seem to be so awkward and unintuitive.
>
> Regards,
> Paul Drummond

Good point.

I remember i felt something similar some 5 or 7 years ago and was
annoyed. But now i can't remember any detail... i just got used to
emacs and can't say i find it being problem at all.

actually, i think point is a valid one and a bit technically involved
in detail.

i'll have to study this in detail some other day but here's some
points.

For testing, save a file with this line as content:
something in the water does not compute

Now, you can try the word movement in different editors.

I tested this on Notepad, Notepad++, vim, emacs, Mac's TextEdit.

In short, different text editors all have a bit different behavior.
Here, Notepad, Notepad++, vim have the same behavior, while emacs and
TextEdit have similar behavior.

In Notepad, Notepad++, vim, the cursor always ends at the beginning of
each word.

In emacs and TextEdit, they end in the beginning of the word if you
are using backward-word, but ends at the end of the word if you are
using forward-word.

That's the first major difference.

--------------------------------------------------
Now, try this line:

something !! in @@ the ## water $$ does %% not ^^ compute

Now, vim and Notepad++ 's behavior are identical. Their behavior is
pretty simple and like before. They simply put the cursor at the
beginning of each string sequence, doesn't matter what the characters
are. Notepad is similar, except that it moves into between %%.

emacs and TextEdit behaved similarly.
Emacs will skip the symbol clusters entirely, except %%. (this depends
on what mode you are in)
TextEdit will also stop in middle of $$ and ^^, otherwise skip the
other symbols clusters entirely.

So, from this, it is clear that different editors has different
concepts of syntax group, or not such concept at all.

I understand well the emacs case. Emacs has a syntax table concept,
that groups certain chars into a classes of “whitespace”, “word”,
“symbol”, “punctuation”, ...etc. When you use backward-word, it simply
move untill it reaches a char that's not in the “word” group. So,
depending on which mode you are in, it'll either skip a character
sequence of identical chars entirely, or stop at their boundary. And
if the char sequence is of different symbols such as !@#$%&*() then
emacs may go into middle of them.

The question is whether other editors has syntax group notion, or that
their word movement behavior depends on the language mode at all.

--------------------------------------------------

Now, the interesting question is which model is more efficient for
general everyday coding of different languages.

First question is: is it more efficient in general for forward/
backward word motions to always land in front of the word as in vim,
Notepad, Notepad++ ?

Certainly i think it is more intuitive that way. But otherwise i don'
tknow. I'll have to do research on this some day.

The second question is whether it is good to have the movement
dependant on the language mode. Again i don't know.

Though, i do find emacs syntax table annoying from my experience of
working with it a bit in the past few years... from the little i know,
i felt that it doesn't do much, its power to model syntax is quite
weak, and very complicated to use... but i don't know for sure.

Btw, one of your example, this one:

Page *page = new _Page(this);
page.load();

i cannot duplicate.

Xah
∑ http://xahlee.org/

☄

Xah Lee

unread,

Jun 18, 2010, 3:06:30 AM6/18/10

to

doesdid some more study on this.

wrote up a cleaned up version here:
http://xahlee.blogspot.com/2010/06/text-editors-cursor-movement-behavior.html

here's a excerpt of the question:

-------------------------
Now, create a file of this content for more test.

something in the water does not compute

something !! in @@ the ## water $$ does %% not ^^ compute

something!!in@@the##water$$does%%not^^compute
(defun insert-p-tag () "Insert at cursor point."
(interactive) (insert "") (backward-char 4))
for (my $i = 0; $i < 9; $i++) { print "done!";}
<a>a b c d e</a>

Answer this:

* Does the positions the cursor stop depends on whether you are
moving left or right?
* Does the word motion behavior change depending on what language
mode you are in?
* What is your editor? on what OS?

Thanks.

Xah

Uday S Reddy

unread,

Jun 18, 2010, 3:24:28 AM6/18/10

to

On 6/17/2010 3:20 AM, Stefan Monnier wrote:

>
> Emacs doesn't so much care about word-boundaries as about words.
> So when you forward-word, it just skip until the end of the next word,
> where "abc" is a word, but ";-( )" is not.

> So in many cases, it ends up doing in one step what VI would do in [two]:

> first skip over the non-word chars, and then skip the next few
> word-chars, whereas VI would stop after the run of non-word chars and
> stop again after the subsequent run of word chars.

Indeed, reducing two down to one is an advantage.

But if I have "abs;-()" and I want to delete the whole jing bang, Emacs loses
big time!

Cheers,
Uday

Message has been deleted

jpkotta

unread,

Jun 24, 2010, 10:43:56 AM6/24/10

to

On Jun 23, 4:02 am, Gary <help-gnu-em...@garydjones.name> wrote:

> Paul Drummond writes:
> > ** Example 2
>
> > When editing C++ files I often need to delete the "ClassName::" part when
> > declaring functions in the header:
>
> > void ClassName::function();
> > ^
>
> > With point at the start of ClassName I want to press M-d twice to delete
> > ClassName and :: but "::" isn't recognised as a word. In Vim I just
>

> Twice? Three times, shirley? Class and Name are both words...
>
> (As you might guess, my pet peeve about the word boundary recognition is
> when programming using camelcase.)
>

Try c-subword-mode for CamelCase. There is also capitalized-words-
mode, but I've never tried it.

andreas...@easy-emacs.de

unread,

Jun 25, 2010, 6:33:53 AM6/25/10

to help-gn...@gnu.org

> ^

> With point at the end of the word "Understanding" I hit C-w (which I bind to
> backward-kill-word) and the word "Understanding" is killed as expected. But
> when I hit C-w again, the point kills to the colon. Why? Why is colon a
> word-boundary but the closing square bracket isn't?
>

> ** Example 2
>
> When editing C++ files I often need to delete the "ClassName::" part when
> declaring functions in the header:
>
> void ClassName::function();
> ^
>
> With point at the start of ClassName I want to press M-d twice to delete

> ClassName and :: but "::" isn't recognised as a word. In Vim I just type
> "dw" twice and it *just works*.
>

Hi,

seems not a question of word-boundaries, but a feature:

as you describe, Vim says: when word-chars are under cursor, kill them.
When non-word chars are there, kill until next word.

Interesting.

> ** Example 3
>
> I have loads of problems when deleting and navigating words over multiple
> lines. In the following C++ code for instance:
>
> Page *page = new _Page(this);
> page.load();
> ^
>
> When point is after "page", before the dot on the second line and I hit M-b
> (backward-word) point ends up at the first opening bracket of "Page(" !!!
>
> Again, vim does the right thing here - pressing 'b' takes the point to the
> closing bracket of Page(this) so it doesn't recognise the semi-colon as a
> bracket which is intuitive and what I would expect. This is really the
> point I am trying to make. I have never taken the time to understand the
> behaviour of word boundaries in Vim because *it just works*. In Emacs I am
> forced to think about word boundaries because Emacs keeps surprising me with
> its weird behaviour!

Forward-moves stop after the object, backward-moves before.
When a mode defines '()' as word-characters, M-x backward-word will stop
at the semi-colon at your example.

Andreas

>
> Note: My examples happen to be C++ but I use lots of other languages too
> including elisp, Clojure, JavaScript, Python and Java and the
> word-boundaries seem to be wrong for all of them.
>
> I have tried several different elisp solutions but each one has at least one
> feature that isn't quite right. Here are some links I kept, I've tried many
> other solutions but don't have the links to hand:
>

> http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs
> http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365

Paul Drummond

unread,

Jun 26, 2010, 6:46:56 AM6/26/10

to Gary, help-gn...@gnu.org

On 23 June 2010 10:02, Gary <help-gn...@garydjones.name> wrote:

Paul Drummond writes:
> ** Example 2
>
> When editing C++ files I often need to delete the "ClassName::" part when
> declaring functions in the header:
>
> void ClassName::function();
> ^
>
> With point at the start of ClassName I want to press M-d twice to delete
> ClassName and :: but "::" isn't recognised as a word. In Vim I just

Twice? Three times, shirley? Class and Name are both words...

Yeah, I agree about CamelCase but I wanted to keep the example simple ;)

Because it needs to be defined somewhat differently for natural
languages and different programming languages, at a guess. What a word
is depends entirely on the context you (and I) decide, and they may well
be different (see two versus three key presses above).

But each context I use has a major mode and I would expect each major mode to have sensible default word boundaries but they don't.

Paul Drummond.

Paul Drummond

unread,

Jun 26, 2010, 6:53:08 AM6/26/10

to help-gn...@gnu.org

Thanks for the responses guys.

I think the point I am trying to make here is that it's a *big* task to fix word boundaries for every case (every word-related key binding multiplied by each language/major mode I use!).

I presume that Emacs hackers either a) put up with it or b) spend a lot of time fixing each case until they are happy.

I suspect the answer is b. ;-)

I wish there was a single minor-mode that fixes all the word boundary issues for every major-mode I use! I can but dream. Or maybe I will get round to doing it myself one day! ;)

Cheers,
Paul Drummond

Thien-Thi Nguyen

unread,

Jun 26, 2010, 7:22:08 AM6/26/10

to Paul Drummond, help-gn...@gnu.org

() Paul Drummond <paul.d...@iode.co.uk>
() Sat, 26 Jun 2010 11:53:08 +0100

I suspect the answer is b. ;-)

There is another answer: (c) looking at sexps instead of words.

thi

ken

unread,

Jun 26, 2010, 7:49:05 PM6/26/10

to Paul Drummond, help-gn...@gnu.org

On 06/26/2010 06:53 AM Paul Drummond wrote:
> Thanks for the responses guys.
>
> I think the point I am trying to make here is that it's a *big* task to
> fix word boundaries for every case (every word-related key binding
> multiplied by each language/major mode I use!).
>
> I presume that Emacs hackers either a) put up with it or b) spend a lot
> of time fixing each case until they are happy.
>

> I suspect the answer is b. ;-)
>

> I wish there was a single minor-mode that fixes all the word boundary
> issues for every major-mode I use! I can but dream. Or maybe I will
> get round to doing it myself one day! ;)
>
> Cheers,
> Paul Drummond

Is it possible to specify word boundaries for a particular mode?

--
Find research and analysis on US healthcare, health insurance,
and health policy at: <http://healthpolicydaily.blogspot.com/>

Deniz Dogan

unread,

Jun 26, 2010, 11:05:42 PM6/26/10

to geb...@mousecar.com, help-gn...@gnu.org

2010/6/27 ken <geb...@mousecar.com>:

>
> On 06/26/2010 06:53 AM Paul Drummond wrote:
>> Thanks for the responses guys.
>>
>> I think the point I am trying to make here is that it's a *big* task to
>> fix word boundaries for every case (every word-related key binding
>> multiplied by each language/major mode I use!).
>>
>> I presume that Emacs hackers either a) put up with it or b) spend a lot
>> of time fixing each case until they are happy.
>>
>> I suspect the answer is b. ;-)
>>
>> I wish there was a single minor-mode that fixes all the word boundary
>> issues for every major-mode I use! I can but dream. Or maybe I will
>> get round to doing it myself one day! ;)
>>
>> Cheers,
>> Paul Drummond
>
>
> Is it possible to specify word boundaries for a particular mode?
>

Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.

Regarding camel case word jumping, see subword-mode (previously known
as c-subword-mode) which is part of Emacs.

--
Deniz Dogan

Xah Lee

unread,

Jun 27, 2010, 10:58:10 AM6/27/10

to

Heres the answer again in case you missed it.

• Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++)
http://xahlee.org/emacs/text_editor_cursor_behavior.html

plain text version follows.
-------------------------------------
Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++)

Xah Lee, 2010-06-17

This article discusses some differences of cursor movement behavior
among editors. That is, when you press “Ctrl+→”, on a line of
programing language code with lots of different sequence of symbols,
where exactly does the cursor stop at?

--------------------------------------------------
Always End at Beginning of Word?

Type the following in your favorite text editor.

something in the water does not compute

Now, you can try the word movement in different editors.

I tested this on Notepad, Notepad++, vim, emacs, Mac's TextEdit.

In Notepad, Notepad++, vim, the cursor always ends at the beginning of
each word.

In emacs, TextEdit, Xcode, they end in the beginning of the word if
you are moving backward, but ends at the end of the word if you are
moving forward.

That's the first major difference.

--------------------------------------------------
Does Movement Depends on the Language Mode?

Now, try this line:

something !! in @@ the ## water $$ does %% not ^^ compute

Now, vim and Notepad++ 's behavior are identical. Their behavior is

pretty simple and like before. They simply put the cursor at the
beginning of each string sequence, doesn't matter what the characters

are. Notepad is similar, except that it will move into between %%.

Emacs, TextEdit behaved similarly. Emacs will skip the symbol
clusters !!, @@, ##, ^^ entirely, while stopping at boundaries of $$
and %%. (when emacs is in text-mode) TextEdit will stop in middle of $
$ and ^^, but skip the other symbol clusters entirely.

I don't know about other editors, but i understand the behavior of
emacs well. Emacs has a syntax table concept. Each and every character
is classified into one of “whitespace”, “word”, “symbol”,
“punctuation”, and others. When you use backward-word, it simply move

untill it reaches a char that's not in the “word” group.

Each major mode's value of syntax table are usually different. So,

depending on which mode you are in, it'll either skip a character
sequence of identical chars entirely, or stop at their boundary.

(info "(elisp) Syntax Tables")

The question is whether other editor's word movement behavior changes
depending on the what language mode it is currently in. And if so, how
the behavior changes? do they use a concept similar to emacs's syntax
table?

In Notepad++, cursor word-motion behavior does not change with respect
to what language mode you are in. Some 5 min test shows nor for vim.

--------------------------------------------------
More Test

Now, create a file of this content for more test.

something in the water does not compute
something !! in @@ the ## water $$ does %% not ^^ compute
something!!in@@the##water$$does%%not^^compute
(defun insert-p-tag () "Insert at cursor point."
(interactive) (insert "") (backward-char 4))
for (my $i = 0; $i < 9; $i++) { print "done!";}
<a>a b c d e</a>

Answer this:

* Does the positions the cursor stop depends on whether you are
moving left or right?
* Does the word motion behavior change depending on what language
mode you are in?
* What is your editor? on what OS?

--------------------------------------------------
Which is More Efficient?

Now, the interesting question is which model is more efficient for
general everyday coding of different languages.

First question is: is it more efficient in general for left/right word
motions to always land in the left boundary the word as in vim,
Notepad, Notepad++ ?

Certainly i think it is more intuitive that way. But otherwise i don't
know.

The second question is: whether it is good to have the movement change
depending on the language mode.

I don't know. But again it seems more intuitive that way, because
users have good expectation where the cursor will stop regardless what
language he's coding. Though, of course it MAY be less efficient,
because logically one'd think that it might be better to have word
motion behavior adopt to different language. But am not sure about
this in real world situations.

Though, i do find emacs syntax table annoying from my experience of
working with it a bit in the past few years... from the little i know,
i felt that it doesn't do much, its power to model syntax is quite
weak, and very complicated to use... but i don't know for sure.

This article is inspired from Paul Drummond question in gnu.emacs.help

--------------------------------------------------
2010-06-18

On 2010-06-17, Elena <egarr...@gmail.com> wrote:

is there some elisp code to move by tokens when a programming mode
is
active? For instance, in the following C code:

double value = f ();

the point - represented by | - would move like this:

cc-mode has functions c-forward-token-1 and c-forward-token-2. (thanks
to Andreas Politz)

It is easy to write a elisp code to do what you want, though, might be
tedious depending on what you mean by token, and whether you really
want the cursor to move by token. (might be too many stops)

Here's a function i wrote and have been using it for a couple of
years. You can mod it to get what u want. Basically that's the idea.
But depending what you mean by token, might be tedious to get it
right.

(defun forward-block ()
"Move cursor forward to next occurrence of double newline char.
In most major modes, this is the same as `forward-paragraph', however,
this function behaves the same in any mode.
forward-paragraph is mode dependent, because it depends on
syntax table that has different meaning for “paragraph” depending on
mode."
(interactive)
(skip-chars-forward "\n")
(when (not (search-forward-regexp "\n[[:blank:]]*\n" nil t))
(goto-char (point-max)) ) )

(defun backward-block ()
"Move cursor backward to previous occurrence of double newline char.
See: `forward-block'"
(interactive)
(skip-chars-backward "\n")
(when (not (search-backward-regexp "\n[[:blank:]]*\n" nil t))
(goto-char (point-min))
)
)

actually, you can just mod it so that it always just skip syntax
classes that's white space... but then if you have 1+1+8 that'll skip
the whole thing...

Xah
∑ http://xahlee.org/

☄

Xah Lee

unread,

Jun 27, 2010, 11:02:47 AM6/27/10

to

On Jun 26, 8:05 pm, Deniz Dogan <deniz.a.m.do...@gmail.com> wrote:

> Regarding camel case word jumping, see subword-mode (previously known
> as c-subword-mode) which is part of Emacs.

Thanks for the info on subword-mode!

great discovery. Few years ago i searched the web and found one or two
camelCase mode, i installed it and it works, but now a bundled package
is much better!

thanks.

Xah

Samuel Wales

unread,

Dec 10, 2012, 9:11:01 PM12/10/12

to Paul Drummond, help-gn...@gnu.org

On 6/26/10, Paul Drummond <paul.d...@iode.co.uk> wrote:
> I wish there was a single minor-mode that fixes all the word boundary issues
> for every major-mode I use! I can but dream. Or maybe I will get round to
> doing it myself one day! ;)

I have been using Emacs for decades, but I have not gotten used to its
navigation, killing, and marking boundary assumptions yet.

I'm always fixing up whitespace, going back and deleting less so as
not to delete punctuation, wanting the whole word or only part of it,
etc. I think Emacs does the wrong thing somewhat more than it does
the right thing in this case. Or maybe that is because it is more
noticeable when it does the wrong thing.

I keep thinking I should have gotten used to it by now. :)

Given the great libraries out there for other things (e.g. scrolling),
you'd think there might be a customizable library for different
preferences for all syntax levels, perhaps based on thingatpt.

Did you find anything, Paul?

Samuel

--
The Kafka Pandemic: http://thekafkapandemic.blogspot.com

The disease DOES progress. MANY people have died from it. ANYBODY
can get it. There is no hope without action.

ken

unread,

Dec 11, 2012, 6:18:59 AM12/11/12

to Deniz Dogan, help-gn...@gnu.org

On 06/26/2010 11:05 PM Deniz Dogan wrote:
> 2010/6/27 ken<geb...@mousecar.com>:
>>

>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>> Thanks for the responses guys.
>>>
>>> I think the point I am trying to make here is that it's a *big* task to
>>> fix word boundaries for every case (every word-related key binding
>>> multiplied by each language/major mode I use!).
>>>
>>> I presume that Emacs hackers either a) put up with it or b) spend a lot
>>> of time fixing each case until they are happy.
>>>
>>> I suspect the answer is b. ;-)
>>>

>>> I wish there was a single minor-mode that fixes all the word boundary
>>> issues for every major-mode I use! I can but dream. Or maybe I will
>>> get round to doing it myself one day! ;)
>>>

>>> Cheers,
>>> Paul Drummond
>>
>>
>> Is it possible to specify word boundaries for a particular mode?
>>
>
> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.

Thanks for the pointer to that function.

The behavior I see in need of repair is the role of so-called "comments"
in sentence syntax.</tag> For instance, immediately before this
sentence are two spaces... which should signify the end of the previous
sentence. But functions like "forward-sentence" and "fill-paragraph"
and "backward-sentence" don't recognize it.

Said another way, the "</tag>" string obscures the relationship between
the period before it and the two spaces after it and so fails to see
that one sentence ends and another starts. This occurs in text-mode and
seems to be inherited by other modes.

If I'm reading "modify-syntax-entry" correctly, the default meanings of
'<' and '>' are, respectively, beginning and end of comment, so
modifying them wouldn't fix this problem. Or can this be remedied by a
change in the syntax table? Or is this a bug?

Eric Abrahamsen

unread,

Dec 11, 2012, 7:03:55 AM12/11/12

to help-gn...@gnu.org

For this particular case, I think you can modify the value of the
`sentence-end' variable (which is returned by the `sentence-end'
function? The whole thing is a little confusing). You'd probably be best
off starting with the docstring for the sentence-end function, and
working back from there.

I think the `sentence-end' variable is automatically buffer-local, which
means if you change it in a mode-hook it ought to work the way you want.
I agree that the whole syntax thing feels like a very well-polished
hack.

E

ken

unread,

Dec 11, 2012, 10:17:36 AM12/11/12

to Eric Abrahamsen, GNU Emacs List

On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
> ken<geb...@mousecar.com> writes:
>
>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>> 2010/6/27 ken<geb...@mousecar.com>:
>>>>
>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>> Thanks for the responses guys.
>>>>>

>>>>> ....

Eric,

Yes, that would be the variable to adjust. I took a hard look at it and
discussed it (I believe) on this list years ago, but never came up with
a fix. As I see it, there are two problems:

First, "one" of the items in that RE would need to be "zero or more
consecutive instances of '<' followed by any number of other characters
up until the next '>' is found." E.g., the RE would need to be able to
find the end of this sentence.)</q></div> Though
I've used REs successfully in quite a few instances and so with a small
bit of help could probably figure that part out, there's a second issue.

My considered opinion is that in the above and similar examples, the end
of the sentence is immediately after the period ('.')... or question
mark, exclamation mark, etc. and not after the </div>. That is where
the point should go when forward-sentence is executed. This means that
no RE would work because, once it finds the RE-defined sentence-end, it
then needs to go backwards within the found string until it encounters
[.!?]+ and then search forward again to the first character after. IOW,
unless I'm missing some capability of REs, "sentence-end" needs to be a
function rather than an RE and would be a different function than one
which finds the beginning of a sentence.

Eric Abrahamsen

unread,

Dec 12, 2012, 2:02:03 AM12/12/12

to help-gn...@gnu.org

I'm getting way out of my depth here, both regarding regexps and emacs'
sentence-related shenanigans, but you could consider advising the
`sentence-end' function so that it checks current the major mode, and
delegates to a different sentence-end function depending on the mode (or
declines to handle and bails to the built-in sentence-end).

The individual mode-specific sentence-end functions look at the text
after point, and return a different regexp every time, one specifically
tailored to this particular sentence in this particular mode. The call to
`forward-sentence' or whatever happily uses a different regexp every
time it is called.

Feels hacky, but I guess `sentence-end' is already doing this in a
sense -- potentially returning a different regexp every time.

My brain is exhausted!

E

ken

unread,

Dec 12, 2012, 9:32:31 AM12/12/12

to GNU Emacs List

On 12/12/2012 02:02 AM Eric Abrahamsen wrote:
> ken<geb...@mousecar.com> writes:
>
>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>>> ken<geb...@mousecar.com> writes:
>>>
>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>>> 2010/6/27 ken<geb...@mousecar.com>:
>>>>>>
>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>>> Thanks for the responses guys.
>>>>>>>
>>>>>>> ....
>>>>>>>

[In my original post the paragraph below was unclear. So changed it.]

>> My considered opinion is that in the above and similar examples, the
>> end of the sentence is immediately after the period ('.')... or
>> question mark, exclamation mark, etc. and not after the</div>. That
>> is where the point should go when forward-sentence is executed. This
>> means that no RE would work because, once it finds the RE-defined
>> sentence-end, it then needs to go backwards within the found string

>> until it encounters [.!?]+ and then move the mark one char forward to the

>> character after. IOW, unless I'm missing some capability of REs,
>> "sentence-end" needs to be a function rather than an RE and would be a
>> different function than one which finds the beginning of a sentence.
>
> I'm getting way out of my depth here, both regarding regexps and emacs'
> sentence-related shenanigans, but you could consider advising the
> `sentence-end' function so that it checks current the major mode, and
> delegates to a different sentence-end function depending on the mode (or
> declines to handle and bails to the built-in sentence-end).
>
> The individual mode-specific sentence-end functions look at the text
> after point, and return a different regexp every time, one specifically
> tailored to this particular sentence in this particular mode. The call to
> `forward-sentence' or whatever happily uses a different regexp every
> time it is called.
>
> Feels hacky, but I guess `sentence-end' is already doing this in a
> sense -- potentially returning a different regexp every time.
>
> My brain is exhausted!
>
> E

If one were to write a mode-specific replacement for the existing
"forward-sentence" and "sentence-end", what are some ways in elisp to
ensure that they're invoked when working in that mode? Would it be
enough to include (the recoded) "forward-sentence" and "sentence-end" in
the code for that mode...? or would some kind of specific hook language
need to be included in ~/.emacs?

Eric Abrahamsen

unread,

Dec 12, 2012, 11:27:45 PM12/12/12

to help-gn...@gnu.org

I was considering overloading the `sentence-end' function in a
mode-hook, but I think it's highly likely that you'd end up polluting
other modes. So probably the safest thing to do is to advise it at the
top level, ie in your ~/.emacs file, and then check current mode from
there. Something like the following totally untested code:

--8<---------------cut here---------------start------------->8---
(defadvice sentence-end (before my-check-sentence-end activate)
"Possibly short-circuit the `sentence-end' function."
(cond ((derived-mode-p 'emacs-lisp-mode)
(emacs-lisp-sentence-end))
((derived-mode-p 'some-other-mode)
(other-mode-sentence-end))
(t ad-do-it)))

(defun emacs-lisp-sentence-end ()
;; examine text around point and return an appropriate regexp
)

(defun other-mode-sentence-end ()
;; return a different regexp
)
--8<---------------cut here---------------end--------------->8---

That ought to work, but I'm not guaranteeing that this is the best
approach!

E

Eric Abrahamsen

unread,

Dec 13, 2012, 12:59:57 AM12/13/12

to help-gn...@gnu.org

> (defadvice sentence-end (before my-check-sentence-end activate)
> "Possibly short-circuit the `sentence-end' function."
> (cond ((derived-mode-p 'emacs-lisp-mode)
> (emacs-lisp-sentence-end))
> ((derived-mode-p 'some-other-mode)
> (other-mode-sentence-end))
> (t ad-do-it)))

I'm in the habit of using `derived-mode-p' but on second thought, you'll
probably just want to go with the simpler, but more exacting: (eq
major-mode 'emacs-lisp-mode)