Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

changing word boundaries

10 views
Skip to first unread message

Ernest Adrogué

unread,
Oct 18, 2009, 12:27:40 PM10/18/09
to help-gn...@gnu.org
Hi there,

The Catalan language has a ligature consisting in one
"l" character, followed by a middle dot ("·"), followed
by another "l". See here for more details:
http://en.wikipedia.org/wiki/L·l#Catalan

Is there a way to make emacs aware of this, so that it
doesn't treat a word containing "l·l" as two separate
words?

Thanks.

PS. Please CC me, if you reply to this.

--
Ernest


Peter Dyballa

unread,
Oct 18, 2009, 3:24:00 PM10/18/09
to Ernest Adrogué, help-gn...@gnu.org

Am 18.10.2009 um 18:27 schrieb Ernest Adrogué:

> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?


How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
+0140. The problem is that · only between two l becomes a word
constituent and in so many other cases it's a multiplication sign, a
comma, a name separator, some kind of bullet sign...

--
Greetings

Pete

The human animal differs from the lesser primates in his passion for
lists of "Ten Best."
– H. Allen Smith

Andreas Politz

unread,
Oct 18, 2009, 5:08:08 PM10/18/09
to help-gn...@gnu.org, Ernest Adrogué
Ernest Adrogué <eadr...@gmx.net> writes:

> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>

> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?
>

> Thanks.
>
> PS. Please CC me, if you reply to this.

You could use dynamic syntax-tables via font-lock.

(add-hook 'text-mode-hook
(lambda nil
(set (make-variable-buffer-local
'parse-sexp-lookup-properties) t)
;; get font-lock started
(unless font-lock-defaults
(setq font-lock-defaults '(nil t)))
(add-to-list
(make-variable-buffer-local
'font-lock-syntactic-keywords)
;; let ! between 2*a have word syntax
'("a\\(!\\)a" 1 "w"))))


Replace `a' and `!' with your characters and it'll work,
hopefully.

-ap

Andreas Politz

unread,
Oct 18, 2009, 5:09:39 PM10/18/09
to help-gn...@gnu.org, Ernest Adrogué

Ernest Adrogué

unread,
Oct 18, 2009, 5:19:13 PM10/18/09
to Peter Dyballa, help-gn...@gnu.org
Hallo,

18/10/09 @ 21:24 (+0200), thus spake Peter Dyballa:


> How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
> +0140. The problem is that · only between two l becomes a word
> constituent and in so many other cases it's a multiplication sign, a
> comma, a name separator, some kind of bullet sign...

Seems the way to go, yes. Unfortunately, everybody still
uses the middle dot, for example, spell-checkers think ŀ is
a misspelling.

Cheers.

--
Ernest


Ernest Adrogué

unread,
Oct 19, 2009, 8:06:36 PM10/19/09
to help-gn...@gnu.org
18/10/09 @ 23:08 (+0200), thus spake Andreas Politz:

> You could use dynamic syntax-tables via font-lock.
>
> (add-hook 'text-mode-hook
> (lambda nil
> (set (make-variable-buffer-local
> 'parse-sexp-lookup-properties) t)
> ;; get font-lock started
> (unless font-lock-defaults
> (setq font-lock-defaults '(nil t)))
> (add-to-list
> (make-variable-buffer-local
> 'font-lock-syntactic-keywords)
> ;; let ! between 2*a have word syntax
> '("a\\(!\\)a" 1 "w"))))
>
>
> Replace `a' and `!' with your characters and it'll work,
> hopefully.

It does what I wanted. :)
Thanks!

Ernest


Dave Love

unread,
Nov 1, 2009, 2:15:23 PM11/1/09
to
Peter Dyballa <Peter_...@Web.DE> writes:

> How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
> +0140. The problem is that · only between two l becomes a word
> constituent and in so many other cases it's a multiplication sign, a
> comma, a name separator, some kind of bullet sign...

It may be mis-used, but U+00B7 is MIDDLE DOT (punctuation). BULLET is
U+2022 and the mathematical DOT OPERATOR is U+22C5. It surely doesn't
really matter in this context anyhow. A lot of character syntaxes have
long been wrong in Emacs anyhow.

Dave Love

unread,
Nov 1, 2009, 2:09:53 PM11/1/09
to Ernest Adrogué, help-gn...@gnu.org
Ernest Adrogué <eadr...@gmx.net> writes:

> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?

[You're probably not really interested in word boundaries, just word
constituents. For an illustration of the difference, see variable
`word-combining-categories' and what capitalized-words-mode does in
Emacs 23.]

You should define a Catalan language environment to be used in ca_ES
locales. (I'm surprised I didn't do it, as there's a relevant input
method.) It should set the base syntax of · to word, and set a suitable
default input method. The existing one, `catalan-prefix', should
presumably bind `~.' to `·', as in latin-prefix; it doesn't currently,
and maybe needs other fixes.

The environment would be something like this (untested), which is
probably better then trying to use categories. [The default Latin-1
character set is overridden in, say, ca_ES.UTF-8.]

(push '("ca" . "Catalan") locale-language-names)

(set-language-info-alist
"Catalan" '((tutorial . "TUTORIAL.es") ; maybe...
(charset iso-8859-1)
(coding-system iso-latin-1 iso-latin-9)
(coding-priority iso-latin-1)
(input-method . "catalan-prefix")
(nonascii-translation . iso-8859-1)
(unibyte-display . iso-latin-1)
(setup-function
. (lambda ()
(modify-syntax-entry ?· "w" (standard-syntax-table))))
(exit-function
. (lambda ()
(modify-syntax-entry ?· "_" (standard-syntax-table))))
;; Fixme:
;; (sample-text . "Spanish (Español) ¡Hola!")
(documentation . "\
This language environment uses the Latin-1 character set, sets
the default input method to \"catalan-prefix\", and sets the
syntax of `·' to word. It selects the Spanish tutorial, in the
absence of a Catalan translation."))
'("European"))

You could make a bug report if you have more luck than me with reports
about stuff I worked on.


Dave Love

unread,
Nov 1, 2009, 2:10:26 PM11/1/09
to Ernest Adrogué, help-gn...@gnu.org
Ernest Adrogué <eadr...@gmx.net> writes:

Well, it's a pretty odd way to do it. If you really only want to use
the ligature in Text mode -- and not programming language comments, for
instance -- just amend `text-mode-syntax-table'.


Ernest Adrogué

unread,
Nov 8, 2009, 12:07:22 PM11/8/09
to Dave Love, help-gn...@gnu.org
1/11/09 @ 19:09 (+0000), thus spake Dave Love:

> Ernest Adrogué <eadr...@gmx.net> writes:
>
> > Hi there,
> >
> > The Catalan language has a ligature consisting in one
> > "l" character, followed by a middle dot ("·"), followed
> > by another "l". See here for more details:
> > http://en.wikipedia.org/wiki/L·l#Catalan
> >
> > Is there a way to make emacs aware of this, so that it
> > doesn't treat a word containing "l·l" as two separate
> > words?
>

Thanks a lot. Have you got any idea of where this should be
put in order to be loaded automatically at start-up?
I tried in init.el, and in a file in the "language" directory
in /usr/share/emacs/23.1/lisp/ to no avail.
It says that there's "no match", when I try to set the language
environment to Catalan interactively.

> You could make a bug report if you have more luck than me with reports
> about stuff I worked on.

I will try, once I get it to work :)

Cheers,

Ernest


Kevin Rodgers

unread,
Nov 11, 2009, 9:57:04 AM11/11/09
to help-gn...@gnu.org
Ernest Adrogu� wrote:
> 1/11/09 @ 19:09 (+0000), thus spake Dave Love:
>> Ernest Adrogu� <eadr...@gmx.net> writes:
>>
>>> Hi there,
>>>
>>> The Catalan language has a ligature consisting in one
>>> "l" character, followed by a middle dot ("�"), followed
>>> by another "l". See here for more details:
>>> http://en.wikipedia.org/wiki/L�l#Catalan
>>>
>>> Is there a way to make emacs aware of this, so that it
>>> doesn't treat a word containing "l�l" as two separate
>>> words?
>> ;; (sample-text . "Spanish (Espa�ol) �Hola!")

>> (documentation . "\
>> This language environment uses the Latin-1 character set, sets
>> the default input method to \"catalan-prefix\", and sets the
>> syntax of `�' to word. It selects the Spanish tutorial, in the
>> absence of a Catalan translation."))
>> '("European"))
>
> Thanks a lot. Have you got any idea of where this should be
> put in order to be loaded automatically at start-up?

1. C-x C-f ~/.emacs

2. M-x find-library RET default.el

3. M-x find-library RET site-start.el

> I tried in init.el, and in a file in the "language" directory
> in /usr/share/emacs/23.1/lisp/ to no avail.
> It says that there's "no match", when I try to set the language
> environment to Catalan interactively.
>
>> You could make a bug report if you have more luck than me with reports
>> about stuff I worked on.
>
> I will try, once I get it to work :)
>
> Cheers,
>
> Ernest
>
>
>


--
Kevin Rodgers
Denver, Colorado, USA

0 new messages