Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

changing word boundaries

8 views
Skip to first unread message

Ernest Adrogué

unread,
Oct 18, 2009, 12:27:40 PM10/18/09
to help-gn...@gnu.org
Hi there,

The Catalan language has a ligature consisting in one
"l" character, followed by a middle dot ("·"), followed
by another "l". See here for more details:
http://en.wikipedia.org/wiki/L·l#Catalan

Is there a way to make emacs aware of this, so that it
doesn't treat a word containing "l·l" as two separate
words?

Thanks.

PS. Please CC me, if you reply to this.

--
Ernest


Peter Dyballa

unread,
Oct 18, 2009, 3:24:00 PM10/18/09
to Ernest Adrogué, help-gn...@gnu.org

Am 18.10.2009 um 18:27 schrieb Ernest Adrogué:

> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?


How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
+0140. The problem is that · only between two l becomes a word
constituent and in so many other cases it's a multiplication sign, a
comma, a name separator, some kind of bullet sign...

--
Greetings

Pete

The human animal differs from the lesser primates in his passion for
lists of "Ten Best."
– H. Allen Smith

Andreas Politz

unread,
Oct 18, 2009, 5:08:08 PM10/18/09
to help-gn...@gnu.org, Ernest Adrogué
Ernest Adrogué <eadr...@gmx.net> writes:

> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>

> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?
>

> Thanks.
>
> PS. Please CC me, if you reply to this.

You could use dynamic syntax-tables via font-lock.

(add-hook 'text-mode-hook
(lambda nil
(set (make-variable-buffer-local
'parse-sexp-lookup-properties) t)
;; get font-lock started
(unless font-lock-defaults
(setq font-lock-defaults '(nil t)))
(add-to-list
(make-variable-buffer-local
'font-lock-syntactic-keywords)
;; let ! between 2*a have word syntax
'("a\\(!\\)a" 1 "w"))))


Replace `a' and `!' with your characters and it'll work,
hopefully.

-ap

Andreas Politz

unread,
Oct 18, 2009, 5:09:39 PM10/18/09
to help-gn...@gnu.org, Ernest Adrogué

Ernest Adrogué

unread,
Oct 18, 2009, 5:19:13 PM10/18/09
to Peter Dyballa, help-gn...@gnu.org
Hallo,

18/10/09 @ 21:24 (+0200), thus spake Peter Dyballa:


> How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
> +0140. The problem is that · only between two l becomes a word
> constituent and in so many other cases it's a multiplication sign, a
> comma, a name separator, some kind of bullet sign...

Seems the way to go, yes. Unfortunately, everybody still
uses the middle dot, for example, spell-checkers think ŀ is
a misspelling.

Cheers.

--
Ernest


Ernest Adrogué

unread,
Oct 19, 2009, 8:06:36 PM10/19/09
to help-gn...@gnu.org
18/10/09 @ 23:08 (+0200), thus spake Andreas Politz:

> You could use dynamic syntax-tables via font-lock.
>
> (add-hook 'text-mode-hook
> (lambda nil
> (set (make-variable-buffer-local
> 'parse-sexp-lookup-properties) t)
> ;; get font-lock started
> (unless font-lock-defaults
> (setq font-lock-defaults '(nil t)))
> (add-to-list
> (make-variable-buffer-local
> 'font-lock-syntactic-keywords)
> ;; let ! between 2*a have word syntax
> '("a\\(!\\)a" 1 "w"))))
>
>
> Replace `a' and `!' with your characters and it'll work,
> hopefully.

It does what I wanted. :)
Thanks!

Ernest


Dave Love

unread,
Nov 1, 2009, 2:15:23 PM11/1/09
to
Peter Dyballa <Peter_...@Web.DE> writes:

> How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
> +0140. The problem is that · only between two l becomes a word
> constituent and in so many other cases it's a multiplication sign, a
> comma, a name separator, some kind of bullet sign...

It may be mis-used, but U+00B7 is MIDDLE DOT (punctuation). BULLET is
U+2022 and the mathematical DOT OPERATOR is U+22C5. It surely doesn't
really matter in this context anyhow. A lot of character syntaxes have
long been wrong in Emacs anyhow.

Dave Love

unread,
Nov 1, 2009, 2:09:53 PM11/1/09
to Ernest Adrogué, help-gn...@gnu.org
Ernest Adrogué <eadr...@gmx.net> writes:

> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?

[You're probably not really interested in word boundaries, just word
constituents. For an illustration of the difference, see variable
`word-combining-categories' and what capitalized-words-mode does in
Emacs 23.]

You should define a Catalan language environment to be used in ca_ES
locales. (I'm surprised I didn't do it, as there's a relevant input
method.) It should set the base syntax of · to word, and set a suitable
default input method. The existing one, `catalan-prefix', should
presumably bind `~.' to `·', as in latin-prefix; it doesn't currently,
and maybe needs other fixes.

The environment would be something like this (untested), which is
probably better then trying to use categories. [The default Latin-1
character set is overridden in, say, ca_ES.UTF-8.]

(push '("ca" . "Catalan") locale-language-names)

(set-language-info-alist
"Catalan" '((tutorial . "TUTORIAL.es") ; maybe...
(charset iso-8859-1)
(coding-system iso-latin-1 iso-latin-9)
(coding-priority iso-latin-1)
(input-method . "catalan-prefix")
(nonascii-translation . iso-8859-1)
(unibyte-display . iso-latin-1)
(setup-function
. (lambda ()
(modify-syntax-entry ?· "w" (standard-syntax-table))))
(exit-function
. (lambda ()
(modify-syntax-entry ?· "_" (standard-syntax-table))))
;; Fixme:
;; (sample-text . "Spanish (Español) ¡Hola!")
(documentation . "\
This language environment uses the Latin-1 character set, sets
the default input method to \"catalan-prefix\", and sets the
syntax of `·' to word. It selects the Spanish tutorial, in the
absence of a Catalan translation."))
'("European"))

You could make a bug report if you have more luck than me with reports
about stuff I worked on.


Dave Love

unread,
Nov 1, 2009, 2:10:26 PM11/1/09
to Ernest Adrogué, help-gn...@gnu.org
Ernest Adrogué <eadr...@gmx.net> writes:

Well, it's a pretty odd way to do it. If you really only want to use
the ligature in Text mode -- and not programming language comments, for
instance -- just amend `text-mode-syntax-table'.


Ernest Adrogué

unread,
Nov 8, 2009, 12:07:22 PM11/8/09
to Dave Love, help-gn...@gnu.org
1/11/09 @ 19:09 (+0000), thus spake Dave Love:

> Ernest Adrogué <eadr...@gmx.net> writes:
>
> > Hi there,
> >
> > The Catalan language has a ligature consisting in one
> > "l" character, followed by a middle dot ("·"), followed
> > by another "l". See here for more details:
> > http://en.wikipedia.org/wiki/L·l#Catalan
> >
> > Is there a way to make emacs aware of this, so that it
> > doesn't treat a word containing "l·l" as two separate
> > words?
>

Thanks a lot. Have you got any idea of where this should be
put in order to be loaded automatically at start-up?
I tried in init.el, and in a file in the "language" directory
in /usr/share/emacs/23.1/lisp/ to no avail.
It says that there's "no match", when I try to set the language
environment to Catalan interactively.

> You could make a bug report if you have more luck than me with reports
> about stuff I worked on.

I will try, once I get it to work :)

Cheers,

Ernest


Kevin Rodgers

unread,
Nov 11, 2009, 9:57:04 AM11/11/09
to help-gn...@gnu.org
Ernest Adrogu� wrote:
> 1/11/09 @ 19:09 (+0000), thus spake Dave Love:
>> Ernest Adrogu� <eadr...@gmx.net> writes:
>>
>>> Hi there,
>>>
>>> The Catalan language has a ligature consisting in one
>>> "l" character, followed by a middle dot ("�"), followed
>>> by another "l". See here for more details:
>>> http://en.wikipedia.org/wiki/L�l#Catalan
>>>
>>> Is there a way to make emacs aware of this, so that it
>>> doesn't treat a word containing "l�l" as two separate
>>> words?
>> ;; (sample-text . "Spanish (Espa�ol) �Hola!")

>> (documentation . "\
>> This language environment uses the Latin-1 character set, sets
>> the default input method to \"catalan-prefix\", and sets the
>> syntax of `�' to word. It selects the Spanish tutorial, in the
>> absence of a Catalan translation."))
>> '("European"))
>
> Thanks a lot. Have you got any idea of where this should be
> put in order to be loaded automatically at start-up?

1. C-x C-f ~/.emacs

2. M-x find-library RET default.el

3. M-x find-library RET site-start.el

> I tried in init.el, and in a file in the "language" directory
> in /usr/share/emacs/23.1/lisp/ to no avail.
> It says that there's "no match", when I try to set the language
> environment to Catalan interactively.
>
>> You could make a bug report if you have more luck than me with reports
>> about stuff I worked on.
>
> I will try, once I get it to work :)
>
> Cheers,
>
> Ernest
>
>
>


--
Kevin Rodgers
Denver, Colorado, USA

0 new messages