Getting the right dictionary for e-mail and newsgroup messages

Cecil Westerhof

unread,

Dec 25, 2009, 5:08:27 AM12/25/09

to

At the moment I use two languages for my messages: Dutch and English. So
I need to have the dictionary to be set correctly. Therefore I wrote the
following code:
(defvar gnus-dictionaries
'(
("^nl\\.\\|\\.nl\\.\\|\\.nl$" . "dutch")
)
"A list of group names and dictionaries to use.")

(defun gnus-set-dictionary ()
"Determine what dictionary to use based on the current message."
(let ((dictionary))
(if (message-news-p)
(setq dictionary "british")
(setq dictionary "dutch"))
(dolist (item gnus-dictionaries)
(when (string-match (car item) gnus-newsgroup-name)
(setq dictionary (cdr item))))
(ispell-change-dictionary dictionary)))

(add-hook 'message-mode-hook 'gnus-set-dictionary)

Default e-mail is in Dutch and newsgroup messages in English. But
sometimes it could be that it is otherwise. For example Dutch newsgroups
wants Dutch Messages. That is why I use gnus-dictionaries to check
for an exception.

I only have one problem. At the moment I need to use:
"^nl\\.\\|\\.nl\\.\\|\\.nl$"
for the regular expression. I would prefer to use something like:
"[\\.^]nl[\\.$]"

But that does not work. Is there another way to make the regular
expression simpler?

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Ted Zlatanov

unread,

Jan 5, 2010, 2:12:07 PM1/5/10

to

On Fri, 25 Dec 2009 11:08:27 +0100 Cecil Westerhof <Ce...@decebal.nl> wrote:

CW> I only have one problem. At the moment I need to use:
CW> "^nl\\.\\|\\.nl\\.\\|\\.nl$"
CW> for the regular expression. I would prefer to use something like:
CW> "[\\.^]nl[\\.$]"

CW> But that does not work. Is there another way to make the regular
CW> expression simpler?

The simplest solution is probably to use split-string, since you'll get
all the path components that way:

(member "nl" (split-string "X.nl.X" "\\."))

Your regex character class of [\\.^] doesn't work because it's matching
the character ^ and not the beginning of the line. Same for the [\\.$]
class.

If the '.' character is not in your word class (it shouldn't be), you
can use

(string-match "\\bnl\\b" "X.nl.X")

which is probably the best regex-based solution, so it will work with
your existing code. You could also use \< and \> but that's probably
unnecessary. Look at the ELisp manual, section "Backslash Constructs in
Regular Expressions" for details.

Ted

Cecil Westerhof

unread,

Jan 5, 2010, 3:47:34 PM1/5/10

to

Ted Zlatanov <t...@lifelogs.com> writes:

> CW> I only have one problem. At the moment I need to use:
> CW> "^nl\\.\\|\\.nl\\.\\|\\.nl$"
> CW> for the regular expression. I would prefer to use something like:
> CW> "[\\.^]nl[\\.$]"
>
> CW> But that does not work. Is there another way to make the regular
> CW> expression simpler?

> If the '.' character is not in your word class (it shouldn't be), you
> can use
>
> (string-match "\\bnl\\b" "X.nl.X")

This also matches:
(string-match "\\bnl\\b" "X-nl-X")

But I do not think that is a problem. So I now use "\\bnl\\b". That is a
lot clearer as "^nl\\.\\|\\.nl\\.\\|\\.nl$" and easier to adopt when
another language has to be added.

Thanks.

Ted Zlatanov

unread,

Jan 6, 2010, 10:21:12 AM1/6/10

to

On Tue, 05 Jan 2010 21:47:34 +0100 Cecil Westerhof <Ce...@decebal.nl> wrote:

>> (string-match "\\bnl\\b" "X.nl.X")

CW> This also matches:
CW> (string-match "\\bnl\\b" "X-nl-X")

CW> But I do not think that is a problem.

I think you can modify the syntax table to accomodate this, making '-' a
member of the word class, but you have to check the manual for the
details.

Ted

Cecil Westerhof

unread,

Jan 6, 2010, 10:47:50 AM1/6/10

to

Ted Zlatanov <t...@lifelogs.com> writes:

>>> (string-match "\\bnl\\b" "X.nl.X")
>
> CW> This also matches:
> CW> (string-match "\\bnl\\b" "X-nl-X")
>
> CW> But I do not think that is a problem.
>
> I think you can modify the syntax table to accomodate this, making '-' a
> member of the word class, but you have to check the manual for the
> details.

I would need to put a lot more in the word class, because the original
problem was that before nl needs to be a '.' or the beginning of the
line and after a '.' or the end of the line. So in principal the regular
expression is to lenient, but the chance that it is a real problem is
very small. When that is the case, I need to use my original expression
again. But it is used for parsing newsgroup names, so I think the
expression is good enough.