You need to frob `gnus-group-name-charset-group-alist' to something
like `((".*" . utf-8))' though. Per, do you want to commit the change
to what you suggested?
From GNUS-NEWS:
** Group names are treated as UTF-8 by default.
This is supposedly what USEFOR wants to migrate to. See
`gnus-group-name-charset-group-alist' and
`gnus-group-name-charset-method-alist' for customization.
ChangeLog entry:
2001-10-06 Simon Josefsson <j...@extundo.com>
Support UTF-8 group names better.
* message.el (message-check-news-header-syntax): Encode group
names before comparison.
* gnus-msg.el (gnus-copy-article-buffer): Run all
`gnus-article-decode-hook's except `article-decode-charset'
instead of hardcoding call to one of them.
* gnus-art.el (gnus-article-decode-hook): Add
`article-decode-group-name'.
(article-decode-group-name): New function, use `g-d-n'.
* gnus-group.el (gnus-group-insert-group-line): Decode
gnus-tmp-group using `g-d-n'.
* gnus-util.el (gnus-decode-newsgroups): New function.
The patch:
Index: gnus-art.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/gnus-art.el,v
retrieving revision 6.109
diff -u -r6.109 gnus-art.el
--- gnus-art.el 2001/09/28 11:22:41 6.109
+++ gnus-art.el 2001/10/06 21:04:01
@@ -638,7 +638,8 @@
(face :value default)))))
(defcustom gnus-article-decode-hook
- '(article-decode-charset article-decode-encoded-words)
+ '(article-decode-charset article-decode-encoded-words
+ article-decode-group-name)
"*Hook run to decode charsets in articles."
:group 'gnus-article-headers
:type 'hook)
@@ -1753,6 +1754,22 @@
(save-restriction
(article-narrow-to-head)
(funcall gnus-decode-header-function (point-min) (point-max)))))
+
+(defun article-decode-group-name ()
+ "Decode group names in `Newsgroups:'."
+ (let ((inhibit-point-motion-hooks t)
+ buffer-read-only
+ (method (gnus-find-method-for-group gnus-newsgroup-name)))
+ (when (and (or gnus-group-name-charset-method-alist
+ gnus-group-name-charset-group-alist)
+ (gnus-buffer-live-p gnus-original-article-buffer)
+ (mail-fetch-field "Newsgroups"))
+ (nnheader-replace-header "Newsgroups"
+ (gnus-decode-newsgroups
+ (with-current-buffer
+ gnus-original-article-buffer
+ (mail-fetch-field "Newsgroups"))
+ gnus-newsgroup-name method)))))
(defun article-de-quoted-unreadable (&optional force read-charset)
"Translate a quoted-printable-encoded article.
Index: gnus-group.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/gnus-group.el,v
retrieving revision 6.40
diff -u -r6.40 gnus-group.el
--- gnus-group.el 2001/09/29 19:25:31 6.40
+++ gnus-group.el 2001/10/06 21:04:02
@@ -1339,7 +1339,9 @@
(point)
(prog1 (1+ (point))
;; Insert the text.
- (eval gnus-group-line-format-spec))
+ (let ((gnus-tmp-group (gnus-group-name-decode
+ gnus-tmp-group group-name-charset)))
+ (eval gnus-group-line-format-spec)))
`(gnus-group ,(gnus-intern-safe gnus-tmp-group gnus-active-hashtb)
gnus-unread ,(if (numberp number)
(string-to-int gnus-tmp-number-of-unread)
Index: gnus-msg.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/gnus-msg.el,v
retrieving revision 6.47
diff -u -r6.47 gnus-msg.el
--- gnus-msg.el 2001/09/23 06:17:41 6.47
+++ gnus-msg.el 2001/10/06 21:04:03
@@ -579,7 +579,10 @@
(or (message-goto-body) (point-max)))
;; Insert the original article headers.
(insert-buffer-substring gnus-original-article-buffer beg end)
- (article-decode-encoded-words))))
+ ;; Decode charsets.
+ (let ((gnus-article-decode-hook
+ (delq 'article-decode-charset gnus-article-decode-hook)))
+ (run-hooks 'gnus-article-decode-hook)))))
gnus-article-copy)))
(defun gnus-post-news (post &optional group header article-buffer yank subject
Index: gnus-util.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/gnus-util.el,v
retrieving revision 6.19
diff -u -r6.19 gnus-util.el
--- gnus-util.el 2001/08/24 04:43:11 6.19
+++ gnus-util.el 2001/10/06 21:04:03
@@ -187,6 +187,14 @@
(search-forward ":" eol t)
(point)))))
+(defun gnus-decode-newsgroups (newsgroups group &optional method)
+ (let ((method (or method (gnus-find-method-for-group group))))
+ (mapconcat (lambda (group)
+ (gnus-group-name-decode group (gnus-group-name-charset
+ method group)))
+ (message-tokenize-header newsgroups ", ")
+ ", ")))
+
(defun gnus-remove-text-with-property (prop)
"Delete all text in the current buffer with text property PROP."
(save-excursion
Index: message.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/message.el,v
retrieving revision 6.119
diff -u -r6.119 message.el
--- message.el 2001/09/29 19:25:32 6.119
+++ message.el 2001/10/06 21:04:04
@@ -2915,12 +2915,15 @@
(if followup-to
(concat newsgroups "," followup-to)
newsgroups)))
+ (method (if (message-functionp message-post-method)
+ (funcall message-post-method)
+ message-post-method))
(known-groups
- (mapcar (lambda (n) (gnus-group-real-name n))
- (gnus-groups-from-server
- (if (message-functionp message-post-method)
- (funcall message-post-method)
- message-post-method))))
+ (mapcar (lambda (n)
+ (gnus-group-name-decode
+ (gnus-group-real-name n)
+ (gnus-group-name-charset method n)))
+ (gnus-groups-from-server method)))
errors)
(while groups
(unless (or (equal (car groups) "poster")
> You need to frob `gnus-group-name-charset-group-alist' to something
> like `((".*" . utf-8))' though. Per, do you want to commit the change
> to what you suggested?
I have committed the change.
> + (nnheader-replace-header "Newsgroups"
> + (gnus-decode-newsgroups
> + (with-current-buffer
> + gnus-original-article-buffer
> + (mail-fetch-field "Newsgroups"))
> + gnus-newsgroup-name method)))))
The Followups-To header should get the same treatment. Just look at
J.B. Moreno's posting, or try following up to it.
> + (message-tokenize-header newsgroups ", ")
> + ", ")))
The ", " confuses me, spaces are (or were until recently) illegal in
the newsgroup header. But it seems to work. Is there some other part
of Gnus that inserts and removed the spaces?
> Simon Josefsson <j...@extundo.com> writes:
>
>> + (nnheader-replace-header "Newsgroups"
>> + (gnus-decode-newsgroups
>> + (with-current-buffer
>> + gnus-original-article-buffer
>> + (mail-fetch-field "Newsgroups"))
>> + gnus-newsgroup-name method)))))
>
> The Followups-To header should get the same treatment. Just look at
> J.B. Moreno's posting, or try following up to it.
Oh. Do you want to make the change? I don't have time today.
>> + (message-tokenize-header newsgroups ", ")
>> + ", ")))
>
> The ", " confuses me, spaces are (or were until recently) illegal in
> the newsgroup header. But it seems to work. Is there some other part
> of Gnus that inserts and removed the spaces?
No idea, I just thought it looked easier to read. Probably changing
it to "," is better.
(Is "," and " " forbidden in USEFOR group names? Escaped?)
> Oh. Do you want to make the change? I don't have time today.
Done.
However, you can still not specify "Followup-To: dk.test.utf8-זרו",
Gnus will encode the name when posting.
> No idea, I just thought it looked easier to read. Probably changing
> it to "," is better.
Done.
> (Is "," and " " forbidden in USEFOR group names? Escaped?)
They are not legal, the old limitations on ASCII part of Unicode are
pretty much still in place. You can not use all parts of Unicode
either.
> However, you can still not specify "Followup-To: dk.test.utf8-זרו",
> Gnus will encode the name when posting.
I "fixed" this, but yuck. 'message-send-news' does not support
'gnus-group-name-charset-group-alist' at all! It uses the same
charset for the entire Newsgroups line, and passes "" as the group
name to the function deciding which charset to use. Of course this
matches the new default value of ".*" for g-g-n-c-g-a, but there is no
way to get hierarchy specific encoding using g-g-n-c-g-a.
I modified the code so the same accident that made Newsgroups work,
now also make Followup-To work.
Doing it right would not be too hard for the no-crossposting case, but
tough in case of crossposting.
There is `gnus-group-name-charset-method-alist' as well, using a ""
group indicate that the method should be used to figure things out.
Perhaps this variable is more useful when posting, due to the
followup/crossposting problems. One problem though, there is no ".*"
method for g-g-n-c-m-a.
Also, one easy solution is probably to use the charset specified by
the first entry of `newsgroups' to encode the headers. You can't post
the same article with different encodings, even if different
hierarchies disagree on what you should use. (Maybe Gnus could warn
about this though, and that you're likely to be flamed in either of
the hierarchy.)
> There is `gnus-group-name-charset-method-alist' as well, using a ""
> group indicate that the method should be used to figure things out.
I think the idea was that it test if any special rules apply to the
server, then if there are special rules apply to the hierarchy. It
kind of ruin the point if that only works for reading, not posting.
> Also, one easy solution is probably to use the charset specified by
> the first entry of `newsgroups' to encode the headers.
Yes, or just the whole content of the line. If people specify
e.g. ("dk\." utf-8) without any "\<", "^" or "\`", it will also match
if a dk.* group is mentioned
That is the workaround I mentioned. It's clearly a 99.99% solution,
crossposting to hierarchies with conflicting character set conventions
for newsgroup names are not going to be common. In these cases,
telling the users they must multipost could be viewed as acceptable.
Implementing the remaining 00.01% might not be worthwhile for anything
but the coolness feature.
> You can't post the same article with different encodings, even if
> different hierarchies disagree on what you should use.
Yes you can, if you had
(("dk\." utf-8) ("no\." latin-1))
and crossposted to
Newsgroups: dk.test.utf8-æøå,no.test.utf8-æøå
a perfect Gnus would utf8 encode dk.test.utf8-æøå and latin1 encode
no.test.utf8-æøå.
I believe the display code you wrote already handles this case.
Hm... can we agree that RFC2047 encoding the Newsgroup line *never*
makes sense for news? In that case, implementing it might not be
as hard as I feared.
> (Maybe Gnus could warn about this though, and that you're likely to
> be flamed in either of the hierarchy.)
Heh, the message will not appear on the hierarchy where Gnus got it
wrong, so no flames ;-)
>> You can't post the same article with different encodings, even if
>> different hierarchies disagree on what you should use.
>
> Yes you can, if you had
>
> (("dk\." utf-8) ("no\." latin-1))
>
> and crossposted to
>
> Newsgroups: dk.test.utf8-æøå,no.test.utf8-æøå
>
> a perfect Gnus would utf8 encode dk.test.utf8-æøå and latin1 encode
> no.test.utf8-æøå.
I don't see how this will work -- the NNTP command will work, but the
Newsgroups field in the article can only be in one charset so it will
be incorrect in either one of the hierarchy. So you end up posting
UTF-8 headers in no.* or Latin-1 headers to dk.*, neither is what you
told to Gnus to do with the above setup, I think. Of course, this
case is bogus, so Gnus should probably just warn.
Maybe I'm missing the point here.
Hm. Come to think of it, maybe we're overloading the semantics of
g-g-n-c-*-a? Maybe it should be for display purposes only, and some
other variables should be consulted for posting.
> Hm... can we agree that RFC2047 encoding the Newsgroup line *never*
> makes sense for news? In that case, implementing it might not be
> as hard as I feared.
It will probably make sense for news transported via mail... Anyway,
it should be possible to customize some variable to make Gnus work
like that if you want.
> I don't see how this will work -- the NNTP command will work, but the
> Newsgroups field in the article can only be in one charset so it will
> be incorrect in either one of the hierarchy.
No, it can easily be in multiple character sets. I don't understand
how you can think otherwise, your own display code implements that.
> So you end up posting UTF-8 headers in no.* or Latin-1 headers to
> dk.*, neither is what you told to Gnus to do with the above setup, I
> think.
g-g-n-c-*-a doesn't specify the character set used for the Newsgroup
header. It specifies the character set used for the newsgroup name.
> Of course, this
> case is bogus, so Gnus should probably just warn.
It is only bogus because Gnus doesn't currently implement it for
posting (only for display). There is nothing a priori wrong about
crossposting between hierachies with conflicting character sets used
for newsgroup names. For most newsreaders it will even fork, at least
one of the names will just look wrong (like dk.test.utf8-æøå).
> Hm. Come to think of it, maybe we're overloading the semantics of
> g-g-n-c-*-a? Maybe it should be for display purposes only, and some
> other variables should be consulted for posting.
I see no advantage of that. We are specifying what the encodings of
newsgroup names are, unlike other parts of the headers it makes no
sense to use different encodings for displaying and posting. A
nesgroup name must always be exactly the same byte sequence for news
to work.
>> Hm... can we agree that RFC2047 encoding the Newsgroup line *never*
>> makes sense for news? In that case, implementing it might not be
>> as hard as I feared.
>
> It will probably make sense for news transported via mail...
That's not the case I was talking about.
> Anyway, it should be possible to customize some variable to make
> Gnus work like that if you want.
I don't think that would be useful. Gnus either support crossposting
between hierachies with incompatible encoding, or it doesn't.
> Simon Josefsson <j...@extundo.com> writes:
>
>> I don't see how this will work -- the NNTP command will work, but the
>> Newsgroups field in the article can only be in one charset so it will
>> be incorrect in either one of the hierarchy.
>
> No, it can easily be in multiple character sets. I don't understand
> how you can think otherwise, your own display code implements that.
You're right. :-)
The problem is that the user will need to enter the unencoded version
of the group name in g-g-n-c-*-a. Otherwise Gnus would need to know
the charset to be able to decode the group name to be able to match it
against g-g-n-c-*-a, and that's Moment 22. But in practice, this
isn't a problem -- but consider a top-level hierarchy named æøå, you
would need to enter ("æøå.*" utf-8) or something else for that. (A
better example would be a CJK top-level hierarchy, since everything
already is utf-8 so you don't need to specify utf-8 now.)
>> So you end up posting UTF-8 headers in no.* or Latin-1 headers to
>> dk.*, neither is what you told to Gnus to do with the above setup, I
>> think.
>
> g-g-n-c-*-a doesn't specify the character set used for the Newsgroup
> header. It specifies the character set used for the newsgroup name.
Well, it does now, see `article-decode-group-name'. And this uses
only the current group name as input, it doesn't look at each group
name in the Newsgroup to find out what encoding it is in.
And this decoded value is later put into your reply.
>> Of course, this
>> case is bogus, so Gnus should probably just warn.
>
> It is only bogus because Gnus doesn't currently implement it for
> posting (only for display). There is nothing a priori wrong about
> crossposting between hierachies with conflicting character sets used
> for newsgroup names. For most newsreaders it will even fork, at least
> one of the names will just look wrong (like dk.test.utf8-æøå).
If it doesn't look right, it's not working. :-)
> The problem is that the user will need to enter the unencoded version
> of the group name in g-g-n-c-*-a. Otherwise Gnus would need to know
> the charset to be able to decode the group name to be able to match it
> against g-g-n-c-*-a, and that's Moment 22. But in practice, this
> isn't a problem -- but consider a top-level hierarchy named æøå, you
> would need to enter ("æøå.*" utf-8) or something else for that. (A
> better example would be a CJK top-level hierarchy, since everything
> already is utf-8 so you don't need to specify utf-8 now.)
Yes, my "mental model" has always been pure ASCII top level hierarchy
names.
>>> So you end up posting UTF-8 headers in no.* or Latin-1 headers to
>>> dk.*, neither is what you told to Gnus to do with the above setup, I
>>> think.
>>
>> g-g-n-c-*-a doesn't specify the character set used for the Newsgroup
>> header. It specifies the character set used for the newsgroup name.
>
> Well, it does now, see `article-decode-group-name'. And this uses
> only the current group name as input, it doesn't look at each group
> name in the Newsgroup to find out what encoding it is in.
>
> And this decoded value is later put into your reply.
Yes and no. It does use a single "method" parameter for the whole
newsgroup line, which is what g-g-n-c-method-a uses. This make sense,
as you can not crosspost to different servers (methods).
However the "group" parameter, which is what g-g-n-c-group-a uses, is
individual for each newsgroup on the line. While the first newsgroup
name is passed as the "group" parameter, this parameter is shadowed by
the individual newsgroup names by the "(lambda (group) ...)".
So I believe "article-decode-group-name" does the right thing by using
a single "method", but individual "group" when encoding the line.
> Doing it right would not be too hard for the no-crossposting case, but
> tough in case of crossposting.
I believe I have fixed this for the "no crossposting between groups
with incompatible character sets" case. Chances are that the other
case will never happen.