Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Language selection / content negotiation problems

1 view
Skip to first unread message

Roland Holzapfel

unread,
Jan 20, 2000, 3:00:00 AM1/20/00
to
Hello,

I'm using apache 1.3.9 with it's content negotiation module for auto-
matic language selection according to a user's browser preferences.

Works fine, except for the "sub languages" like en_US or de_CH.

I tried to just add more "AddLanguage" directives to map the major and
all sublanguage MIME types to one file extension:

AddLanguage de .de
AddLanguage de-AT .de
AddLanguage de-CH .de
AddLanguage en .en
AddLanguage en-GB .en
AddLanguage en-US .en
AddLanguage fr .fr

This doesn't work. Only the last mapping will be active, which is
de-CH, en-US and fr an the above example.

Any hints ? I don't want to create lots of (unix-) links.

thanks, Roland.
--
----------------------------------------------------------------------
Roland Holzapfel | Fraunhofer-Institut Graphische Datenverarbeitung
| Rundeturmstrasse 6 phone: ++49 (0)6151 155315
| 64283 Darmstadt fax: ++49 (0)6151 155399
holz...@igd.fhg.de| Germany http://www.igd.fhg.de/~holzapfel/
----------------------------------------------------------------------

Alan J. Flavell

unread,
Jan 20, 2000, 3:00:00 AM1/20/00
to
On 20 Jan 2000, Roland Holzapfel wrote:

> I tried to just add more "AddLanguage" directives to map the major and
> all sublanguage MIME types to one file extension:
>
> AddLanguage de .de
> AddLanguage de-AT .de

Right, you can't do that.

> Any hints ?

Use

AddLanguage de .de
Addlanguage de-AT .da-AT
AddLanguage de-CH .de-CH

etc.

In situations where you don't have a specific -AT or -CH variant, call
the generic document e.g foo.html.de.de-AT.de-CH, then MultiViews will
match any of these.

If you have a document variant that you want to return when _no_
languages match, then you can call it foo.html.html (this trick was
discussed very recently here).

Or you can use the mapping file technique, using a script to investigate
the contents of the directory and write a map file according to whatever
negotiation policy you want to apply. You'd have to re-run the script
each time you add or remove files in the directory, of course.

Does that help?


Andreas Prilop

unread,
Jan 20, 2000, 3:00:00 AM1/20/00
to
In article <Pine.HPP.3.95a.100012...@hpplus01.cern.ch>,

"Alan J. Flavell" <fla...@mail.cern.ch> wrote:

> Use
>
> AddLanguage de .de
> Addlanguage de-AT .da-AT
> AddLanguage de-CH .de-CH

Do you have any *realistic* example where such a pedantic distinction
should be necessary? Not even books in the "real world" are translated
between US English and UK English [as far as I known].
I have always wondered what is meant by "Accept-Language: de-AT,en"
in an e-mail message, for example. I don't speak de-AT - so I must
answer in English??
IMHO, those "sublanguages" are not really necessary - only confusing.

--
Wipe the spammers *.cn, *.hk, *.kr, *.tw off the net!

Rainer Scherg

unread,
Jan 20, 2000, 3:00:00 AM1/20/00
to
Hi Alan,

Would it make sense to have

AddLanguage de .de-en
AddLanguage de-AT .de-en
AddLanguage en .de-en
AddLanguage en-US .de-en

working?

This could be helpfull, if a document contains more than two languages.
e.g. a two-column-document with on side in English language and the
other column in German, or whatever language...

To have all langauges decoded in the filename could raise some
problems - either handling problems or OS problems. (also: symlink
handling is not on all OSs possible...)

- rainer


"Alan J. Flavell" wrote:
>
> On 20 Jan 2000, Roland Holzapfel wrote:
>
> > I tried to just add more "AddLanguage" directives to map the major and
> > all sublanguage MIME types to one file extension:
> >
> > AddLanguage de .de
> > AddLanguage de-AT .de
>
> Right, you can't do that.
>
> > Any hints ?
>

> Use
>
> AddLanguage de .de
> Addlanguage de-AT .da-AT
> AddLanguage de-CH .de-CH
>

Alan J. Flavell

unread,
Jan 20, 2000, 3:00:00 AM1/20/00
to
On Thu, 20 Jan 2000, Andreas Prilop wrote:

> > AddLanguage de .de
> > Addlanguage de-AT .da-AT
> > AddLanguage de-CH .de-CH
>

> Do you have any *realistic* example where such a pedantic distinction
> should be necessary?

"Necessary"? No. But I was trying to answer the question as posed. I
have in fact amused myself by setting en-GB,de,en for a while in my
browser, but it rarely made any difference.

> Not even books in the "real world" are translated
> between US English and UK English [as far as I known].

You may well be right, since we British have no problems reading
American, having got acclimatised by films and TV. But I have
definitely seen American editions of British books that had been
"translated" (at least for spelling differences, and in some cases also
for idiom).

> I have always wondered what is meant by "Accept-Language: de-AT,en"
> in an e-mail message, for example. I don't speak de-AT - so I must
> answer in English??

Technically, yes. If the reader wanted to prefer generic German to
English there is a perfectly valid way for them to say so (see my
specimen above). So if they don't say so, theoretically they don't want
it.

If users aren't capable of creating a proper HTTP configuration that
expresses their preferences, then their user agent should help them, e.g
by prompting them when they create their configuration. (Or if it's
Microsoft, it would likely be done by pre-empting the decision for them,
and secreting the real settings somewhere in a super-advanced menu that
can only be found by wizards who have taken the course).

all the best

Alan J. Flavell

unread,
Jan 20, 2000, 3:00:00 AM1/20/00
to
On Thu, 20 Jan 2000, Rainer Scherg wrote:

> Would it make sense to have
>
> AddLanguage de .de-en
> AddLanguage de-AT .de-en
> AddLanguage en .de-en
> AddLanguage en-US .de-en
>
> working?

I don't think so, but I can't really comment in the internal workings of
the module. I assume that under the present design a file extension
cannot be defined to represent two different languages at the same time.

Keep in mind that this is only a configuration convenience, it isn't
part of the HTTP protocol specification itself. If you don't find it
convenient to use this configuration convenience, then there are other
ways. Either the negotiation map, or some custom handler.

If you can devise something better that works under Win32 Apache, I'm
sure they'll be interested to have your user-contributed module. I mean
that seriously, I'm not being ironic.

But in the protocol there is no way for a user to say directly that they
prefer documents that are in both English and German, they can only put
English and German on their accept list; so I don't see a need for any
special machinery that will declare a document to be in both languages
at once.

If you call it foo.html.en.de, then it can match accept-language
requests that include either en or de. Exactly what will happen if you
have foo.html.en, foo.html.de and foo.html.en.de with various
combinations of user request, I don't know, but I don't think anything
spectacularly wrong will happen.

> To have all langauges decoded in the filename could raise some
> problems - either handling problems or OS problems.

I think I have to say here that Apache was originally designed for
unix-like systems. The Win32 port looks a good piece of work but if the
Win filesystem prevents you from doing something, don't blame Apache for
it.

> (also: symlink handling is not on all OSs possible...)

We've discussed that already; my present suggestions didn't use
symlinks as such.

gruesse


Rainer Scherg

unread,
Jan 20, 2000, 3:00:00 AM1/20/00
to
You are too fast in answering 8-/

I've canceled my original messages - because after thinking again
it makes no difference if a file is named "foo.html.en-de" or
"foo.html.en.de". So, I fully aggree to your mail...
(happens sometimes... 8-))


But I've taken a look to the apache source of mod_mime.c.
The reason why apache is only finding one "item" is this call:

/* Parse filename extensions, which can be in any order */
while ((ext = ap_getword(r->pool, &fn, '.')) && *ext) {
[...]
if ((type = ap_table_get(conf->language_types, ext))) {
[...]

The "ap_table_get" routine has to be replaced by a loop thru
conf->language_types to check each language for a specified
mime-extension. I guess "ap_table_get" returns a result on the
first hit.

Programming a loop will cost only some lines of source, getting
a small performance penalty in result. IMO not really necessary
in real life. But the apache behavior (only one language per
extension) should be documented (I didn't take a glance to the
docs...)


cu - rainer


"Alan J. Flavell" wrote:

Alex Brown

unread,
Jan 21, 2000, 3:00:00 AM1/21/00
to
Roland Holzapfel wrote:

> I'm using apache 1.3.9 with it's content negotiation module for auto-
> matic language selection according to a user's browser preferences.

info taken from http://www.apache.org/docs/mod/mod_mime.html#addlanguage

> I tried to just add more "AddLanguage" directives to map the major and
> all sublanguage MIME types to one file extension:
>
> AddLanguage de .de
> AddLanguage de-AT .de

> AddLanguage de-CH .de
> AddLanguage en .en
> AddLanguage en-GB .en
> AddLanguage en-US .en
> AddLanguage fr .fr
>
> This doesn't work. Only the last mapping will be active, which is
> de-CH, en-US and fr an the above example.

for a start, for reasons of syntax you'll want to use en-us and en-gb as
that's what the docs say.

By using the above directive you are telling Apache to treat a .en file as
GB english and then US english, so it serves US English. It is working
perfectly - doing exactly what you tell it to do.

If you create a set of pages for GB English _and_ one for US English with
extensions (in this case .enGB and .enUS)

and use

AddLanguage en-gb .enGB
AddLanguage en-us .enUS

it should work fine. I haven't tried it, but that's what the docs would
suggest to me.

Hope that helps.

Alex.

--
Alex Brown
www.kama-sooty.co.uk

Grzegorz Staniak

unread,
Jan 21, 2000, 3:00:00 AM1/21/00
to
Andreas Prilop wrote:

> > Use


> >
> > AddLanguage de .de
> > Addlanguage de-AT .da-AT
> > AddLanguage de-CH .de-CH
>
> Do you have any *realistic* example where such a pedantic distinction

> should be necessary? Not even books in the "real world" are translated


> between US English and UK English [as far as I known].

English is not the only language spoken on this planet. And AFAIR (I
attended my linguistics classes some 7 years ago now) the differences
between e.g. de-DE amnd de-CH are _huge_ in every respect.

> IMHO, those "sublanguages" are not really necessary - only confusing.

In your opinion.



> --
> Wipe the spammers *.cn, *.hk, *.kr, *.tw off the net!

--
Grzegorz Staniak <gsta...@zagiel.pl>

Andreas Prilop

unread,
Jan 21, 2000, 3:00:00 AM1/21/00
to
In article <38884A89...@zagiel.pl>,
Grzegorz Staniak <gsta...@zagiel.pl> wrote:

> English is not the only language spoken on this planet.

Who claimed otherwise?

> And AFAIR (I
> attended my linguistics classes some 7 years ago now) the differences
> between e.g. de-DE amnd de-CH are _huge_ in every respect.

In _spoken_ language. Max Frisch and Friedrich Dürrenmatt don't come
in separate "DE" versions; and neither does the Neue Zürcher Zeitung.
I think it would be overkill to supply on the WWW separate versions
in de-DE and de-CH or in en-GB and en-US, which differ in "colour"/
"color".
Furthermore, I consider the approach of appending a _country_ name
to a language name principally wrong. For example, the difference
"Samstag"/"Sonnabend" in German cannot be attributed to AT/CH/DE.

Roland Holzapfel

unread,
Jan 21, 2000, 3:00:00 AM1/21/00
to
Hello,

I was impressed on the discussion I started ... ;-)

>> I tried to just add more "AddLanguage" directives to map the major and
>> all sublanguage MIME types to one file extension:
>>
>> AddLanguage de .de
>> AddLanguage de-AT .de
>

>Right, you can't do that.
>
>> Any hints ?
>

>Use
>
>AddLanguage de .de
>Addlanguage de-AT .da-AT
>AddLanguage de-CH .de-CH
>

>etc.

That's not what I want. It's not my problem to need to really have
different variants, nor what mime type a server adds to the documents.

I would like to serve .en documents when somebody requests any english
variant.

People tend to just activate en_US OR de_CH in their browsers, but NOT
ALSO en/de.

Which means they don't get anything they want but the server's default,
because the server only knows plain de, en, fr and so on.

But this probably means to need two Language directives: One which tells
the server the Content-Language according to the file extension, and
another one which maps __requested__ languages to __existing__ file
extensions.
The latter one is missing.

>In situations where you don't have a specific -AT or -CH variant, call
>the generic document e.g foo.html.de.de-AT.de-CH, then MultiViews will
>match any of these.

Tell about 20 people working on PCs to use file extensions like this ...
Even I as a unix-only user wouldn't like to do it like that on about 20k
documents.

Roland

Rainer Scherg

unread,
Jan 21, 2000, 3:00:00 AM1/21/00
to
So we are starting the previous MultiViews discussion again 8-/...
(see deja news archive on subject:
"Multi-lingual Web Site, Content Negotiation and Cookies"...)

It seems to me, that more and more people are starting to use
the MultiViews feature of apache and are all running into
similar problems...

(Serving no 406 status, but DefaultLanguage - or what I would prefer
using LanguagePriority to solve this problem (should mostly take
care of needs to serve also the default language).)

A new "problem" is how apache is refering the mime extension to
a language (1:1) this should be (1:n).


moo - rainer

Alan J. Flavell

unread,
Jan 21, 2000, 3:00:00 AM1/21/00
to
On 21 Jan 2000, Roland Holzapfel wrote:

> >AddLanguage de .de
> >Addlanguage de-AT .da-AT
> >AddLanguage de-CH .de-CH
> >etc.
>
> That's not what I want.

But it leads to one possible way to do what you asked for, using
MultiViews.

> It's not my problem to need to really have different variants,

That's OK. It is still one possible solution to what you asked for.

The other solution is to generate a mapping file.

> I would like to serve .en documents when somebody requests any english
> variant.

I understood this. But the protocol says what it says.

The authoritative rules for language matching are given in RFC2616
section 14.4. Presumably Apache is implementing these rules. I didn't
make these rules, I am only trying to show you what they mean. If you
think they are wrong, you are free to say so, but I personally (of
course) cannot change them for you! If you think I am misunderstanding
the rules, then please _do_ tell me!!

[Digression: en-UK is wrong: the ISO standard, whether we like it or
not, says that GB is the abbreviation for the United Kingdom!! We
are not talking Internet domain names here (although there have been
interminable threads on whether they all ought to be converted from .uk
to .gb for this reason... PLEASE don't pursue this issue here!!!). ]


In the following text from the RFC, "language-range" refers to the
syntax element which appears in the Accept-Language header, such as en,
en-GB, de, de-AT etc.

A language-range matches a language-tag if
it exactly equals the tag, or if it exactly equals a prefix of the
tag such that the first tag character following the prefix is "-".

Please correct me if I am mis-reading this, but my interpretation of
that is

User acceptance: Doc.specification:
"language range" "language tag" Outcome:

en en-GB accept
en en accept
en-GB en-GB accept
en-GB en _refuse_

Also, if they request only en-GB it certainly means they are _refusing_
documents advertised as en-US etc. If they want to accept any kind of
English, the protocol requires them to include 'en' in their accept
list.

> People tend to just activate en_US OR de_CH in their browsers, but NOT
> ALSO en/de.

Well, then they are making a very definite statement by doing that.
If they do not intend to make that statement, they are making a mistake.
I think I have to tell you that by rights, this is the responsibility of
the client agent, as RFC2616 stresses in careful detail:

Note: When making the choice of linguistic preference available to
the user, we remind implementors of the fact that users are not
familiar with the details of language matching as described above,
and should provide appropriate guidance. As an example, users
might assume that on selecting "en-gb", they will be served any
kind of English document if British English is not available. A
user agent might suggest in such a case to add "en" to get the
best matching behavior.

> Which means they don't get anything they want but the server's default,

Well, if the server is working according to spec, they may even get the
406 None Acceptable response (and an invitation to choose). Unfriendly
or not, this is the _correct_ answer to their _misguided_ request.

> because the server only knows plain de, en, fr and so on.

It's OK: I _did_ understand what you wanted to happen in this case, even
though it goes against what the protocol mandates.

> But this probably means to need two Language directives: One which tells
> the server the Content-Language according to the file extension, and
> another one which maps __requested__ languages to __existing__ file
> extensions.

I repeat, you _can_ use the mapping file for this, if you insist that
you will not change the filename-extensions.

But you can also solve the problem by using multiple filename
extensions, one for each language-range that you want the file to match.

> Tell about 20 people working on PCs to use file extensions like this ...

As I said before: it isn't Apache's fault that you find it inconvenient
to use this mechanism. You can still generate a mapping file.

> Even I as a unix-only user wouldn't like to do it like that on about 20k
> documents.

Then write a script that generates a mapping file, as I said before.

I was showing you a way that you can persuade MultiViews to give (at
least approximately) the effect that you asked for. If you don't find
it convenient, then you don't have to use it. Good luck.


Grzegorz Staniak

unread,
Jan 22, 2000, 3:00:00 AM1/22/00
to
Andreas Prilop wrote:

[...]

> > And AFAIR (I
> > attended my linguistics classes some 7 years ago now) the differences
> > between e.g. de-DE amnd de-CH are _huge_ in every respect.
>
> In _spoken_ language.

Not really. It's not a question of an accent, there's a whole lot of
lexical and syntactic differences, not to mention the idiom. Are you
sure all publishers in Switzerland switch to northern, "literary" German
for all kinds of publications?

> Max Frisch and Friedrich Dürrenmatt don't come
> in separate "DE" versions; and neither does the Neue Zürcher Zeitung.
> I think it would be overkill to supply on the WWW separate versions
> in de-DE and de-CH or in en-GB and en-US, which differ in "colour"/
> "color".

Well, do all Swiss web pages contain only literary works? Honestly, I
don't know. In this particular case the differences between national
variants of nominally "just one" language are - IMVHO - sufficient to
justify a certain provision in the protocol and its implementations.
Nobody's forcing anyone to use an obscure protocol feature, it's about
providing means of specifying finer language-resolution for those who
could possibly need such a distinction. If there people who publish
"spoken" Schwyz on the Web, that's enough of a reason for me. And
somehow I have no doubts that German would not be the only language in
quuestion.

> Furthermore, I consider the approach of appending a _country_ name
> to a language name principally wrong. For example, the difference
> "Samstag"/"Sonnabend" in German cannot be attributed to AT/CH/DE.

That's a good point. It's the same with e.g. dialects of Irish, or
northern/southern Welsh.



> --
> Wipe the spammers *.cn, *.hk, *.kr, *.tw off the net!

--
Grzegorz Staniak <gsta...@zagiel.pl>

Nick Kew

unread,
Jan 22, 2000, 3:00:00 AM1/22/00
to
In article <nhtcapri-ya0240800...@newsserver.rrzn.uni-hannover.de>,

>> attended my linguistics classes some 7 years ago now) the differences
>> between e.g. de-DE amnd de-CH are _huge_ in every respect.

I understand from my Zurich colleague that even he has trouble with
DE-CH from the Walise. And when I lived in Nurnberg, their language
didn't sound much like de-DE as I (slightly) knew it.

I'm a Brit, but when I lived in Sheffield I had to work hard to understand
EN-GB-Barnsley (near Sheffield)

> Furthermore, I consider the approach of appending a _country_ name
> to a language name principally wrong. For example, the difference
> "Samstag"/"Sonnabend" in German cannot be attributed to AT/CH/DE.

There are well-defined differences between written English and American
that might justify it (I guess the rest is generalisation). The fact
that the differences within each of the two *spoken* languages are
greater than the difference between the two - in their 'canonical'
forms (insofar as that is a valid concept) is a separate issue.

My browser preferences start with UK english, but put the generic "en"
in fifth position. I still seem to see more US english than anything else.

--
Nick Kew

We're so advanced here ... our nearest main road is called the A 386

Alan J. Flavell

unread,
Jan 22, 2000, 3:00:00 AM1/22/00
to
On Sat, 22 Jan 2000, Nick Kew wrote:

> I understand from my Zurich colleague that even he has trouble with
> DE-CH from the Walise. And when I lived in Nurnberg, their language
> didn't sound much like de-DE as I (slightly) knew it.

(I'm going to stay aloof of this part of the argument, because I don't
think it particularly belongs here, and, now that the HTTP
specifications say what they say, we still have a real problem to
solve, whether or not we believe that it's exactly the problem that we
ought to be solving.)

> My browser preferences start with UK english, but put the generic "en"
> in fifth position. I still seem to see more US english than anything else.

But the problem here is that you and I _know_ that we need to put en
into the list if we decide to set a local preference (let's assume
en-GB for the sake of argument).

RFC2616 says explicitly that it's the job of client agents to help
their users get their preferences set correctly. I could imagine this
being done with a preferences menu that contains lines like

[en - generic English]
[en-GB - British English] [X] fallback to en - generic English

with the fallback pre-checked so that they have to take a definite
action to refuse it.

Also, some (many?) will wish to accept any available language if none
of their explicit preferences is available, so there needs to be a
mechanism for appending "*" to the list.

Do the popular browsers have dialogues that work like this? Do they
heck as like!

And thus we have these discussion threads (two quite recent, this
being one) where authors are trying to pre-empt the user's
inappropriate choice. For perfectly understandable reasons, I
concede: but what they are asking for is technically in violation of
the published specification.

On the other hand, if we don't have any of their preferred languages,
how are we supposed to know which language to present the status-406
page to them in? This is a dilemma, even if we could customise a much
more user-friendly version of the current Apache status 406 page. That
page offers all the _functions_ that it needs to, but it sure won't
win any popularity contests. And hence the reluctance of authors to
let their readers ever get there.

all the best


Rainer Scherg

unread,
Jan 22, 2000, 3:00:00 AM1/22/00
to

I've submitted a small "bugfix" for this problem to the apache bug-db.

But someone has to check the behavior change, when inheriting
AddLanguage config params from server to virtual servers to
per-dir-config... There may be a problem...

The code snipset can (patchfile) can be found on http://bugs.apache.org

- rainer

0 new messages