Re: content negotiation of included XML documents

22 views
Skip to first unread message

Bill MacArthur

unread,
Sep 28, 2011, 4:37:19 PM9/28/11
to mod-...@googlegroups.com, Carlo Contavalli

On 9/28/2011 3:20 PM, Carlo Contavalli wrote:
> On Wed, Sep 28, 2011 at 12:09 PM, Bill MacArthur<webm...@dhs-club.com> wrote:
>> We developed a lot of XML/XSP content in the past using AxKit on Apache 1.3.
>> Our primary reason for that was we needed to provide translated versions of
>> content and we did not want to duplicate the markup in all that content. XML
>> docs were also easier to provide a translation interface for our
>> translators.
>>
>> We are just getting around to moving to Apache2 and, of course, there is no
>> AxKit for that. I found the mod_xslt package and installed it into a test
>> server and found it seemed to work nicely on a sample page. We have since
>> begun to build out what will become our production machine and I was testing
>> today with an elaborate page which has an XSL pi which is rendered fine.
>> That XSL document also uses XSL includes to pull in common headers and
>> footers. Those XSL documents use XSL document() to pull in XML content. The
>> problem is this, the document specified does not exist literally, only as
>> one of many language extensions.
>>
>> For example, we may specify document(stuff.xml) in our included XSL
>> document. Since we are using content negotiation, all of our content is
>> placed into files like stuff.xml.en stuff.xml.it and so forth. AxKit took
>> care of performing the content negotiation of these resources that were
>> pulled in by the various included XSL templates, even though the resources
>> are local and pulled directly from the file system. Unfortunately, mod_xslt
>> throws an error stating that it cannot find stuff.xml. Is there a way to
>> tell mod_xslt to carry the content negotiation forward as it pulls in
>> subsequent documents?
>
> Very interesting question. The quick answer is that content
> negotiation is not supported right now.
>
> In general, I think the include could fall in 2 cases:
> 1 - needs to include a file from the local filesystem
> 2 - needs to fetch a request via http
>
> In case 1, I believe mod-xslt will try to open the file directly, from
> disk, and will not do anything smart about the language.

I don't exactly recall how AxKit handles these, but I think it performs http requests to localhost and echoes the language preferences header. I realize that building content negotiation into a file opener is way more of a project that can be justified for what is probably an edge case. It would be nice if Apache exposed their content negotiation engine for external use, you know, drop in a file name and let it do the job of returning a file handle or the contents.

>
> In case 2, mod-xslt will perform an http request, which means that
> apache might be able to use the normal content negotiation, if only
> mod-xslt added the correct headers.

This seems like the simplest approach. In mod_perl the header is available on the Apache object. Here is a bit of mod_perl 1 code we had to implement when IE7 went to sending all the Accept-Language headers as it-IT, fr-FR, fr-CH,... That broke our content negotiation since we only had simple 2 character file extensions. This code stripped off the -XX to return to a simple value before Apache began using it in it's content negotiation engine.
sub handler {
my $r = Apache::Request->new(shift);
my $al = $r->header_in("Accept-Language");
return OK unless $al;
$al =~ s/([a-z]{2})-[a-z]*/$1/ig;
$r->header_in("Accept-Language", $al);
$r->header_out( 'Vary',
$r->header_out('Vary') ? $r->header_out('Vary') . ' Accept-Language'
: 'Accept-Language'
);
return OK;
}

Perhaps those headers are easily available when mod_xslt is invoked and could be made available throughout the processing cycle. Including the header/s should not break the request, only add to it's robustness.

>
> I can see how much work it would be to implement it. Can you open an
> issue on github to track this down?

Done.

>
> I wonder if you could use yaslt extensions to check the headers and
> pick the includes yourself? would be a bit more work, but I think
> doable?

I just looked at the docs on that functionality. I will have to experiment.

>
> Also, would that be ok to you if I CCed the message to the mailing
> list in future messages?

I have joined the list and am sending this to it and CCing you.

>
> Thanks,
> Carlo

Thanks for the prompt reply, Carlo. mod_xslt fills what looks like a tremendous void in the area of on-the-fly XML processing. I am hoping that you will have a "voila" moment and discover a quick modification to getting one or both of the solutions you mentioned implemented. In the meanwhile I will experiment with the yaslt.

Bill

Carlo Contavalli

unread,
Sep 28, 2011, 5:37:45 PM9/28/11
to mod-...@googlegroups.com

Indeed. I'll check which APIs apache exposes for this kind of things.
If we're lucky, it may not be that much work.

Yep, this seems easy to implement. What worries me is that internal
http requests are fairly costly right now in apache2. In apache 1.3,
they were handled by invoking internal apache APIs, but with apache2,
they go all the way through sockets and connections.

Might be a good time to have another look at that code.

>
>>
>> I can see how much work it would be to implement it. Can you open an
>> issue on github to track this down?
>
> Done.

Thanks :)!


>> I wonder if you could use yaslt extensions to check the headers and
>> pick the includes yourself? would be a bit more work, but I think
>> doable?
>
> I just looked at the docs on that functionality. I will have to experiment.
>
>>
>> Also, would that be ok to you if I CCed the message to the mailing
>> list in future messages?
>
> I have joined the list and am sending this to it and CCing you.

Great, thanks again.

I'll try to have a look at this in the next few days, and see if we
can come up with something simple but functional.

Thanks,
Carlo

Carlo Contavalli

unread,
Sep 29, 2011, 12:08:44 PM9/29/11
to mod-...@googlegroups.com, webm...@dhs-club.com
On Wed, Sep 28, 2011 at 2:37 PM, Carlo Contavalli
<ccont...@inscatolati.net> wrote:
>> [...]

>> I don't exactly recall how AxKit handles these, but I think it performs http
>> requests to localhost and echoes the language preferences header. I realize
>> that building content negotiation into a file opener is way more of a
>> project that can be justified for what is probably an edge case. It would be
>> nice if Apache exposed their content negotiation engine for external use,
>> you know, drop in a file name and let it do the job of returning a file
>> handle or the contents.
>
> Indeed. I'll check which APIs apache exposes for this kind of things.
> If we're lucky, it may not be that much work.

So, I was looking at the apache2 api and libxml api. I think I have a plan:

- replace the standard entity loader in libxml with a mod-xslt specific
one (xmlSetExternalEntityLoader())
-or-
play some tricks with xmlRegisterInputCallbacks to have mod-xslt
own handlers called when loading external entities.

- use ap_sub_req_lookup_file or ap_sub_req_lookup_uri
from the handler. _lookup_uri will cause an internal request,
with all filters being applied.
lookup_flie will read the file from disk, I believe, but still
apply the apache internal logic. (Eg, if your .htaccess forbids
symlinks, they will be forbidden).

- make sure some of the headers are propagated correctly
in sub-requests. Introduce a paramter to control this?
Likely a good idea, so one can control if cookies or things
like language related headers should propagate in sub
requests.

I think this will be a good idea in general, not just for your use
case. It's probably how it should have been done in the first place.

Currently, for apache2, mod-xslt is leaving the external entity loader
used by libxml entirely alone, which means libxml will use its own
code to go read file from disk, http or ftp. This will also bring
feature parity with 1.3 sapi, finally, which was using sub requests,
and had nice recursion control checks.

Let's see if I can get a new version with this running quickly.

Carlo

Reply all
Reply to author
Forward
0 new messages