Templating and SDCH

15 views
Skip to first unread message

Ren

unread,
Nov 27, 2008, 10:52:05 AM11/27/08
to SDCH
Hi,

Given alot of web development uses templates, which are essentially
constant strings with placeholders for dynamic values.

It should be possible to compile a template to a dictionary, and use a
SDCH for the dynamic values, and the user agent would piece it
together?

Jared

Jens Alfke

unread,
Nov 28, 2008, 1:21:37 PM11/28/08
to SDCH
On Nov 27, 7:52 am, Ren <Jared.Willi...@ntlworld.com> wrote:
> It should be possible to compile a template to a dictionary, and use a
> SDCH for the dynamic values, and the user agent would piece it
> together?

Yup. Any page generated by that template would make a good dictionary
document; the dynamic values wouldn't matter, all they'd do is make
the dictionary a few bytes longer than necessary. Or there could be a
one-time build step on the server that processes the template with
empty placeholders and saves the resulting HTML to use as the
dictionary.

The processes of generating and then applying the diff will do the
piecing-together. The nice thing is that the software doesn't have to
know anything about the template system, as the VCDIFF algorithms
generate efficient deltas from arbitrary byte streams.

A related issue is localization — typically a template contains macros
to substitute localized strings. For best compression, the list of
localized strings could be appended to the dictionary file; but then
you'd probably want to have a separate dictionary per language.

—Jens

Ren

unread,
Nov 28, 2008, 4:00:31 PM11/28/08
to SDCH


On Nov 28, 6:21 pm, Jens Alfke <j...@mooseyard.com> wrote:
> On Nov 27, 7:52 am, Ren <Jared.Willi...@ntlworld.com> wrote:
>
> > It should be possible to compile a template to a dictionary, and use a
> > SDCH for the dynamic values, and the user agent would piece it
> > together?
>
> Yup. Any page generated by that template would make a good dictionary
> document; the dynamic values wouldn't matter, all they'd do is make
> the dictionary a few bytes longer than necessary. Or there could be a
> one-time build step on the server that processes the template with
> empty placeholders and saves the resulting HTML to use as the
> dictionary.

Yeah, taking the one time build step route.
So can work out all the dictionary COPY opcode offsets & sizes once.

Written a VCDIFF encoder in PHP, so can generate the sequence
COPY and ADD opcodes required.

> A related issue is localization — typically a template contains macros
> to substitute localized strings. For best compression, the list of
> localized strings could be appended to the dictionary file; but then
> you'd probably want to have a separate dictionary per language.

Think it may be easier to have two stages, neutral template, replace
macros
with localized strings, and build the dictionary from it. Think that
would handle
cases where inserted values need to be interchanged. Like ICUs
MessageFormat style strings.

Jared

Jens Alfke

unread,
Nov 29, 2008, 2:19:29 PM11/29/08
to SDCH


On Nov 28, 1:00 pm, Ren <Jared.Willi...@ntlworld.com> wrote:
> Yeah, taking the one time build step route.
> So can work out all the dictionary COPY opcode offsets & sizes once.
>
> Written a VCDIFF encoder in PHP, so can generate the sequence
> COPY and ADD opcodes required.

I was thinking more of generating the page the normal way, then
feeding the output along with the dictionary into a general-purpose
delta generator. That wouldn't require modifications to the
application framework.

However, what you're proposing would be a lot more CPU-efficient. How
complicated is it to generate the VCDIFF opcodes? My only concern
would be that you'd lose some potential compression in the output
since you're only focusing on the template substitution and not other
replacement of repeated strings ... for example, if the body content
of the page uses the word VCDIFF a lot, a general purpose encoder
would tokenize it, whereas your special-case one wouldn't.

—Jens

Jim Roskind

unread,
Nov 29, 2008, 3:23:46 PM11/29/08
to SD...@googlegroups.com
Sdch is generally used in concert with gzip.   Gzip handles repeated strings that are not present in the dictionary.  The most common form of content encoding (using sdch) is then:

Content-Encoding: sdch,gzip

The above means the delta compression was done first, and the resulting data was gzipped, which cleans up an repeated strings.  Decoding, in a browser, is then done in the reverse order (gunzip and then SDCH decoding).

Hope that helps,

Jim

Ren

unread,
Nov 29, 2008, 7:40:18 PM11/29/08
to SDCH


On Nov 29, 7:19 pm, Jens Alfke <j...@mooseyard.com> wrote:
> On Nov 28, 1:00 pm, Ren <Jared.Willi...@ntlworld.com> wrote:
>
> > Yeah, taking the one time build step route.
> > So can work out all the dictionary COPY opcode offsets & sizes once.
>
> > Written a VCDIFF encoder in PHP, so can generate the sequence
> > COPY and ADD opcodes required.
>
> I was thinking more of generating the page the normal way, then
> feeding the output along with the dictionary into a general-purpose
> delta generator. That wouldn't require modifications to the
> application framework.
>
> However, what you're proposing would be a lot more CPU-efficient. How
> complicated is it to generate the VCDIFF opcodes?

Its not that bad, ~200 lines of PHP code.

Opcode encoding is largely just array lookups, with only address
encoding
required for COPY opcode.
A 2d array for mapping the first opcode, and 3d array for combining
opcodes.
Both of which I pre-generated using the default map from open-vcdiff.

> My only concern
> would be that you'd lose some potential compression in the output
> since you're only focusing on the template substitution and not other
> replacement of repeated strings ... for example, if the body content
> of the page uses the word VCDIFF a lot, a general purpose encoder
> would tokenize it, whereas your special-case one wouldn't.

If a value is used more than once, can detect that and output
a COPY opcode, instead of another ADD. But as Jim pointed
out combining SDCH with gzip (as I do) will reduce redundant data.

Jared



Reply all
Reply to author
Forward
0 new messages