On Nov 29, 7:19 pm, Jens Alfke <
j...@mooseyard.com> wrote:
> On Nov 28, 1:00 pm, Ren <
Jared.Willi...@ntlworld.com> wrote:
>
> > Yeah, taking the one time build step route.
> > So can work out all the dictionary COPY opcode offsets & sizes once.
>
> > Written a VCDIFF encoder in PHP, so can generate the sequence
> > COPY and ADD opcodes required.
>
> I was thinking more of generating the page the normal way, then
> feeding the output along with the dictionary into a general-purpose
> delta generator. That wouldn't require modifications to the
> application framework.
>
> However, what you're proposing would be a lot more CPU-efficient. How
> complicated is it to generate the VCDIFF opcodes?
Its not that bad, ~200 lines of PHP code.
Opcode encoding is largely just array lookups, with only address
encoding
required for COPY opcode.
A 2d array for mapping the first opcode, and 3d array for combining
opcodes.
Both of which I pre-generated using the default map from open-vcdiff.
> My only concern
> would be that you'd lose some potential compression in the output
> since you're only focusing on the template substitution and not other
> replacement of repeated strings ... for example, if the body content
> of the page uses the word VCDIFF a lot, a general purpose encoder
> would tokenize it, whereas your special-case one wouldn't.
If a value is used more than once, can detect that and output
a COPY opcode, instead of another ADD. But as Jim pointed
out combining SDCH with gzip (as I do) will reduce redundant data.
Jared