On Apr 4, 2:46 pm, Anoop Johnson <
anoop.k.john...@gmail.com> wrote:
> I'm investigating delta compression libraries to compute the delta
> between two large binary files. (could be as big as a couple of gigs)
> I'm new to vcdiff and open-vcdiff. I've read the vcdiff RFC and some
> of the source of open-vcdiff and I don't understand a couple of
> things:
>
> 1. Terminology: What is a dictionary file? Is this the same as the
> source file referred in the RFC? Why use a different terminology?
A dictionary is a fixed source file that is used to encode and decode
a large number of target files.
open-vcdiff is primarily intended for use with the SDCH (Shared-
Dictionary Compression over HTTP) protocol, and the term "dictionary"
comes from SDCH.
This document describes SDCH in detail:
http://sdch.googlegroups.com/web/Shared_Dictionary_Compression_over_HTTP.pdf
> 2. If the above answer is yes, then why is the whole dictionary file
> loaded into memory? I'm referring to the method
> VCDiffFileBasedCoder::OpenDictionary(). In my case, the source and
> target files can be several gigs.
The open-vcdiff decoder currently requires the entire source to be
loaded into contiguous memory, and its encoder always produces delta
windows that use the entire source file as the source window.
Please see open-vcdiff issue 17, which requests support for keeping
only part of the source file in memory:
http://code.google.com/p/open-vcdiff/issues/detail?id=17