Questions about Dictionary Files

25 views
Skip to first unread message

Anoop Johnson

unread,
Apr 4, 2011, 5:46:57 PM4/4/11
to open-vcdiff
Hi,

I'm investigating delta compression libraries to compute the delta
between two large binary files. (could be as big as a couple of gigs)
I'm new to vcdiff and open-vcdiff. I've read the vcdiff RFC and some
of the source of open-vcdiff and I don't understand a couple of
things:

1. Terminology: What is a dictionary file? Is this the same as the
source file referred in the RFC? Why use a different terminology?

2. If the above answer is yes, then why is the whole dictionary file
loaded into memory? I'm referring to the method
VCDiffFileBasedCoder::OpenDictionary(). In my case, the source and
target files can be several gigs.

Thanks,
Anoop

open-vcdiff

unread,
Apr 4, 2011, 7:44:10 PM4/4/11
to open-vcdiff
On Apr 4, 2:46 pm, Anoop Johnson <anoop.k.john...@gmail.com> wrote:
> I'm investigating delta compression libraries to compute the delta
> between two large binary files. (could be as big as a couple of gigs)
> I'm new to vcdiff and open-vcdiff. I've read the vcdiff RFC and some
> of the source of open-vcdiff and I don't understand a couple of
> things:
>
> 1. Terminology: What is a dictionary file? Is this the same as the
> source file referred in the RFC? Why use a different terminology?

A dictionary is a fixed source file that is used to encode and decode
a large number of target files.
open-vcdiff is primarily intended for use with the SDCH (Shared-
Dictionary Compression over HTTP) protocol, and the term "dictionary"
comes from SDCH.
This document describes SDCH in detail:
http://sdch.googlegroups.com/web/Shared_Dictionary_Compression_over_HTTP.pdf

> 2. If the above answer is yes, then why is the whole dictionary file
> loaded into memory? I'm referring to the method
> VCDiffFileBasedCoder::OpenDictionary(). In my case, the source and
> target files can be several gigs.

The open-vcdiff decoder currently requires the entire source to be
loaded into contiguous memory, and its encoder always produces delta
windows that use the entire source file as the source window.
Please see open-vcdiff issue 17, which requests support for keeping
only part of the source file in memory:
http://code.google.com/p/open-vcdiff/issues/detail?id=17
Reply all
Reply to author
Forward
0 new messages