streaming API

79 views
Skip to first unread message

Joe

unread,
Oct 21, 2008, 5:25:46 PM10/21/08
to open-vcdiff
The VCDiffStreamingEncoder looks like it requires the entire source
(the data that the delta is being created from) has to be in memory.
The input can be read in chunks with EncodeChunk but not the source.

Is that correct ?

joe

open-vcdiff

unread,
Oct 21, 2008, 5:33:07 PM10/21/08
to open-vcdiff
Hi Joe:
Yes, that's right. The encoder looks for matching strings of bytes
between the entire source file and the current target chunk.

Saludos,

lincoln

Joe

unread,
Oct 21, 2008, 5:50:20 PM10/21/08
to open-vcdiff
Thanks. That means though that it is not usable for large files. You
won't be able to fit the source into memory.

The xdelta library can stream from both source and input. Maybe that
is not a true implementation of the RFC ?

joe

Jens Alfke

unread,
Oct 21, 2008, 6:11:17 PM10/21/08
to open-...@googlegroups.com

On Oct 21, 2008, at 2:50 PM, Joe wrote:

> Thanks. That means though that it is not usable for large files. You
> won't be able to fit the source into memory.

That depends. You can memory-map the source file and pass in the
mapped address space. But depending on the access patterns, that could
become very slow (i.e. random access, or repeatedly scanning the
entire file.)

> The xdelta library can stream from both source and input. Maybe that
> is not a true implementation of the RFC ?

No, they're both compatible. The VCDIFF RFC only defines the data
format of the diff (as a list of "patching" instructions) and the
algorithm for applying the diff to the original file. It doesn't
specify any algorithm for creating the diff, and there are many
different ways to create one between two files, with varying degrees
of efficiency in time and space. [The same is true of MP3 encoding, if
you're familiar with that.] Open-VCDIFF and Xdelta3 are probably using
significantly different encoding algorithms.

—Jens

open-vcdiff

unread,
Oct 21, 2008, 6:33:37 PM10/21/08
to open-vcdiff
On Oct 21, 2:50 pm, Joe <lordje...@gmail.com> wrote:
> Thanks. That means though that it is not usable for large files. You
> won't be able to fit the source into memory.
> The xdelta library can stream from both source and input.

This limitation makes Xdelta a better choice for applications that
need to handle large input files, at least for now.

Another user reported open-vcdiff Issue 16 (http://code.google.com/p/
open-vcdiff/issues/detail?id=16) because open-vcdiff cannot handle
input files larger than 4 GB. As I mentioned in my comments on that
issue:
"open-vcdiff was not originally designed with very large input sets in
mind, but rather as a tool for implementing the SDCH protocol for
moderately-sized HTTP responses. In that context, 2GB was seen as an
ample limit. I would like to make open-vcdiff as useful as possible
for as many people as possible. Generalizing it to handle very large
input and output files (so that it can be applied, for example, to
revision control of huge text files) will be a good step towards that
goal."

Please feel free to open a new issue at http://code.google.com/p/open-vcdiff/issues/,
requesting that the encoder be able to stream very large source files
without keeping them entirely in memory. You can even check out the
source code at http://code.google.com/p/open-vcdiff/source/checkout
and try adding large file support yourself if you like.

Saludos,

lincoln

Joe

unread,
Oct 21, 2008, 6:49:36 PM10/21/08
to open-vcdiff
Thanks I've added the issue.

On Oct 21, 3:33 pm, open-vcdiff <openvcd...@gmail.com> wrote:
> On Oct 21, 2:50 pm, Joe <lordje...@gmail.com> wrote:
>
> > Thanks. That means though that it is not usable for large files. You
> > won't be able to fit the source into memory.
> > The xdelta library can stream from both source and input.
>
> This limitation makes Xdelta a better choice for applications that
> need to handle large input files, at least for now.
>
> Another user reported open-vcdiff Issue 16 (http://code.google.com/p/
> open-vcdiff/issues/detail?id=16) because open-vcdiff cannot handle
> input files larger than 4 GB.  As I mentioned in my comments on that
> issue:
> "open-vcdiff was not originally designed with very large input sets in
> mind, but rather as a tool for implementing the SDCH protocol for
> moderately-sized HTTP responses.  In that context, 2GB was seen as an
> ample limit.  I would like to make open-vcdiff as useful as possible
> for as many people as possible.  Generalizing it to handle very large
> input and output files (so that it can be applied, for example, to
> revision control of huge text files) will be a good step towards that
> goal."
>
> Please feel free to open a new issue athttp://code.google.com/p/open-vcdiff/issues/,
> requesting that the encoder be able to stream very large source files
> without keeping them entirely in memory.  You can even check out the
> source code athttp://code.google.com/p/open-vcdiff/source/checkout

LCID Fire

unread,
Jun 6, 2013, 5:40:07 AM6/6/13
to open-...@googlegroups.com
This is a shocking show stopper. Maybe you should put this limitation on the front page, since I think the use case is quite common.

open-vcdiff

unread,
Jun 10, 2013, 3:19:04 PM6/10/13
to open-...@googlegroups.com
On Thursday, June 6, 2013 2:40:07 AM UTC-7, LCID Fire wrote:
This is a shocking show stopper. Maybe you should put this limitation on the front page, since I think the use case is quite common.

I've added the following text to the front page:  "This implementation requires that the entire contents of the source file (also known as dictionary file) be loaded into memory.  Therefore it cannot handle source files larger than 2-4 GB, depending on the particular restrictions of the OS.  The target and delta file sizes do not have this limitation."

Thanks much for your feedback!

Saludos,
Lincoln
Reply all
Reply to author
Forward
0 new messages