[Boost-users] gzip_compressor cannot be flushed and closed properly

511 views
Skip to first unread message

Mengda Wu

unread,
Feb 9, 2008, 3:48:28 AM2/9/08
to boost...@lists.boost.org
Hi,

   I have a code block trying to output a series of gzip files. I wish to have all the files flushed after the block.
But the files can only be flushed after exiting the whole program.
The files still have zero size after I try to close them. I am using std::vector to store all the pointers to boost filtering_ostream.
Can you help?

  //Open iostreams
 
  char filename[20];
  std::vector<boost::iostreams::filtering_ostream *> os_vector;
  for(i=0; i<4; i++)
  {
     sprintf(filename, "file_%d.gz", i );
     std::ofstream *of = new std::ofstream(filename, std::ios_base::binary);
     boost::iostreams::filtering_ostream* os = new boost::iostreams::filtering_ostream;
     os->push(boost::iostreams::gzip_compressor());
     os->push(*of);
     os_vector->push_back(os);
  }

  //Output something
  for(i=0; i<4; i++)
  {
     boost::iostreams::filtering_ostream* os = os_vector[i];
     os<<"Output something here" << std::endl;
  }

  //Close streams
  for(i=0; i<4; i++)
  {
    boost::iostreams::filtering_ostream* os = os_vector[i];
    os->strict_sync();
    os->pop();
    os->reset();
  }

Thanks,
Mengda

eg

unread,
Feb 18, 2008, 6:35:47 PM2/18/08
to boost...@lists.boost.org

I haven't tried out your sample...
but I am not sure if gzip filters are flushable?


_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Mengda Wu

unread,
Feb 21, 2008, 3:04:04 AM2/21/08
to eg, boost...@lists.boost.org
Hi,

   I am trying to save gzip files and close them in my program without quitting it. And I would like
to read these files using another program at the same time. The problem is I cannot access the gzip
files unless I quit my program. Do you know whether I can properly save and close the gzip files with
boost iostreams?

Thanks,
Mengda

2008/2/18, eg <ego...@gmail.com>:

eg

unread,
Feb 21, 2008, 5:22:46 PM2/21/08
to boost...@lists.boost.org
Mengda Wu wrote:
> Hi,
>
> I am trying to save gzip files and close them in my program without
> quitting it. And I would like
> to read these files using another program at the same time. The problem
> is I cannot access the gzip
> files unless I quit my program. Do you know whether I can properly save
> and close the gzip files with
> boost iostreams?
>

The following works for me when I call it in a function using boost
1.33.1 (in Windows XP):

using namespace std;
namespace io = boost::iostreams;

std::ifstream ifs(infile, std::ios_base::in | std::ios::binary);
io::filtering_ostream out;

out.push(io::gzip_compressor());
out.push( io::file_sink(outfile, ios_base::out | ios_base::binary));
out << ifs.rdbuf();
ifs.close();
out.flush();
out.reset();

// After the reset, the output file is closed.

Mengda Wu

unread,
Feb 25, 2008, 2:18:12 AM2/25/08
to eg, boost...@lists.boost.org
The code works for me. Thanks,

Mengda

2008/2/21, eg <ego...@gmail.com>:

Jonathan Turkanis

unread,
Feb 26, 2008, 10:38:18 PM2/26/08
to boost...@lists.boost.org
Hi Mengda,

I'm sorry I didn't see this post sooner. If you include the library name
in your message subject it is more likely that the library author will
respond quickly.

Mengda Wu wrote:
> Hi,
>
> I have a code block trying to output a series of gzip files. I wish
> to have all the files flushed after the block.
> But the files can only be flushed after exiting the whole program.
> The files still have zero size after I try to close them. I am using
> std::vector to store all the pointers to boost filtering_ostream.
> Can you help?

What you are noticing is that data is not being written to disk until
the streams are closed. This is not actually a bug, as I will explain,
but it still may warrant a change to the library.

flush() is just a suggestion; in general you can't force a filter to
output all the filtered data it is currently storing, except at the end
of the stream. There may be internal constraints, depending on the
format of the data output by the filter, that dictate when new data is
available for flushing. For example, an encryption filter might not be
able to output any new characters until its input has length equal to a
multiple of its block size, or until EOF occurs.

In the case of the gzip filters, Boost.Iostreams simply lets zlib
determine when new characters in the filtered sequence are available.
In your example, the compressed text is very short (21 characters) and
it looks like zlib is simply waiting for more input before it spits
anything out. When I run your example with 250K of uncompressed data,
there is output written to disk before the streams are closed.

I have opened a ticket (http://svn.boost.org/trac/boost/ticket/1656)
raising the question whether symmetric filters (including gzip) should
attempt to force the underlying filtering algorithm to spit out as many
bytes as possible when flush() is called.

> //Open iostreams
>
> char filename[20];
> std::vector<boost::iostreams::filtering_ostream *> os_vector;
> for(i=0; i<4; i++)
> {
> sprintf(filename, "file_%d.gz", i );
> std::ofstream *of = new std::ofstream(filename, std::ios_base::binary);
> boost::iostreams::filtering_ostream* os = new
> boost::iostreams::filtering_ostream;
> os->push(boost::iostreams::gzip_compressor());
> os->push(*of);
> os_vector->push_back(os);
> }
>
> //Output something
> for(i=0; i<4; i++)
> {
> boost::iostreams::filtering_ostream* os = os_vector[i];
> os<<"Output something here" << std::endl;
> }
>
> //Close streams
> for(i=0; i<4; i++)
> {
> boost::iostreams::filtering_ostream* os = os_vector[i];
> os->strict_sync();
> os->pop();
> os->reset();
> }

There are several other problems with this code.

First, os_vector is not a pointer, to os_vector->push_back(os); you
should use os_vector.push_back(os). Second, the dynamically allocated
ofstreams are leaked; when you add them to a filtering stream, it does
not take ownership of them; it merely stores a reference. Third, the
dynamically allocated filtering_ostreams are in danger of being leaked
if an exception is thrown by any of the code following the allocation;
you should consider some other method of storing the streams -- possibly
using a ptr_vector (http://tinyurl.com/3x4yor). Fourth, it is useless to
call pop() immeditately before reset(): pop() removes the last element
in a chain, while reset removes all the elements in a chain.

> Thanks,
> Mengda

Best Regards,

--
Jonathan Turkanis
CodeRage
http://www.coderage.com

Jonathan Turkanis

unread,
Feb 27, 2008, 6:58:04 PM2/27/08
to boost...@lists.boost.org
Jonathan Turkanis wrote:

> What you are noticing is that data is not being written to disk until
> the streams are closed. This is not actually a bug, as I will explain,
> but it still may warrant a change to the library.

Actually, while everything I said is correct if you are using the Boost
trunk, if you are using 1.34.1, the problem is that the dynamically
allocated fstreams are not being closed until their destructors are
called at program termination.

You can fix this by:

i. changing the way you store the ofstreams, so their destructors are
called earlier;
ii. manually closing the ofstreams after you are done writing to them; or
iii. switching to Boost.Iostreams file_sinks or file_descriptor_sinks.

As I mentioned at the end of my last post, if you continue to use
fstreams, you should change the way you store them, since currently they
represent a resource leak.

Best,

Reply all
Reply to author
Forward
0 new messages