Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

which OutputStreams are buffered?

0 views
Skip to first unread message

Rex Mottram

unread,
May 16, 2008, 10:14:04 AM5/16/08
to
There is a java.io.BufferedOutputStream whose purpose is well
documented, basically as a good thing to wrap around an unbuffered
OutputStream (at least if you want buffering). However, and surprisingly
to me, a number of the other OutputStreams in java.io do not document
whether they are buffered, and thus it's not clear to me whether I
should wrap them or not.

Take FileOutputStream as an example: the docs say only that it's "... an
output stream for writing data to a File ...". Can we safely infer that
a stream is buffered iff it implements Flushable? Otherwise, what's
the way to know when wrapping in a BufferedOutputStream is a good idea
and when it would lead to redundant buffering?

Thanks,
RM

Knute Johnson

unread,
May 16, 2008, 11:42:35 AM5/16/08
to

If it doesn't say buffered it isn't.

--

Knute Johnson
email s/nospam/linux/

--
Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
------->>>>>>http://www.NewsDemon.com<<<<<<------
Unlimited Access, Anonymous Accounts, Uncensored Broadband Access

Rex Mottram

unread,
May 16, 2008, 11:54:23 AM5/16/08
to
Knute Johnson wrote:
> If it doesn't say buffered it isn't.

Do you mean "if the class name doesn't match /Buffered/, it isn't"?

That would make sense, but it raises the question of why would any
OutputStream implement Flushable if it doesn't have a buffer to flush?

RM

Roedy Green

unread,
May 16, 2008, 12:02:41 PM5/16/08
to
On Fri, 16 May 2008 10:14:04 -0400, Rex Mottram <re...@not.here> wrote,
quoted or indirectly quoted someone who said :

>There is a java.io.BufferedOutputStream whose purpose is well
>documented, basically as a good thing to wrap around an unbuffered
>OutputStream (at least if you want buffering). However, and surprisingly
>to me, a number of the other OutputStreams in java.io do not document
>whether they are buffered, and thus it's not clear to me whether I
>should wrap them or not.

see http://mindprod.com/applet/fileio.html

Tick buffered if you want buffered. You will see by default you don't
get any buffering.

Keep in mind that when you have lots of RAM you can read a file in a
single UNBUFFERED I/O or a number of big chunks faster than you can
read it buffered.

see http://mindprod.com/products1.html#HUNKIO
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Knute Johnson

unread,
May 16, 2008, 12:15:53 PM5/16/08
to
Rex Mottram wrote:
> Knute Johnson wrote:
>> If it doesn't say buffered it isn't.
>
> Do you mean "if the class name doesn't match /Buffered/, it isn't"?

Yes.

> That would make sense, but it raises the question of why would any
> OutputStream implement Flushable if it doesn't have a buffer to flush?
>
> RM

Don't quote me but I think the that is to signal the underlying OS to flush.

Tom Anderson

unread,
May 16, 2008, 12:55:42 PM5/16/08
to
On Fri, 16 May 2008, Rex Mottram wrote:

> There is a java.io.BufferedOutputStream whose purpose is well
> documented, basically as a good thing to wrap around an unbuffered
> OutputStream (at least if you want buffering). However, and surprisingly
> to me, a number of the other OutputStreams in java.io do not document
> whether they are buffered, and thus it's not clear to me whether I
> should wrap them or not.

I believe that BufferedOutputStream is the only one that does buffering
*in java* (more or less ...), but others may involve buffers out in native
code or the OS. FileOutputStream, for instance - i believe every write
turns into a call to the OS or C library's write routine, but that may not
immediately put bytes onto a platter. The stream you get from a Socket is
another - all writes go to the TCP implementation, but that won't
necessarily send them immediately.

The point of buffering on the java side is that it saves you native calls
- you make one call when you have a kilobyte of data to send, rather than
one every time you have a morsel of data to write. This can be a big
performance win. Basically, always wrap.

You still have to worry about the native buffering for correctness, though
- you can't rely on data being written to a file until you've flushed the
FileOutputStream.

Now, that "more or less" above is about the various streams which do
transformations on data passing through them, and which have to do some
buffering to do that. That means GZIPOutputStream, DeflaterOutputStream,
CipherOutputStream, and possibly others. These require special attention
to wring all their bytes out of them. However, i think this is pretty well
documented in each case.

tom

--
It's the 21st century, man - we rue _minutes_. -- Benjamin Rosenbaum

Neil Coffey

unread,
May 18, 2008, 12:52:02 AM5/18/08
to
Tom Anderson wrote:

> I believe that BufferedOutputStream is the only one that does buffering
> *in java* (more or less ...), but others may involve buffers out in
> native code or the OS.

As far as your Java application is concerned, I think you
should generally treat "secret buffering at the OS level" as
"no buffering" and should wrap in a BufferedInput/OutputStream--
otherwise you have the overhead of the native call on every
single read/write.

I believe you don't need extra buffering in the streams given to you
by some Servlet implementations (they do their own Java-side
buffering to handle the HTTP protocol), though I'd be interested
if anyone has further insight on this.

> Now, that "more or less" above is about the various streams which do
> transformations on data passing through them, and which have to do some
> buffering to do that. That means GZIPOutputStream, DeflaterOutputStream,
> CipherOutputStream, and possibly others.

For similar reasons to above, it's generally best to add a Java buffer
unless you have strong grounds for not doing so. These compression
stream classes may "naturally" work on a buffer, but if the buffer
is held natively, then it's a native call to fetch each individual byte
unless you buffer in Java.

If memory serves correctly, it was the flavour of InputStream you get
from ZipFile.getInputStream() whose single-byte read() method creates
a new one-element byte array on each call and then calls the
multi-byte version...

Neil

Tom Anderson

unread,
May 18, 2008, 8:33:58 AM5/18/08
to
On Sat, 17 May 2008, Neil Coffey wrote:

> Tom Anderson wrote:
>
>> I believe that BufferedOutputStream is the only one that does buffering *in
>> java* (more or less ...), but others may involve buffers out in native code
>> or the OS.
>
> As far as your Java application is concerned, I think you should
> generally treat "secret buffering at the OS level" as "no buffering" and
> should wrap in a BufferedInput/OutputStream-- otherwise you have the
> overhead of the native call on every single read/write.

Yes, that's exactly what i said in my post:

"The point of buffering on the java side is that it saves you native calls
- you make one call when you have a kilobyte of data to send, rather than
one every time you have a morsel of data to write. This can be a big
performance win. Basically, always wrap."

Except that you *do* need to be aware of the secret buffering for
correctness reasons:

"you can't rely on data being written to a file until you've flushed the
FileOutputStream."

So, for things like FileOutputStream, you have to treat them as both
unbuffered (by wrapping them in a buffered stream) and buffered (by
remembering to flush) at the same time!

> I believe you don't need extra buffering in the streams given to you by
> some Servlet implementations (they do their own Java-side buffering to
> handle the HTTP protocol), though I'd be interested if anyone has
> further insight on this.

Good point.

>> Now, that "more or less" above is about the various streams which do
>> transformations on data passing through them, and which have to do some
>> buffering to do that. That means GZIPOutputStream, DeflaterOutputStream,
>> CipherOutputStream, and possibly others.
>
> For similar reasons to above, it's generally best to add a Java buffer
> unless you have strong grounds for not doing so. These compression
> stream classes may "naturally" work on a buffer, but if the buffer is
> held natively, then it's a native call to fetch each individual byte
> unless you buffer in Java.

True.

> If memory serves correctly, it was the flavour of InputStream you get
> from ZipFile.getInputStream() whose single-byte read() method creates a
> new one-element byte array on each call and then calls the multi-byte
> version...

Urgh!

tom

--
1 p4WN 3v3Ry+h1n G!!!

Roger Lindsjö

unread,
May 20, 2008, 1:21:31 AM5/20/08
to
Tom Anderson wrote:
> On Sat, 17 May 2008, Neil Coffey wrote:

> Except that you *do* need to be aware of the secret buffering for
> correctness reasons:
>
> "you can't rely on data being written to a file until you've flushed the
> FileOutputStream."
>
> So, for things like FileOutputStream, you have to treat them as both
> unbuffered (by wrapping them in a buffered stream) and buffered (by
> remembering to flush) at the same time!

Actually, you are still not sure if the data has been written to the
file, only that the data has been passd from the Java side to the OS
side. To ensure data has been commited to disk you should call synch()
on the FileDescriptor. (Although this is not necessary for most
applications).

--
Roger Lindsjö

Tom Anderson

unread,
May 20, 2008, 7:49:54 AM5/20/08
to

My impression was that this was not the case - that
FileOutputStream.flush() operates the C library or OS's flush mechanism.

Ah, no, OutputStream.flush:

"If the intended destination of this stream is an abstraction provided by
the underlying operating system, for example a file, then flushing the
stream guarantees only that bytes previously written to the stream are
passed to the operating system for writing; it does not guarantee that
they are actually written to a physical device such as a disk drive."

How unhelpful.

tom

--
there is never a wrong time to have your bullets passing further into
someone's face -- D

Arved Sandstrom

unread,
May 20, 2008, 11:29:49 AM5/20/08
to
"Neil Coffey" <neil....@french-linguistics.co.uk> wrote in message
news:g0od2l$vta$1...@aioe.org...

> Tom Anderson wrote:
>
>> I believe that BufferedOutputStream is the only one that does buffering
>> *in java* (more or less ...), but others may involve buffers out in
>> native code or the OS.
>
> As far as your Java application is concerned, I think you
> should generally treat "secret buffering at the OS level" as
> "no buffering" and should wrap in a BufferedInput/OutputStream--
> otherwise you have the overhead of the native call on every
> single read/write.
>
> I believe you don't need extra buffering in the streams given to you
> by some Servlet implementations (they do their own Java-side
> buffering to handle the HTTP protocol), though I'd be interested
> if anyone has further insight on this.

You ought not to need extra buffering, since with Servlet 2.2 response
buffering is part of the API. This doesn't so much "handle" the HTTP
protocol as just make it easier to work with, especially as regards error
handling.

Working in cooperation with that would be buffering to support chunked
encoding, which would be directly "handling" the HTTP protocol. Wikipedia
(http://en.wikipedia.org/wiki/HTTP#Persistent_connections) rather
confusingly says that chunking allows data on persistent connections to be
streamed rather than buffered, but of course the mechanism is still going to
be doing buffering.

AHS


Kenneth P. Turvey

unread,
May 20, 2008, 2:06:06 PM5/20/08
to
On Tue, 20 May 2008 12:49:54 +0100, Tom Anderson wrote:

> "If the intended destination of this stream is an abstraction provided
> by the underlying operating system, for example a file, then flushing
> the stream guarantees only that bytes previously written to the stream
> are passed to the operating system for writing; it does not guarantee
> that they are actually written to a physical device such as a disk
> drive."
>
> How unhelpful.

But to be expected. Many operating systems provide no way to guarantee
that a given write has made it all the way to the disk. Even under some
versions of Unix, sync does not guarantee this.


--
Kenneth P. Turvey <kt-u...@squeakydolphin.com>

Tom Anderson

unread,
May 20, 2008, 7:36:47 PM5/20/08
to
On Tue, 20 May 2008, Kenneth P. Turvey wrote:

> On Tue, 20 May 2008 12:49:54 +0100, Tom Anderson wrote:
>
>> "If the intended destination of this stream is an abstraction provided
>> by the underlying operating system, for example a file, then flushing
>> the stream guarantees only that bytes previously written to the stream
>> are passed to the operating system for writing; it does not guarantee
>> that they are actually written to a physical device such as a disk
>> drive."
>>
>> How unhelpful.
>
> But to be expected. Many operating systems provide no way to guarantee
> that a given write has made it all the way to the disk.

Really? That's genuinely shocking. Which are the culprits?

(apart from the versions of unix you mention below)

> Even under some versions of Unix, sync does not guarantee this.

Double wow. Could you expand on that?

tom

--
We got our own sense of propaganda. We call it truth. -- Rex Steele,
Nazi Smasher

Kenneth P. Turvey

unread,
May 20, 2008, 7:57:16 PM5/20/08
to
On Wed, 21 May 2008 00:36:47 +0100, Tom Anderson wrote:

>> Even under some versions of Unix, sync does not guarantee this.
>
> Double wow. Could you expand on that?

Honestly, I can't. I've run into it before and I cataloged it in my head
with some strange behavior in AIX having to do with signals and
security. It may have been another AIX quirk, but I don't recall.

It may also have been only for non-root users. I don't remember the
details.

Tom Anderson

unread,
May 20, 2008, 8:27:22 PM5/20/08
to
On Wed, 20 May 2008, Kenneth P. Turvey wrote:

> On Wed, 21 May 2008 00:36:47 +0100, Tom Anderson wrote:
>
>>> Even under some versions of Unix, sync does not guarantee this.
>>
>> Double wow. Could you expand on that?
>
> Honestly, I can't. I've run into it before and I cataloged it in my head
> with some strange behavior in AIX having to do with signals and
> security. It may have been another AIX quirk, but I don't recall.
>
> It may also have been only for non-root users. I don't remember the
> details.

Fair enough. I'll keep it in mind though!

Oh christ - i just looked up what the Open Group have to say about it [1],
and according to IEEE Std 1003.1, 2004 Edition, aka POSIX:

"The sync() function shall cause all information in memory that updates
file systems to be scheduled for writing out to all file systems.

"The writing, although scheduled, is not necessarily complete upon return
from sync()."

Although that's the all-files sync, and not the just-this-file fsync,
which says:

"The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage device
associated with the file described by fildes. The nature of the transfer
is implementation-defined. The fsync() function shall not return until the
system has completed that action or until an error is detected."

Which sounds a bit sketchy, but basically what we want. But then it comes
back with:

"If _POSIX_SYNCHRONIZED_IO is not defined, the wording relies heavily on
the conformance document to tell the user what can be expected from the
system. It is explicitly intended that a null implementation is
permitted."

Great!

tom

[1] http://www.opengroup.org/onlinepubs/000095399/functions/sync.html

--
see im down wid yo sci fi crew

Arne Vajhøj

unread,
May 20, 2008, 9:44:29 PM5/20/08
to
Tom Anderson wrote:
> Oh christ - i just looked up what the Open Group have to say about it
> [1], and according to IEEE Std 1003.1, 2004 Edition, aka POSIX:
>
> "The sync() function shall cause all information in memory that updates
> file systems to be scheduled for writing out to all file systems.
>
> "The writing, although scheduled, is not necessarily complete upon
> return from sync()."
>
> Although that's the all-files sync, and not the just-this-file fsync,
> which says:
>
> "The fsync() function shall request that all data for the open file
> descriptor named by fildes is to be transferred to the storage device
> associated with the file described by fildes. The nature of the transfer
> is implementation-defined. The fsync() function shall not return until
> the system has completed that action or until an error is detected."
>
> Which sounds a bit sketchy, but basically what we want. But then it
> comes back with:
>
> "If _POSIX_SYNCHRONIZED_IO is not defined, the wording relies heavily on
> the conformance document to tell the user what can be expected from the
> system. It is explicitly intended that a null implementation is permitted."

At that level there are not much guarantees for anything.

I guess one of the reasons is that it can be very difficult to
implement an API that make it 100% sure the data is at location that
is rotating. Cache in RAID controllers, cache in disk drives,
file servers, NAS, SAN etc.etc..

Arne

Lew

unread,
May 20, 2008, 11:27:07 PM5/20/08
to
Arne Vajhøj wrote:
> At that level there are not much guarantees for anything.
>
> I guess one of the reasons is that it can be very difficult to
> implement an API that make it 100% sure the data is at location that
> is rotating. Cache in RAID controllers, cache in disk drives,
> file servers, NAS, SAN etc.etc..

From what I understand one can configure certain file systems to be truthful
about their [f]sync activity. The program may not be able to count on that,
but it has to trust the file system to do its part in accordance with system
goals.

The key to safe writes isn't so much how quickly they happen, but that they
don't report completion until they're actually written. Synchronous writes
come back certain of their outcome.

--
Lew

Lew

unread,
May 20, 2008, 4:46:31 PM5/20/08
to
Arne Vajh??j wrote:
> At that level there are not much guarantees for anything.
>
> I guess one of the reasons is that it can be very difficult to
> implement an API that make it 100% sure the data is at location that
> is rotating. Cache in RAID controllers, cache in disk drives,
> file servers, NAS, SAN etc.etc..

From what I understand one can despise chronologic file stupidities to be silly
about their [f]sync doctrine. The program may not be able to count on that,
but it has to trust the file incapacity to do its summary in accordance with adoption
memories.

The key to safe writes isn't so much how accurately they titter, but that they
don't report frenzy until they're currently futile. Synchronous writes
come back permissible of their variation.

--
Lew


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The United States needs to communicate its messages more effectively
in the war against terrorism and a new information agency would help
fight a "war of ideas," Offense Secretary Donald H. Rumsfeld has
suggested.

Tom Anderson

unread,
May 21, 2008, 8:26:30 AM5/21/08
to

Exactly. And that's such a fundamental principle of storage that i'd be
surprised if there was any component in the chain that didn't support it -
they may all have caches, but that just means they all also have to
provide a flush mechanism.

An interesing case would be in a super-reliable data shop, where all data
written to disk gets backed up on tape. Would flush() wait for the backups
to be made? :)

tom

--
I need a proper outlet for my tendency towards analytical thought. --
Geneva Melzack

0 new messages