Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What replaces StringBufferInputStream

207 views
Skip to first unread message

Patricia Shanahan

unread,
Aug 28, 2006, 8:01:19 PM8/28/06
to
I need to generate an InputStream from a String containing some test data.

StringBufferInputStream is deprecated, and the documentation points to
StringReader.

However, after looking through java.io several times, I have not found
how to construct an InputStream from a Reader.

What is the proper, undeprecated, replacement code for:

InputStream in = new StringBufferInputStream(someString);

Thanks,

Patricia

Arne Vajhøj

unread,
Aug 28, 2006, 8:12:10 PM8/28/06
to

InputStream in = new ByteArrayInputStream(someString.getBytes(encoding));

must be a candidate.

Arne

Patricia Shanahan

unread,
Aug 28, 2006, 9:32:38 PM8/28/06
to

Thanks.

That works, and gets rid of the warnings. But why does the
StringBufferInputStream documentation say "As of JDK 1.1, the preferred
way to create a stream from a string is via the StringReader class." if
StringReader cannot do StringBufferInputReader's job?

Patricia

Arne Vajhøj

unread,
Aug 28, 2006, 9:42:21 PM8/28/06
to
Patricia Shanahan wrote:
> That works, and gets rid of the warnings. But why does the
> StringBufferInputStream documentation say "As of JDK 1.1, the preferred
> way to create a stream from a string is via the StringReader class." if
> StringReader cannot do StringBufferInputReader's job?

I dont know.

A guess could be that a typical usage of StringBufferInputReader
was to wrap it in an InputStreamReader.

But this is pure guessing.

Arne

John W. Kennedy

unread,
Aug 28, 2006, 10:09:44 PM8/28/06
to

From Java's viewpoint, you're looking at it wrong. In the normal case,
since 1.1, Java doesn't want you to use an InputStream for anything, but
a Reader (unless you need a DataInputStream).

Reader in = new StringReader(someString);

--
John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
-- Charles Williams. "Taliessin through Logres: Prelude"

Patricia Shanahan

unread,
Aug 28, 2006, 11:02:13 PM8/28/06
to
John W. Kennedy wrote:
> Patricia Shanahan wrote:
>> I need to generate an InputStream from a String containing some test
>> data.
>>
>> StringBufferInputStream is deprecated, and the documentation points to
>> StringReader.
>>
>> However, after looking through java.io several times, I have not found
>> how to construct an InputStream from a Reader.
>>
>> What is the proper, undeprecated, replacement code for:
>>
>> InputStream in = new StringBufferInputStream(someString);
>
> From Java's viewpoint, you're looking at it wrong. In the normal case,
> since 1.1, Java doesn't want you to use an InputStream for anything, but
> a Reader (unless you need a DataInputStream).
>
> Reader in = new StringReader(someString);
>

I'm writing a unit test for a class that reads from InputStream. The
string contains the test data I want it to operate on.

The class I'm testing is intended to deal with the stdout and stderr
streams from a java.lang.Process job, and java.lang.Process does not
seem to have heard of Reader.

Patricia

Chris Uppal

unread,
Aug 29, 2006, 3:56:37 AM8/29/06
to
John W. Kennedy wrote:

> From Java's viewpoint, you're looking at it wrong. In the normal case,
> since 1.1, Java doesn't want you to use an InputStream for anything, but
> a Reader (unless you need a DataInputStream).

Maybe that came out wrong, but taken literally it is completely false.
InputStreams are central to the IO architecture.

-- chris


Chris Uppal

unread,
Aug 29, 2006, 4:40:42 AM8/29/06
to
Patricia Shanahan wrote:

> But why does the
> StringBufferInputStream documentation say "As of JDK 1.1, the preferred
> way to create a stream from a string is via the StringReader class." if
> StringReader cannot do StringBufferInputReader's job?

My guess is that they intended to create in InputStream-flavoured wrapper for
Readers, but never got around to it. Probably they argued for a few days about
what to call it, but could neither stomach ReaderInputStream nor agree on
anything else, and just quietly dropped it.

"Nobody's going to need it anyway"

;-)

-- chris


Robert Klemme

unread,
Aug 29, 2006, 4:41:26 AM8/29/06
to
On 29.08.2006 03:32, Patricia Shanahan wrote:
> Arne Vajhøj wrote:
>> Patricia Shanahan wrote:
>>> I need to generate an InputStream from a String containing some test
>>> data.
>>>
>>> StringBufferInputStream is deprecated, and the documentation points to
>>> StringReader.
>>>
>>> However, after looking through java.io several times, I have not found
>>> how to construct an InputStream from a Reader.

> That works, and gets rid of the warnings. But why does the


> StringBufferInputStream documentation say "As of JDK 1.1, the preferred
> way to create a stream from a string is via the StringReader class." if
> StringReader cannot do StringBufferInputReader's job?

Um, as far as I can see there is no StringBufferInputReader. The
replacement for StringBufferInputStream is InputStreamReader - the major
difference is that the latter properly deals with encodings. In your
case the default constructor is probably most appropriate because the
process will likely emit its output using the default encoding of the
platform. If not, you need a way to make sure the process and your
InputStreamReader use the same encoding.

HTH

Kind regards

robert

Chris Uppal

unread,
Aug 29, 2006, 4:48:50 AM8/29/06
to
Patricia Shanahan wrote:

> I'm writing a unit test for a class that reads from InputStream. The
> string contains the test data I want it to operate on.

Wouldn't it make more sense to hold the test data in a byte[] array ? That,
after all, is what your real data will look like -- binary, coming from an
external application using who-knows-what encoding.

-- chris


M.J. Dance

unread,
Aug 29, 2006, 5:21:41 AM8/29/06
to
Patricia Shanahan wrote:

> What is the proper, undeprecated, replacement code for:
>
> InputStream in = new StringBufferInputStream(someString);

There is no proper replacement. The line of code above is mixing two
superficially similar but inherently different things: bytes and chars. Of
course the two are related but, in order to fully describe that relatinship, one
needs additional information: character encoding. Having that, one can
getBytes() from a String and, using those, create a ByteArrayInputStream.

There are some inconsistencies in Java API as far as such situations are
concerned. Sun decided to deprecate a whole StringBufferInputStream (instead of
deprecating only a constructor and adding a new one specifying encoding). Yet
they chose a different approach with String for example. One can construct a new
String from an array of bytes despite the fact that by doing that, one relies on
default/platform character encoding. Assuming that this encoding is the right
one, one can get desired results. But we all know that assumption is the mother
of all FUs, don't we?


So, whatever you do, don't... ;-)

Patricia Shanahan

unread,
Aug 29, 2006, 7:18:54 AM8/29/06
to

My real data is text. I find it much faster to write text as Strings.

The solution, as Arne pointed out, is to first get a byte array from the
String. If internationalization were an issue, I would specify an
encoding on that conversion.

In the real application, the data will be coming from Matlab. The
application only needs to work on a few hundred systems, three of which
I control and the remaining systems are all part of a UCSD grid
computer, so I do know what encoding.

Patricia

vahan

unread,
Aug 29, 2006, 8:02:28 AM8/29/06
to
As we know java String is char array. When we look through code source
for
StringBufferInputStream and StringReader read method we can see
difference
between:

StringBufferInputStream:
public synchronized int read() {
return (pos < count) ? (buffer.charAt(pos++) & 0xFF) : -1;

StringReader:
public int read() throws IOException {
synchronized (lock) {
ensureOpen();
if (next >= length)
return -1;
return str.charAt(next++);
}
}

As we see StringBufferInputStream's read method return only low byte
from char as int . That is why it is deprecated.
Best Vahan

Patricia Shanahan

unread,
Aug 29, 2006, 8:19:39 AM8/29/06
to
vahan wrote:
> As we know java String is char array. When we look through code source
> for
> StringBufferInputStream and StringReader read method we can see
> difference
> between:
>
> StringBufferInputStream:
> public synchronized int read() {
> return (pos < count) ? (buffer.charAt(pos++) & 0xFF) : -1;
>
> StringReader:
> public int read() throws IOException {
> synchronized (lock) {
> ensureOpen();
> if (next >= length)
> return -1;
> return str.charAt(next++);
> }
> }
>
> As we see StringBufferInputStream's read method return only low byte
> from char as int . That is why it is deprecated.
> Best Vahan

I don't have a problem with StringBufferInputStream being deprecated,
and understand perfectly well why.

The question is what replaces it, when the recommendation in the
documentation, StringReader, does not seem to do the whole job, because
there does not seem to be any way to get from there to an InputStream.

Patricia

Chris Uppal

unread,
Aug 29, 2006, 8:54:44 AM8/29/06
to
Patricia Shanahan wrote:

> The solution, as Arne pointed out, is to first get a byte array from the
> String. If internationalization were an issue, I would specify an
> encoding on that conversion.

I'd still specify the encoding explicitly -- you may as well be clear on what
the test is actually testing.

BTW, I don't believe that character encodings should be thought of as just an
internationalisation issue. (And even they if were, you -- being a
foreigner -- shouldn't ignore those issues ;-)

Matlab's a powerful program; surely it can be told to produce its output in
UTF-8 or UTF-16...

-- chris


Patricia Shanahan

unread,
Aug 29, 2006, 9:48:31 AM8/29/06
to

Matlab is extremely powerful in specific directions. The documentation
for the student version I have has no hits for "UTF", "international",
and "unicode", but pages of hits for "eigen". The only hit for
"encoding" is in the Signal Processing Toolbox and "quantizes the
entries in a multidimensional array of floating-point numbers u and
encodes them as integers". "ASCII" is used in the documentation as being
the alternative to "binary".

The downloaded code that I'm using is all matrix-to-matrix internal
operations, not I/O.

I suspect the Matlab stdout and stderr are going to be predominantly in
the ASCII character set. Any non-ASCII text would have to be in the
default encoding for the platform.

Patricia

Lasse Reichstein Nielsen

unread,
Aug 29, 2006, 1:43:32 PM8/29/06
to
Patricia Shanahan <pa...@acm.org> writes:

> The question is what replaces it, when the recommendation in the
> documentation, StringReader, does not seem to do the whole job, because
> there does not seem to be any way to get from there to an InputStream.

My guess is that a String should not be convertible to an InputStream
of bytes directly, because there is so many ways a String can be made
into bytes with none of them being an obvious default. I.e., asking to
go directly from a String to an InputStream was asking for something
that was better done in two separable steps: String to bytes, bytes to
stream.

Strings contain characters, so the most fitting sequential input to
convert it to would be a Reader.

/L
--
Lasse Reichstein Nielsen - l...@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

Mike Schilling

unread,
Aug 29, 2006, 2:08:29 PM8/29/06
to

"Patricia Shanahan" <pa...@acm.org> wrote in message
news:PAXIg.11588$Qf....@newsread2.news.pas.earthlink.net...

Which may or may not be the default encoding for any particular Java
installation.

I'm with Chris on this. If nothing else, specifying "US-ASCII" or
"ISO_8851_1" (or whatever) is awfully cheap documentation about what sort of
input your code expects, and will make debugging any future miscues far
eaiser for whoever's maintaining it then.


Oliver Wong

unread,
Aug 29, 2006, 3:03:42 PM8/29/06
to

"Mike Schilling" <mscotts...@hotmail.com> wrote in message
news:xo%Ig.4293$yO7....@newssvr14.news.prodigy.com...

>
> "Patricia Shanahan" <pa...@acm.org> wrote in message
> news:PAXIg.11588$Qf....@newsread2.news.pas.earthlink.net...
>> Chris Uppal wrote:
>>> Patricia Shanahan wrote:
>>>
>>>> The solution, as Arne pointed out, is to first get a byte array from
>>>> the
>>>> String. If internationalization were an issue, I would specify an
>>>> encoding on that conversion.
>>>
>>> I'd still specify the encoding explicitly -- you may as well be clear on
>>> what
>>> the test is actually testing.
>>>
>>> BTW, I don't believe that character encodings should be thought of as
>>> just an
>>> internationalisation issue. (And even they if were, you -- being a
>>> foreigner -- shouldn't ignore those issues ;-)
>>>
>>> Matlab's a powerful program; surely it can be told to produce its output
>>> in
>>> UTF-8 or UTF-16...
>>
>> Matlab is extremely powerful in specific directions. The documentation
>> for the student version I have has no hits for "UTF", "international",
>> and "unicode", but pages of hits for "eigen".
[...]

>> I suspect the Matlab stdout and stderr are going to be predominantly in
>> the ASCII character set. Any non-ASCII text would have to be in the
>> default encoding for the platform.
[...]

> If nothing else, specifying "US-ASCII" or "ISO_8851_1" (or whatever) is
> awfully cheap documentation about what sort of input your code expects,
> and will make debugging any future miscues far eaiser for whoever's
> maintaining it then.

Alternatively, it might be more accurate to put a TODO with something
like:

/*TODO: I don't know what encoding MatLab actually uses. If you find out,
set it here.*/

Setting an explicit encoding (to me) implies that that's the actual
encoding you want to use, as opposed to you having just chosen an encoding
randomly because you didn't know which one was appropriate.

- Oliver

Mike Schilling

unread,
Aug 29, 2006, 3:39:42 PM8/29/06
to

"Oliver Wong" <ow...@castortech.com> wrote in message
news:ic0Jg.22473$tP4.17596@clgrps12...

Yes, I did mean setting it *after* finding out the correct value :-)


Patricia Shanahan

unread,
Aug 29, 2006, 4:28:34 PM8/29/06
to
Lasse Reichstein Nielsen wrote:
...

> Strings contain characters, so the most fitting sequential input to
> convert it to would be a Reader.

Yes, but as far as I can tell Reader is a total dead end when the
objective is InputStream.

I got stuck, and had to ask for help, precisely because I was thinking
Reader, when I should have been taking a detour through byte arrays

Patricia

John W. Kennedy

unread,
Aug 29, 2006, 10:18:07 PM8/29/06
to

Only for reading and writing raw bytes. There is, so to speak, an
impedance mismatch between strings and streams, which is why, since 1.1,
strings are supposed to be processed by Reader and Writer classes.
That's why StringBufferInputStream is obsolete, producing the warning
messages that were the cause of this whole thread in the first place.

Arne Vajhøj

unread,
Aug 29, 2006, 10:30:14 PM8/29/06
to
John W. Kennedy wrote:
> Chris Uppal wrote:
>> John W. Kennedy wrote:
>>> From Java's viewpoint, you're looking at it wrong. In the normal case,
>>> since 1.1, Java doesn't want you to use an InputStream for anything, but
>>> a Reader (unless you need a DataInputStream).
>>
>> Maybe that came out wrong, but taken literally it is completely false.
>> InputStreams are central to the IO architecture.
>
> Only for reading and writing raw bytes. There is, so to speak, an
> impedance mismatch between strings and streams, which is why, since 1.1,
> strings are supposed to be processed by Reader and Writer classes.
> That's why StringBufferInputStream is obsolete, producing the warning
> messages that were the cause of this whole thread in the first place.

Your first post said "Java doesn't want you to use an InputStream
for anything" without the "Only for reading and writing raw bytes".

Arne

M.J. Dance

unread,
Aug 30, 2006, 3:02:56 AM8/30/06
to

That seems to be the problem with this input stream <-> reader (and output
stream <-> writer, for that matter) dichotomy. Every(?) reader has an uderlying
input stream which, I imagine, wouldn't be a problem to obtain. But that would
mean inviting problems: reading from both stream and reader simultaneously could
cause "unpredictable" behaviour: a few <khm/> years ago I was trying to make a
jsp serve binary content. No problem, I thought, one can obtain an output stream
from response (implicit object, instanceof (Http)ServletResponse, available in
every jsp) and send data through there. But it (the servlet engine, that is),
said that it already called .getWriter() and that .getOutputStream() cannot be
called after that. Maybe things changed since then, but it's a good
illustration. I think.

Chris Uppal

unread,
Aug 30, 2006, 3:52:54 AM8/30/06
to
John W. Kennedy wrote:
> Chris Uppal wrote:
> > John W. Kennedy wrote:
> >
> > > From Java's viewpoint, you're looking at it wrong. In the normal
> > > case, since 1.1, Java doesn't want you to use an InputStream for
> > > anything, but a Reader (unless you need a DataInputStream).
> >
> > Maybe that came out wrong, but taken literally it is completely false.
> > InputStreams are central to the IO architecture.
>
> Only for reading and writing raw bytes.

?!? Only !?!

That's hardly a trivial, obscure, or unimportant application !

(In fact, judging from what I read in this ng, many applications, or would-be
applications, of Readers or Writers are technically wrong in that the posters
are trying to treat binary data as if it were "really" text.)

-- chris


Chris Uppal

unread,
Aug 30, 2006, 3:59:57 AM8/30/06
to
Oliver Wong wrote:

> Setting an explicit encoding (to me) implies that that's the actual
> encoding you want to use, as opposed to you having just chosen an encoding
> randomly because you didn't know which one was appropriate.

Fair point.

With luck (i.e. I haven't bothered to check) the US-ASCII decoder will signal
an error if it is fed bytes outside the [0, 127] range. If so then setting
that would be one way to be explicit about the assumption (almost certainly
correct) that I think Patricia's making.

-- chris

Lasse Reichstein Nielsen

unread,
Aug 30, 2006, 5:41:47 AM8/30/06
to
"M.J. Dance" <mjd...@hotmail.com> writes:

> That seems to be the problem with this input stream <-> reader (and
> output stream <-> writer, for that matter) dichotomy. Every(?) reader
> has an uderlying input stream which, I imagine, wouldn't be a problem
> to obtain.

Except, e.g., StringReader.

If you are communicating between processes, or even computers, then at
some point you'll represent your data as the lowest common
denominator: bytes, but working inside a single program, you can start
out with characters and keep it that way.

There is no way to meaningfully convert a generic Writer to an
OutputStream, and it's an implementation detail whether there is an
underlying OutputStream for a given Writer, so the Writer interface
can't meaningfully expose a method for giving out an OutputStream.

On the opposite end, you shouldn't blindly convert an InputStream
to a Reader without knowing that the bytes do represent characters
in the encoding you have chosen.

Chris Uppal

unread,
Aug 30, 2006, 7:38:58 AM8/30/06
to
Lasse Reichstein Nielsen wrote:

> There is no way to meaningfully convert a generic Writer to an
> OutputStream, and it's an implementation detail whether there is an
> underlying OutputStream for a given Writer, so the Writer interface
> can't meaningfully expose a method for giving out an OutputStream.

Certainly there is. You can create an OutputStream decorator which wraps a
Writer (or an InputStream which wraps a Reader) in just the same way as an
OutputStreamWriter wraps an OutputStream. All you need is a CharacterDecoder
(or Encoder).

The java.io package lacks such a beast, but it's trivial enough to create your
own if you have a need for it (and -- as I said before -- if you can think of
an acceptable name for it).

I admit there's not an /awful/ lot of use for it though...

-- chris

Mike Schilling

unread,
Aug 30, 2006, 8:41:44 AM8/30/06
to

"Chris Uppal" <chris...@metagnostic.REMOVE-THIS.org> wrote in message
news:44f550df$0$637$bed6...@news.gradwell.net...


Not quite so much luck.

import java.io.*;

public class BadAscii
{
public static void main(String[] args) throws Exception
{
byte arr[] = { (byte)0x40, (byte)0x80};

ByteArrayInputStream bais = new ByteArrayInputStream(arr);
InputStreamReader isr = new InputStreamReader(bais, "US-ASCII");
while (true)
{
int r = isr.read();
if (r < 0)
break;
System.out.println(
(char)r + "(" + Integer.toHexString(r) + ")");
}
}
}

results in

% java -cp . BadAscii
@(40)
?(fffd)

So, no exception, but the FFFD is a clear indication that there's been a
decoding error.


Mike Schilling

unread,
Aug 30, 2006, 12:48:24 PM8/30/06
to

"Chris Uppal" <chris...@metagnostic.REMOVE-THIS.org> wrote in message
news:44f578f9$0$641$bed6...@news.gradwell.net...

>
> Certainly there is. You can create an OutputStream decorator which wraps
> a
> Writer (or an InputStream which wraps a Reader) in just the same way as an
> OutputStreamWriter wraps an OutputStream. All you need is a
> CharacterDecoder
> (or Encoder).
>
> The java.io package lacks such a beast, but it's trivial enough to create
> your
> own if you have a need for it (and -- as I said before -- if you can think
> of
> an acceptable name for it).

WriterOutputStream and ReaderInputStream will do.


Chris Uppal

unread,
Aug 31, 2006, 5:22:13 AM8/31/06
to
Mike Schilling wrote:

> WriterOutputStream and ReaderInputStream will do.

Ugh!

;-)

-- chris


Chris Uppal

unread,
Aug 31, 2006, 5:48:08 AM8/31/06
to
Mike Schilling wrote:

> So, no exception, but the FFFD is a clear indication that there's been a
> decoding error.

Thanks for checking. A pity that it doesn't throw an exception. "Suppressing"
the error is probably the best behaviour for most purposes, but it would be
nice (now I come to think of it) if we could tell a decoding/encoding stream
that we want it to be strict (just as we can tell an actual CharserDecoder how
it should treat "wrong" inputs).

-- chris

Piotr Kobzda

unread,
Aug 31, 2006, 8:38:17 AM8/31/06
to
Chris Uppal wrote:

> A pity that it doesn't throw an exception. "Suppressing"
> the error is probably the best behaviour for most purposes, but it would be
> nice (now I come to think of it) if we could tell a decoding/encoding stream
> that we want it to be strict (just as we can tell an actual CharserDecoder how
> it should treat "wrong" inputs).

You can easily achieve that setting "strictness" on CharsetDecoder
produced by your desired Charset.

Strictness you can express this way:

CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);

Applying it to Mike's example as:

InputStreamReader isr = new InputStreamReader(bais, dec);

Should give you an expected results.


piotr

Mike Schilling

unread,
Aug 31, 2006, 11:45:59 AM8/31/06
to

"Piotr Kobzda" <pi...@gazeta.pl> wrote in message
news:ed6l7p$lfh$1...@inews.gazeta.pl...

Thank you; I've changed my example to use it, resulting in

import java.io.*;
import java.nio.charset.*;

public class BadAscii
{
public static void main(String[] args) throws Exception
{
byte arr[] = { (byte)0x40, (byte)0x80};

CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);

ByteArrayInputStream bais = new ByteArrayInputStream(arr);


InputStreamReader isr = new InputStreamReader(bais, dec);

while (true)
{
int r = isr.read();
if (r < 0)
break;
System.out.println(
(char)r + "(" + Integer.toHexString(r) + ")");

}
}
}

and it now produces

Exception in thread "main" java.nio.charset.MalformedInputException: Input
length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:463)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)
at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:131)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:117)
at java.io.InputStreamReader.read(InputStreamReader.java:151)
at BadAscii.main(BadAscii.java:18)

By the way, you can observe that the decoder tries to process the entire
array at once, since the first character, which is legitimate ASCII, is
never returned.

Mike Schilling

unread,
Aug 31, 2006, 11:47:01 AM8/31/06
to

"Chris Uppal" <chris...@metagnostic.REMOVE-THIS.org> wrote in message
news:44f6b0ce$0$639$bed6...@news.gradwell.net...

> Mike Schilling wrote:
>
>> WriterOutputStream and ReaderInputStream will do.
>
> Ugh!
>
> ;-)

Those are worse than InputStreamReader and OutputStreamWriter because ... ?
:-)


Dale King

unread,
Sep 1, 2006, 12:35:10 AM9/1/06
to
M.J. Dance wrote:
> Patricia Shanahan wrote:
>
>> What is the proper, undeprecated, replacement code for:
>>
>> InputStream in = new StringBufferInputStream(someString);
>
> There is no proper replacement. The line of code above is mixing two
> superficially similar but inherently different things: bytes and chars.
> Of course the two are related but, in order to fully describe that
> relatinship, one needs additional information: character encoding.
> Having that, one can getBytes() from a String and, using those, create a
> ByteArrayInputStream.


The essential issue here is that Java (rightly so in my opinion)
associates direction with crossing the boundary between characters and
bytes. Going from character to bytes is only supported in the output or
writing direction. The implication being that conversion between the two
is associated with an external entity (a file, a server). The assumption
with Java is that once you have it as a character the program itself
should only deal with it as characters. It should only be converted to
bytes in order to send it outside of Java.

That is a fairly resonable way to do things in my opinion. In almost all
cases it is correct. There are some cases where you do want to go the
other way, but they are rare. And if they did support it, it would
probably be encouraging abuse by those that try to handle text data as
bytes which is just wrong.

This assumption also simplifies the whole character encoder/decoder
system. Consider the fact that one character can map to multiple bytes,
but the reverse is not true. One byte doesn't map to multiple characters
in any known encoding. If I am reading a byte from a "ReaderInputStream"
and one character from the String can map to multiple bytes you end up
having to have some form of buffer in between because the byte you read
may only be one of several for that first character. You would also end
up buffering in the reverse direction for a "WriterOutputStream" class
because the byte you write may not be enough to write the character. But
in the directionality supported by Java there is no need for such a
buffer. The idea of converting from characters to bytes is not as
straightforward as it seems on the surface.

In this case the reason Patricia needs it is a poorly designed class.
Since that class expects to read textual data it should support Readers.
It could in addition support InputStream (although I would mark that
support as deprecated because users should not be using it).

--
Dale King

Mike Schilling

unread,
Sep 1, 2006, 2:03:38 AM9/1/06
to

"Dale King" <Dale...@gmail.com> wrote in message
news:fuydnQLtL-VEJWrZ...@insightbb.com...

>
> This assumption also simplifies the whole character encoder/decoder
> system. Consider the fact that one character can map to multiple bytes,
> but the reverse is not true. One byte doesn't map to multiple characters
> in any known encoding. If I am reading a byte from a "ReaderInputStream"
> and one character from the String can map to multiple bytes you end up
> having to have some form of buffer in between because the byte you read
> may only be one of several for that first character. You would also end up
> buffering in the reverse direction for a "WriterOutputStream" class
> because the byte you write may not be enough to write the character. But
> in the directionality supported by Java there is no need for such a
> buffer. The idea of converting from characters to bytes is not as
> straightforward as it seems on the surface.

In UTF-8, a group of 4 bytes can map to two characters (when the code point
is > FFFF, and is represented in Java by a pair of 16-bit characters.) At
any rate, decoders don't decode a character at a time, as you'll note if you
check the Javadoc for CharacterDecoder; they decode an array or stream of
bytes into the appropriate characters, and there's always (at least
potentially) a buffer involved.


M.J. Dance

unread,
Sep 1, 2006, 4:15:22 AM9/1/06
to
Dale King wrote:
> M.J. Dance wrote:
>> Patricia Shanahan wrote:
>>
>>> What is the proper, undeprecated, replacement code for:
>>>
>>> InputStream in = new StringBufferInputStream(someString);
>>
>> There is no proper replacement. The line of code above is mixing two
>> superficially similar but inherently different things: bytes and
>> chars. Of course the two are related but, in order to fully describe
>> that relatinship, one needs additional information: character
>> encoding. Having that, one can getBytes() from a String and, using
>> those, create a ByteArrayInputStream.
>
>
> The essential issue here is that Java (rightly so in my opinion)
> associates direction with crossing the boundary between characters and
> bytes. Going from character to bytes is only supported in the output or
> writing direction. The implication being that conversion between the two
> is associated with an external entity (a file, a server). The assumption
> with Java is that once you have it as a character the program itself
> should only deal with it as characters. It should only be converted to
> bytes in order to send it outside of Java.
>
> That is a fairly resonable way to do things in my opinion. In almost all
> cases it is correct. There are some cases where you do want to go the
> other way, but they are rare. And if they did support it, it would
> probably be encouraging abuse by those that try to handle text data as
> bytes which is just wrong.

Well. There are cases where you can't do without bytes. Cryptography, digests,
signing etc. all operate on bytes. And people do want to encrypt, digest and/or
sign text (t.i. a string of characters) from time to time.

> This assumption also simplifies the whole character encoder/decoder
> system. Consider the fact that one character can map to multiple bytes,
> but the reverse is not true. One byte doesn't map to multiple characters
> in any known encoding. If I am reading a byte from a "ReaderInputStream"
> and one character from the String can map to multiple bytes you end up
> having to have some form of buffer in between because the byte you read
> may only be one of several for that first character. You would also end
> up buffering in the reverse direction for a "WriterOutputStream" class
> because the byte you write may not be enough to write the character. But
> in the directionality supported by Java there is no need for such a
> buffer. The idea of converting from characters to bytes is not as
> straightforward as it seems on the surface.
>
> In this case the reason Patricia needs it is a poorly designed class.
> Since that class expects to read textual data it should support Readers.
> It could in addition support InputStream (although I would mark that
> support as deprecated because users should not be using it).

Even if not all the readers/writers are wrapped around a stream, there could be
a public getInputStream(...) or getOutputStream(...). It would just throw an
OperationNotSupportedException or something.

Chris Uppal

unread,
Sep 1, 2006, 3:33:25 AM9/1/06
to
Mike Schilling wrote:

[me:]


> > > WriterOutputStream and ReaderInputStream will do.
> >
> > Ugh!
> >
> > ;-)
>
> Those are worse than InputStreamReader and OutputStreamWriter because ...
> ? :-)

Well, since you ask...

They are confusingly similar to pre-existing classes.

They violate rules of English phrase construction, and class name construction
based on that.

(Incidentally, part of the problem is the non-symmetry in the existing names.
"Reader" and "InputStream" do not follow the same grammatical pattern.
While -- as it chances -- you can qualify "Reader" with "InputStream", the
reverse doesn't sit happily).

-- chris


Chris Uppal

unread,
Sep 1, 2006, 3:34:24 AM9/1/06
to
Piotr Kobzda wrote:

[me:]


> > A pity that it doesn't throw an exception. "Suppressing"
> > the error is probably the best behaviour for most purposes, but it
> > would be nice (now I come to think of it) if we could tell a
> > decoding/encoding stream that we want it to be strict (just as we can
> > tell an actual CharserDecoder how it should treat "wrong" inputs).
>
> You can easily achieve that setting "strictness" on CharsetDecoder
> produced by your desired Charset.

[,,,]


> InputStreamReader isr = new InputStreamReader(bais, dec);

Ah! I hadn't realised you could do that. Thanks for the suggestion.

-- chris


Thomas Hawtin

unread,
Sep 1, 2006, 9:22:54 AM9/1/06
to
Chris Uppal wrote:
>
> (Incidentally, part of the problem is the non-symmetry in the existing names.
> "Reader" and "InputStream" do not follow the same grammatical pattern.
> While -- as it chances -- you can qualify "Reader" with "InputStream", the
> reverse doesn't sit happily).

There isn't really a particularly good reason to expose the class.
Instead of introducing a new public class, a method would do:

public static InputStream asInputStream(Reader reader, Charset cs)

As Reader is a class, you could even add it as a member.

Tom Hawtin
--
Unemployed English Java programmer
http://jroller.com/page/tackline/

Patricia Shanahan

unread,
Sep 1, 2006, 9:55:19 AM9/1/06
to
Thomas Hawtin wrote:
> Chris Uppal wrote:
>>
>> (Incidentally, part of the problem is the non-symmetry in the existing
>> names.
>> "Reader" and "InputStream" do not follow the same grammatical pattern.
>> While -- as it chances -- you can qualify "Reader" with
>> "InputStream", the
>> reverse doesn't sit happily).
>
> There isn't really a particularly good reason to expose the class.
> Instead of introducing a new public class, a method would do:
>
> public static InputStream asInputStream(Reader reader, Charset cs)
>
> As Reader is a class, you could even add it as a member.

Although it certainly COULD be done that way, it would be inconsistent
with the general design of java.io.

It might have been better if it were done that way from the start, with
a visible class if, and only if, the new class adds some functionality,
such as a readLine() method.

For example, FileInputStream could have been replaced by a series of get
methods equivalent to its constructors:

public static InputStream fileAsInputStream(File f)

etc.

However, I don't think it is worth breaking the pattern now. To me, an
InputStreamReader sounds like a Reader that reads an InputStream, so I
like the name.

Patricia

Mike Schilling

unread,
Sep 1, 2006, 10:46:29 AM9/1/06
to

"Chris Uppal" <chris...@metagnostic.REMOVE-THIS.org> wrote in message
news:44f7f026$0$638$bed6...@news.gradwell.net...

A FileInputStream is an input stream that gets bytes from a file, and a
ReaderInputStream would be an input stream that gets bytes from a Reader.


Dale King

unread,
Sep 5, 2006, 9:49:59 AM9/5/06
to

And I didn't say that people could do without bytes. Of course, they
want to do those things, the question is directionality (read vs.
write). It makes a lot of sense to encrypt a string to write it to some
external entity. It's hard to come up with a compelling case where you
need to read from a String as encrypted bytes.

>> In this case the reason Patricia needs it is a poorly designed class.
>> Since that class expects to read textual data it should support
>> Readers. It could in addition support InputStream (although I would
>> mark that support as deprecated because users should not be using it).
>
> Even if not all the readers/writers are wrapped around a stream, there
> could be a public getInputStream(...) or getOutputStream(...). It would
> just throw an OperationNotSupportedException or something.

Don't follow you here.

--
Dale King

Patricia Shanahan

unread,
Sep 5, 2006, 10:08:26 AM9/5/06
to
Dale King wrote:
>...

> In this case the reason Patricia needs it is a poorly designed class.
> Since that class expects to read textual data it should support Readers.
> It could in addition support InputStream (although I would mark that
> support as deprecated because users should not be using it).
>

I'm not so sure that classes such as Process should supply Reader/Writer
interfaces themselves. Process is providing access to byte stream data.
If it does it through Reader and Writer it would have to be told the
encoding (even if it also had a option for going with a default encoding).

Some applications might want direct, unconverted, byte stream access to
exactly what the job said. That might be important, for example, for debug.

Given the general layering of I/O interfaces, I think it may be better
to just provide the Stream interfaces, and expect the caller to build
any Reader or Writer, passing the encoding information to the constructor.

ReaderInputStream and WriterOutputStream would have solved my problem
completely.

Patricia

Dale King

unread,
Sep 5, 2006, 11:46:44 AM9/5/06
to
Patricia Shanahan wrote:
> Dale King wrote:
>> ...
>> In this case the reason Patricia needs it is a poorly designed class.
>> Since that class expects to read textual data it should support
>> Readers. It could in addition support InputStream (although I would
>> mark that support as deprecated because users should not be using it).
>>
>
> I'm not so sure that classes such as Process should supply Reader/Writer
> interfaces themselves. Process is providing access to byte stream data.
> If it does it through Reader and Writer it would have to be told the
> encoding (even if it also had a option for going with a default encoding).
>
> Some applications might want direct, unconverted, byte stream access to
> exactly what the job said. That might be important, for example, for debug.

I wasn't suggesting that Process should. If I understood your
explanation you are trying to test a class that process the output
streams of a Process. So I am imagining a class that has a constructor
that takes two InputStream objects, one for the output stream and one
for the error stream from the Process.

Since this class is really processing text information it should really
take two Reader objects. This makes the class independent of where the
actual data comes from (in your case from a String). If you wanted to
hook it up to the output of a Process you simply wrap the streams from
the Process in an InputStreamReader with the appropriate encoding.

This makes the class more general-purpose. Imagine for example if
instead of the data coming from another process on this machine you
instead sent off a SOAP message to another server and got a text
response back.

> Given the general layering of I/O interfaces, I think it may be better
> to just provide the Stream interfaces, and expect the caller to build
> any Reader or Writer, passing the encoding information to the constructor.

If I understand what you meant, I think that is what I am suggesting.

> ReaderInputStream and WriterOutputStream would have solved my problem
> completely.

But unfortunately would invite much abuse by those who do not understand
the differences between text and raw bytes. I could see many newbies
using a ReaderInputStream when in reality they should be converting to
Reader.

--
Dale King

0 new messages