StringBufferInputStream is deprecated, and the documentation points to
StringReader.
However, after looking through java.io several times, I have not found
how to construct an InputStream from a Reader.
What is the proper, undeprecated, replacement code for:
InputStream in = new StringBufferInputStream(someString);
Thanks,
Patricia
InputStream in = new ByteArrayInputStream(someString.getBytes(encoding));
must be a candidate.
Arne
Thanks.
That works, and gets rid of the warnings. But why does the
StringBufferInputStream documentation say "As of JDK 1.1, the preferred
way to create a stream from a string is via the StringReader class." if
StringReader cannot do StringBufferInputReader's job?
Patricia
I dont know.
A guess could be that a typical usage of StringBufferInputReader
was to wrap it in an InputStreamReader.
But this is pure guessing.
Arne
From Java's viewpoint, you're looking at it wrong. In the normal case,
since 1.1, Java doesn't want you to use an InputStream for anything, but
a Reader (unless you need a DataInputStream).
Reader in = new StringReader(someString);
--
John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
-- Charles Williams. "Taliessin through Logres: Prelude"
I'm writing a unit test for a class that reads from InputStream. The
string contains the test data I want it to operate on.
The class I'm testing is intended to deal with the stdout and stderr
streams from a java.lang.Process job, and java.lang.Process does not
seem to have heard of Reader.
Patricia
> From Java's viewpoint, you're looking at it wrong. In the normal case,
> since 1.1, Java doesn't want you to use an InputStream for anything, but
> a Reader (unless you need a DataInputStream).
Maybe that came out wrong, but taken literally it is completely false.
InputStreams are central to the IO architecture.
-- chris
> But why does the
> StringBufferInputStream documentation say "As of JDK 1.1, the preferred
> way to create a stream from a string is via the StringReader class." if
> StringReader cannot do StringBufferInputReader's job?
My guess is that they intended to create in InputStream-flavoured wrapper for
Readers, but never got around to it. Probably they argued for a few days about
what to call it, but could neither stomach ReaderInputStream nor agree on
anything else, and just quietly dropped it.
"Nobody's going to need it anyway"
;-)
-- chris
> That works, and gets rid of the warnings. But why does the
> StringBufferInputStream documentation say "As of JDK 1.1, the preferred
> way to create a stream from a string is via the StringReader class." if
> StringReader cannot do StringBufferInputReader's job?
Um, as far as I can see there is no StringBufferInputReader. The
replacement for StringBufferInputStream is InputStreamReader - the major
difference is that the latter properly deals with encodings. In your
case the default constructor is probably most appropriate because the
process will likely emit its output using the default encoding of the
platform. If not, you need a way to make sure the process and your
InputStreamReader use the same encoding.
HTH
Kind regards
robert
> I'm writing a unit test for a class that reads from InputStream. The
> string contains the test data I want it to operate on.
Wouldn't it make more sense to hold the test data in a byte[] array ? That,
after all, is what your real data will look like -- binary, coming from an
external application using who-knows-what encoding.
-- chris
> What is the proper, undeprecated, replacement code for:
>
> InputStream in = new StringBufferInputStream(someString);
There is no proper replacement. The line of code above is mixing two
superficially similar but inherently different things: bytes and chars. Of
course the two are related but, in order to fully describe that relatinship, one
needs additional information: character encoding. Having that, one can
getBytes() from a String and, using those, create a ByteArrayInputStream.
There are some inconsistencies in Java API as far as such situations are
concerned. Sun decided to deprecate a whole StringBufferInputStream (instead of
deprecating only a constructor and adding a new one specifying encoding). Yet
they chose a different approach with String for example. One can construct a new
String from an array of bytes despite the fact that by doing that, one relies on
default/platform character encoding. Assuming that this encoding is the right
one, one can get desired results. But we all know that assumption is the mother
of all FUs, don't we?
So, whatever you do, don't... ;-)
My real data is text. I find it much faster to write text as Strings.
The solution, as Arne pointed out, is to first get a byte array from the
String. If internationalization were an issue, I would specify an
encoding on that conversion.
In the real application, the data will be coming from Matlab. The
application only needs to work on a few hundred systems, three of which
I control and the remaining systems are all part of a UCSD grid
computer, so I do know what encoding.
Patricia
StringBufferInputStream:
public synchronized int read() {
return (pos < count) ? (buffer.charAt(pos++) & 0xFF) : -1;
StringReader:
public int read() throws IOException {
synchronized (lock) {
ensureOpen();
if (next >= length)
return -1;
return str.charAt(next++);
}
}
As we see StringBufferInputStream's read method return only low byte
from char as int . That is why it is deprecated.
Best Vahan
I don't have a problem with StringBufferInputStream being deprecated,
and understand perfectly well why.
The question is what replaces it, when the recommendation in the
documentation, StringReader, does not seem to do the whole job, because
there does not seem to be any way to get from there to an InputStream.
Patricia
> The solution, as Arne pointed out, is to first get a byte array from the
> String. If internationalization were an issue, I would specify an
> encoding on that conversion.
I'd still specify the encoding explicitly -- you may as well be clear on what
the test is actually testing.
BTW, I don't believe that character encodings should be thought of as just an
internationalisation issue. (And even they if were, you -- being a
foreigner -- shouldn't ignore those issues ;-)
Matlab's a powerful program; surely it can be told to produce its output in
UTF-8 or UTF-16...
-- chris
Matlab is extremely powerful in specific directions. The documentation
for the student version I have has no hits for "UTF", "international",
and "unicode", but pages of hits for "eigen". The only hit for
"encoding" is in the Signal Processing Toolbox and "quantizes the
entries in a multidimensional array of floating-point numbers u and
encodes them as integers". "ASCII" is used in the documentation as being
the alternative to "binary".
The downloaded code that I'm using is all matrix-to-matrix internal
operations, not I/O.
I suspect the Matlab stdout and stderr are going to be predominantly in
the ASCII character set. Any non-ASCII text would have to be in the
default encoding for the platform.
Patricia
> The question is what replaces it, when the recommendation in the
> documentation, StringReader, does not seem to do the whole job, because
> there does not seem to be any way to get from there to an InputStream.
My guess is that a String should not be convertible to an InputStream
of bytes directly, because there is so many ways a String can be made
into bytes with none of them being an obvious default. I.e., asking to
go directly from a String to an InputStream was asking for something
that was better done in two separable steps: String to bytes, bytes to
stream.
Strings contain characters, so the most fitting sequential input to
convert it to would be a Reader.
/L
--
Lasse Reichstein Nielsen - l...@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Which may or may not be the default encoding for any particular Java
installation.
I'm with Chris on this. If nothing else, specifying "US-ASCII" or
"ISO_8851_1" (or whatever) is awfully cheap documentation about what sort of
input your code expects, and will make debugging any future miscues far
eaiser for whoever's maintaining it then.
> If nothing else, specifying "US-ASCII" or "ISO_8851_1" (or whatever) is
> awfully cheap documentation about what sort of input your code expects,
> and will make debugging any future miscues far eaiser for whoever's
> maintaining it then.
Alternatively, it might be more accurate to put a TODO with something
like:
/*TODO: I don't know what encoding MatLab actually uses. If you find out,
set it here.*/
Setting an explicit encoding (to me) implies that that's the actual
encoding you want to use, as opposed to you having just chosen an encoding
randomly because you didn't know which one was appropriate.
- Oliver
Yes, I did mean setting it *after* finding out the correct value :-)
Yes, but as far as I can tell Reader is a total dead end when the
objective is InputStream.
I got stuck, and had to ask for help, precisely because I was thinking
Reader, when I should have been taking a detour through byte arrays
Patricia
Only for reading and writing raw bytes. There is, so to speak, an
impedance mismatch between strings and streams, which is why, since 1.1,
strings are supposed to be processed by Reader and Writer classes.
That's why StringBufferInputStream is obsolete, producing the warning
messages that were the cause of this whole thread in the first place.
Your first post said "Java doesn't want you to use an InputStream
for anything" without the "Only for reading and writing raw bytes".
Arne
That seems to be the problem with this input stream <-> reader (and output
stream <-> writer, for that matter) dichotomy. Every(?) reader has an uderlying
input stream which, I imagine, wouldn't be a problem to obtain. But that would
mean inviting problems: reading from both stream and reader simultaneously could
cause "unpredictable" behaviour: a few <khm/> years ago I was trying to make a
jsp serve binary content. No problem, I thought, one can obtain an output stream
from response (implicit object, instanceof (Http)ServletResponse, available in
every jsp) and send data through there. But it (the servlet engine, that is),
said that it already called .getWriter() and that .getOutputStream() cannot be
called after that. Maybe things changed since then, but it's a good
illustration. I think.
?!? Only !?!
That's hardly a trivial, obscure, or unimportant application !
(In fact, judging from what I read in this ng, many applications, or would-be
applications, of Readers or Writers are technically wrong in that the posters
are trying to treat binary data as if it were "really" text.)
-- chris
> Setting an explicit encoding (to me) implies that that's the actual
> encoding you want to use, as opposed to you having just chosen an encoding
> randomly because you didn't know which one was appropriate.
Fair point.
With luck (i.e. I haven't bothered to check) the US-ASCII decoder will signal
an error if it is fed bytes outside the [0, 127] range. If so then setting
that would be one way to be explicit about the assumption (almost certainly
correct) that I think Patricia's making.
-- chris
> That seems to be the problem with this input stream <-> reader (and
> output stream <-> writer, for that matter) dichotomy. Every(?) reader
> has an uderlying input stream which, I imagine, wouldn't be a problem
> to obtain.
Except, e.g., StringReader.
If you are communicating between processes, or even computers, then at
some point you'll represent your data as the lowest common
denominator: bytes, but working inside a single program, you can start
out with characters and keep it that way.
There is no way to meaningfully convert a generic Writer to an
OutputStream, and it's an implementation detail whether there is an
underlying OutputStream for a given Writer, so the Writer interface
can't meaningfully expose a method for giving out an OutputStream.
On the opposite end, you shouldn't blindly convert an InputStream
to a Reader without knowing that the bytes do represent characters
in the encoding you have chosen.
> There is no way to meaningfully convert a generic Writer to an
> OutputStream, and it's an implementation detail whether there is an
> underlying OutputStream for a given Writer, so the Writer interface
> can't meaningfully expose a method for giving out an OutputStream.
Certainly there is. You can create an OutputStream decorator which wraps a
Writer (or an InputStream which wraps a Reader) in just the same way as an
OutputStreamWriter wraps an OutputStream. All you need is a CharacterDecoder
(or Encoder).
The java.io package lacks such a beast, but it's trivial enough to create your
own if you have a need for it (and -- as I said before -- if you can think of
an acceptable name for it).
I admit there's not an /awful/ lot of use for it though...
-- chris
Not quite so much luck.
import java.io.*;
public class BadAscii
{
public static void main(String[] args) throws Exception
{
byte arr[] = { (byte)0x40, (byte)0x80};
ByteArrayInputStream bais = new ByteArrayInputStream(arr);
InputStreamReader isr = new InputStreamReader(bais, "US-ASCII");
while (true)
{
int r = isr.read();
if (r < 0)
break;
System.out.println(
(char)r + "(" + Integer.toHexString(r) + ")");
}
}
}
results in
% java -cp . BadAscii
@(40)
?(fffd)
So, no exception, but the FFFD is a clear indication that there's been a
decoding error.
WriterOutputStream and ReaderInputStream will do.
> WriterOutputStream and ReaderInputStream will do.
Ugh!
;-)
-- chris
> So, no exception, but the FFFD is a clear indication that there's been a
> decoding error.
Thanks for checking. A pity that it doesn't throw an exception. "Suppressing"
the error is probably the best behaviour for most purposes, but it would be
nice (now I come to think of it) if we could tell a decoding/encoding stream
that we want it to be strict (just as we can tell an actual CharserDecoder how
it should treat "wrong" inputs).
-- chris
> A pity that it doesn't throw an exception. "Suppressing"
> the error is probably the best behaviour for most purposes, but it would be
> nice (now I come to think of it) if we could tell a decoding/encoding stream
> that we want it to be strict (just as we can tell an actual CharserDecoder how
> it should treat "wrong" inputs).
You can easily achieve that setting "strictness" on CharsetDecoder
produced by your desired Charset.
Strictness you can express this way:
CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);
Applying it to Mike's example as:
InputStreamReader isr = new InputStreamReader(bais, dec);
Should give you an expected results.
piotr
Thank you; I've changed my example to use it, resulting in
import java.io.*;
import java.nio.charset.*;
public class BadAscii
{
public static void main(String[] args) throws Exception
{
byte arr[] = { (byte)0x40, (byte)0x80};
CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);
ByteArrayInputStream bais = new ByteArrayInputStream(arr);
InputStreamReader isr = new InputStreamReader(bais, dec);
while (true)
{
int r = isr.read();
if (r < 0)
break;
System.out.println(
(char)r + "(" + Integer.toHexString(r) + ")");
}
}
}
and it now produces
Exception in thread "main" java.nio.charset.MalformedInputException: Input
length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:463)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)
at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:131)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:117)
at java.io.InputStreamReader.read(InputStreamReader.java:151)
at BadAscii.main(BadAscii.java:18)
By the way, you can observe that the decoder tries to process the entire
array at once, since the first character, which is legitimate ASCII, is
never returned.
Those are worse than InputStreamReader and OutputStreamWriter because ... ?
:-)
The essential issue here is that Java (rightly so in my opinion)
associates direction with crossing the boundary between characters and
bytes. Going from character to bytes is only supported in the output or
writing direction. The implication being that conversion between the two
is associated with an external entity (a file, a server). The assumption
with Java is that once you have it as a character the program itself
should only deal with it as characters. It should only be converted to
bytes in order to send it outside of Java.
That is a fairly resonable way to do things in my opinion. In almost all
cases it is correct. There are some cases where you do want to go the
other way, but they are rare. And if they did support it, it would
probably be encouraging abuse by those that try to handle text data as
bytes which is just wrong.
This assumption also simplifies the whole character encoder/decoder
system. Consider the fact that one character can map to multiple bytes,
but the reverse is not true. One byte doesn't map to multiple characters
in any known encoding. If I am reading a byte from a "ReaderInputStream"
and one character from the String can map to multiple bytes you end up
having to have some form of buffer in between because the byte you read
may only be one of several for that first character. You would also end
up buffering in the reverse direction for a "WriterOutputStream" class
because the byte you write may not be enough to write the character. But
in the directionality supported by Java there is no need for such a
buffer. The idea of converting from characters to bytes is not as
straightforward as it seems on the surface.
In this case the reason Patricia needs it is a poorly designed class.
Since that class expects to read textual data it should support Readers.
It could in addition support InputStream (although I would mark that
support as deprecated because users should not be using it).
--
Dale King
>
> This assumption also simplifies the whole character encoder/decoder
> system. Consider the fact that one character can map to multiple bytes,
> but the reverse is not true. One byte doesn't map to multiple characters
> in any known encoding. If I am reading a byte from a "ReaderInputStream"
> and one character from the String can map to multiple bytes you end up
> having to have some form of buffer in between because the byte you read
> may only be one of several for that first character. You would also end up
> buffering in the reverse direction for a "WriterOutputStream" class
> because the byte you write may not be enough to write the character. But
> in the directionality supported by Java there is no need for such a
> buffer. The idea of converting from characters to bytes is not as
> straightforward as it seems on the surface.
In UTF-8, a group of 4 bytes can map to two characters (when the code point
is > FFFF, and is represented in Java by a pair of 16-bit characters.) At
any rate, decoders don't decode a character at a time, as you'll note if you
check the Javadoc for CharacterDecoder; they decode an array or stream of
bytes into the appropriate characters, and there's always (at least
potentially) a buffer involved.
Well. There are cases where you can't do without bytes. Cryptography, digests,
signing etc. all operate on bytes. And people do want to encrypt, digest and/or
sign text (t.i. a string of characters) from time to time.
> This assumption also simplifies the whole character encoder/decoder
> system. Consider the fact that one character can map to multiple bytes,
> but the reverse is not true. One byte doesn't map to multiple characters
> in any known encoding. If I am reading a byte from a "ReaderInputStream"
> and one character from the String can map to multiple bytes you end up
> having to have some form of buffer in between because the byte you read
> may only be one of several for that first character. You would also end
> up buffering in the reverse direction for a "WriterOutputStream" class
> because the byte you write may not be enough to write the character. But
> in the directionality supported by Java there is no need for such a
> buffer. The idea of converting from characters to bytes is not as
> straightforward as it seems on the surface.
>
> In this case the reason Patricia needs it is a poorly designed class.
> Since that class expects to read textual data it should support Readers.
> It could in addition support InputStream (although I would mark that
> support as deprecated because users should not be using it).
Even if not all the readers/writers are wrapped around a stream, there could be
a public getInputStream(...) or getOutputStream(...). It would just throw an
OperationNotSupportedException or something.
[me:]
> > > WriterOutputStream and ReaderInputStream will do.
> >
> > Ugh!
> >
> > ;-)
>
> Those are worse than InputStreamReader and OutputStreamWriter because ...
> ? :-)
Well, since you ask...
They are confusingly similar to pre-existing classes.
They violate rules of English phrase construction, and class name construction
based on that.
(Incidentally, part of the problem is the non-symmetry in the existing names.
"Reader" and "InputStream" do not follow the same grammatical pattern.
While -- as it chances -- you can qualify "Reader" with "InputStream", the
reverse doesn't sit happily).
-- chris
[me:]
> > A pity that it doesn't throw an exception. "Suppressing"
> > the error is probably the best behaviour for most purposes, but it
> > would be nice (now I come to think of it) if we could tell a
> > decoding/encoding stream that we want it to be strict (just as we can
> > tell an actual CharserDecoder how it should treat "wrong" inputs).
>
> You can easily achieve that setting "strictness" on CharsetDecoder
> produced by your desired Charset.
[,,,]
> InputStreamReader isr = new InputStreamReader(bais, dec);
Ah! I hadn't realised you could do that. Thanks for the suggestion.
-- chris
There isn't really a particularly good reason to expose the class.
Instead of introducing a new public class, a method would do:
public static InputStream asInputStream(Reader reader, Charset cs)
As Reader is a class, you could even add it as a member.
Tom Hawtin
--
Unemployed English Java programmer
http://jroller.com/page/tackline/
Although it certainly COULD be done that way, it would be inconsistent
with the general design of java.io.
It might have been better if it were done that way from the start, with
a visible class if, and only if, the new class adds some functionality,
such as a readLine() method.
For example, FileInputStream could have been replaced by a series of get
methods equivalent to its constructors:
public static InputStream fileAsInputStream(File f)
etc.
However, I don't think it is worth breaking the pattern now. To me, an
InputStreamReader sounds like a Reader that reads an InputStream, so I
like the name.
Patricia
A FileInputStream is an input stream that gets bytes from a file, and a
ReaderInputStream would be an input stream that gets bytes from a Reader.
And I didn't say that people could do without bytes. Of course, they
want to do those things, the question is directionality (read vs.
write). It makes a lot of sense to encrypt a string to write it to some
external entity. It's hard to come up with a compelling case where you
need to read from a String as encrypted bytes.
>> In this case the reason Patricia needs it is a poorly designed class.
>> Since that class expects to read textual data it should support
>> Readers. It could in addition support InputStream (although I would
>> mark that support as deprecated because users should not be using it).
>
> Even if not all the readers/writers are wrapped around a stream, there
> could be a public getInputStream(...) or getOutputStream(...). It would
> just throw an OperationNotSupportedException or something.
Don't follow you here.
--
Dale King
I'm not so sure that classes such as Process should supply Reader/Writer
interfaces themselves. Process is providing access to byte stream data.
If it does it through Reader and Writer it would have to be told the
encoding (even if it also had a option for going with a default encoding).
Some applications might want direct, unconverted, byte stream access to
exactly what the job said. That might be important, for example, for debug.
Given the general layering of I/O interfaces, I think it may be better
to just provide the Stream interfaces, and expect the caller to build
any Reader or Writer, passing the encoding information to the constructor.
ReaderInputStream and WriterOutputStream would have solved my problem
completely.
Patricia
I wasn't suggesting that Process should. If I understood your
explanation you are trying to test a class that process the output
streams of a Process. So I am imagining a class that has a constructor
that takes two InputStream objects, one for the output stream and one
for the error stream from the Process.
Since this class is really processing text information it should really
take two Reader objects. This makes the class independent of where the
actual data comes from (in your case from a String). If you wanted to
hook it up to the output of a Process you simply wrap the streams from
the Process in an InputStreamReader with the appropriate encoding.
This makes the class more general-purpose. Imagine for example if
instead of the data coming from another process on this machine you
instead sent off a SOAP message to another server and got a text
response back.
> Given the general layering of I/O interfaces, I think it may be better
> to just provide the Stream interfaces, and expect the caller to build
> any Reader or Writer, passing the encoding information to the constructor.
If I understand what you meant, I think that is what I am suggesting.
> ReaderInputStream and WriterOutputStream would have solved my problem
> completely.
But unfortunately would invite much abuse by those who do not understand
the differences between text and raw bytes. I could see many newbies
using a ReaderInputStream when in reality they should be converting to
Reader.
--
Dale King