No sample code, however, closes the streams. That seems to be either
wrong (and misleading to new users), or then I'm missing something..
To confuse matters further, however, I found one instance where
JetS3et does close the stream! In the putObjectWithSignedUrl() call in
RestS3Service, the input stream of the S3Object is closed! That method
seems a bit odd in general as it modifies the passed S3 object rather
than returning a new one as the method comments state.
Can someone explain when close() is supposed to be called by the
applicatio using JetS3t and when it gets called by JetS3t itself?
Getting close() right is pretty important when using files, so you
don't run out of file descriptors, so this is an area where
documentation could perhaps be improved.
Thanks!
Peppe
Thanks for bringing this to my attention, you're right that the data
input stream handling is incorrect.
The REST S3 should indeed be responsible for closing the input
streams of S3Object classes provided to it. The fact that this wasn't
happening was an oversight that was hidden by the fact that the multi-
threaded service used by all JetS3t applications did close these
streams. I have committed updated code that closes streams in the
RestS3Service.putObjectImpl() method and removed the stream closing
from the multi-threaded service, which is a little risky but will at
least bring any other errors to light more quickly.
The general rules for applications using JetS3t are:
- applications are not responsible for closing streams of objects
uploaded to S3
- applications are responsible for closing streams of objects
downloaded from S3, presumably after first consuming the data
(failure close these streams could cause nasty side-effects besides
open streams, such as network connections being held longer than
necessary)
As you point out, the DataInputStream/File distinction is a little
odd. The use of file objects instead of input streams was a late
addition to the toolkit that was necessary due to platform limits on
the number of open files that are permitted. Unfortunately Java
doesn't allow the creation of InputStream objects that aren't
automatically opened, so the necessary work-around is to ensure that
file input streams are only created at the moment they're needed. By
combining this with thread management (performed by the multi-
threaded service) you can ensure that only a limited number of files
are ever open at one time.
Good catch too with the comments on putObjectWithSignedUrl(), they
were plain misleading. I have updated these.
Hopefully the most recent changes in CVS will address these issues.
It would be great Peppe if you could check out this code and confirm
the fixes for me with your "stream closing" monitor.
Thanks for the precise and well-researched feedback, more is always
welcome.
Cheers,
James
I think the ideal would be to have methods in S3Object to do both
storing and retrieval, with the user explicitly both opening and
closing the streams (an output stream for storing and an input stream
for retrieval) and being responsible for both reading and writing data
to and from these streams. The problem with that is all the other
stuff that goes on - metadata, content length, content type, ACL, etc.
It should happen "behind the scenes" in that model, which means you'd
have to set it all up correctly before opening an output stream to
store an object.. The nice thing is that you could always read data
from an S3Object in a consistent way and not have the "read only once"
behavior. And one could have some sort of data caching in there too,
which would be useful for smaller objects..
Is there an automatically built binary from CVS daily or should I pull
the source from CVS?
Peppe