Best way to handle streaming LOB fields

1,610 views
Skip to first unread message

Craig B

unread,
Jan 3, 2009, 5:48:55 PM1/3/09
to SimpleJPA
I am investigating the best way to support dealing with very large
objects stored in S3 using the @Lob annotation. The S3 support
already in SimpleJPA is great, however the JPA specification fails to
adequately deal with the case of objects too large to fit in memory.

Since the JPA specification (and thus SimpleJPA) expects to deal with
simple types for large objects, applications are required to load each
object fully into memory (in a String or comparable data-type) prior
to uploading the content to S3. Since S3 supports very large objects
that don't comfortably fit into memory, a process is needed to access
these resources (both writing and reading) through streams.

Since SimpleJPA uses InputStreams internally to push data to and from
S3, the basic functionality exists, and we would only need to get
access to these underlying streams. However, I wanted to get some
feedback as to the best way to support this non-standard feature.

There are a couple of ways to handle this. One would be to use the
existing @Lob annotation, and provide a publicly accessible API method
on the entity manager or lazy initialization wrapper to create and
return a stream directly to or from S3. This would allow maximum
reuse of existing infrastructure, and easy integration with client
applications (requiring only a cast to the appropriate SimpleJPA
interface, or no cast at all with a dynamic language such as Groovy).

Another possibility would be to use a proprietary annotation to mark
fields which should be accessible as streams. This is the approach
taken by other JPA providers such as OpenJPA, which (I think) uses a
@Persistent annotation to enable streaming access. While this would
align SimpleJPA with other JPA providers, this approach requires the
most intrusive changes both to SimpleJPA and client applications, and
to me is the least favorable.

I am planning on adding this feature to a private branch of the
codebase, but would be happy to share if it would be of use to the
wider community.

Thoughts?

Sanne Grinovero

unread,
Jan 4, 2009, 7:00:04 AM1/4/09
to simp...@googlegroups.com
Hi,
you should mention this issue to the JPA2 expert group, as it would be useful for something like this to be included in the standard.
I doubt that it would make sense to have a @Lob on such a getter, as it would be a self-kill to try using this get method, but I agree there
should be some easy and standard way to retrieve a stream from a lob in Database wihout having to be aware of the SQL specifics of
the current database.
How would your API lok like?
I expect something like
Stream s = em.getStreamingLob( Serializable IdOfContainingEntity, String fieldName);
or something like OpenJPA; for reference I found this JIRA:
http://issues.apache.org/jira/browse/OPENJPA-130
I wonder why this cool solution was not included in JPA2.

2009/1/3 Craig B <craig...@gmail.com>

Travis Reeder

unread,
Jan 4, 2009, 3:54:45 PM1/4/09
to simp...@googlegroups.com
Yes, it seems like it should be part of the spec, but since it's not, I think either of Craig's suggestions would work (special streaming persist and get) or a proprietary annotation. There is also a third suggestion from that JIRA link that makes sense:

If you annotate an InputStream with @LOB then OpenJPA should be smart enough to figure it out.
@Entity
public class Employee {
    ...
    @Lob
    private InputStream photoStream;
...
}


That actually makes a lot of sense to me, but it's followed up with:

My only concern with Craig's suggestion is compatibility if the JPA spec team decides to go a different route in a future version. If we use our own annotation, then we can maintain compatibility for it even in the face of spec changes.

I think I like the @Lob annotated InputStream or Reader for it's ease of use and no new annotation, but would it cause issues in the future if the spec changes?  If it will, then I would vote for a new annotation like @Persistent. And lastly the special method because it makes the programmer have to do more work and remember to use these methods.

Craig: Regardless, I would love to see your patches. Could you attach them to this ticket: 

Travis

Craig B

unread,
Jan 4, 2009, 4:21:01 PM1/4/09
to SimpleJPA
I also like the approach of using an annotation on an InputStream
object, but that only deals with streaming reads and ignores streaming
writes (which are just as important for large datasets). I think that
instead of using an InputStream we would have to use a new data type
that could provide both an InputStream and an OutputStream.

I suppose another possibility would be using a
java.nio.channels.ByteChannel (which is the base class of both
ReadableByteChannel and WritableByteChannel) and providing the correct
concrete implementation depending on whether it's accessed by the
getter or setter. A channel can of course be converted to an
InputStream/OutputStream and vice-versa in the client as needed. As
an ancillary benefit, if Jets3t were ever to support NIO then having
channel support end-to-end would enable more efficient IO with in-
kernel direct buffers.

In the meantime, after I verify it works I will post what I have as an
example and we can discuss the API from there.

Craig
> <sanne.grinov...@gmail.com>wrote:
>
> > Hi,
> > you should mention this issue to the JPA2 expert group, as it would be
> > useful for something like this to be included in the standard.
> > I doubt that it would make sense to have a @Lob on such a getter, as it
> > would be a self-kill to try using this get method, but I agree there
> > should be some easy and standard way to retrieve a stream from a lob in
> > Database wihout having to be aware of the SQL specifics of
> > the current database.
> > How would your API lok like?
> > I expect something like
> > Stream s = em.getStreamingLob( Serializable IdOfContainingEntity, String
> > fieldName);
> > or something like OpenJPA; for reference I found this JIRA:
> >http://issues.apache.org/jira/browse/OPENJPA-130
> > I wonder why this cool solution was not included in JPA2.
>
> > 2009/1/3 Craig B <craigwbl...@gmail.com>

Travis Reeder

unread,
Jan 4, 2009, 7:29:32 PM1/4/09
to simp...@googlegroups.com
Writes would still be an InputStream, eg:

 @Entity
public class Employee {
    
    private InputStream photoStream;

Travis Reeder

unread,
Jan 4, 2009, 7:31:59 PM1/4/09
to simp...@googlegroups.com
Ooops, sent the last one by accident, here we go again:

Writes would still be an InputStream, eg:

 @Entity
public class Employee {
    
    private InputStream photoStream;

     @Lob
     public InputStream getPhotoStream(){
           return photoStream;
    }

    public void setPhotoStream(InputStream photoStream){
         this.photoStream = photoStream;
    }
}

You would pass in an InputStream on set and get an InputStream on get.

Travis

On Sun, Jan 4, 2009 at 1:21 PM, Craig B <craig...@gmail.com> wrote:

Craig B

unread,
Jan 4, 2009, 8:48:17 PM1/4/09
to SimpleJPA
Ah, I see. I think that might be easier than the workaround I've been
working on. I'll see how easily I can switch to using the
InputStream.

On Jan 4, 7:31 pm, "Travis Reeder" <tree...@gmail.com> wrote:
> Ooops, sent the last one by accident, here we go again:
> Writes would still be an InputStream, eg:
>
>  @Entity
> public class Employee {
>
>     private InputStream photoStream;
>
>      @Lob
>      public InputStream getPhotoStream(){
>            return photoStream;
>     }
>
>     public void setPhotoStream(InputStream photoStream){
>          this.photoStream = photoStream;
>     }
>
> }
>
> You would pass in an InputStream on set and get an InputStream on get.
>
> Travis
>

Craig B

unread,
Jan 4, 2009, 10:35:04 PM1/4/09
to SimpleJPA
Patch attached to the issue.

Travis Reeder

unread,
Jan 4, 2009, 11:43:49 PM1/4/09
to simp...@googlegroups.com
Great, thanks. Will check it out. 

Any chance you could throw in a simple unit test for it?

new object(), setField(InputStream), persist(), sleep(5000), get object from SimpleDB, call getter, compare original with new.

Travis

Craig B

unread,
Jan 5, 2009, 12:46:48 AM1/5/09
to SimpleJPA
I would, but there is no build system in place to handle running the
tests. Unfortunately the pom.xml that someone submitted does not run
the tests.

Sanne Grinovero

unread,
Jan 5, 2009, 7:26:40 AM1/5/09
to simp...@googlegroups.com
Hi, there are lots of good ideas here but when using:

"
public void setPhotoStream(InputStream photoStream){
         this.photoStream = photoStream;
    }
"
you don't have a way to say "ok, I'm done" (flush/close/..).
that looks scary in terms of resource management.

Hibernate is doing something even better IMHO:

@Lob
Clob getData();
setData(Clob data);

for demo code, relevant to hibernate 2:
http://www.hibernate.org/56.html

you get a reference to a controlling object, you can later open streams of different types and manage the resource lifecycle.



2009/1/5 Craig B <craig...@gmail.com>

Craig B

unread,
Jan 5, 2009, 1:22:41 PM1/5/09
to SimpleJPA
The patch is implemented to close the InputStream after the content is
persisted to S3, which happens on the em.merge() or em.persist()
call. The stream remains open until the object is persisted. The
final implementation should also handle closing the stream if the
object is discarded without being persisted.

Jets3t also supports setting the object content to a File, which is
only opened and read when the object is actually sent to S3. This is
nice since there is no file descriptor or stream open during the time
between setting the Lob field and finally persisting the object.
However, I don't see how to cleanly support that with the "@Lob
InputStream" approach.

The Hibernate Clob type approach is what I meant when discussing
creating a new data type that can supply streams. I also like that
approach, though it could require more invasive changes to SimpleJPA.

On Jan 5, 7:26 am, "Sanne Grinovero" <sanne.grinov...@gmail.com>
wrote:

Travis Reeder

unread,
Jan 5, 2009, 2:37:50 PM1/5/09
to simp...@googlegroups.com
Interesting.  Although looking at Hibernate.createBlob() and Hibernate.createClob(), it doesn't look like they take a File either. 

We could have a FileBlob and FileClob class that implement the java.sql.Blob and java.sql.Clob interfaces. I wonder if we could make use of any of the Blob/Clob methods though?  

Travis
Reply all
Reply to author
Forward
0 new messages