How to export huge data collections (memory will be important)

72 views
Skip to first unread message

Miguel Almeida

unread,
May 18, 2012, 11:16:46 AM5/18/12
to jmesa...@googlegroups.com
Dear all,

We've recently started using JMesa to test it as a replacement for Displaytag (which seems to have been abandoned).

When the items to retrieve are too big, we need to perform pagination. This works as expected in any pagination system - you only fetch a subset of your records from the underlying database.

My question is: what happens when you need to export the same data (eg: to pdf)?
As I looked around the code, every AbstractViewExporter implementation seems to get the complete data and pass it to a byte[] array [1]. If the data is huge, this will cause memory problems. Sure enough, we've built a test case that blew on 400k records [2].

I suppose the solution would be to write to a temporary file instead of memory. Does this already exist? Can we implement a new exporter for it? If not, how do you solve this problem?


Cheers,

Miguel Almeida


[1] - Eg: CvsViewExporter's export method:
    public void export()
            throws Exception {
        responseHeaders(getResponse());
        String viewData = (String) getView().render();
        byte[] contents = (viewData).getBytes();
        ServletOutputStream outputStream = getResponse().getOutputStream();
        outputStream.write(contents);
        outputStream.flush();
    }


[2]
Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
    at java.lang.StringCoding.encode(StringCoding.java:272)
    at java.lang.StringCoding.encode(StringCoding.java:284)
    at java.lang.String.getBytes(String.java:986)
    at org.jmesa.view.csv.CsvViewExporter.export(CsvViewExporter.java:42)
    at org.jmesa.facade.TableFacade.renderExport(TableFacade.java:869)
    at org.jmesa.facade.TableFacade.render(TableFacade.java:853)
    at org.jmesa.model.TableModel.render(TableModel.java:325)
    at com.itclinical.clinicalmanagement.web.utilities.JMesaService.render(JMesaService.java:67)
    at com.itclinical.clinicalmanagement.web.actions.AuditAction.retrieveAuditPage(AuditAction.java:71)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.opensymphony.xwork2.DefaultActionInvocation.invokeAction(DefaultActionInvocation.java:453)
    at com.opensymphony.xwork2.DefaultActionInvocation.invokeActionOnly(DefaultActionInvocation.java:292)
    at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:255)
    at com.opensymphony.xwork2.interceptor.DefaultWorkflowInterceptor.doIntercept(DefaultWorkflowInterceptor.java:176)
    at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:98)
    at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249)
    at com.opensymphony.xwork2.validator.ValidationInterceptor.doIntercept(ValidationInterceptor.java:265)
    at org.apache.struts2.interceptor.validation.AnnotationValidationInterceptor.doIntercept(AnnotationValidationInterceptor.java:68)
    at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:98)
    at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249)
    at com.opensymphony.xwork2.interceptor.ConversionErrorInterceptor.intercept(ConversionErrorInterceptor.java:138)
    at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249)
    at com.opensymphony.xwork2.interceptor.ParametersInterceptor.doIntercept(ParametersInterceptor.java:211)
    at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:98)
    at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249)
    at com.opensymphony.xwork2.interceptor.StaticParametersInterceptor.intercept(StaticParametersInterceptor.java:190)
    at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:249)
    at org.apache.struts2.interceptor.CheckboxInterceptor.intercept(CheckboxInterceptor.java:90)


Miguel Almeida

unread,
May 18, 2012, 11:44:46 AM5/18/12
to jmesa...@googlegroups.com
Just to complete my previous post, the memory exception is at the render() phase of the export (line  getView().render(); in the CSVViewExporter) - which might make things more difficult.

Jeff Johnston

unread,
May 21, 2012, 3:40:05 PM5/21/12
to jmesa...@googlegroups.com
Sorry for the late reply...I have been on vacation for a few days!

For large result sets you would still use the Limit along with the PageItems to only get the results that you want.

http://code.google.com/p/jmesa/wiki/LimitTutorialV3

If you do not care about honoring the filter and sort for exports you could also use the AllItems interface.

Both PageItems and AllItems are callback interfaces on the TableModel getItems() method.


Can we implement a new exporter for it?

Yes, although this is much easier on the trunk code as you can now plug in you own exporter on the TableModel.

Also, FYI, one change that is coming with the 4.0 release is I am taking the dynfilter logic out and just using simple input fields for the filters. One thing I am doing with the 4.0 release is simplifying some things and not adding in any new features.

Hope that helps...let me know if you have more questions.

-Jeff Johnston

Miguel Almeida

unread,
May 22, 2012, 7:56:16 AM5/22/12
to jmesa...@googlegroups.com
Hi Jeff, thanks for the reply.
My response is inline.


On Monday, May 21, 2012 8:40:05 PM UTC+1, Jeff Johnston wrote:
Sorry for the late reply...I have been on vacation for a few days!

For large result sets you would still use the Limit along with the PageItems to only get the results that you want.

http://code.google.com/p/jmesa/wiki/LimitTutorialV3

If you do not care about honoring the filter and sort for exports you could also use the AllItems interface.

Both PageItems and AllItems are callback interfaces on the TableModel getItems() method.

Just to be clear: this would only export the current sub-set, right? Ie, it is not for "displaying with Limit but export all items".

Can we implement a new exporter for it?

Yes, although this is much easier on the trunk code as you can now plug in you own exporter on the TableModel.

I might have a look at it then. As we weren't able to solve the problem, in the meantime we started implementing a pdf exporter completely outside the scope of JMesa (where we'll write the objects to disk instead of storing them in memory, as that would totally blow things up  for large datasets)

Cheers,

Miguel Almeida

Jeff Johnston

unread,
May 22, 2012, 9:55:14 AM5/22/12
to jmesa...@googlegroups.com
With the PageItems it would export just what you want. But in general you do always have control over what gets pulled. If the results are too big to fit into memory then I would think you would always want to use the PageItems for that use case.

Another way that you could do it is pass the TableModel a Limit object in which the RowSelect object has the exact rows that you want to export. The idea behind the Limit was always to be able to throw the table into whatever state you wanted. I do not think this is what you really want though.

So try the
PageItems and then I can work with you if things are not working.

Going on the trunk right now would be rough though as we are refactoring the JavaScript to simplify how it works. I'll let you know when things settle down!

-Jeff

Miguel Almeida

unread,
May 22, 2012, 12:59:22 PM5/22/12
to jmesa...@googlegroups.com
Hey Jeff. I re-read my email and I think I wasn't very clear.


On Tuesday, May 22, 2012 2:55:14 PM UTC+1, Jeff Johnston wrote:
With the PageItems it would export just what you want. But in general you do always have control over what gets pulled. If the results are too big to fit into memory then I would think you would always want to use the PageItems for that use case.
 
What I want is actually "all items to be pulled when exporting, but only the paginated result to be shown when displaying on the webpage". From your answers it seems that's not the case, is it?

Miguel
 

Jeff Johnston

unread,
May 23, 2012, 12:37:15 PM5/23/12
to jmesa...@googlegroups.com
When you use the PageItems callback you do have control over what items you want to return. The PageItems basically tells you how the user filtered, sorted, and what page they are on.

-Jeff
Reply all
Reply to author
Forward
0 new messages