Performance issue with big files in hippo:resource

74 views
Skip to first unread message

bcs...@gmail.com

unread,
Nov 20, 2014, 12:43:07 PM11/20/14
to hippo-c...@googlegroups.com
Hello,
we want to add files to some nodes which the user can download. We use nodes of type hippo:resource for this.
Now we have tutorial videos up to 200 MB. So I used [1] to change the max file size to 200 MB and gave hippo enough ram [2] by using cargo.jvm.args.
But .. uploading a 130 MB file in CMS throws big performance issues. It seems to consume much ram, takes much time (even if changing other properties) and often fails due to timeout exceptions.
So my question is: What is the best way to store big files in nodes?

Thank you very much,
Björn

Hippo 7.9.3

[1] http://www.onehippo.org/library/concepts/editor-interface/image-and-asset-upload-validation.html
[2] <cargo.jvm.args>-Xmx8000m -XX:PermSize=1024m -XX:MaxPermSize=2048m -XX:-HeapDumpOnOutOfMemoryError</cargo.jvm.args>

Ard Schrijvers

unread,
Nov 21, 2014, 2:06:03 AM11/21/14
to hippo-c...@googlegroups.com
On Thu, Nov 20, 2014 at 6:43 PM, <bcs...@gmail.com> wrote:
> Hello,
> we want to add files to some nodes which the user can download. We use nodes
> of type hippo:resource for this.
> Now we have tutorial videos up to 200 MB. So I used [1] to change the max
> file size to 200 MB and gave hippo enough ram [2] by using cargo.jvm.args.

We normally store these outside the repository in dedicated video
storages, for example see [1]

Regards Ard

[1] http://external.forge.onehippo.org/

> But .. uploading a 130 MB file in CMS throws big performance issues. It
> seems to consume much ram, takes much time (even if changing other
> properties) and often fails due to timeout exceptions.
> So my question is: What is the best way to store big files in nodes?
>
> Thank you very much,
> Björn
>
> Hippo 7.9.3
>
> [1]
> http://www.onehippo.org/library/concepts/editor-interface/image-and-asset-upload-validation.html
> [2] <cargo.jvm.args>-Xmx8000m -XX:PermSize=1024m -XX:MaxPermSize=2048m
> -XX:-HeapDumpOnOutOfMemoryError</cargo.jvm.args>
>
> --
> Hippo Community Group: The place for all discussions and announcements about
> Hippo CMS (and HST, repository etc. etc.)
>
> To post to this group, send email to hippo-c...@googlegroups.com
> RSS:
> https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
> ---
> You received this message because you are subscribed to the Google Groups
> "Hippo Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hippo-communi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/hippo-community.
> For more options, visit https://groups.google.com/d/optout.



--
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Jeroen Reijn

unread,
Nov 21, 2014, 3:39:37 AM11/21/14
to hippo-c...@googlegroups.com
Hi Bjorn,

see my comments inline.

On Thu, Nov 20, 2014 at 6:43 PM, <bcs...@gmail.com> wrote:
Hello,
we want to add files to some nodes which the user can download. We use nodes of type hippo:resource for this.
Now we have tutorial videos up to 200 MB. So I used [1] to change the max file size to 200 MB and gave hippo enough ram [2] by using cargo.jvm.args.

Well that seems to be fine. Do not that cargo is usually only used for development purposes.
 
But .. uploading a 130 MB file in CMS throws big performance issues. It seems to consume much ram, takes much time (even if changing other properties) and often fails due to timeout exceptions.

Can you perhaps describe what you are experiencing a little more? What is perceived as slow? Uploading the file or does the CMS become unresponsive? I presume you are uploading your file as an asset?

The timeouts usually occur due to networking related matters. It could be that uploading the file to the server takes a long time. I have another question for you. Since you are running with Cargo are you also still using the h2 databases or have you switched to MySQL or something similar?
 
So my question is: What is the best way to store big files in nodes?

Thank you very much,
Björn

Hippo 7.9.3

[1] http://www.onehippo.org/library/concepts/editor-interface/image-and-asset-upload-validation.html
[2] <cargo.jvm.args>-Xmx8000m -XX:PermSize=1024m -XX:MaxPermSize=2048m -XX:-HeapDumpOnOutOfMemoryError</cargo.jvm.args>

--
Hippo Community Group: The place for all discussions and announcements about Hippo CMS (and HST, repository etc. etc.)
 
To post to this group, send email to hippo-c...@googlegroups.com
RSS: https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
---
You received this message because you are subscribed to the Google Groups "Hippo Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hippo-communi...@googlegroups.com.
Visit this group at http://groups.google.com/group/hippo-community.
For more options, visit https://groups.google.com/d/optout.



--
Jeroen Reijn
Hippo


Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 101 Main Street, Cambridge, MA 02142

bcs...@gmail.com

unread,
Nov 24, 2014, 4:10:57 AM11/24/14
to hippo-c...@googlegroups.com
Hi Jeroen & Ard,

thank you for your replies.
We're just in development stage and still use Cargo with h2, but will change to another container and DB (MySQL?) when going to production.
Will there be a different behaviour using MySQL or another DB?
I debugged and saw a "getStringRepresentation()" storing the whole file to one java String, so there seems to be no streaming used.
We have a document type with a "file" node of type hippo:resource to attach relevant data to the document.
Using it in the CMS with small files works fine, but uploading the 130 MB test file by CMS only works without OutOfMemoryException if I give Hippo really much RAM. Doing so (10 GB), the upload works, but the CMS gets unresponsive and so slow that it runs into it's own timeouts of 1 minute when trying to save and publish the document.
Network traffic shouldn't be the problem because I'm testing with everything on one big local machine ;)
Perhaps we have to check the external storage option.

Thank you for your help,
Björn

bcs...@gmail.com

unread,
Nov 24, 2014, 12:08:11 PM11/24/14
to hippo-c...@googlegroups.com, bcs...@gmail.com
Hi,

update:
I just tested my scenary by using PostgreSQL instead of H2 as DB, and could upload and download big files :). Memory usage is still high, but speed is ok and avoids the timeouts.
Just two "not so nice"-points:
* The big file is always read to memory instead of being streamed because the DB type bytea (byte array, no streaming) instead of blob (streaming) is used.
* The big file is always completely read/written by CMS if I change a (independent) property of the same node. I think this is because nodes are always written as one, and versioned etc

Greetings,
Björn

Woonsan Ko

unread,
Nov 24, 2014, 1:30:42 PM11/24/14
to hippo-c...@googlegroups.com
Hi Björn,

Are you sure that there exists code transforming binary input stream to a string in Hippo CMS7 product?
I just tried to search "getStringRepresentation" in the source directories, but I couldn't.
Could you point out where it exists?

Regards,

Woonsan

Boston - 1 Broadway, Cambridge, MA 02142
Amsterdam - Oosteinde 11, 1017 WT Amsterdam

bcs...@gmail.com

unread,
Nov 25, 2014, 4:57:42 AM11/25/14
to hippo-c...@googlegroups.com
Hi Woonsan,

I saw the "getStringRepresentation" while debugging to the bottleneck through the different layers of Hippo CMS, underlying JCR and JDBC. I mean that this was H2 specific JDBC code, so it won't be visible in toplevel code. Changing from H2 to PostgreSQL changed the upload of one 30MB file in CMS from about 60 to 2 sec, so H2 seems not optimal for big blobs. But even using PostgreSQL there is the same effect concerning memory usage because the automatically created schema uses bytea (instead of blob) to store the binary data, which is read and written at one piece instead of being streamed. I see that this behaviour is more an DB and JCR issue. Do you know if there is a DB (MySQL?) where streaming is supported by the Hippo-JCR-DB chain?

Greetings,
Björn

Bartosz Oudekerk

unread,
Nov 25, 2014, 5:28:48 AM11/25/14
to hippo-c...@googlegroups.com
On 25/11/14 10:57, bcs...@gmail.com wrote:
> Hi Woonsan,
>
> I saw the "getStringRepresentation" while debugging to the bottleneck
> through the different layers of Hippo CMS, underlying JCR and JDBC. I mean
> that this was H2 specific JDBC code, so it won't be visible in toplevel
> code. Changing from H2 to PostgreSQL changed the upload of one 30MB file in
> CMS from about 60 to 2 sec, so H2 seems not optimal for big blobs. But even
> using PostgreSQL there is the same effect concerning memory usage because
> the automatically created schema uses bytea (instead of blob) to store the
> binary data, which is read and written at one piece instead of being
> streamed. I see that this behaviour is more an DB and JCR issue. Do you
> know if there is a DB (MySQL?) where streaming is supported by the
> Hippo-JCR-DB chain?

See below for how Jackrabbit creates the datastore table for different
DB implementations:

http://svn.apache.org/repos/asf/jackrabbit/tags/2.6.5/jackrabbit-core/src/main/resources/org/apache/jackrabbit/core/data/db

Kind regards,
Bartosz
--
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 745 Atlantic Ave, Third Floor, Boston MA 02111

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
http://www.onehippo.com/
Reply all
Reply to author
Forward
0 new messages