Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Storage limits of sparsemapcontent
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Zach A. Thomas  
View profile  
 More options May 29 2011, 3:33 pm
From: "Zach A. Thomas" <z...@aeroplanesoftware.com>
Date: Sun, 29 May 2011 14:33:43 -0500
Local: Sun, May 29 2011 3:33 pm
Subject: Storage limits of sparsemapcontent

Hi. When we migrated the pilot at NYU to sparsemapcontent, some pages were lost with "encoded string too long" errors. When I went digging a little deeper, I found that the DataOutputStream writeUTF method used by StringType.java has a limit of 2^16 bytes per call. You can actually write more than this by splitting the data into smaller chunks and making multiple calls to writeUTF.

I went looking online for discussion of this problem. Here's how netbeans.org solved it: http://hg.netbeans.org/main/rev/6d07994bc971

Locally, I have tried this same fix on StringType.java, and it seems to work fine, but then I found out that blob columns in MySQL are also limited to 2^16 bytes! The combined storage for all the properties on a node must be below this limit. So I modified the MySQL ddl to use mediumblob (up to 16M bytes). This limitation doesn't surface on Oracle, where a blob can be up to 8 terabytes (wow).

The question for this list is whether we should take the netbeans approach and allow Strings over 64K bytes in the database, or somehow marshal/unmarshal these larger values to the filesystem?

In NYU's case, the properties which are this large are always sakai:pagecontent, which stores arbitrary HTML for pages. It's easy to imagine 64K byte and larger pages.

thanks,
Zach


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Boston  
View profile  
 More options May 31 2011, 5:54 am
From: Ian Boston <i...@tfd.co.uk>
Date: Tue, 31 May 2011 10:54:56 +0100
Local: Tues, May 31 2011 5:54 am
Subject: Re: [sakai-nakamura] Storage limits of sparsemapcontent
Zach,
The intention was the the properties of a Content Item would never be
greater than 64K, since that would mean streaming significant amounts
of data in and out of Java objects. If Content Items are becoming
greater than 64K, then we should address that by using file bodies
which stream correctly rather than allowing unlimited property sizes.

The Sparse ContentManagerImpl is not sophisticated enough to allow
arbitarty property sizes upto TB in size without any overhead. That
was a positive decision, made to avoid lots of complexity. I still
think that was the right decision.

Why are you getting more than 64K in a ContentItems properties?
That's a *big* object to be cached in memory, if there were millions
of them it would have a big impact on memory usage.
Ian

On 29 May 2011 20:33, Zach A. Thomas <z...@aeroplanesoftware.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Zach Thomas  
View profile  
 More options May 31 2011, 11:51 am
From: Zach Thomas <zach.tho...@gmail.com>
Date: Tue, 31 May 2011 08:51:02 -0700 (PDT)
Local: Tues, May 31 2011 11:51 am
Subject: Re: Storage limits of sparsemapcontent
It's sakai:pagecontent, which contains the HTML for any given group
page. They can get quite large.

Zach

On May 31, 4:54 am, Ian Boston <i...@tfd.co.uk> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Boston  
View profile  
 More options May 31 2011, 12:03 pm
From: Ian Boston <i...@tfd.co.uk>
Date: Tue, 31 May 2011 17:03:20 +0100
Local: Tues, May 31 2011 12:03 pm
Subject: Re: [sakai-nakamura] Re: Storage limits of sparsemapcontent
Over 64K they really should be a file.
Under 64K, they should be a property

64K is a very large HTML page, I have a feeling you can fit Hamlet
into that provided you dont go wild on markup.

Ian

On 31 May 2011 16:51, Zach Thomas <zach.tho...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
D. Stuart Freeman  
View profile  
 More options May 31 2011, 12:06 pm
From: "D. Stuart Freeman" <stuart.free...@et.gatech.edu>
Date: Tue, 31 May 2011 12:06:14 -0400
Local: Tues, May 31 2011 12:06 pm
Subject: Re: [sakai-nakamura] Re: Storage limits of sparsemapcontent

On Tue, May 31, 2011 at 05:03:20PM +0100, Ian Boston wrote:
> Over 64K they really should be a file.
> Under 64K, they should be a property

> 64K is a very large HTML page, I have a feeling you can fit Hamlet
> into that provided you dont go wild on markup.

I had to check: http://www.gutenberg.org/ebooks/1524
;)

--
D. Stuart Freeman
Georgia Institute of Technology

  signature.asc
< 1K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Tweney  
View profile  
 More options May 31 2011, 12:17 pm
From: Chris Tweney <ch...@media.berkeley.edu>
Date: Tue, 31 May 2011 09:17:24 -0700
Local: Tues, May 31 2011 12:17 pm
Subject: Re: [sakai-nakamura] Re: Storage limits of sparsemapcontent
IMHO the ContentManager should be the one to decide whether it should
store something in a file or a property. If you put that logic into the
calling code, then the caller needs to know a lot about underlying
storage mechanisms, and we'll have duplicated size checks scattered all
over the app.

-chris

On 5/31/11 9:06 AM, D. Stuart Freeman wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Boston  
View profile  
 More options May 31 2011, 12:30 pm
From: Ian Boston <i...@tfd.co.uk>
Date: Tue, 31 May 2011 17:30:36 +0100
Local: Tues, May 31 2011 12:30 pm
Subject: Re: [sakai-nakamura] Re: Storage limits of sparsemapcontent
That would be great, however, to do so would make the driver code
horribly complex, which is why the restriction is there. If you have a
look in the guts of Jackrabbit you get an idea just how expensive this
can be. I have to assume that the Jackrabbit team really do know what
they are doing, and have found the most elegant solution in this area.
They put it right at the bottom of their stack in the Bundle
Persistence manager that intelligently blocks up properties. Earlier
divers in Jackrabbit imposed a similar 64K limit. One other thing to
note is that IIRC Jackrabbit used its schema to help it make those
decisions.

I dont think we have the resource to do this at the lower levels and
make it work.... its quite a large re-write of the insert and get
methods in the drivers.

Ian

On 31 May 2011 17:17, Chris Tweney <ch...@media.berkeley.edu> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Tweney  
View profile  
 More options May 31 2011, 1:19 pm
From: Chris Tweney <ch...@media.berkeley.edu>
Date: Tue, 31 May 2011 10:19:07 -0700
Local: Tues, May 31 2011 1:19 pm
Subject: Re: [sakai-nakamura] Re: Storage limits of sparsemapcontent
Call me crazy here, but I think it's better to have that expensive,
complicated logic centralized in one low-level place than to have it
duplicated, with various levels of skill and correctness, across several
dozen different client components. If we don't do it in the storage
engine, then we're going to do it over and over again at the application
level. Or, we won't do it, and we'll have a bunch of bug reports that
come in from the real world when properties get above 64K.

64K is actually quite small for a real-world web page. Consider that
many users will create pages by pasting in from MS Word, where just a
couple of text pages can easily reach that size and larger.

-chris

On 5/31/11 9:30 AM, Ian Boston wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alan Marks  
View profile  
 More options May 31 2011, 4:19 pm
From: Alan Marks <alanma...@sakaifoundation.org>
Date: Tue, 31 May 2011 13:19:27 -0700
Local: Tues, May 31 2011 4:19 pm
Subject: Re: [sakai-nakamura] Re: Storage limits of sparsemapcontent

A brief aside from the implementation details:

There is no question that real-world use will run into this limit and that
long-term we need to find a way to let users save larger content. I created
a 14 page Word doc, then copied the text and pasted it into TinyMCE,
resulting in a 500 error. Thirteen pages worked. This is probably a pretty
common scenario.

That said, you could make a case that it would be supporting bad design to
allow such very long pages, but I could be accused of rationalizing.

At any rate, because we're past feature-freeze and in ship-mode, the leads
talked about this today and decided it would be too large and
destabilizing to fix now. We're going to provide better messaging to the
user, so that they can know when they've hit this limit. Sometimes you have
to make tradeoffs to ship. This is one of those times. I've created the
following Jiras:

https://jira.sakaiproject.org/browse/KERN-1919
https://jira.sakaiproject.org/browse/SAKIII-3162

https://jira.sakaiproject.org/browse/KERN-1920

On Tue, May 31, 2011 at 10:19 AM, Chris Tweney <ch...@media.berkeley.edu>wrote:

--

Alan Marks
Sakai OAE Project Director
skype: skramnala


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Boston  
View profile  
 More options Jun 1 2011, 5:07 am
From: Ian Boston <i...@tfd.co.uk>
Date: Wed, 1 Jun 2011 10:07:28 +0100
Local: Wed, Jun 1 2011 5:07 am
Subject: Re: [sakai-nakamura] Re: Storage limits of sparsemapcontent
After the leads meeting I thought about this and I think it may be
possible to create a new data type that handles large data types.
LongString.

This will have to be off by default since it may have a bad impact all
over the place.
On write, if a String is over a limit it will be written as a
LongString, which will be a reference to a file on disk. Once that
happens, it will be ignored for all sparse indexing although might
still be Ok for Solr indexing.
When its read it will come out as a LongString and provided its not
referenced anywhere in the Nakamura code base it will make it all the
way out to json.

If it is referenced anywhere in the Nakamura code base it will cause a
ClassCastException (since a LongString cant be cast to a String and a
String is final). That will randomly break random things and its quite
likely that those breakages will be masked by other error handling,
which is why, it I can get this to work at all, it will be off by
default, turned on at your peril.

Also, once a big string is in the DB, the only way to convert it back
to a String will be to delete the property and re-create it. I haven't
tried to write this patch yet and I may have missed something in my
thought process that make it impossible. The CastCastException is the
real blocker, caused, in part by abandoning the original design that
used coercion of data types rather than direct class casts.

Ian

On 31 May 2011 21:19, Alan Marks <alanma...@sakaifoundation.org> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »