Performance on large object tables [500K rows]

57 views
Skip to first unread message

Neville

unread,
May 25, 2008, 4:03:27 AM5/25/08
to Persevere
Just wondering about how Persevere [via JSONPath] handles sorting and
filtering on large database tables, MS SQL Server in particular.

Our home grown JSON server, built on Ruby on Rails, provides very fast
access to a 500,000 row table for a limited sort and filter set,
because Rails essentially generates SQL which can be tuned to use
indexes, foreign keys.

In particular I am keen to have the same type of performance [because
Persevere is generating similar SQL] for:

1) JSONPath filters and sorting on simple database column expressions
2) Set Pagination, ie via HTTP Range Headers

Item 1) should be ok if Persevere attempts to construct an SQL which
includes as much of the filter/where expression as possible, and as
much of the sort expression as possible. For complex filters, this
might mean performing a second filter pass on the SQL result set
within Persevere I think.

Item 2) is harder, as some SQL databases, SQL Server included, dont
support LIMIT and OFFSET which are needed for pagination by the
database server. The Rails SQL Server Adapter emulates LIMIT and
OFFSET via some SQL "gymnastics" [which I can detail if desired] and
pagination performance on a large dataset is very good.

I'm a little worried that Persevere might process JSONPath filters,
sorting and Range pagination locally, as opposed to pushing as much as
possibe to the datasource.

So, are there any known performance concerns / gotchas in these
specific areas?
Am I being needlessly paranoid?

Neville

unread,
May 25, 2008, 9:52:35 PM5/25/08
to Persevere
I decided to do some testing with Persevere 0.9.8 and a 300,000 row
Customers table which exposes only 3 columns - UniqueID, CustomerCode,
CustomerTitle.

Notes:
Client: Firefox 3 RC1
JRE: 1.6.0.5
OS: Windows XP SP3

test 1: fetch by id
url: http://localhost:8080/Customers/1
expected: 1 row
result: success

test 2: CustomerCode < 'AB'
url: http://localhost:8080/Customers/[?(@.CustomerCode<'AB')]
expected: around 20 rows
result: failure
java.lang.StackOverflowError
at org.json.JSONTokener.nextValue(JSONTokener.java:310)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
at org.json.JSONTokener.nextValue(JSONTokener.java:322)
at org.json.JSONArray.(JSONArray.java:119)
<snip>

test 3: HTTP Range items=0-10
url: http://localhost:8080/Customers/
expected: 11 rows
result: failure
java.lang.OutOfMemoryError: Java heap space
at
com.microsoft.sqlserver.jdbc.DDC.convertReaderToString(Unknown Source)
at
com.microsoft.sqlserver.jdbc.DDC.convertNonUnicodeNonPLPToObject(Unknown
Source)
at com.microsoft.sqlserver.jdbc.ServerDTVImpl.getValue(Unknown
Source)
at com.microsoft.sqlserver.jdbc.DTV.getValue(Unknown Source)
at com.microsoft.sqlserver.jdbc.Column.getValue(Unknown
Source)
at com.microsoft.sqlserver.jdbc.Column.getValue(Unknown
Source)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getString(Unknown
Source)
at
org.persvr.datasource.DatabaseTableDataSource.mapQuery(DatabaseTableDataSource.java:
33)
at
org.persvr.data.PersistableObject.mapPersistent(PersistableObject.java:
847)
at org.persvr.data.ObjectId.resolveTarget(ObjectId.java:193)
at org.persvr.data.Query.getTarget(Query.java:23)
at org.persvr.data.Query.getTarget(Query.java:1)
at org.persvr.remote.ClientConnection
$IndividualRequest.requestData(ClientConnection.java:260)
at
org.persvr.remote.PersevereFilter.doFilter(PersevereFilter.java:309)
at org.mortbay.jetty.servlet.ServletHandler
$CachedChain.doFilter(ServletHandler.java:1084)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:
360)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:
216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:
181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:
726)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:
206)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:
114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:
152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:
505)
at org.mortbay.jetty.HttpConnection
$RequestHandler.headerComplete(HttpConnection.java:828)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:514)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:
211)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:
380)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:
395)
at org.mortbay.thread.BoundedThreadPool
$PoolThread.run(BoundedThreadPool.java:450)

Kris Zyp

unread,
May 25, 2008, 9:58:50 PM5/25/08
to Persevere
> test 2: CustomerCode < 'AB'
> url: http://localhost:8080/Customers/[?(@.CustomerCode<'AB')]

You need to escape the '?' as it is supposed to be part of the path.
Properly encoded the URL should be:
http://localhost:8080/Customers/%5B%3F(%40.CustomerCode%3C'AB')%5D
However, I make this mistake all the time as well, so I think I am going to
add some code to look for query strings and intrepret it as the user
intended. The error message isn't really that helpful either, but it is
basically trying to parse "[" (because the query string is treated
separately). However, even with right query, I would bet you would encounter
the same problem as test #3.

> test 3: HTTP Range items=0-10
> url: http://localhost:8080/Customers/
> expected: 11 rows
> result: failure
> java.lang.OutOfMemoryError: Java heap space

We definitely need SQL pagination!
Kris

Neville

unread,
May 25, 2008, 10:50:46 PM5/25/08
to Persevere
> You need to escape the '?' as it is supposed to be part of the path.

doh!

I knew that ... just testing the group <grin> !

But it does raise in my mind the idea of a human friendly syntax
[which you alluded to in the "sorting" discussion].

Would it be possible for Persevere to support an alternative
"JSONPath" syntax such as

http://host:port/Resource/id?filter=expr&sort=expr

Anyhow, on with the test!

test 2: CustomerCode < 'AB'
url: http://localhost:8080/Customers/%5B%3F(%40.CustomerCode%3C'AB')%5D
expected: around 20 rows
result: failure
2008-05-26 12:40:19.295::WARN: EXCEPTION
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.newKeyIterator(Unknown Source)
at java.util.HashMap$KeySet.iterator(Unknown Source)
at java.util.HashSet.iterator(Unknown Source)
at sun.nio.ch.SelectorImpl.processDeregisterQueue(Unknown
Source)
at sun.nio.ch.WindowsSelectorImpl.doSelect(Unknown Source)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at org.mortbay.io.nio.SelectorManager
$SelectSet.doSelect(SelectorManager.java:406)
at
org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:166)
at
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:
124)
at org.mortbay.jetty.AbstractConnector
$Acceptor.run(AbstractConnector.java:537)
at org.mortbay.thread.BoundedThreadPool
$PoolThread.run(BoundedThreadPool.java:450)

> We definitely need SQL pagination!

To do this we need a JSONPath to SQL abstraction, a default
implementation, and the ability to specify a custom implementation in
the \config\

Then I can help populate the SQL Server custom implementation.

Neville

unread,
May 25, 2008, 10:57:24 PM5/25/08
to Persevere
> However, even with right query, I would bet you would encounter
> the same problem as test #3.

Of course you were right ... however this test is easy to pass for a
small result set if Persevere constructs a SQL "Where clause".

Kris Zyp

unread,
May 25, 2008, 11:06:08 PM5/25/08
to persevere...@googlegroups.com
> But it does raise in my mind the idea of a human friendly syntax
> [which you alluded to in the "sorting" discussion].

Yeah, it really should all work without escaping. There are some limits, %
and # simply must be properly escaped.

> Would it be possible for Persevere to support an alternative
> "JSONPath" syntax such as
>
> http://host:port/Resource/id?filter=expr&sort=expr

Sure, although that doesn't help with escaping does it?

>> We definitely need SQL pagination!
>
> To do this we need a JSONPath to SQL abstraction, a default
> implementation, and the ability to specify a custom implementation in
> the \config\
>
> Then I can help populate the SQL Server custom implementation.

You are going to write the SQL Server custom implementation? Great, I will
do the other parts!

>> However, even with right query, I would bet you would encounter
>> the same problem as test #3.

> Of course you were right ... however this test is easy to pass for a


> small result set if Persevere constructs a SQL "Where clause".

Yes, that will be included in the JSONPath to SQL conversion.

Thanks,
Kris

>
>
> >
>

Kris Zyp

unread,
May 26, 2008, 12:52:59 PM5/26/08
to persevere...@googlegroups.com

> 1) JSONPath filters and sorting on simple database column expressions
> 2) Set Pagination, ie via HTTP Range Headers
>
> Item 1) should be ok if Persevere attempts to construct an SQL which
> includes as much of the filter/where expression as possible, and as
> much of the sort expression as possible. For complex filters, this
> might mean performing a second filter pass on the SQL result set
> within Persevere I think.

Currently Persevere does process JSONPath expression in this manner, passing
expressions to the data source where data sources can create SQL from simple
expressions when possile. However, the relation DB data source plugin does
not support creating SQL from JSONPath expressions yet, only the object DB
data source can do it right now. However, since all the hard work of parsing
JSONPath, sending it to the data source in coherent form, and creating SQL
(for the object db) is already done, it should be quite easy for me to get
it in the relational DB data source, probably could do it next week.

Obviously passing sort expressions through to the database is not supported
yet since we are currently discussing the syntax for such ;), but that is
certainly the intent of sort expressions. There is also definitely room for
improvement as far as translating more complex JSONPath expressions into SQL
(although there will probably always be room for further optimizations in
this area). The nice thing is this can be done without surface changes, it
purely internal optimizations by the datasource implementations.

> Item 2) is harder, as some SQL databases, SQL Server included, dont
> support LIMIT and OFFSET which are needed for pagination by the
> database server. The Rails SQL Server Adapter emulates LIMIT and
> OFFSET via some SQL "gymnastics" [which I can detail if desired] and
> pagination performance on a large dataset is very good.

End-to-end paging is definitely a goal of Persevere, and has been
architected with that in mind, but the SQL side is not in place yet. I would
definitely appreciate any information you can provide on the SQL
"gymnastics" involved in getting cross-browser paging (LIMIT/OFFSET)
working, so that can be properly implemented.
On the client side we have all the pieces in place, Dojo's grid supports
pagination, as does the RestStore that I built which then translates to HTTP
Range/Partial Content transfers, which is supported by Persevere. This is
probably the most important part, since the client server connection is the
biggest bottleneck, but it would definitely be great to get pagination all
the way through to the database.
Thanks,
Kris

Neville

unread,
May 26, 2008, 9:34:51 PM5/26/08
to Persevere
> I would definitely appreciate any information you can provide on the SQL
> "gymnastics" involved in getting cross-browser paging (LIMIT/OFFSET)
> working, so that can be properly implemented.

Unfortunately I cant help out with anything other than MS SQL Server :
(

However for SQL Server, in my testing, for Range=RangeStart -
RangeEnd, the most performant approach seems to be:

t1) select the top RangeEnd id's matching the Filter and Sort
t2) reverse sort and select the last N (RangeEnd-RangeStart+1) id's
via TOP
t3) select the rows matching the N ids

SELECT t3.* FROM <table> t3 WHERE t3.id IN (
SELECT TOP <RangeEnd - RangeStart + 1> t2.id FROM <table> t2 WHERE
t2.id in (
SELECT TOP <RangeEnd> t1.id from <table> t1 WHERE <where> ORDER BY
<order>)
ORDER BY <order reversed>
ORDER BY <order>

Neville

unread,
May 26, 2008, 11:17:49 PM5/26/08
to Persevere
So a simple of paging 10 a time through a people table, in lastname,
firstname order, might be:

page 1:

SELECT t3.* FROM people t3 WHERE t3.id IN (
SELECT TOP 10 t2.id FROM people t2 WHERE t2.id in (
SELECT TOP 10 t1.id FROM people t1 ORDER BY lastname ASC,
firstname ASC)
ORDER BY lastname DESC, firstname DESC
ORDER BY lastname ASC, firstname ASC

page 2:

SELECT t3.* FROM people t3 WHERE t3.id IN (
SELECT TOP 10 t2.id FROM people t2 WHERE t2.id in (
SELECT TOP 20 t1.id FROM people t1 ORDER BY lastname ASC,
firstname ASC)
ORDER BY lastname DESC, firstname DESC
ORDER BY lastname ASC, firstname ASC

page 3:

SELECT t3.* FROM people t3 WHERE t3.id IN (
SELECT TOP 10 t2.id FROM people t2 WHERE t2.id in (
SELECT TOP 30 t1.id FROM people t1 ORDER BY lastname ASC,
firstname ASC)
ORDER BY lastname DESC, firstname DESC
ORDER BY lastname ASC, firstname ASC

Kris Zyp

unread,
May 27, 2008, 10:46:43 AM5/27/08
to persevere...@googlegroups.com
Wow, this is quite tricky. For a first pass improvement, I am just going to
see how it works with just lazy loading from the result set. Then we can go
from there.
Kris
Reply all
Reply to author
Forward
0 new messages