Store JSON data into Datastore: your opinion ?

2,602 views
Skip to first unread message

davidgm

unread,
Nov 10, 2009, 11:23:21 AM11/10/09
to Google App Engine
Hi again,

I just figured out that using JSON to store my main data in datastore
could be a good idea.
I would like to share this view and have your opinion about this:

My app usually returns data to client in JSON format.
This data comes from different db classes, created almost as a
relational db (maybe a newbie mistake).
Instead of having those multiple records, fetching them and building
the aggregate structure and converting it to JSON... I just wonder if
it would be better to store the JSON string into a single db class.

Updates will need JSON parsing back and forth, but they are few
updates compared to read queries.
And db lock contention will be limited since only a few dozens of
clients are interested in one data chunk.

I think this trick can cut down CPU usage and datastore API calls.

But here comes another point: JSON string will be into memcache also.
This cache already reduces CPU and datastore calls. Maybe the
difference between a classic approach (close to relational db) and
this JSON string storage will be unnoticeable.

The intermediate solution is to store a complex aggregate into one db
class, but I didn't yet figure out how to do it. Precision: I use
Python.

So, my question is:

In this context, what do you think of using JSON strings to store data
into datastore?

Thanks for any advice

Brandon Thomson

unread,
Nov 10, 2009, 2:40:10 PM11/10/09
to Google App Engine
If you don't need the data indexed and your access patterns are
suitable like you say, there is no harm in storing data in JSON format
in a blob, i have done similar things in many places. But remember
there is no cjson on app engine so make sure to benchmark if you are
doing a lot of parsing. Google's protocol buffer is much faster than
the json parsing.

Paul Kinlan

unread,
Nov 10, 2009, 3:14:02 PM11/10/09
to google-a...@googlegroups.com
Hi,

I am doing something similar at the moment on http://www.ahoyo.com.  We parse feeds and aggregate them into a canonical JSON form that can be read directly by our client applications.  Pre-aggregating the data-feed as soon as we poll it or receive a pubsubhubbub notification rather than compute it when the client requests the data allows us to have a very speedy http handler (it is important because this is the touch point for our users).  We aren't memcaching the data at the moment, but it is very simple to add in and should save us a lot of datastore time on popular client applications.

There is very little processing effort required to give the data to our clients so the cost should be predictable per user of our system, if we didn't precompute the data the performance of the client applications is datastore query, quantity of data and sort dependent (and for popular apps it would end up costing us a lot of money).

There are downsides, none of this data that is json formatted is searchable, but if you can live with that your solution is pretty much what we do.

Paul

2009/11/10 davidgm <david.gu...@gmail.com>

Robert Kluin

unread,
Nov 10, 2009, 6:06:23 PM11/10/09
to google-a...@googlegroups.com
I am using a "mixed" approach.  When I receive data I process it (by validating inputs and update related entity's statistics), then when I compute aggregates I create and store a JSON object containing individual records with them.

Compared to individual entities, I have seen only a negligible decrease in write performance (a few extra CPU cycles used and no change in API time).  I see a huge increase on read performance since 95% of the time I can directly use the entities with the JSON data.  I only use the individual records for "advanced" searching and an infrequent report.

Even when needing to display individual records I have found deserializing the JSON and writing fields out to be as quick (or faster!) than fetching the the individual records from the datastore.  I have about 5 individual records bundled into 1 JSON object.  I tested both methods performance using several hundred records of each format (identical data in the proper formats for each method).

Robert

davidgm

unread,
Nov 12, 2009, 10:38:05 AM11/12/09
to Google App Engine
Thank you all for your interesting feedbacks.

The "JSON in datastore" solution seems quite well suited to my needs.

I keep in mind this idea of mixed/combo approach in case searching
becomes difficult with JSON storage.
In my case I think that instead of having all data in both format,
JSON and datastore classes, it could be better to keep the main data
in JSON format and have a partial copy in a few individual records for
searching and reporting purpose.

Even if CPU usage is a little bit higher with longer JSON string to
convert, I find it useful to reduce the datastore classes' complexity
without extra redundancy. And I don't really need frequent search in
that data. So this is the way to go.

Thanks again,
David


On 11 nov, 00:06, Robert Kluin <robert.kl...@gmail.com> wrote:
> I am using a "mixed" approach.  When I receive data I process it (by
> validating inputs and update related entity's statistics), then when I
> compute aggregates I create and store a JSON object containing individual
> records with them.
>
> Compared to individual entities, I have seen only a negligible decrease in
> write performance (a few extra CPU cycles used and no change in API time).
> I see a huge increase on read performance since 95% of the time I can
> directly use the entities with the JSON data.  I only use the individual
> records for "advanced" searching and an infrequent report.
>
> Even when needing to display individual records I have found deserializing
> the JSON and writing fields out to be as quick (or faster!) than fetching
> the the individual records from the datastore.  I have about 5 individual
> records bundled into 1 JSON object.  I tested both methods performance using
> several hundred records of each format (identical data in the proper formats
> for each method).
>
> Robert
>
>
>
> On Tue, Nov 10, 2009 at 3:14 PM, Paul Kinlan <paul.kin...@gmail.com> wrote:
> > Hi,
>
> > I am doing something similar at the moment onhttp://www.ahoyo.com.  We
> > parse feeds and aggregate them into a canonical JSON form that can be read
> > directly by our client applications.  Pre-aggregating the data-feed as soon
> > as we poll it or receive a pubsubhubbub notification rather than compute it
> > when the client requests the data allows us to have a very speedy http
> > handler (it is important because this is the touch point for our users).  We
> > aren't memcaching the data at the moment, but it is very simple to add in
> > and should save us a lot of datastore time on popular client applications.
>
> > There is very little processing effort required to give the data to our
> > clients so the cost should be predictable per user of our system, if we
> > didn't precompute the data the performance of the client applications is
> > datastore query, quantity of data and sort dependent (and for popular apps
> > it would end up costing us a lot of money).
>
> > There are downsides, none of this data that is json formatted is
> > searchable, but if you can live with that your solution is pretty much what
> > we do.
>
> > Paul
>
> > 2009/11/10 davidgm <david.guyonmar...@gmail.com>

Siva P Thumma

unread,
Nov 13, 2009, 1:17:28 AM11/13/09
to Google App Engine
This idea would be the perfect for app-engine like datastores. ( In
the cloud. )
Because everything is a string ofcourse over the wire. ( there would
be no registry-bits-and-bytes over the wire ).
In fact, I doubt bigtable is kind of an implementation of this thing
in its core.

Any way, I "am" looking forward to see a perfect JSON datastore in the
cloud.
I hope it should take more time for the people to understand this
concept. Because I love oracle and fear that it may drown by this
massive and SIMPLE data notation architecture.
Reply all
Reply to author
Forward
0 new messages