Newbie Question - Support for Google Gears

5 views
Skip to first unread message

harshal

unread,
Apr 1, 2009, 10:09:39 AM4/1/09
to SIMILE Widgets
Hi,

Is it possible to increase scalability of Exhibit by using Google
Gears (or similar)?

If I understand Google Gears and Exhibit enough, I think they can be
fit together, at least to some extent, so that more elements can be
processed through this approach. Most of the Exhibit users have 'large-
scale' requirement to handle 10K-50K items.

I would appreciate any view/comment on this.

Thanks your time and efforts.

With regards,
./harshal

David Huynh

unread,
Apr 2, 2009, 2:12:45 AM4/2/09
to simile-...@googlegroups.com
Harshal,

David Karger and his students have been pondering about that option, I
believe. We just need someone to roll up their sleeves and try it out.
Do note that, of course, those 10K-50K items still need to make their
way over HTTP each time.

David

Adam Marcus

unread,
Apr 2, 2009, 7:43:43 AM4/2/09
to simile-...@googlegroups.com
Harshal,

As David said, while working on Datapress, Edward Benson and I have
been thinking about this problem. We're working on building an apache
module/python script that will turn datasets into sqllite databases on
the server side, so that if a dataset is too large to send across the
network, all queries can hit the server. The benefit of using sqllite
is that it's also what is used by both gears and is available in
firefox (all html5-compliant browsers will have it soon), so we can
also opt to send the dataset to the client for better client-side data
processing.

The step after that will be to allow exhibit to speak with this
data-aware webserver to decide whether to run the query remotely or
locally. David Huynh was our inspiration for this idea, since he
created a prototype of it before heading west. Hang in there, and
we'll give more information as it comes out!

-Adam

harshal

unread,
Apr 3, 2009, 2:37:16 AM4/3/09
to SIMILE Widgets
Thanks David and Adam,

It would be fantastic to have these features in Exhibit.
Just can't wait for them.

We use Exhibit for data visualization, over the LAN most of the time.
So at least we are constrained.

Thanks for your time, efforts and interest.

With regards,

./harshal

On Apr 2, 4:43 pm, Adam Marcus <mar...@gmail.com> wrote:
> Harshal,
>
> As David said, while working on Datapress, Edward Benson and I have
> been thinking about this problem.  We're working on building an apache
> module/python script that will turn datasets into sqllite databases on
> the server side, so that if a dataset is too large to send across the
> network, all queries can hit the server.  The benefit of using sqllite
> is that it's also what is used by both gears and is available in
> firefox (all html5-compliant browsers will have it soon), so we can
> also opt to send the dataset to the client for better client-side data
> processing.
>
> The step after that will be to allow exhibit to speak with this
> data-aware webserver to decide whether to run the query remotely or
> locally.  David Huynh was our inspiration for this idea, since he
> created a prototype of it before heading west.  Hang in there, and
> we'll give more information as it comes out!
>
> -Adam
>

Nicolas Chauvat

unread,
Apr 2, 2009, 3:36:45 PM4/2/09
to simile-...@googlegroups.com
On Thu, Apr 02, 2009 at 07:43:43AM -0400, Adam Marcus wrote:
> module/python script that will turn datasets into sqllite databases on
> the server side, so that if a dataset is too large to send across the
> network, all queries can hit the server.

So you store information in a database on the server...

> The step after that will be to allow exhibit to speak with this
> data-aware webserver to decide whether to run the query remotely or

And provide a web service for your javascript client to fetch the
information it needs on the fly...

am I understanding correctly?

If that's what you are trying to do, I bet you will want your client
to send a query like "give me that part of the dataset", which means
you need some query language. Since you've probably heard about SQL
injection, you know sending SQL requests from the client to the server
is not your best option. Then SPARQL comes to mind...

... but wait, did I mention http://www.cubicweb.org ?

We usually run it on top of larger SQL databases like Postgresql for
better performance and larger datasets, but all the automated tests
are run with sqlite.

An example with Timeline would be sending a request like:

Any X WHERE X publication_date >= "2008/01/01",
X publication_date <= "2008/06/30"

and getting the result as JSON. Drag the focus in Timeline and you can
send a new request with a different date. MVC in Timeline, I think I
read it is almost there.

Or maybe with Exhibit:

Any X,T,D WHERE X name LIKE "a%", T tags X, X pub_date D

and getting the result as JSON, to display the X with names starting
with letter 'a' and using tags and dates as facets.

These examples are with the RQL query language. SPARQL will be
available within a couple months.

Licence is LGPL, book being written at http://www.cubicweb.org/doc/en/

If I understood correctly and that's the kind of thing you were
thinking of, please be our guest :)

--
Nicolas Chauvat

logilab.fr - services en informatique scientifique et gestion de connaissances

Edward Benson

unread,
Apr 3, 2009, 4:02:55 PM4/3/09
to SIMILE Widgets
Nicolas,

Thanks for your reply. You bring up a good point: one the one hand,
the
relational model is king when it comes to the tools actually being
used:
odds are you've got a RDBMS on the back-end and SQLite embedded in
your
browser. But on the other hand, there are many people who don't want
to
necessarily think about, or interact with, their data as a typical
RDBMS
would normally require.

So we're trying to address this in two ways:

- On the server-side, allow people to maintain their files are
"ordinary"
raw data files (CSV, JSON, MS Excel, etc) but have a process that is
baking
these into an optimized database for you so that you get query across
all
the files & optimized access.

- On the client-side, create the ability to smartly know when to pull
down
and cache this data (to fuel an Exhibit, or whatever else you're
running),
but also to allow client-side developers to interact with the data in
a
comfortable way programmatically. For this we will definitely need to
support query, and at the moment we've been toying with the idea of
trying
to stay as neutral on the issue as possible, possibly passing along an
extra
parameter specifying the langauge the query is in. This comes with
some pros
and cons, of course.

Thanks for the link to Cubicweb -- we will definitely check it out!
Best of
luck finishing the SPARQL additions.

Regards,
Ted
> Licence is LGPL, book being written athttp://www.cubicweb.org/doc/en/

Nicolas Chauvat

unread,
Apr 3, 2009, 6:22:42 PM4/3/09
to simile-...@googlegroups.com
Hello Edward,

On Fri, Apr 03, 2009 at 01:02:55PM -0700, Edward Benson wrote:

> - On the server-side, allow people to maintain their files are
> "ordinary" raw data files (CSV, JSON, MS Excel, etc) but have a
> process that is baking these into an optimized database for you so
> that you get query across all the files & optimized access.

Sounds to me like a makefile that updates a database. Am I old fashioned? :)

> - On the client-side, create the ability to smartly know when to

> ...


> we've been toying with the idea of trying to stay as neutral on the
> issue as possible, possibly passing along an extra parameter
> specifying the langauge the query is in. This comes with some pros
> and cons, of course.

I would say go for the new standard and get started with SPARQL
without losing time by trying to be too generic, but that's MHO.

Good luck with your development as it sounds like a very useful thing
for SIMILE Widgets users.

harshal

unread,
May 2, 2009, 4:48:11 AM5/2/09
to SIMILE Widgets
Hi,

Any updates or further thoughts on same ?

Thanks for your time and interest.

./harshal

David Huynh

unread,
May 4, 2009, 2:12:37 PM5/4/09
to simile-...@googlegroups.com, Adam Marcus, Edward Benson
No update on my part... But CC'ing Adam and Ted in case they have made
progress.

David
Reply all
Reply to author
Forward
0 new messages