Internals and Embedding

9 views
Skip to first unread message

Adam Retter

unread,
Jul 10, 2011, 2:50:35 PM7/10/11
to terrastore-discussions
I would be interested to know if it is possible to embed a Terrastore
server into a Java application.

I want to build something atop Terrastore, but I would rather not take
the hit of having to contact a server via http, if I can embed that
server into my application and talk to it directly via an internal API
(as its all in the same JVM instance)? Also in this manner, when my
application starts I would start first the Terrastore server and when
my application stops I would first stop the Terrastore server. My
application is a server itself, so I dont see it stopping very often!

Also how tightly coupled to JSON is Terrastore internally?
I giess that the JSON layer is higher up than the storage layer, and
that it extracts the JSON values from the JSON documents internally
and stores them via some sort of internal API, which needs serializes
them to some sort of byte representation?
What I wish to build is document based, but not JSON based, and I dont
want to have to pay a performance hit for going into and out of JSON,
hopefully there is an internal API I could use instead?

Sergio Bossa

unread,
Jul 11, 2011, 2:08:38 AM7/11/11
to terrastore-...@googlegroups.com
Hi Adam!

> I want to build something atop Terrastore, but I would rather not take
> the hit of having to contact a server via http, if I can embed that
> server into my application and talk to it directly via an internal API
> (as its all in the same JVM instance)?

There is currently no way to embed the Terrastore server, even if it
would be certainly possible, but I discourage you to do that.
The reason why I don't recommend it are mainly related to:
1) Scalability: tightly coupling Terrastore with your application
would prevent both to scale independently, i.e. forcing you to add
application nodes just for the sake of scaling Terrastore (or
vice-versa).
2) Performance: Terrastore servers hold their data in memory, hence
they put lots of pressure on the garbage collector; so, your
application would start suffering from the "unrelated" gc activity,
probably preventing your application "business" to correctly perform.

BTW, any details about your application? Do you really *need* to embed
Terrastore into it?

> Also how tightly coupled to JSON is Terrastore internally?
> I giess that the JSON layer is higher up than the storage layer, and
> that it extracts the JSON values from the JSON documents internally
> and stores them via some sort of internal API, which needs serializes
> them to some sort of byte representation?

Yep, Terrastore stores and moves around simple byte streams, so
there's no JSON on the lower layers.
But, many Terrastore features are based on JSON: supporting other
formats would require some refactoring and obviously some new
stuff/change.

> What I wish to build is document based, but not JSON based, and I dont
> want to have to pay a performance hit for going into and out of JSON,
> hopefully there is an internal API I could use instead?

There's no such an internal API, unfortunately: if you give me more
details about your application and your target formats, maybe I could
give you some ideas.

Cheers,

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob

Adam Retter

unread,
Jul 11, 2011, 4:29:53 AM7/11/11
to terrastore-...@googlegroups.com
>> I want to build something atop Terrastore, but I would rather not take
>> the hit of having to contact a server via http, if I can embed that
>> server into my application and talk to it directly via an internal API
>> (as its all in the same JVM instance)?
>
> There is currently no way to embed the Terrastore server, even if it
> would be certainly possible, but I discourage you to do that.
> The reason why I don't recommend it are mainly related to:
> 1) Scalability: tightly coupling Terrastore with your application
> would prevent both to scale independently, i.e. forcing you to add
> application nodes just for the sake of scaling Terrastore (or
> vice-versa).
> 2) Performance: Terrastore servers hold their data in memory, hence
> they put lots of pressure on the garbage collector; so, your
> application would start suffering from the "unrelated" gc activity,
> probably preventing your application "business" to correctly perform.
>
> BTW, any details about your application? Do you really *need* to embed
> Terrastore into it?

Ah okay so this sounds tricky, so I envisaged that our application
would be a layer on-top of terra-store which our application clients
and users talk to, and it provides services with Terrastore forming
the new storage layer of our application. They never talk to
Terrastore directly. Our application already provides quite a few
services, such as, XQuery, REST, WebDav, XML-RPC, XForms, Full-Text
indexing, Binary indexing. I really just wanted to try and replace our
bespoke storage layer with calls to a Terrastore API.

I hadnt considered the impact that Terrastore might have on GC in the
same JVM, that is interesting and we would certainly want to avoid
that. So it sounds like I would need two JVM instances, if only Java
had pipes!
Perhaps there is a more native like remote API which I could create,
which would be an alternative to the JSON/REST API you currently have?

>> Also how tightly coupled to JSON is Terrastore internally?
>> I giess that the JSON layer is higher up than the storage layer, and
>> that it extracts the JSON values from the JSON documents internally
>> and stores them via some sort of internal API, which needs serializes
>> them to some sort of byte representation?
>
> Yep, Terrastore stores and moves around simple byte streams, so
> there's no JSON on the lower layers.

Great :-)

> But, many Terrastore features are based on JSON: supporting other
> formats would require some refactoring and obviously some new
> stuff/change.

I am not sure what these features are yet, but it may be that I dont need them?

>> What I wish to build is document based, but not JSON based, and I dont
>> want to have to pay a performance hit for going into and out of JSON,
>> hopefully there is an internal API I could use instead?
>
> There's no such an internal API, unfortunately: if you give me more
> details about your application and your target formats, maybe I could
> give you some ideas.

Well our Target features are XML (and associated indexes) and binary
blobs initially, however the binary's just live on the filesystem at
the moment, but I want to keep in mind that in the future we may also
want something RDF like or JSON based (right now though those are not
a major concern!).

> Cheers,
>
> Sergio B.
>
> --
> Sergio Bossa
> http://www.linkedin.com/in/sergiob
>

--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Sergio Bossa

unread,
Jul 11, 2011, 8:16:58 AM7/11/11
to terrastore-...@googlegroups.com
On Mon, Jul 11, 2011 at 10:29 AM, Adam Retter
<adam....@googlemail.com> wrote:

> Ah okay so this sounds tricky, so I envisaged that our application
> would be a layer on-top of terra-store which our application clients
> and users talk to, and it provides services with Terrastore forming
> the new storage layer of our application. They never talk to
> Terrastore directly.

Yep, that's the idea.

> Perhaps there is a more native like remote API which I could create,
> which would be an alternative to the JSON/REST API you currently have?

There were some discussions about adding a binary protocol maybe based
on Google Protocol Buffers, but no actual implementation: any
contribution would be obviously greatly appreciated :)

> I am not sure what these features are yet, but it may be that I dont need them?

Mostly all features requiring document manipulation (predicates, updates...)-

> Well our Target features are XML (and associated indexes) and binary
> blobs initially, however the binary's just live on the filesystem at
> the moment, but I want to keep in mind that in the future we may also
> want something RDF like or JSON based (right now though those are not
> a major concern!).

Got it: well, you'd really need some kind of pluggable format, or
rather embedding it into json; former would require deep changes on
the server core (but not something impossible), latter would just
require some action by the client side, with a (low imho) impact on
performances.

Adam Retter

unread,
Jul 11, 2011, 9:33:53 AM7/11/11
to terrastore-...@googlegroups.com
>> Perhaps there is a more native like remote API which I could create,
>> which would be an alternative to the JSON/REST API you currently have?
>
> There were some discussions about adding a binary protocol maybe based
> on Google Protocol Buffers, but no actual implementation: any
> contribution would be obviously greatly appreciated :)

If I decide or am persuaded that Terrastore is the right fit for our
application, then I would be more than happy to contribute to
Terrastore on a long-term basis.

>> I am not sure what these features are yet, but it may be that I dont need them?
>
> Mostly all features requiring document manipulation (predicates, updates...)-

Ah okay, so I would have to either 1) de-json these through
abstraction, or 2) re-implement in my application layer.
Preferably (1) with a Json implemetation.

>> Well our Target features are XML (and associated indexes) and binary
>> blobs initially, however the binary's just live on the filesystem at
>> the moment, but I want to keep in mind that in the future we may also
>> want something RDF like or JSON based (right now though those are not
>> a major concern!).
>
> Got it: well, you'd really need some kind of pluggable format, or
> rather embedding it into json; former would require deep changes on
> the server core (but not something impossible),

To modify the core, for a reasonable developer who is as yet
unfamiliar with the code-base, how much work do you think this might
be?

> latter would just
> require some action by the client side, with a (low imho) impact on
> performances.
>
> --
> Sergio Bossa
> http://www.linkedin.com/in/sergiob
>

--

Sergio Bossa

unread,
Jul 12, 2011, 7:00:01 AM7/12/11
to terrastore-...@googlegroups.com
On Mon, Jul 11, 2011 at 3:33 PM, Adam Retter <adam....@googlemail.com> wrote:

> If I decide or am persuaded that Terrastore is the right fit for our
> application, then I would be more than happy to contribute to
> Terrastore on a long-term basis.

Cool :)

> To modify the core, for a reasonable developer who is as yet
> unfamiliar with the code-base, how much work do you think this might
> be?

Maybe a few weeks, but I'm just throwing numbers out of my head.

Reply all
Reply to author
Forward
0 new messages