Eager property / Loading just one property?

57 views
Skip to first unread message

Cres

unread,
Apr 19, 2012, 3:56:13 AM4/19/12
to Neo4j
Hi,
I was wondering if there's any way to load just one property off a
node, without loading the rest (or alternatively, for that property to
be eager)?
I have relatively heavy nodes and sometimes when I need some light
data from a large number of nodes the performance is just not good
enough.

Thank you.

James Thornton

unread,
Apr 19, 2012, 4:16:02 AM4/19/12
to ne...@googlegroups.com
With Gremlin, you simply return the property instead of the node:

gremlin> g.v(1).out                                
==>v[2]
==>v[4]
==>v[3]

...vs...

gremlin> g.v(1).out.name
==>vadas
==>josh
==>lop

- James

Mattias Persson

unread,
Apr 19, 2012, 5:03:58 AM4/19/12
to ne...@googlegroups.com
That will load all the properties from the store, if they aren't already cached... and that's the problem you'd like to avoid, right?

No you cannot do that, but it would be cool to have a small number of properties loaded together with each node which you configure (or are discovered) to be the most used properties and only they will be loaded first. If you requested another property than those then all the others would be loaded.

2012/4/19 James Thornton <james.t...@gmail.com>



--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com

Peter Neubauer

unread,
Apr 19, 2012, 5:07:46 AM4/19/12
to ne...@googlegroups.com

Maybe catch it in a feature request on github!

Rickard Öberg

unread,
Apr 19, 2012, 5:43:30 AM4/19/12
to ne...@googlegroups.com
On 4/19/12 17:07 , Peter Neubauer wrote:
> Maybe catch it in a feature request on github!
>
> On Apr 19, 2012 11:04 AM, "Mattias Persson" <mat...@neotechnology.com
> <mailto:mat...@neotechnology.com>> wrote:
>
> That will load all the properties from the store, if they aren't
> already cached... and that's the problem you'd like to avoid, right?
>
> No you cannot do that, but it would be cool to have a small number
> of properties loaded together with each node which you configure (or
> are discovered) to be the most used properties and only they will be
> loaded first. If you requested another property than those then all
> the others would be loaded.

The correct way to do this is to support use cases explicitly, i.e. you
should be able to set a name on a transaction, and that name will be
used to determine things like this, since "most used properties" is
entirely dependent on use case.

I've have implemented this scheme in a previous product I worked on, and
it really really helps performance.

/Rickard

--
Rickard �berg
Developer
Neo Technology
Twitter: @rickardoberg, Skype: rickardoberg

Mattias Persson

unread,
Apr 19, 2012, 5:52:47 AM4/19/12
to ne...@googlegroups.com


2012/4/19 Rickard Öberg <rickar...@neotechnology.com>
On 4/19/12 17:07 , Peter Neubauer wrote:
Maybe catch it in a feature request on github!

On Apr 19, 2012 11:04 AM, "Mattias Persson" <mat...@neotechnology.com
<mailto:mattias@neotechnology.com>> wrote:

   That will load all the properties from the store, if they aren't
   already cached... and that's the problem you'd like to avoid, right?

   No you cannot do that, but it would be cool to have a small number
   of properties loaded together with each node which you configure (or
   are discovered) to be the most used properties and only they will be
   loaded first. If you requested another property than those then all
   the others would be loaded.

The correct way to do this is to support use cases explicitly, i.e. you
should be able to set a name on a transaction, and that name will be
used to determine things like this, since "most used properties" is
entirely dependent on use case.

I've have implemented this scheme in a previous product I worked on, and
it really really helps performance.

The mismatch between use cases and my idea is that the "most used property" would be persisted as such in the store, which is what would have made it fast.

/Rickard

--
Rickard Öberg

Developer
Neo Technology
Twitter: @rickardoberg, Skype: rickardoberg

Rickard Öberg

unread,
Apr 19, 2012, 6:09:07 AM4/19/12
to ne...@googlegroups.com
On 4/19/12 17:52 , Mattias Persson wrote:
> The correct way to do this is to support use cases explicitly, i.e. you
> should be able to set a name on a transaction, and that name will be
> used to determine things like this, since "most used properties" is
> entirely dependent on use case.
>
> I've have implemented this scheme in a previous product I worked on, and
> it really really helps performance.
>
> The mismatch between use cases and my idea is that the "most used
> property" would be persisted as such in the store, which is what would
> have made it fast.

If you can persist "most used property" in the store, that is great, iff
it is associated with a particular use case, rather than a general
statement. Per use case it is reasonably easy to define exactly what
properties are used, and that then becomes the perfect eager loading scheme.

Mattias Persson

unread,
Apr 19, 2012, 6:20:59 AM4/19/12
to ne...@googlegroups.com


2012/4/19 Rickard Öberg <rickar...@neotechnology.com>

On 4/19/12 17:52 , Mattias Persson wrote:
   The correct way to do this is to support use cases explicitly, i.e. you
   should be able to set a name on a transaction, and that name will be
   used to determine things like this, since "most used properties" is
   entirely dependent on use case.

   I've have implemented this scheme in a previous product I worked on, and
   it really really helps performance.

The mismatch between use cases and my idea is that the "most used
property" would be persisted as such in the store, which is what would
have made it fast.

If you can persist "most used property" in the store, that is great, iff it is associated with a particular use case, rather than a general statement. Per use case it is reasonably easy to define exactly what properties are used, and that then becomes the perfect eager loading scheme.

Yup, and support different use cases we'd need a separate store, than f.ex. the node store, to put these "fast" properties in.


/Rickard

--
Rickard Öberg
Developer
Neo Technology
Twitter: @rickardoberg, Skype: rickardoberg

Rickard Öberg

unread,
Apr 19, 2012, 8:47:51 PM4/19/12
to ne...@googlegroups.com
On 4/19/12 18:20 , Mattias Persson wrote:
> On 4/19/12 17:52 , Mattias Persson wrote:
>
> The correct way to do this is to support use cases
> explicitly, i.e. you
> should be able to set a name on a transaction, and that name
> will be
> used to determine things like this, since "most used
> properties" is
> entirely dependent on use case.
>
> I've have implemented this scheme in a previous product I
> worked on, and
> it really really helps performance.
>
> The mismatch between use cases and my idea is that the "most used
> property" would be persisted as such in the store, which is what
> would
> have made it fast.
>
>
> If you can persist "most used property" in the store, that is great,
> iff it is associated with a particular use case, rather than a
> general statement. Per use case it is reasonably easy to define
> exactly what properties are used, and that then becomes the perfect
> eager loading scheme.
>
>
> Yup, and support different use cases we'd need a separate store, than
> f.ex. the node store, to put these "fast" properties in.

Just to make sure we mean the same thing, here's an example:
A node has two properties foo and bar. Use case A uses foo only, and use
case B uses bar only. If I do a transaction for A, then I want foo to be
eagerloaded, and if I do case B I want bar to be eagerloaded. I.e. all
properties on the node are "fast" properties, depending on use case. So,
considering this, what would you gain with "separate store"?

Mattias Persson

unread,
Apr 20, 2012, 9:31:01 AM4/20/12
to ne...@googlegroups.com


2012/4/20 Rickard Öberg <rickar...@neotechnology.com>
As I see it it's not about eager loading, it's about the store format which makes it necessary to load all properties for an entity if any is requested. One solution to that is to fix the store format so that it can load one at a time (with an tiny index of some sort per entity), or as my initial though, to select a small number of properties to live closer to the entity records on disk. I don't think we're exactly talking about the same thing here :)

/Rickard

--
Rickard Öberg
Developer
Neo Technology
Twitter: @rickardoberg, Skype: rickardoberg

Pablo Pareja

unread,
Apr 20, 2012, 10:10:23 AM4/20/12
to ne...@googlegroups.com
+1 For that fix in the store format so that it can load just one at a time.

Rickard Öberg

unread,
Apr 21, 2012, 12:52:26 AM4/21/12
to ne...@googlegroups.com
On 4/20/12 21:31 , Mattias Persson wrote:
> As I see it it's not about eager loading, it's about the store format
> which makes it necessary to load all properties for an entity if any is
> requested. One solution to that is to fix the store format so that it
> can load one at a time (with an tiny index of some sort per entity),

This is good.

> or
> as my initial though, to select a small number of properties to live
> closer to the entity records on disk.

This is bad, because it assumes that there are properties that are "more
important". See previous email on use cases.

If it is possible to load the raw state from disk in full, but only
dehydrate into Java objects on access, that could help. To achieve this
it might be a good idea to have a 2-level cache, where the first level
mirrors what is on disk, in raw format, and the second is what is
associated with a particular transaction, which is Java objects.

> I don't think we're exactly
> talking about the same thing here :)

Nope, hence the clarification.

Mattias Persson

unread,
Apr 24, 2012, 8:58:51 AM4/24/12
to ne...@googlegroups.com


2012/4/21 Rickard Öberg <rickar...@neotechnology.com>

On 4/20/12 21:31 , Mattias Persson wrote:
As I see it it's not about eager loading, it's about the store format
which makes it necessary to load all properties for an entity if any is
requested. One solution to that is to fix the store format so that it
can load one at a time (with an tiny index of some sort per entity),

This is good.


or
as my initial though, to select a small number of properties to live
closer to the entity records on disk.

This is bad, because it assumes that there are properties that are "more important". See previous email on use cases.

Yup, as it assumes only a single use case.

If it is possible to load the raw state from disk in full, but only dehydrate into Java objects on access, that could help. To achieve this it might be a good idea to have a 2-level cache, where the first level mirrors what is on disk, in raw format, and the second is what is associated with a particular transaction, which is Java objects.

That is what Neo4j has today. The low level cache is memory mapping of the raw store format, the high level cache is the Java objects.

I don't think we're exactly
talking about the same thing here :)

Nope, hence the clarification.


/Rickard

--
Rickard Öberg
Developer
Neo Technology
Twitter: @rickardoberg, Skype: rickardoberg

Rickard Öberg

unread,
Apr 25, 2012, 12:15:11 AM4/25/12
to ne...@googlegroups.com
On 4/24/12 20:58 , Mattias Persson wrote:
> If it is possible to load the raw state from disk in full, but only
> dehydrate into Java objects on access, that could help. To achieve
> this it might be a good idea to have a 2-level cache, where the
> first level mirrors what is on disk, in raw format, and the second
> is what is associated with a particular transaction, which is Java
> objects.
>
> That is what Neo4j has today. The low level cache is memory mapping of
> the raw store format, the high level cache is the Java objects.

That's not entirely true, since the high level cache is not associated
with a *particular tx*, which is what I suggested. The way it is handled
today there is a common cache of hydrated objects, and then a list per
tx of changes for particular nodes. That's different.

Mattias Persson

unread,
Apr 25, 2012, 4:02:52 AM4/25/12
to ne...@googlegroups.com


2012/4/25 Rickard Öberg <rickar...@neotechnology.com>

On 4/24/12 20:58 , Mattias Persson wrote:
   If it is possible to load the raw state from disk in full, but only
   dehydrate into Java objects on access, that could help. To achieve
   this it might be a good idea to have a 2-level cache, where the
   first level mirrors what is on disk, in raw format, and the second
   is what is associated with a particular transaction, which is Java
   objects.

That is what Neo4j has today. The low level cache is memory mapping of
the raw store format, the high level cache is the Java objects.

That's not entirely true, since the high level cache is not associated with a *particular tx*, which is what I suggested. The way it is handled today there is a common cache of hydrated objects, and then a list per tx of changes for particular nodes. That's different.

Yeah, that is different. I'm wondering what benefits it would have though... with cache for a particular transaction only I mean.

/Rickard

--
Rickard Öberg
Developer
Neo Technology
Twitter: @rickardoberg, Skype: rickardoberg
Reply all
Reply to author
Forward
0 new messages