Darren, any chance you can profile this? Would be interesting to see what exactly is happening, or you could give us the dataset and a sample CURL query?
On Sun, Apr 8, 2012 at 5:24 PM, Darren Govoni <darreng5...@gmail.com> wrote: > Hi Mike, > Interestingly enough, I'm doing that already, but under this circumstance, > the exception occurs before the response even comes back. Presumably > because there's too much data for the heap > on the server side (i.e. inside neo4j server). I did raise the maxmemory > setting for neo4j, but am worried this is only a temporary fix and when my > data grows, the problem could > keep re-appearing. I suppose I shouldn't be returning 1000 nodes anyway, so > I might be able to solve this at query time.
> With the adjusted maxmemory setting, I'm running again normally. It was > defaulted to 64MB.
> On Sunday, April 8, 2012 7:37:16 AM UTC-4, Michael Hunger wrote:
>> You would extract the node-properties into a map and then return just this >> collection of maps.
>> Michael
>> Am 08.04.2012 um 13:31 schrieb Darren Govoni:
>> In some cases, it looks like it can return 1000 nodes. I'm reusing >> vertices but not edges. So there can be many duplicate edges (for counting).
>> How does one return just node properties? That's all I really need, but I >> see a lot of other stuff in nodes like URLs.
>> Perhaps its a question for James as well since I'm using bulbs neo4jserver >> graph.
>> On Sunday, April 8, 2012 2:34:13 AM UTC-4, Peter Neubauer wrote:
>>> How many nodes is your script returning? I guess it is the JSON >>> serialization taking that memory. Returning smaller chunks of less data (e. >>> G. Node properties instead of full node representation) would be better...
>>>> Hi, >>>> I am running neo4j community SNAPSHOT 1.7 on a 8GB RAM quad core >>>> intel i7 laptop. >>>> My database has about 5k nodes and 10k relationships. Suddenly I am >>>> getting this exception. >>>> Sorry its not pretty printed, but its what comes back. I have plenty of >>>> available memory.
On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
> In some cases, it looks like it can return 1000 nodes. I'm reusing > vertices but not edges. So there can be many duplicate edges (for counting).
> How does one return just node properties? That's all I really need, but I > see a lot of other stuff in nodes like URLs.
> Perhaps its a question for James as well since I'm using bulbs neo4jserver > graph.
Darren, you are returning all the paths emanating from a node -- you're not returning the nodes so returning the node properties is not relevant in this case.
To get an idea of the size of the data you are returning, return a count of the paths instead:
> Hi Mike, > Interestingly enough, I'm doing that already, but under this circumstance, the exception occurs before the response even comes back. Presumably because there's too much data for the heap > on the server side (i.e. inside neo4j server). I did raise the maxmemory setting for neo4j, but am worried this is only a temporary fix and when my data grows, the problem could > keep re-appearing. I suppose I shouldn't be returning 1000 nodes anyway, so I might be able to solve this at query time.
> With the adjusted maxmemory setting, I'm running again normally. It was defaulted to 64MB.
> On Sunday, April 8, 2012 7:37:16 AM UTC-4, Michael Hunger wrote: > You would extract the node-properties into a map and then return just this collection of maps.
> Michael
> Am 08.04.2012 um 13:31 schrieb Darren Govoni:
>> In some cases, it looks like it can return 1000 nodes. I'm reusing vertices but not edges. So there can be many duplicate edges (for counting).
>> How does one return just node properties? That's all I really need, but I see a lot of other stuff in nodes like URLs.
>> Perhaps its a question for James as well since I'm using bulbs neo4jserver graph.
>> On Sunday, April 8, 2012 2:34:13 AM UTC-4, Peter Neubauer wrote: >> How many nodes is your script returning? I guess it is the JSON serialization taking that memory. Returning smaller chunks of less data (e. G. Node properties instead of full node representation) would be better...
>> On Apr 7, 2012 2:38 PM, "Darren Govoni" <darreng5...@gmail.com> wrote: >> Hi, >> I am running neo4j community SNAPSHOT 1.7 on a 8GB RAM quad core intel i7 laptop. >> My database has about 5k nodes and 10k relationships. Suddenly I am getting this exception. >> Sorry its not pretty printed, but its what comes back. I have plenty of available memory.
I will. I bumped the maxmemory to 2GB and things are working, but when I have a chance I will run experiments with lower memory. Because I hope not to get hit with this in production setting. So its on my TODO list.
On Sunday, April 8, 2012 11:46:21 AM UTC-4, Peter Neubauer wrote:
> Darren, > any chance you can profile this? Would be interesting to see what > exactly is happening, or you could give us the dataset and a sample > CURL query?
> On Sun, Apr 8, 2012 at 5:24 PM, Darren Govoni <darreng5...@gmail.com> > wrote: > > Hi Mike, > > Interestingly enough, I'm doing that already, but under this > circumstance, > > the exception occurs before the response even comes back. Presumably > > because there's too much data for the heap > > on the server side (i.e. inside neo4j server). I did raise the maxmemory > > setting for neo4j, but am worried this is only a temporary fix and when > my > > data grows, the problem could > > keep re-appearing. I suppose I shouldn't be returning 1000 nodes anyway, > so > > I might be able to solve this at query time.
> > With the adjusted maxmemory setting, I'm running again normally. It was > > defaulted to 64MB.
> > On Sunday, April 8, 2012 7:37:16 AM UTC-4, Michael Hunger wrote:
> >> You would extract the node-properties into a map and then return just > this > >> collection of maps.
> >> Michael
> >> Am 08.04.2012 um 13:31 schrieb Darren Govoni:
> >> In some cases, it looks like it can return 1000 nodes. I'm reusing > >> vertices but not edges. So there can be many duplicate edges (for > counting).
> >> How does one return just node properties? That's all I really need, but > I > >> see a lot of other stuff in nodes like URLs.
> >> Perhaps its a question for James as well since I'm using bulbs > neo4jserver > >> graph.
> >> On Sunday, April 8, 2012 2:34:13 AM UTC-4, Peter Neubauer wrote:
> >>> How many nodes is your script returning? I guess it is the JSON > >>> serialization taking that memory. Returning smaller chunks of less > data (e. > >>> G. Node properties instead of full node representation) would be > better...
> >>>> Hi, > >>>> I am running neo4j community SNAPSHOT 1.7 on a 8GB RAM quad core > >>>> intel i7 laptop. > >>>> My database has about 5k nodes and 10k relationships. Suddenly I am > >>>> getting this exception. > >>>> Sorry its not pretty printed, but its what comes back. I have plenty > of > >>>> available memory.
On Sunday, April 8, 2012 12:58:18 PM UTC-4, James Thornton wrote:
> On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
>> In some cases, it looks like it can return 1000 nodes. I'm reusing >> vertices but not edges. So there can be many duplicate edges (for counting).
>> How does one return just node properties? That's all I really need, but I >> see a lot of other stuff in nodes like URLs.
>> Perhaps its a question for James as well since I'm using bulbs >> neo4jserver graph.
> Darren, you are returning all the paths emanating from a node -- you're > not returning the nodes so returning the node properties is not relevant in > this case.
> To get an idea of the size of the data you are returning, return a count > of the paths instead:
> On Sunday, April 8, 2012 12:58:18 PM UTC-4, James Thornton wrote:
>> On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
>>> In some cases, it looks like it can return 1000 nodes. I'm reusing >>> vertices but not edges. So there can be many duplicate edges (for counting).
>>> How does one return just node properties? That's all I really need, but >>> I see a lot of other stuff in nodes like URLs.
>>> Perhaps its a question for James as well since I'm using bulbs >>> neo4jserver graph.
>> Darren, you are returning all the paths emanating from a node -- you're >> not returning the nodes so returning the node properties is not relevant in >> this case.
>> To get an idea of the size of the data you are returning, return a count >> of the paths instead:
Yes, looks like it. What is the reason for this query, and is there any way you can prune it down by e.g. including directions in the relationship traversals?
On Wed, Apr 11, 2012 at 10:53 PM, Darren Govoni <darreng5...@gmail.com> wrote: > What I see happening here is a combinatorial explosion. As you can see, my > data set is not that rich..... :(
> On Wednesday, April 11, 2012 4:50:53 PM UTC-4, Darren Govoni wrote:
>> On Sunday, April 8, 2012 12:58:18 PM UTC-4, James Thornton wrote:
>>> On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
>>>> In some cases, it looks like it can return 1000 nodes. I'm reusing >>>> vertices but not edges. So there can be many duplicate edges (for counting).
>>>> How does one return just node properties? That's all I really need, but >>>> I see a lot of other stuff in nodes like URLs.
>>>> Perhaps its a question for James as well since I'm using bulbs >>>> neo4jserver graph.
>>> Darren, you are returning all the paths emanating from a node -- you're >>> not returning the nodes so returning the node properties is not relevant in >>> this case.
>>> To get an idea of the size of the data you are returning, return a count >>> of the paths instead:
I want to display a graph centered on a vertex and fanning out some number of levels, most likely 3. I was told to use paths for this and it seems to make sense.
My graph is composed of sentence triples like.
document contains text index has documents
But for a repeat occurrences, I reuse the same vertex so path finding will work, but I don't know how to re-use an edge (is it even possible?) so if "document contains text" appears 20 times, then this adds a factorial of combinations to the path finding.
If I had 100 mllion nodes in my graph (which is supported by neo4j?) do these gremlin queries scale? I'm thinking maybe not, if any vertex has a handful of unique edges, the combinations will be off the charts when you add even half a dozen vertices with their own edges.
Is there a best practice or known limits to neo4j's ability to find paths with more than a few dozen vertex?
On Wednesday, April 11, 2012 5:03:39 PM UTC-4, Peter Neubauer wrote:
> Yes, > looks like it. What is the reason for this query, and is there any way > you can prune it down by e.g. including directions in the relationship > traversals?
> On Wed, Apr 11, 2012 at 10:53 PM, Darren Govoni <darreng5...@gmail.com> > wrote: > > What I see happening here is a combinatorial explosion. As you can see, > my > > data set is not that rich..... :(
> > On Wednesday, April 11, 2012 4:50:53 PM UTC-4, Darren Govoni wrote:
> >> On Sunday, April 8, 2012 12:58:18 PM UTC-4, James Thornton wrote:
> >>> On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
> >>>> In some cases, it looks like it can return 1000 nodes. I'm reusing > >>>> vertices but not edges. So there can be many duplicate edges (for > counting).
> >>>> How does one return just node properties? That's all I really need, > but > >>>> I see a lot of other stuff in nodes like URLs.
> >>>> Perhaps its a question for James as well since I'm using bulbs > >>>> neo4jserver graph.
> >>> Darren, you are returning all the paths emanating from a node -- you're > >>> not returning the nodes so returning the node properties is not > relevant in > >>> this case.
> >>> To get an idea of the size of the data you are returning, return a > count > >>> of the paths instead:
On Wednesday, April 11, 2012 5:03:39 PM UTC-4, Peter Neubauer wrote:
> Yes, > looks like it. What is the reason for this query, and is there any way > you can prune it down by e.g. including directions in the relationship > traversals?
> On Wed, Apr 11, 2012 at 10:53 PM, Darren Govoni <darreng5...@gmail.com> > wrote: > > What I see happening here is a combinatorial explosion. As you can see, > my > > data set is not that rich..... :(
> > On Wednesday, April 11, 2012 4:50:53 PM UTC-4, Darren Govoni wrote:
> >> On Sunday, April 8, 2012 12:58:18 PM UTC-4, James Thornton wrote:
> >>> On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
> >>>> In some cases, it looks like it can return 1000 nodes. I'm reusing > >>>> vertices but not edges. So there can be many duplicate edges (for > counting).
> >>>> How does one return just node properties? That's all I really need, > but > >>>> I see a lot of other stuff in nodes like URLs.
> >>>> Perhaps its a question for James as well since I'm using bulbs > >>>> neo4jserver graph.
> >>> Darren, you are returning all the paths emanating from a node -- you're > >>> not returning the nodes so returning the node properties is not > relevant in > >>> this case.
> >>> To get an idea of the size of the data you are returning, return a > count > >>> of the paths instead:
Gremlin is a lazy language so you can next() results all day long. If you are trying to save all these results in memory, then you will run into problems.
> I want to display a graph centered on a vertex and fanning out some number > of levels, most likely 3. > I was told to use paths for this and it seems to make sense.
> My graph is composed of sentence triples like.
> document contains text > index has documents
> But for a repeat occurrences, I reuse the same vertex so path finding will > work, but > I don't know how to re-use an edge (is it even possible?) so if "document > contains text" > appears 20 times, then this adds a factorial of combinations to the path > finding.
> If I had 100 mllion nodes in my graph (which is supported by neo4j?) do > these gremlin queries > scale? I'm thinking maybe not, if any vertex has a handful of unique > edges, the combinations will be > off the charts when you add even half a dozen vertices with their own > edges.
> Is there a best practice or known limits to neo4j's ability to find paths > with more than a few dozen vertex?
> On Wednesday, April 11, 2012 5:03:39 PM UTC-4, Peter Neubauer wrote:
>> Yes, >> looks like it. What is the reason for this query, and is there any way >> you can prune it down by e.g. including directions in the relationship >> traversals?
>> On Wed, Apr 11, 2012 at 10:53 PM, Darren Govoni <darreng5...@gmail.com> >> wrote: >> > What I see happening here is a combinatorial explosion. As you can see, >> my >> > data set is not that rich..... :(
>> > On Wednesday, April 11, 2012 4:50:53 PM UTC-4, Darren Govoni wrote:
>> >> On Sunday, April 8, 2012 12:58:18 PM UTC-4, James Thornton wrote:
>> >>> On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
>> >>>> In some cases, it looks like it can return 1000 nodes. I'm reusing >> >>>> vertices but not edges. So there can be many duplicate edges (for >> counting).
>> >>>> How does one return just node properties? That's all I really need, >> but >> >>>> I see a lot of other stuff in nodes like URLs.
>> >>>> Perhaps its a question for James as well since I'm using bulbs >> >>>> neo4jserver graph.
>> >>> Darren, you are returning all the paths emanating from a node -- >> you're >> >>> not returning the nodes so returning the node properties is not >> relevant in >> >>> this case.
>> >>> To get an idea of the size of the data you are returning, return a >> count >> >>> of the paths instead:
> On Wednesday, April 11, 2012 5:03:39 PM UTC-4, Peter Neubauer wrote:
>> Yes, >> looks like it. What is the reason for this query, and is there any way >> you can prune it down by e.g. including directions in the relationship >> traversals?
>> On Wed, Apr 11, 2012 at 10:53 PM, Darren Govoni <darreng5...@gmail.com> >> wrote: >> > What I see happening here is a combinatorial explosion. As you can see, >> my >> > data set is not that rich..... :(
>> > On Wednesday, April 11, 2012 4:50:53 PM UTC-4, Darren Govoni wrote:
>> >> On Sunday, April 8, 2012 12:58:18 PM UTC-4, James Thornton wrote:
>> >>> On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
>> >>>> In some cases, it looks like it can return 1000 nodes. I'm reusing >> >>>> vertices but not edges. So there can be many duplicate edges (for >> counting).
>> >>>> How does one return just node properties? That's all I really need, >> but >> >>>> I see a lot of other stuff in nodes like URLs.
>> >>>> Perhaps its a question for James as well since I'm using bulbs >> >>>> neo4jserver graph.
>> >>> Darren, you are returning all the paths emanating from a node -- >> you're >> >>> not returning the nodes so returning the node properties is not >> relevant in >> >>> this case.
>> >>> To get an idea of the size of the data you are returning, return a >> count >> >>> of the paths instead:
>> for it.loops<2, count is 288 >> for it.loops<3, count is 3278 >> for it.loops<4, count is 46,960
Yes, this Gremlin "paths" expression will return all possible path combinations emanating from all vertices named "document" so it will be quite large -- is this really what you want?
Please describe your context and end goal, and maybe we can help you craft a better solution.
Thanks James. Well, I guess I want all minimal paths emanating (both ins and outs) from a vertex. For example, let's say I have the following vertex-edge-vertex triples (directed).
where e1-e3 and e4-e6 have the same "name" property (but different id's). So reducing the above list to:
v1-e1-v2 v2-e4-v3
or perhaps
v1-"buy"-v2 v2-"likes"-v3
such that any one of e1,e2,e3 would suffice because they have the same name (e.g. "buy"). I duplicate the edges because I'm not sure how to reuse one or if that concept exists, but also because I want to count the number of edges with "buy" for Page Rank purposes. And I use properties on edges that trace back to other objects elsewhere.
The above example is only 2 edges away, but it can vary (e.g. it.loops<X)
I'm still wrestling with gremlin syntax/semantics to understand how to affect this. Maybe something with simplePath. Still trying. Slow but sure.
>>> for it.loops<2, count is 288 >>> for it.loops<3, count is 3278 >>> for it.loops<4, count is 46,960
> Yes, this Gremlin "paths" expression will return all possible path > combinations emanating from all vertices named "document" so it will be > quite large -- is this really what you want?
> Please describe your context and end goal, and maybe we can help you craft > a better solution.
That is odd. Here are some notes: 1. Use 'both' instead of bothE.bothV (and then loop(1)) 2. Try without paths and simply do count() (just for testing)
That is such a tiny graph that I don't know why you are having a GC overhead limit exceeded. Are you doing this from the Gremlin REPL, a Groovy class, or from the Web Admin?
Oh reading lower, I see you are doing some Bulbs related stuff----can you speak more to that as James is the Bulbs guy.....
> On Sunday, April 8, 2012 12:58:18 PM UTC-4, James Thornton wrote:
> On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote: > In some cases, it looks like it can return 1000 nodes. I'm reusing vertices but not edges. So there can be many duplicate edges (for counting).
> How does one return just node properties? That's all I really need, but I see a lot of other stuff in nodes like URLs.
> Perhaps its a question for James as well since I'm using bulbs neo4jserver graph.
> Darren, you are returning all the paths emanating from a node -- you're not returning the nodes so returning the node properties is not relevant in this case.
> To get an idea of the size of the data you are returning, return a count of the paths instead:
On Wednesday, April 11, 2012 3:57:36 PM UTC-5, Marko Rodriguez wrote:
> Hi,
> That is odd. Here are some notes: > 1. Use 'both' instead of bothE.bothV (and then loop(1)) > 2. Try without paths and simply do count() (just for testing)
> That is such a tiny graph that I don't know why you are having a GC > overhead limit exceeded. Are you doing this from the Gremlin REPL, a Groovy > class, or from the Web Admin?
> Oh reading lower, I see you are doing some Bulbs related stuff----can you > speak more to that as James is the Bulbs guy.....
Marko, he's returning all possible paths over REST -- all 47,000 of them -- but he doesn't really want or need this, he just wants the shortest path.
> Marko, he's returning all possible paths over REST -- all 47,000 of them -- but he doesn't really want or need this, he just wants the shortest path.
> On Wednesday, April 11, 2012 3:57:36 PM UTC-5, Marko Rodriguez wrote: > Hi,
> That is odd. Here are some notes: > 1. Use 'both' instead of bothE.bothV (and then loop(1)) > 2. Try without paths and simply do count() (just for testing)
> That is such a tiny graph that I don't know why you are having a GC overhead limit exceeded. Are you doing this from the Gremlin REPL, a Groovy class, or from the Web Admin?
> Oh reading lower, I see you are doing some Bulbs related stuff----can you speak more to that as James is the Bulbs guy.....