Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
GeoPipeline hasNext() / next() functions take a long time for the first time
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  16 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Abhijeet Deshpande  
View profile  
 More options Oct 15 2012, 8:10 am
From: Abhijeet Deshpande <avdeshpa...@gmail.com>
Date: Mon, 15 Oct 2012 05:10:11 -0700 (PDT)
Local: Mon, Oct 15 2012 8:10 am
Subject: GeoPipeline hasNext() / next() functions take a long time for the first time

Hi
I am using neo4j spatial to find out locations near by a lat/long. The
results returned are sorted on OrthodromicDistance and are further
paginated using the range function (100 results per page). The code for
this is

GeoPipeline flowList =
(GeoPipeline)GeoPipeline.startNearestNeighborLatLonSearch(layer, loc,
dist).sort("OrthodromicDistance").copyDatabaseRecordProperties(keys).range( low,
high);

Now to iterate over this GeoPipline, I am using hasNext() / next()
functions but for some reason the first call for either of these functions
takes long time to execute.

When the application was run, first call to next() or hasNext() took
approximately 400 sec to execute, subsequent 99 calls took 0ms.

Can someone please point out where the mistake is and if there is a faster
way to find the nearest places. The layer has 10 million entries.

Regards
Abhijeet


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Oct 15 2012, 9:05 am
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Mon, 15 Oct 2012 15:04:42 +0200
Local: Mon, Oct 15 2012 9:04 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time
Hi there,
the first run is probably with cold caches. In real world scenarios,
you are running with warm caches, so you should try to warm up the
database by doing a few searches before the real work, or maybe loop
through your interesting nodes or so?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

On Mon, Oct 15, 2012 at 2:10 PM, Abhijeet Deshpande


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Abhijeet Deshpande  
View profile  
 More options Oct 19 2012, 8:13 am
From: Abhijeet Deshpande <avdeshpa...@gmail.com>
Date: Fri, 19 Oct 2012 05:13:45 -0700 (PDT)
Local: Fri, Oct 19 2012 8:13 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

Hi Peter,
Thank you for the response.

Actually this is not related to the very first run of query where the cache
is empty.

The problem is that when ever I try to search places and request for 100
places per page in response, first call to Pipeline.hasNext() /
Pipeline.next() always takes about 400 sec. time and this happens for every
request.

Hope I am able to convey the problem that I am facing.

Regards
Abhijeet


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Craig Taverner  
View profile  
 More options Oct 19 2012, 9:34 am
From: Craig Taverner <cr...@amanzi.com>
Date: Fri, 19 Oct 2012 15:34:53 +0200
Local: Fri, Oct 19 2012 9:34 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

I can suggest that since you have a sorting component in the pipeline, this
will cause the pipeline to internally cache the entire resultset on the
first hasNext or next call, sort it and return the first sorted result.
Subsequent calls will access the internal, sorted, cache. This is a
necessary consequence of the sorting algorithm, and is mostly unavoidable.

So avoid this, you would need to take away the sort. Is it possible to get
the same results you want without the sort? Or at least by moving the sort
into your own code, or perhaps after a filter or some other action that
reduces the resultset size?

On Fri, Oct 19, 2012 at 2:13 PM, Abhijeet Deshpande
<avdeshpa...@gmail.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Oct 21 2012, 7:59 am
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Sun, 21 Oct 2012 13:59:31 +0200
Local: Sun, Oct 21 2012 7:59 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time
Mmh,
I think the whole query is re-issued when you request the next page.
in order to avoid that, could you hold the result in cache on the
server side for the next request and page it on your own?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

On Fri, Oct 19, 2012 at 2:13 PM, Abhijeet Deshpande


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Abhijeet Deshpande  
View profile  
 More options Oct 22 2012, 3:41 pm
From: Abhijeet Deshpande <avdeshpa...@gmail.com>
Date: Mon, 22 Oct 2012 12:41:02 -0700 (PDT)
Local: Mon, Oct 22 2012 3:41 pm
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

Hi Craig, Peter

Thank you for the inputs.

Removing sorting helped quite a lot and I also reduced the result size from
100 to 10. Now the response times have come down to a few milliseconds but
the downside is that the results are not sorted upon the distance.

The objective is to get 10 places per page starting with the closest one.
To implement this I tried following GeoPipeline functions

1. startNearestNeighborSearch(layer, point, distance) and
2, startNearestNeighborLatLonSearch(layer, point, distance)

but none of them returned the results sorted on distance even if the
function names seem to suggest so. As per the function definition, no
additional sort filters are needed so please let me know if I am missing
something.

I also observed that the results of startNearestNeighborLatLonSearch are
better sorted than startNearestNeighborSearch and think it has got
something to do with the propertyFilter that these functions use. I would
like to know more on  "OrthodromicDistance" and "Distance" properties and
the difference between their respecitive property filters.

Regards
Abhijeet


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Abhijeet Deshpande  
View profile  
 More options Oct 25 2012, 6:10 am
From: Abhijeet Deshpande <avdeshpa...@gmail.com>
Date: Thu, 25 Oct 2012 03:10:49 -0700 (PDT)
Local: Thurs, Oct 25 2012 6:10 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

Hi
As specified below, can someone please help me understand why
startNearestNeighborSearch / startNearestNeighborLatLonSearch functions
don't return the results sorted on the distance. I have read some
documentation which suggests that these functions return the results sorted
on distance starting with the nearest one.

Thanks
Abhijeet


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Oct 25 2012, 7:23 am
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Thu, 25 Oct 2012 13:23:17 +0200
Local: Thurs, Oct 25 2012 7:23 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time
Abhijeet,
from the docs and implementation, there is no sorting going on here.
Instead, all of the returned points satisfy the bonuding box you are
requesting

/**
         * Extracts Layer items with a distance from the given point that is
less than or equal the given distance.
         *
     * @param layer with latitude, longitude coordinates
         * @param point
         * @param maxDistanceInKm
         * @return geoPipeline
         */
        public static GeoPipeline startNearestNeighborLatLonSearch(Layer
layer, Coordinate point, double maxDistanceInKm) {
                Envelope searchWindow =
OrthodromicDistance.suggestSearchWindow(point, maxDistanceInKm);
                GeoPipeline pipeline = start(layer, new SearchIntersectWindow(layer,
searchWindow))
                        .calculateOrthodromicDistance(point);

                if (layer.getGeometryType() == Constants.GTYPE_POINT) {
                        pipeline = pipeline.propertyFilter("OrthodromicDistance",
maxDistanceInKm, FilterPipe.Filter.LESS_THAN_EQUAL);
                }

                return pipeline;
        }

        /**
         * Calculates the distance between Layer items nearest to the given
point and the given point.
         * The search window created is based on Layer items density and it
could lead to no results.
         *
         * @param layer
         * @param point
     * @param numberOfItemsToFind tries to find this number of items
for comparison
         * @return geoPipeline
         */
        public static GeoPipeline startNearestNeighborSearch(Layer layer,
Coordinate point, int numberOfItemsToFind) {
                Envelope searchWindow =
SpatialTopologyUtils.createEnvelopeForGeometryDensityEstimate(layer,
point, numberOfItemsToFind);
                return startNearestNeighborSearch(layer, point, searchWindow);
        }

For this to happen, you could either contribute a
SortingNearestNeighborSearch (that does the sorting for you) or have a
second sort step?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

On Thu, Oct 25, 2012 at 12:10 PM, Abhijeet Deshpande


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Craig Taverner  
View profile  
 More options Oct 25 2012, 7:45 am
From: Craig Taverner <cr...@amanzi.com>
Date: Thu, 25 Oct 2012 13:45:53 +0200
Local: Thurs, Oct 25 2012 7:45 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

The best solution is to perform the query with a sufficiently large
bounding box to give at least the number of results you expect, and then
soft and limit in the client code afterwards. This works very well, if you
guess the bounding box correctly. That guess is best done with domain
knowledge, something the client code is more likely to have.

The fundamental problem here, and the reason why the sorting is not done
internally, is that the spatial index is based on location, not distance.
While it is possible to make an index based on distance, the origin of the
search would be specific to the index. This means the index would only work
for searches of distance from a particular point always, not generalized to
any point. So, to support searches around any point (the point you pass in
the search query), we need to build a bounding box, query the index on
that, and then filter to points at the right distance.

On Thu, Oct 25, 2012 at 1:23 PM, Peter Neubauer <


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Abhijeet Deshpande  
View profile  
 More options Nov 7 2012, 1:32 pm
From: Abhijeet Deshpande <avdeshpa...@gmail.com>
Date: Wed, 7 Nov 2012 10:32:50 -0800 (PST)
Local: Wed, Nov 7 2012 1:32 pm
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

Hi Craig

Thank you for the response. As suggested, I have removed the sorting
component from the request and now simply fetching paginated results, 100
per page, using the following call.

String[] keys = {"id","name","address","city","state","zip"};
GeoPipeline flowList =
((GeoPipeline)GeoPipeline.startNearestNeighborLatLonSearch(layer, loc,
dist).range(low, high)).copyDatabaseRecordProperties(keys);

Further I traverse the result using this loop

while(flowList.hasNext()){
                    geoPipeFlow = flowList.next();
                    ------
                    -------
                }

I have observed that the graph search time is negligible but when I
traverse the result in while loop, for lower page ranges (Page 1: 1-100,
Page 2: 101-200, Page 3: 201-300) it takes about 800-900 milliseconds to
execute the first hasNext() call and for later pages like 8,9,10 it takes
about 4000 to 5000 ms. This slows down the overall performance.

Please let me know if it is possible to significantly reduce this result
traversal time irrespective of the page being requested.

I suspect it may be related to the way I am creating the layer in graph
database and hence I have also attached the java code file that populates
graph database. Code steps at a high level are

1. Create graphDatabaseService
2. Create spatialDatabaseService
3. Create SimplePointLayer called places
4. Read a place record that contains lat and long along with other details
from file and input file add it to the layer to create SpatialDatabaseRecord
5. To this newly created node add other properties like place name,
address, zip code etc

Can the access time be reduced if we create some index which can be used
while traversal?

Please let me know your thoughts on this.

Regards
Abhijeet

...

read more »

  PlaceImporter_V2.java
4K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Nov 7 2012, 6:04 pm
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Wed, 7 Nov 2012 15:03:46 -0800
Local: Wed, Nov 7 2012 6:03 pm
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

So,
this is with only a couple of hundred layer entries? Sounds very slow. Do
you have a sample data file so I could test it?

/peter

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA -
http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

On Wed, Nov 7, 2012 at 10:32 AM, Abhijeet Deshpande
<avdeshpa...@gmail.com>wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Abhijeet Deshpande  
View profile  
 More options Nov 8 2012, 5:34 am
From: Abhijeet Deshpande <avdeshpa...@gmail.com>
Date: Thu, 8 Nov 2012 02:34:27 -0800 (PST)
Local: Thurs, Nov 8 2012 5:34 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

Hi Peter,
The graph database on which I am running these tests contains 10 million
place records.
Attached here is a sample file that contains about 10,000 records. Each
line contains a record and is seperated by "|". The columns are as
mentioned below.

id | latitude | longitude | name | address | city | state | zip

-Abhijeet

...

read more »

  places1.txt
621K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Abhijeet Deshpande  
View profile  
 More options Nov 8 2012, 5:35 am
From: Abhijeet Deshpande <avdeshpa...@gmail.com>
Date: Thu, 8 Nov 2012 02:35:23 -0800 (PST)
Local: Thurs, Nov 8 2012 5:35 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

Attached is the second chunk of 5000 records.

...

read more »

  places2.txt
618K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Craig Taverner  
View profile  
 More options Nov 9 2012, 8:14 am
From: Craig Taverner <cr...@amanzi.com>
Date: Fri, 9 Nov 2012 05:14:35 -0800
Local: Fri, Nov 9 2012 8:14 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

Hi,

I've not tried your data yet, but I did have a chance to write some more
junit test code for the GeoPipes, and I tested two cases, both involving
creating a random sample of 10000 points in a point layer. The first test
times how long each chunk of 1000 points takes to read on the stream, just
iterating over the entire stream. The second uses the range() operation you
mentioned before, to select the page of data. So both tests have the same
final result, but the second one will always re-start the iterator and only
return the 1000 results for the relevant page.

Not surprisingly, the plain iterator runs at linear speed, with each 1000
points taking roughly as long to return, while the paging iterator slows
down, because it keeps restarting at the beginning.

So, this leads me to believe you are also restarting at the beginning. The
trick with using a plain iterator is to maintain state between calls. If
you are running in embedded mode, this is obviously trivial. But if you are
running a web app, I think it gets more subtle.

If you want to see my test code to verify if I am in fact modeling the same
process as you, I can commit it to neo4j-spatial. I plan to do that anyway,
after a little more double-checking.

Regards, Craig

On Thu, Nov 8, 2012 at 2:35 AM, Abhijeet Deshpande <avdeshpa...@gmail.com>wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Nov 9 2012, 11:58 am
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Fri, 9 Nov 2012 08:58:27 -0800
Local: Fri, Nov 9 2012 11:58 am
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

Thank you Craig for chiming in!

/peter

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA -
http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Craig Taverner  
View profile  
 More options Nov 9 2012, 1:04 pm
From: Craig Taverner <cr...@amanzi.com>
Date: Fri, 9 Nov 2012 10:04:46 -0800
Local: Fri, Nov 9 2012 1:04 pm
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time for the first time

No problem.

I realize also my entire discussion could be summarized with '*don't use
range()*'....
;-)

On Fri, Nov 9, 2012 at 8:58 AM, Peter Neubauer <

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »