Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion GeoPipeline hasNext() / next() functions take a long time for the first time

Received: by 10.204.127.19 with SMTP id e19mr510794bks.4.1352329451277;
        Wed, 07 Nov 2012 15:04:11 -0800 (PST)
X-BeenThere: neo4j@googlegroups.com
Received: by 10.204.0.70 with SMTP id 6ls2848142bka.2.gmail; Wed, 07 Nov 2012
 15:04:08 -0800 (PST)
Received: by 10.204.148.22 with SMTP id n22mr514292bkv.0.1352329447971;
        Wed, 07 Nov 2012 15:04:07 -0800 (PST)
Received: by 10.204.148.22 with SMTP id n22mr514291bkv.0.1352329447929;
        Wed, 07 Nov 2012 15:04:07 -0800 (PST)
Return-Path: <neubauer.pe...@gmail.com>
Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53])
        by gmr-mx.google.com with ESMTPS id t1si2323593bkt.1.2012.11.07.15.04.07
        (version=TLSv1/SSLv3 cipher=OTHER);
        Wed, 07 Nov 2012 15:04:07 -0800 (PST)
Received-SPF: pass (google.com: domain of neubauer.pe...@gmail.com designates 209.85.215.53 as permitted sender) client-ip=209.85.215.53;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of neubauer.pe...@gmail.com designates 209.85.215.53 as permitted sender) smtp.mail=neubauer.pe...@gmail.com; dkim=pass header...@gmail.com
Received: by mail-la0-f53.google.com with SMTP id l5so1734846lah.26
        for <neo4j@googlegroups.com>; Wed, 07 Nov 2012 15:04:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:sender:in-reply-to:references:from:date
         :x-google-sender-auth:message-id:subject:to:content-type;
        bh=pQo988m9Hj+TFThd+C0BNzOD8VBJuglLolUOMr6ZgWs=;
        b=aWeHRlB6XU/id+g6JHf8kKOpM9GYYSpO+u6HiDhBPw1f+rnR4TVQ82x2QmNaOZP4oS
         3MrPZTNqjZ7n35U0vc+Wfx2tXlf7yf+dlQIWiIpkfXFa0ei6WcQ+lAm+19hDwQjEwf32
         mM+YRXJuSZplKLbtDMEdWURMBz1JAJe/3CkQkJ35Bx2u5eOrFNwYFEpD5ufHAC7tkD7e
         1SjVC0yTpwre7MsusRR2TQGa6GkKCBam9cQEUUEb9nM/39fn9+pn6DUPdm90g8mTWnLR
         6tVN+2Zp4z5Q8OtcDpaFNzHz/Kcu5D62Gcu8k+mEcH8m2ZxJ5qKsacUVOkKDBbbqhfVL
         TC7Q==
Received: by 10.112.26.66 with SMTP id j2mr2427739lbg.90.1352329447404; Wed,
 07 Nov 2012 15:04:07 -0800 (PST)
MIME-Version: 1.0
Sender: neubauer.pe...@gmail.com
Received: by 10.114.24.100 with HTTP; Wed, 7 Nov 2012 15:03:46 -0800 (PST)
In-Reply-To: <aba12a0a-ac2f-439f-bece-095b486a3540@googlegroups.com>
References: <f9e1e309-9dbd-40be-8076-57889b21aaea@googlegroups.com>
 <CAF59RW5fyC12N7q4SUdzj=ZwJ+GEmwc2M4Vt80on-fW_QV=...@mail.gmail.com>
 <5cab6a2c-d765-4d26-a11b-5272ca53e405@googlegroups.com> <CAE2kSFeykXwUj_RX9v7P2z8-O+pgpdKwQgmif-Vx+VyL8AW...@mail.gmail.com>
 <4d5ed4d4-d601-4b17-b9c5-95786b4bb70a@googlegroups.com> <3df0f296-718e-4027-be08-416ba745359f@googlegroups.com>
 <CAF59RW6vw+Cjfn+ZB5oJ2e_O04Zt65F5UHBu=XULHW8aejw...@mail.gmail.com>
 <CAE2kSFfwF6dKWVUL+MhCdREWLfpNOi5xQdTBMagST0u3Kc=...@mail.gmail.com> <aba12a0a-ac2f-439f-bece-095b486a3540@googlegroups.com>
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Wed, 7 Nov 2012 15:03:46 -0800
Message-ID: <CAF59RW5jcv_v86U6y1h7nsyCC9cZ4s7_sciS=EqtPUSG-ZS...@mail.gmail.com>
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time
 for the first time
To: Neo4j User <neo4j@googlegroups.com>
Content-Type: multipart/alternative; boundary=bcaec555561454f64f04cdefbc92

--bcaec555561454f64f04cdefbc92
Content-Type: text/plain; charset=ISO-8859-1

So,
this is with only a couple of hundred layer entries? Sounds very slow. Do
you have a sample data file so I could test it?

/peter


Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA -
http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html


On Wed, Nov 7, 2012 at 10:32 AM, Abhijeet Deshpande
<avdeshpa...@gmail.com>wrote:

> Hi Craig
>
> Thank you for the response. As suggested, I have removed the sorting
> component from the request and now simply fetching paginated results, 100
> per page, using the following call.
>
> String[] keys = {"id","name","address","city","state","zip"};
> GeoPipeline flowList =
> ((GeoPipeline)GeoPipeline.startNearestNeighborLatLonSearch(layer, loc,
> dist).range(low, high)).copyDatabaseRecordProperties(keys);
>
> Further I traverse the result using this loop
>
> while(flowList.hasNext()){
>                     geoPipeFlow = flowList.next();
>                     ------
>                     -------
>                 }
>
> I have observed that the graph search time is negligible but when I
> traverse the result in while loop, for lower page ranges (Page 1: 1-100,
> Page 2: 101-200, Page 3: 201-300) it takes about 800-900 milliseconds to
> execute the first hasNext() call and for later pages like 8,9,10 it takes
> about 4000 to 5000 ms. This slows down the overall performance.
>
> Please let me know if it is possible to significantly reduce this result
> traversal time irrespective of the page being requested.
>
> I suspect it may be related to the way I am creating the layer in graph
> database and hence I have also attached the java code file that populates
> graph database. Code steps at a high level are
>
> 1. Create graphDatabaseService
> 2. Create spatialDatabaseService
> 3. Create SimplePointLayer called places
> 4. Read a place record that contains lat and long along with other details
> from file and input file add it to the layer to create SpatialDatabaseRecord
> 5. To this newly created node add other properties like place name,
> address, zip code etc
>
> Can the access time be reduced if we create some index which can be used
> while traversal?
>
> Please let me know your thoughts on this.
>
> Regards
> Abhijeet
>
>
>
>
> On Thursday, 25 October 2012 17:15:55 UTC+5:30, Craig Taverner wrote:
>
>> The best solution is to perform the query with a sufficiently large
>> bounding box to give at least the number of results you expect, and then
>> soft and limit in the client code afterwards. This works very well, if you
>> guess the bounding box correctly. That guess is best done with domain
>> knowledge, something the client code is more likely to have.
>>
>> The fundamental problem here, and the reason why the sorting is not done
>> internally, is that the spatial index is based on location, not distance.
>> While it is possible to make an index based on distance, the origin of the
>> search would be specific to the index. This means the index would only work
>> for searches of distance from a particular point always, not generalized to
>> any point. So, to support searches around any point (the point you pass in
>> the search query), we need to build a bounding box, query the index on
>> that, and then filter to points at the right distance.
>>
>> On Thu, Oct 25, 2012 at 1:23 PM, Peter Neubauer <
>> peter.n...@neotechnology.**com> wrote:
>>
>>> Abhijeet,
>>> from the docs and implementation, there is no sorting going on here.
>>> Instead, all of the returned points satisfy the bonuding box you are
>>> requesting
>>>
>>> /**
>>>          * Extracts Layer items with a distance from the given point
>>> that is
>>> less than or equal the given distance.
>>>          *
>>>      * @param layer with latitude, longitude coordinates
>>>          * @param point
>>>          * @param maxDistanceInKm
>>>          * @return geoPipeline
>>>          */
>>>         public static GeoPipeline startNearestNeighborLatLonSear**
>>> ch(Layer
>>> layer, Coordinate point, double maxDistanceInKm) {
>>>                 Envelope searchWindow =
>>> OrthodromicDistance.**suggestSearchWindow(point, maxDistanceInKm);
>>>                 GeoPipeline pipeline = start(layer, new
>>> SearchIntersectWindow(layer,
>>> searchWindow))
>>>                         .calculateOrthodromicDistance(**point);
>>>
>>>                 if (layer.getGeometryType() == Constants.GTYPE_POINT) {
>>>                         pipeline = pipeline.propertyFilter("**
>>> OrthodromicDistance",
>>> maxDistanceInKm, FilterPipe.Filter.LESS_THAN_**EQUAL);
>>>                 }
>>>
>>>                 return pipeline;
>>>         }
>>>
>>>         /**
>>>          * Calculates the distance between Layer items nearest to the
>>> given
>>> point and the given point.
>>>          * The search window created is based on Layer items density and
>>> it
>>> could lead to no results.
>>>          *
>>>          * @param layer
>>>          * @param point
>>>      * @param numberOfItemsToFind tries to find this number of items
>>> for comparison
>>>          * @return geoPipeline
>>>          */
>>>         public static GeoPipeline startNearestNeighborSearch(**Layer
>>> layer,
>>> Coordinate point, int numberOfItemsToFind) {
>>>                 Envelope searchWindow =
>>> SpatialTopologyUtils.**createEnvelopeForGeometryDensi**tyEstimate(layer,
>>> point, numberOfItemsToFind);
>>>                 return startNearestNeighborSearch(**layer, point,
>>> searchWindow);
>>>         }
>>>
>>>
>>> For this to happen, you could either contribute a
>>> SortingNearestNeighborSearch (that does the sorting for you) or have a
>>> second sort step?
>>>
>>> Cheers,
>>>
>>> /peter neubauer
>>>
>>> G:  neubauer.peter
>>> S:  peter.neubauer
>>> P:  +46 704 106975
>>> L:   http://www.linkedin.com/in/**neubauer<http://www.linkedin.com/in/neubauer>
>>> T:   @peterneubauer
>>>
>>> Neo4j 1.8 GA - http://www.dzone.com/links/**
>>> neo4j_18_release_fluent_graph_**literacy.html<http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html>
>>>
>>>
>>> On Thu, Oct 25, 2012 at 12:10 PM, Abhijeet Deshpande
>>> <avdes...@gmail.com> wrote:
>>> > Hi
>>> > As specified below, can someone please help me understand why
>>> > startNearestNeighborSearch / startNearestNeighborLatLonSear**ch
>>> functions
>>> > don't return the results sorted on the distance. I have read some
>>> > documentation which suggests that these functions return the results
>>> sorted
>>> > on distance starting with the nearest one.
>>> >
>>> > Thanks
>>> > Abhijeet
>>> >
>>> > On Tuesday, 23 October 2012 01:11:02 UTC+5:30, Abhijeet Deshpande
>>> wrote:
>>> >>
>>> >>
>>> >> Hi Craig, Peter
>>> >>
>>> >> Thank you for the inputs.
>>> >>
>>> >> Removing sorting helped quite a lot and I also reduced the result size
>>> >> from 100 to 10. Now the response times have come down to a few
>>> milliseconds
>>> >> but the downside is that the results are not sorted upon the distance.
>>> >>
>>> >> The objective is to get 10 places per page starting with the closest
>>> one.
>>> >> To implement this I tried following GeoPipeline functions
>>> >>
>>> >> 1. startNearestNeighborSearch(**layer, point, distance) and
>>> >> 2, startNearestNeighborLatLonSear**ch(layer, point, distance)
>>> >>
>>> >> but none of them returned the results sorted on distance even if the
>>> >> function names seem to suggest so. As per the function definition, no
>>> >> additional sort filters are needed so please let me know if I am
>>> missing
>>> >> something.
>>> >>
>>> >> I also observed that the results of startNearestNeighborLatLonSear**ch
>>> are
>>> >> better sorted than startNearestNeighborSearch and think it has got
>>> something
>>> >> to do with the propertyFilter that these functions use. I would like
>>> to know
>>> >> more on  "OrthodromicDistance" and "Distance" properties and the
>>> difference
>>> >> between their respecitive property filters.
>>> >>
>>> >> Regards
>>> >> Abhijeet
>>> >>
>>> >>
>>> >> On Friday, 19 October 2012 19:04:56 UTC+5:30, Craig Taverner wrote:
>>> >>>
>>> >>> I can suggest that since you have a sorting component in the
>>> pipeline,
>>> >>> this will cause the pipeline to internally cache the entire
>>> resultset on the
>>> >>> first hasNext or next call, sort it and return the first sorted
>>> result.
>>> >>> Subsequent calls will access the internal, sorted, cache. This is a
>>> >>> necessary consequence of the sorting algorithm, and is mostly
>>> unavoidable.
>>> >>>
>>> >>> So avoid this, you would need to take away the sort. Is it possible
>>> to
>>> >>> get the same results you want without the sort? Or at least by
>>> moving the
>>> >>> sort into your own code, or perhaps after a filter or some other
>>> action that
>>> >>> reduces the resultset size?
>>> >>>
>>> >>> On Fri, Oct 19, 2012 at 2:13 PM, Abhijeet Deshpande <
>>> avdes...@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Hi Peter,
>>> >>>> Thank you for the response.
>>> >>>>
>>> >>>> Actually this is not related to the very first run of query where
>>> the
>>> >>>> cache is empty.
>>> >>>>
>>> >>>> The problem is that when ever I try to search places and request
>>> for 100
>>> >>>> places per page in response, first call to Pipeline.hasNext() /
>>> >>>> Pipeline.next() always takes about 400 sec. time and this happens
>>> for every
>>> >>>> request.
>>> >>>>
>>> >>>> Hope I am able to convey the problem that I am facing.
>>> >>>>
>>> >>>> Regards
>>> >>>> Abhijeet
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Monday, 15 October 2012 18:35:04 UTC+5:30, Peter Neubauer wrote:
>>> >>>>>
>>> >>>>> Hi there,
>>> >>>>> the first run is probably with cold caches. In real world
>>> scenarios,
>>> >>>>> you are running with warm caches, so you should try to warm up the
>>> >>>>> database by doing a few searches before the real work, or maybe
>>> loop
>>> >>>>> through your interesting nodes or so?
>>> >>>>>
>>> >>>>> Cheers,
>>> >>>>>
>>> >>>>> /peter neubauer
>>> >>>>>
>>> >>>>> G:  neubauer.peter
>>> >>>>> S:  peter.neubauer
>>> >>>>> P:  +46 704 106975
>>> >>>>> L:   http://www.linkedin.com/in/**neubauer<http://www.linkedin.com/in/neubauer>
>>> >>>>> T:   @peterneubauer
>>> >>>>>
>>> >>>>> Neo4j 1.8 GA -
>>> >>>>> http://www.dzone.com/links/**neo4j_18_release_fluent_graph_**
>>> literacy.html<http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Mon, Oct 15, 2012 at 2:10 PM, Abhijeet Deshpande
>>> >>>>> <avdes...@gmail.com> wrote:
>>> >>>>> > Hi
>>> >>>>> > I am using neo4j spatial to find out locations near by a
>>> lat/long.
>>> >>>>> > The
>>> >>>>> > results returned are sorted on OrthodromicDistance and are
>>> further
>>> >>>>> > paginated
>>> >>>>> > using the range function (100 results per page). The code for
>>> this is
>>> >>>>> >
>>> >>>>> > GeoPipeline flowList =
>>> >>>>> > (GeoPipeline)GeoPipeline.**startNearestNeighborLatLonSear**ch(layer,
>>> loc,
>>> >>>>> >
>>> >>>>> > dist).sort("**OrthodromicDistance").**
>>> copyDatabaseRecordProperties(**keys).range(low,
>>> >>>>> > high);
>>> >>>>> >
>>> >>>>> > Now to iterate over this GeoPipline, I am using hasNext() /
>>> next()
>>> >>>>> > functions
>>> >>>>> > but for some reason the first call for either of these functions
>>> >>>>> > takes long
>>> >>>>> > time to execute.
>>> >>>>> >
>>> >>>>> > When the application was run, first call to next() or hasNext()
>>> took
>>> >>>>> > approximately 400 sec to execute, subsequent 99 calls took 0ms.
>>> >>>>> >
>>> >>>>> > Can someone please point out where the mistake is and if there
>>> is a
>>> >>>>> > faster
>>> >>>>> > way to find the nearest places. The layer has 10 million entries.
>>> >>>>> >
>>> >>>>> > Regards
>>> >>>>> > Abhijeet
>>> >>>>> >
>>> >>>>> > --
>>> >>>>> >
>>> >>>>> >
>>> >>>>
>>> >>>> --
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> > --
>>> >
>>> >
>>>
>>> --
>>>
>>>
>>>
>>  --
>
>
>

--bcaec555561454f64f04cdefbc92
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

So,<div>this is with only a couple of hundred layer entries? Sounds very sl=
ow. Do you have a sample data file so I could test it?</div><div><br></div>=
<div>/peter</div><div class=3D"gmail_extra"><br clear=3D"all"><br>Cheers,<b=
r>

<br>/peter neubauer<br><br>G: =A0neubauer.peter<br>S: =A0peter.neubauer<br>=
P: =A0+46 704 106975<br>L: =A0 <a href=3D"http://www.linkedin.com/in/neubau=
er" target=3D"_blank">http://www.linkedin.com/in/neubauer</a><br>T: =A0 @pe=
terneubauer<br>

<br>Neo4j 1.8 GA - <a href=3D"http://www.dzone.com/links/neo4j_18_release_f=
luent_graph_literacy.html" target=3D"_blank">http://www.dzone.com/links/neo=
4j_18_release_fluent_graph_literacy.html</a><br>
<br><br><div class=3D"gmail_quote">On Wed, Nov 7, 2012 at 10:32 AM, Abhijee=
t Deshpande <span dir=3D"ltr">&lt;<a href=3D"mailto:avdeshpa...@gmail.com" =
target=3D"_blank">avdeshpa...@gmail.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">

Hi Craig<br><br>Thank you for the response. As suggested, I have removed th=
e sorting component from the request and now simply fetching paginated resu=
lts, 100 per page, using the following call.<br><br>String[] keys =3D {&quo=
t;id&quot;,&quot;name&quot;,&quot;address&quot;,&quot;city&quot;,&quot;stat=
e&quot;,&quot;zip&quot;};<br>

GeoPipeline flowList =3D ((GeoPipeline)GeoPipeline.startNearestNeighborLatL=
onSearch(layer, loc, dist).range(low, high)).copyDatabaseRecordProperties(k=
eys);<br><br>Further I traverse the result using this loop<br><br>while(flo=
wList.hasNext()){<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 geoPipeFlow =3D flowList.=
next();<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 ------<br>=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 -------<br>=A0=A0=A0=
 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br><br>I have observed that the graph sear=
ch time is negligible but when I traverse the result in while loop, for low=
er page ranges (Page 1: 1-100, Page 2: 101-200, Page 3: 201-300) it takes a=
bout 800-900 milliseconds to execute the first hasNext() call and for later=
 pages like 8,9,10 it takes about 4000 to 5000 ms. This slows down the over=
all performance.<br>

<br>Please let me know if it is possible to significantly reduce this resul=
t traversal time irrespective of the page being requested. <br><br>I suspec=
t it may be related to the way I am creating the layer in graph database an=
d hence I have also attached the java code file that populates graph databa=
se. Code steps at a high level are<br>

<br>1. Create graphDatabaseService<br>2. Create spatialDatabaseService<br>3=
. Create SimplePointLayer called places<br>4. Read a place record that cont=
ains lat and long along with other details from file and input file add it =
to the layer to create SpatialDatabaseRecord<br>

5. To this newly created node add other properties like place name, address=
, zip code etc<br><br>Can the access time be reduced if we create some inde=
x which can be used while traversal?<br><br>Please let me know your thought=
s on this. <br>

<br>Regards<span class=3D"HOEnZb"><font color=3D"#888888"><br>Abhijeet</fon=
t></span><div class=3D"im"><br><br><br><br>On Thursday, 25 October 2012 17:=
15:55 UTC+5:30, Craig Taverner  wrote:</div><blockquote class=3D"gmail_quot=
e" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-l=
eft:1ex">

<div class=3D"im">The best solution is to perform the query with a sufficie=
ntly large bounding box to give at least the number of results you expect, =
and then soft and limit in the client code afterwards. This works very well=
, if you guess the bounding box correctly. That guess is best done with dom=
ain knowledge, something the client code is more likely to have.<div>


<br></div><div>The fundamental problem here, and the reason why the sorting=
 is not done internally, is that the spatial index is based on location, no=
t distance. While it is possible to make an index based on distance, the or=
igin of the search would be specific to the index. This means the index wou=
ld only work for searches of distance from a particular point always, not g=
eneralized to any point. So, to support searches around any point (the poin=
t you pass in the search query), we need to build a bounding box, query the=
 index on that, and then filter to points at the right distance.</div>


</div><div><br><div class=3D"gmail_quote"><div><div class=3D"h5">On Thu, Oc=
t 25, 2012 at 1:23 PM, Peter Neubauer <span dir=3D"ltr">&lt;<a>peter.n...@n=
eotechnology.<u></u>com</a>&gt;</span> wrote:<br>
</div></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><div><div class=3D"h5">Abhijeet,=
<br>
from the docs and implementation, there is no sorting going on here.<br>
Instead, all of the returned points satisfy the bonuding box you are<br>
requesting<br>
<br>
/**<br>
=A0 =A0 =A0 =A0 =A0* Extracts Layer items with a distance from the given po=
int that is<br>
less than or equal the given distance.<br>
=A0 =A0 =A0 =A0 =A0*<br>
=A0 =A0 =A0* @param layer with latitude, longitude coordinates<br>
=A0 =A0 =A0 =A0 =A0* @param point<br>
=A0 =A0 =A0 =A0 =A0* @param maxDistanceInKm<br>
=A0 =A0 =A0 =A0 =A0* @return geoPipeline<br>
=A0 =A0 =A0 =A0 =A0*/<br>
=A0 =A0 =A0 =A0 public static GeoPipeline startNearestNeighborLatLonSear<u>=
</u>ch(Layer<br>
layer, Coordinate point, double maxDistanceInKm) {<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Envelope searchWindow =3D<br>
OrthodromicDistance.<u></u>suggestSearchWindow(point, maxDistanceInKm);<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 GeoPipeline pipeline =3D start(layer, new S=
earchIntersectWindow(layer,<br>
searchWindow))<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 .calculateOrthodromicDistan=
ce(<u></u>point);<br>
<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (layer.getGeometryType() =3D=3D Constant=
s.GTYPE_POINT) {<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pipeline =3D pipeline.prope=
rtyFilter(&quot;<u></u>OrthodromicDistance&quot;,<br>
maxDistanceInKm, FilterPipe.Filter.LESS_THAN_<u></u>EQUAL);<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }<br>
<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return pipeline;<br>
=A0 =A0 =A0 =A0 }<br>
<br>
=A0 =A0 =A0 =A0 /**<br>
=A0 =A0 =A0 =A0 =A0* Calculates the distance between Layer items nearest to=
 the given<br>
point and the given point.<br>
=A0 =A0 =A0 =A0 =A0* The search window created is based on Layer items dens=
ity and it<br>
could lead to no results.<br>
=A0 =A0 =A0 =A0 =A0*<br>
=A0 =A0 =A0 =A0 =A0* @param layer<br>
=A0 =A0 =A0 =A0 =A0* @param point<br>
=A0 =A0 =A0* @param numberOfItemsToFind tries to find this number of items<=
br>
for comparison<br>
=A0 =A0 =A0 =A0 =A0* @return geoPipeline<br>
=A0 =A0 =A0 =A0 =A0*/<br>
=A0 =A0 =A0 =A0 public static GeoPipeline startNearestNeighborSearch(<u></u=
>Layer layer,<br>
Coordinate point, int numberOfItemsToFind) {<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Envelope searchWindow =3D<br>
SpatialTopologyUtils.<u></u>createEnvelopeForGeometryDensi<u></u>tyEstimate=
(layer,<br>
point, numberOfItemsToFind);<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return startNearestNeighborSearch(<u></u>la=
yer, point, searchWindow);<br>
=A0 =A0 =A0 =A0 }<br>
<br>
<br>
For this to happen, you could either contribute a<br>
SortingNearestNeighborSearch (that does the sorting for you) or have a<br>
second sort step?<br>
<div><br>
Cheers,<br>
<br>
/peter neubauer<br>
<br>
G: =A0neubauer.peter<br>
S: =A0peter.neubauer<br>
P: =A0<a value=3D"+46704106975">+46 704 106975</a><br>
L: =A0 <a href=3D"http://www.linkedin.com/in/neubauer" target=3D"_blank">ht=
tp://www.linkedin.com/in/<u></u>neubauer</a><br>
T: =A0 @peterneubauer<br>
<br>
Neo4j 1.8 GA - <a href=3D"http://www.dzone.com/links/neo4j_18_release_fluen=
t_graph_literacy.html" target=3D"_blank">http://www.dzone.com/links/<u></u>=
neo4j_18_release_fluent_graph_<u></u>literacy.html</a><br>
<br>
<br>
</div></div></div><div><div><div><div class=3D"h5">On Thu, Oct 25, 2012 at =
12:10 PM, Abhijeet Deshpande<br></div></div><div><div class=3D"h5">
&lt;<a>avdes...@gmail.com</a>&gt; wrote:<br>
&gt; Hi<br>
&gt; As specified below, can someone please help me understand why<br>
&gt; startNearestNeighborSearch / startNearestNeighborLatLonSear<u></u>ch f=
unctions<br>
&gt; don&#39;t return the results sorted on the distance. I have read some<=
br>
&gt; documentation which suggests that these functions return the results s=
orted<br>
&gt; on distance starting with the nearest one.<br>
&gt;<br>
&gt; Thanks<br>
&gt; Abhijeet<br>
&gt;<br>
&gt; On Tuesday, 23 October 2012 01:11:02 UTC+5:30, Abhijeet Deshpande wrot=
e:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Hi Craig, Peter<br>
&gt;&gt;<br>
&gt;&gt; Thank you for the inputs.<br>
&gt;&gt;<br>
&gt;&gt; Removing sorting helped quite a lot and I also reduced the result =
size<br>
&gt;&gt; from 100 to 10. Now the response times have come down to a few mil=
liseconds<br>
&gt;&gt; but the downside is that the results are not sorted upon the dista=
nce.<br>
&gt;&gt;<br>
&gt;&gt; The objective is to get 10 places per page starting with the close=
st one.<br>
&gt;&gt; To implement this I tried following GeoPipeline functions<br>
&gt;&gt;<br>
&gt;&gt; 1. startNearestNeighborSearch(<u></u>layer, point, distance) and<b=
r>
&gt;&gt; 2, startNearestNeighborLatLonSear<u></u>ch(layer, point, distance)=
<br>
&gt;&gt;<br>
&gt;&gt; but none of them returned the results sorted on distance even if t=
he<br>
&gt;&gt; function names seem to suggest so. As per the function definition,=
 no<br>
&gt;&gt; additional sort filters are needed so please let me know if I am m=
issing<br>
&gt;&gt; something.<br>
&gt;&gt;<br>
&gt;&gt; I also observed that the results of startNearestNeighborLatLonSear=
<u></u>ch are<br>
&gt;&gt; better sorted than startNearestNeighborSearch and think it has got=
 something<br>
&gt;&gt; to do with the propertyFilter that these functions use. I would li=
ke to know<br>
&gt;&gt; more on =A0&quot;OrthodromicDistance&quot; and &quot;Distance&quot=
; properties and the difference<br>
&gt;&gt; between their respecitive property filters.<br>
&gt;&gt;<br>
&gt;&gt; Regards<br>
&gt;&gt; Abhijeet<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Friday, 19 October 2012 19:04:56 UTC+5:30, Craig Taverner wrote=
:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I can suggest that since you have a sorting component in the p=
ipeline,<br>
&gt;&gt;&gt; this will cause the pipeline to internally cache the entire re=
sultset on the<br>
&gt;&gt;&gt; first hasNext or next call, sort it and return the first sorte=
d result.<br>
&gt;&gt;&gt; Subsequent calls will access the internal, sorted, cache. This=
 is a<br>
&gt;&gt;&gt; necessary consequence of the sorting algorithm, and is mostly =
unavoidable.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; So avoid this, you would need to take away the sort. Is it pos=
sible to<br>
&gt;&gt;&gt; get the same results you want without the sort? Or at least by=
 moving the<br>
&gt;&gt;&gt; sort into your own code, or perhaps after a filter or some oth=
er action that<br>
&gt;&gt;&gt; reduces the resultset size?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Fri, Oct 19, 2012 at 2:13 PM, Abhijeet Deshpande &lt;<a>avd=
es...@gmail.com</a>&gt;<br>
&gt;&gt;&gt; wrote:<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Hi Peter,<br>
&gt;&gt;&gt;&gt; Thank you for the response.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Actually this is not related to the very first run of quer=
y where the<br>
&gt;&gt;&gt;&gt; cache is empty.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; The problem is that when ever I try to search places and r=
equest for 100<br>
&gt;&gt;&gt;&gt; places per page in response, first call to Pipeline.hasNex=
t() /<br>
&gt;&gt;&gt;&gt; Pipeline.next() always takes about 400 sec. time and this =
happens for every<br>
&gt;&gt;&gt;&gt; request.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Hope I am able to convey the problem that I am facing.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Regards<br>
&gt;&gt;&gt;&gt; Abhijeet<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; On Monday, 15 October 2012 18:35:04 UTC+5:30, Peter Neubau=
er wrote:<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Hi there,<br>
&gt;&gt;&gt;&gt;&gt; the first run is probably with cold caches. In real wo=
rld scenarios,<br>
&gt;&gt;&gt;&gt;&gt; you are running with warm caches, so you should try to=
 warm up the<br>
&gt;&gt;&gt;&gt;&gt; database by doing a few searches before the real work,=
 or maybe loop<br>
&gt;&gt;&gt;&gt;&gt; through your interesting nodes or so?<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Cheers,<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; /peter neubauer<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; G: =A0neubauer.peter<br>
&gt;&gt;&gt;&gt;&gt; S: =A0peter.neubauer<br>
&gt;&gt;&gt;&gt;&gt; P: =A0<a href=3D"tel:%2B46%20704%20106975" value=3D"+4=
6704106975" target=3D"_blank">+46 704 106975</a><br>
&gt;&gt;&gt;&gt;&gt; L: =A0 <a href=3D"http://www.linkedin.com/in/neubauer"=
 target=3D"_blank">http://www.linkedin.com/in/<u></u>neubauer</a><br>
&gt;&gt;&gt;&gt;&gt; T: =A0 @peterneubauer<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Neo4j 1.8 GA -<br>
&gt;&gt;&gt;&gt;&gt; <a href=3D"http://www.dzone.com/links/neo4j_18_release=
_fluent_graph_literacy.html" target=3D"_blank">http://www.dzone.com/links/<=
u></u>neo4j_18_release_fluent_graph_<u></u>literacy.html</a><br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; On Mon, Oct 15, 2012 at 2:10 PM, Abhijeet Deshpande<br=
>
&gt;&gt;&gt;&gt;&gt; &lt;<a>avdes...@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt; &gt; Hi<br>
&gt;&gt;&gt;&gt;&gt; &gt; I am using neo4j spatial to find out locations ne=
ar by a lat/long.<br>
&gt;&gt;&gt;&gt;&gt; &gt; The<br>
&gt;&gt;&gt;&gt;&gt; &gt; results returned are sorted on OrthodromicDistanc=
e and are further<br>
&gt;&gt;&gt;&gt;&gt; &gt; paginated<br>
&gt;&gt;&gt;&gt;&gt; &gt; using the range function (100 results per page). =
The code for this is<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; GeoPipeline flowList =3D<br>
&gt;&gt;&gt;&gt;&gt; &gt; (GeoPipeline)GeoPipeline.<u></u>startNearestNeigh=
borLatLonSear<u></u>ch(layer, loc,<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; dist).sort(&quot;<u></u>OrthodromicDistance&quot;=
).<u></u>copyDatabaseRecordProperties(<u></u>keys).range(low,<br>
&gt;&gt;&gt;&gt;&gt; &gt; high);<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; Now to iterate over this GeoPipline, I am using h=
asNext() / next()<br>
&gt;&gt;&gt;&gt;&gt; &gt; functions<br>
&gt;&gt;&gt;&gt;&gt; &gt; but for some reason the first call for either of =
these functions<br>
&gt;&gt;&gt;&gt;&gt; &gt; takes long<br>
&gt;&gt;&gt;&gt;&gt; &gt; time to execute.<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; When the application was run, first call to next(=
) or hasNext() took<br>
&gt;&gt;&gt;&gt;&gt; &gt; approximately 400 sec to execute, subsequent 99 c=
alls took 0ms.<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; Can someone please point out where the mistake is=
 and if there is a<br>
&gt;&gt;&gt;&gt;&gt; &gt; faster<br>
&gt;&gt;&gt;&gt;&gt; &gt; way to find the nearest places. The layer has 10 =
million entries.<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; Regards<br>
&gt;&gt;&gt;&gt;&gt; &gt; Abhijeet<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; --<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; --<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt; --<br>
&gt;<br>
&gt;<br>
<br>
</div></div></div></div>--<br>
<br>
<br>
</blockquote></div><br></div>
</blockquote>

<p></p>

-- <br>
=A0<br>
=A0<br>
</blockquote></div><br></div>

--bcaec555561454f64f04cdefbc92--