Message from discussion
GeoPipeline hasNext() / next() functions take a long time for the first time
Received: by 10.204.127.19 with SMTP id e19mr510794bks.4.1352329451277;
Wed, 07 Nov 2012 15:04:11 -0800 (PST)
X-BeenThere: neo4j@googlegroups.com
Received: by 10.204.0.70 with SMTP id 6ls2848142bka.2.gmail; Wed, 07 Nov 2012
15:04:08 -0800 (PST)
Received: by 10.204.148.22 with SMTP id n22mr514292bkv.0.1352329447971;
Wed, 07 Nov 2012 15:04:07 -0800 (PST)
Received: by 10.204.148.22 with SMTP id n22mr514291bkv.0.1352329447929;
Wed, 07 Nov 2012 15:04:07 -0800 (PST)
Return-Path: <neubauer.pe...@gmail.com>
Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53])
by gmr-mx.google.com with ESMTPS id t1si2323593bkt.1.2012.11.07.15.04.07
(version=TLSv1/SSLv3 cipher=OTHER);
Wed, 07 Nov 2012 15:04:07 -0800 (PST)
Received-SPF: pass (google.com: domain of neubauer.pe...@gmail.com designates 209.85.215.53 as permitted sender) client-ip=209.85.215.53;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of neubauer.pe...@gmail.com designates 209.85.215.53 as permitted sender) smtp.mail=neubauer.pe...@gmail.com; dkim=pass header...@gmail.com
Received: by mail-la0-f53.google.com with SMTP id l5so1734846lah.26
for <neo4j@googlegroups.com>; Wed, 07 Nov 2012 15:04:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=mime-version:sender:in-reply-to:references:from:date
:x-google-sender-auth:message-id:subject:to:content-type;
bh=pQo988m9Hj+TFThd+C0BNzOD8VBJuglLolUOMr6ZgWs=;
b=aWeHRlB6XU/id+g6JHf8kKOpM9GYYSpO+u6HiDhBPw1f+rnR4TVQ82x2QmNaOZP4oS
3MrPZTNqjZ7n35U0vc+Wfx2tXlf7yf+dlQIWiIpkfXFa0ei6WcQ+lAm+19hDwQjEwf32
mM+YRXJuSZplKLbtDMEdWURMBz1JAJe/3CkQkJ35Bx2u5eOrFNwYFEpD5ufHAC7tkD7e
1SjVC0yTpwre7MsusRR2TQGa6GkKCBam9cQEUUEb9nM/39fn9+pn6DUPdm90g8mTWnLR
6tVN+2Zp4z5Q8OtcDpaFNzHz/Kcu5D62Gcu8k+mEcH8m2ZxJ5qKsacUVOkKDBbbqhfVL
TC7Q==
Received: by 10.112.26.66 with SMTP id j2mr2427739lbg.90.1352329447404; Wed,
07 Nov 2012 15:04:07 -0800 (PST)
MIME-Version: 1.0
Sender: neubauer.pe...@gmail.com
Received: by 10.114.24.100 with HTTP; Wed, 7 Nov 2012 15:03:46 -0800 (PST)
In-Reply-To: <aba12a0a-ac2f-439f-bece-095b486a3540@googlegroups.com>
References: <f9e1e309-9dbd-40be-8076-57889b21aaea@googlegroups.com>
<CAF59RW5fyC12N7q4SUdzj=ZwJ+GEmwc2M4Vt80on-fW_QV=...@mail.gmail.com>
<5cab6a2c-d765-4d26-a11b-5272ca53e405@googlegroups.com> <CAE2kSFeykXwUj_RX9v7P2z8-O+pgpdKwQgmif-Vx+VyL8AW...@mail.gmail.com>
<4d5ed4d4-d601-4b17-b9c5-95786b4bb70a@googlegroups.com> <3df0f296-718e-4027-be08-416ba745359f@googlegroups.com>
<CAF59RW6vw+Cjfn+ZB5oJ2e_O04Zt65F5UHBu=XULHW8aejw...@mail.gmail.com>
<CAE2kSFfwF6dKWVUL+MhCdREWLfpNOi5xQdTBMagST0u3Kc=...@mail.gmail.com> <aba12a0a-ac2f-439f-bece-095b486a3540@googlegroups.com>
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Wed, 7 Nov 2012 15:03:46 -0800
Message-ID: <CAF59RW5jcv_v86U6y1h7nsyCC9cZ4s7_sciS=EqtPUSG-ZS...@mail.gmail.com>
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long time
for the first time
To: Neo4j User <neo4j@googlegroups.com>
Content-Type: multipart/alternative; boundary=bcaec555561454f64f04cdefbc92
--bcaec555561454f64f04cdefbc92
Content-Type: text/plain; charset=ISO-8859-1
So,
this is with only a couple of hundred layer entries? Sounds very slow. Do
you have a sample data file so I could test it?
/peter
Cheers,
/peter neubauer
G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer
Neo4j 1.8 GA -
http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html
On Wed, Nov 7, 2012 at 10:32 AM, Abhijeet Deshpande
<avdeshpa...@gmail.com>wrote:
> Hi Craig
>
> Thank you for the response. As suggested, I have removed the sorting
> component from the request and now simply fetching paginated results, 100
> per page, using the following call.
>
> String[] keys = {"id","name","address","city","state","zip"};
> GeoPipeline flowList =
> ((GeoPipeline)GeoPipeline.startNearestNeighborLatLonSearch(layer, loc,
> dist).range(low, high)).copyDatabaseRecordProperties(keys);
>
> Further I traverse the result using this loop
>
> while(flowList.hasNext()){
> geoPipeFlow = flowList.next();
> ------
> -------
> }
>
> I have observed that the graph search time is negligible but when I
> traverse the result in while loop, for lower page ranges (Page 1: 1-100,
> Page 2: 101-200, Page 3: 201-300) it takes about 800-900 milliseconds to
> execute the first hasNext() call and for later pages like 8,9,10 it takes
> about 4000 to 5000 ms. This slows down the overall performance.
>
> Please let me know if it is possible to significantly reduce this result
> traversal time irrespective of the page being requested.
>
> I suspect it may be related to the way I am creating the layer in graph
> database and hence I have also attached the java code file that populates
> graph database. Code steps at a high level are
>
> 1. Create graphDatabaseService
> 2. Create spatialDatabaseService
> 3. Create SimplePointLayer called places
> 4. Read a place record that contains lat and long along with other details
> from file and input file add it to the layer to create SpatialDatabaseRecord
> 5. To this newly created node add other properties like place name,
> address, zip code etc
>
> Can the access time be reduced if we create some index which can be used
> while traversal?
>
> Please let me know your thoughts on this.
>
> Regards
> Abhijeet
>
>
>
>
> On Thursday, 25 October 2012 17:15:55 UTC+5:30, Craig Taverner wrote:
>
>> The best solution is to perform the query with a sufficiently large
>> bounding box to give at least the number of results you expect, and then
>> soft and limit in the client code afterwards. This works very well, if you
>> guess the bounding box correctly. That guess is best done with domain
>> knowledge, something the client code is more likely to have.
>>
>> The fundamental problem here, and the reason why the sorting is not done
>> internally, is that the spatial index is based on location, not distance.
>> While it is possible to make an index based on distance, the origin of the
>> search would be specific to the index. This means the index would only work
>> for searches of distance from a particular point always, not generalized to
>> any point. So, to support searches around any point (the point you pass in
>> the search query), we need to build a bounding box, query the index on
>> that, and then filter to points at the right distance.
>>
>> On Thu, Oct 25, 2012 at 1:23 PM, Peter Neubauer <
>> peter.n...@neotechnology.**com> wrote:
>>
>>> Abhijeet,
>>> from the docs and implementation, there is no sorting going on here.
>>> Instead, all of the returned points satisfy the bonuding box you are
>>> requesting
>>>
>>> /**
>>> * Extracts Layer items with a distance from the given point
>>> that is
>>> less than or equal the given distance.
>>> *
>>> * @param layer with latitude, longitude coordinates
>>> * @param point
>>> * @param maxDistanceInKm
>>> * @return geoPipeline
>>> */
>>> public static GeoPipeline startNearestNeighborLatLonSear**
>>> ch(Layer
>>> layer, Coordinate point, double maxDistanceInKm) {
>>> Envelope searchWindow =
>>> OrthodromicDistance.**suggestSearchWindow(point, maxDistanceInKm);
>>> GeoPipeline pipeline = start(layer, new
>>> SearchIntersectWindow(layer,
>>> searchWindow))
>>> .calculateOrthodromicDistance(**point);
>>>
>>> if (layer.getGeometryType() == Constants.GTYPE_POINT) {
>>> pipeline = pipeline.propertyFilter("**
>>> OrthodromicDistance",
>>> maxDistanceInKm, FilterPipe.Filter.LESS_THAN_**EQUAL);
>>> }
>>>
>>> return pipeline;
>>> }
>>>
>>> /**
>>> * Calculates the distance between Layer items nearest to the
>>> given
>>> point and the given point.
>>> * The search window created is based on Layer items density and
>>> it
>>> could lead to no results.
>>> *
>>> * @param layer
>>> * @param point
>>> * @param numberOfItemsToFind tries to find this number of items
>>> for comparison
>>> * @return geoPipeline
>>> */
>>> public static GeoPipeline startNearestNeighborSearch(**Layer
>>> layer,
>>> Coordinate point, int numberOfItemsToFind) {
>>> Envelope searchWindow =
>>> SpatialTopologyUtils.**createEnvelopeForGeometryDensi**tyEstimate(layer,
>>> point, numberOfItemsToFind);
>>> return startNearestNeighborSearch(**layer, point,
>>> searchWindow);
>>> }
>>>
>>>
>>> For this to happen, you could either contribute a
>>> SortingNearestNeighborSearch (that does the sorting for you) or have a
>>> second sort step?
>>>
>>> Cheers,
>>>
>>> /peter neubauer
>>>
>>> G: neubauer.peter
>>> S: peter.neubauer
>>> P: +46 704 106975
>>> L: http://www.linkedin.com/in/**neubauer<http://www.linkedin.com/in/neubauer>
>>> T: @peterneubauer
>>>
>>> Neo4j 1.8 GA - http://www.dzone.com/links/**
>>> neo4j_18_release_fluent_graph_**literacy.html<http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html>
>>>
>>>
>>> On Thu, Oct 25, 2012 at 12:10 PM, Abhijeet Deshpande
>>> <avdes...@gmail.com> wrote:
>>> > Hi
>>> > As specified below, can someone please help me understand why
>>> > startNearestNeighborSearch / startNearestNeighborLatLonSear**ch
>>> functions
>>> > don't return the results sorted on the distance. I have read some
>>> > documentation which suggests that these functions return the results
>>> sorted
>>> > on distance starting with the nearest one.
>>> >
>>> > Thanks
>>> > Abhijeet
>>> >
>>> > On Tuesday, 23 October 2012 01:11:02 UTC+5:30, Abhijeet Deshpande
>>> wrote:
>>> >>
>>> >>
>>> >> Hi Craig, Peter
>>> >>
>>> >> Thank you for the inputs.
>>> >>
>>> >> Removing sorting helped quite a lot and I also reduced the result size
>>> >> from 100 to 10. Now the response times have come down to a few
>>> milliseconds
>>> >> but the downside is that the results are not sorted upon the distance.
>>> >>
>>> >> The objective is to get 10 places per page starting with the closest
>>> one.
>>> >> To implement this I tried following GeoPipeline functions
>>> >>
>>> >> 1. startNearestNeighborSearch(**layer, point, distance) and
>>> >> 2, startNearestNeighborLatLonSear**ch(layer, point, distance)
>>> >>
>>> >> but none of them returned the results sorted on distance even if the
>>> >> function names seem to suggest so. As per the function definition, no
>>> >> additional sort filters are needed so please let me know if I am
>>> missing
>>> >> something.
>>> >>
>>> >> I also observed that the results of startNearestNeighborLatLonSear**ch
>>> are
>>> >> better sorted than startNearestNeighborSearch and think it has got
>>> something
>>> >> to do with the propertyFilter that these functions use. I would like
>>> to know
>>> >> more on "OrthodromicDistance" and "Distance" properties and the
>>> difference
>>> >> between their respecitive property filters.
>>> >>
>>> >> Regards
>>> >> Abhijeet
>>> >>
>>> >>
>>> >> On Friday, 19 October 2012 19:04:56 UTC+5:30, Craig Taverner wrote:
>>> >>>
>>> >>> I can suggest that since you have a sorting component in the
>>> pipeline,
>>> >>> this will cause the pipeline to internally cache the entire
>>> resultset on the
>>> >>> first hasNext or next call, sort it and return the first sorted
>>> result.
>>> >>> Subsequent calls will access the internal, sorted, cache. This is a
>>> >>> necessary consequence of the sorting algorithm, and is mostly
>>> unavoidable.
>>> >>>
>>> >>> So avoid this, you would need to take away the sort. Is it possible
>>> to
>>> >>> get the same results you want without the sort? Or at least by
>>> moving the
>>> >>> sort into your own code, or perhaps after a filter or some other
>>> action that
>>> >>> reduces the resultset size?
>>> >>>
>>> >>> On Fri, Oct 19, 2012 at 2:13 PM, Abhijeet Deshpande <
>>> avdes...@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Hi Peter,
>>> >>>> Thank you for the response.
>>> >>>>
>>> >>>> Actually this is not related to the very first run of query where
>>> the
>>> >>>> cache is empty.
>>> >>>>
>>> >>>> The problem is that when ever I try to search places and request
>>> for 100
>>> >>>> places per page in response, first call to Pipeline.hasNext() /
>>> >>>> Pipeline.next() always takes about 400 sec. time and this happens
>>> for every
>>> >>>> request.
>>> >>>>
>>> >>>> Hope I am able to convey the problem that I am facing.
>>> >>>>
>>> >>>> Regards
>>> >>>> Abhijeet
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Monday, 15 October 2012 18:35:04 UTC+5:30, Peter Neubauer wrote:
>>> >>>>>
>>> >>>>> Hi there,
>>> >>>>> the first run is probably with cold caches. In real world
>>> scenarios,
>>> >>>>> you are running with warm caches, so you should try to warm up the
>>> >>>>> database by doing a few searches before the real work, or maybe
>>> loop
>>> >>>>> through your interesting nodes or so?
>>> >>>>>
>>> >>>>> Cheers,
>>> >>>>>
>>> >>>>> /peter neubauer
>>> >>>>>
>>> >>>>> G: neubauer.peter
>>> >>>>> S: peter.neubauer
>>> >>>>> P: +46 704 106975
>>> >>>>> L: http://www.linkedin.com/in/**neubauer<http://www.linkedin.com/in/neubauer>
>>> >>>>> T: @peterneubauer
>>> >>>>>
>>> >>>>> Neo4j 1.8 GA -
>>> >>>>> http://www.dzone.com/links/**neo4j_18_release_fluent_graph_**
>>> literacy.html<http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Mon, Oct 15, 2012 at 2:10 PM, Abhijeet Deshpande
>>> >>>>> <avdes...@gmail.com> wrote:
>>> >>>>> > Hi
>>> >>>>> > I am using neo4j spatial to find out locations near by a
>>> lat/long.
>>> >>>>> > The
>>> >>>>> > results returned are sorted on OrthodromicDistance and are
>>> further
>>> >>>>> > paginated
>>> >>>>> > using the range function (100 results per page). The code for
>>> this is
>>> >>>>> >
>>> >>>>> > GeoPipeline flowList =
>>> >>>>> > (GeoPipeline)GeoPipeline.**startNearestNeighborLatLonSear**ch(layer,
>>> loc,
>>> >>>>> >
>>> >>>>> > dist).sort("**OrthodromicDistance").**
>>> copyDatabaseRecordProperties(**keys).range(low,
>>> >>>>> > high);
>>> >>>>> >
>>> >>>>> > Now to iterate over this GeoPipline, I am using hasNext() /
>>> next()
>>> >>>>> > functions
>>> >>>>> > but for some reason the first call for either of these functions
>>> >>>>> > takes long
>>> >>>>> > time to execute.
>>> >>>>> >
>>> >>>>> > When the application was run, first call to next() or hasNext()
>>> took
>>> >>>>> > approximately 400 sec to execute, subsequent 99 calls took 0ms.
>>> >>>>> >
>>> >>>>> > Can someone please point out where the mistake is and if there
>>> is a
>>> >>>>> > faster
>>> >>>>> > way to find the nearest places. The layer has 10 million entries.
>>> >>>>> >
>>> >>>>> > Regards
>>> >>>>> > Abhijeet
>>> >>>>> >
>>> >>>>> > --
>>> >>>>> >
>>> >>>>> >
>>> >>>>
>>> >>>> --
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> > --
>>> >
>>> >
>>>
>>> --
>>>
>>>
>>>
>> --
>
>
>
--bcaec555561454f64f04cdefbc92
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
So,<div>this is with only a couple of hundred layer entries? Sounds very sl=
ow. Do you have a sample data file so I could test it?</div><div><br></div>=
<div>/peter</div><div class=3D"gmail_extra"><br clear=3D"all"><br>Cheers,<b=
r>
<br>/peter neubauer<br><br>G: =A0neubauer.peter<br>S: =A0peter.neubauer<br>=
P: =A0+46 704 106975<br>L: =A0 <a href=3D"http://www.linkedin.com/in/neubau=
er" target=3D"_blank">http://www.linkedin.com/in/neubauer</a><br>T: =A0 @pe=
terneubauer<br>
<br>Neo4j 1.8 GA - <a href=3D"http://www.dzone.com/links/neo4j_18_release_f=
luent_graph_literacy.html" target=3D"_blank">http://www.dzone.com/links/neo=
4j_18_release_fluent_graph_literacy.html</a><br>
<br><br><div class=3D"gmail_quote">On Wed, Nov 7, 2012 at 10:32 AM, Abhijee=
t Deshpande <span dir=3D"ltr"><<a href=3D"mailto:avdeshpa...@gmail.com" =
target=3D"_blank">avdeshpa...@gmail.com</a>></span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">
Hi Craig<br><br>Thank you for the response. As suggested, I have removed th=
e sorting component from the request and now simply fetching paginated resu=
lts, 100 per page, using the following call.<br><br>String[] keys =3D {&quo=
t;id","name","address","city","stat=
e","zip"};<br>
GeoPipeline flowList =3D ((GeoPipeline)GeoPipeline.startNearestNeighborLatL=
onSearch(layer, loc, dist).range(low, high)).copyDatabaseRecordProperties(k=
eys);<br><br>Further I traverse the result using this loop<br><br>while(flo=
wList.hasNext()){<br>
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 geoPipeFlow =3D flowList.=
next();<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 ------<br>=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 -------<br>=A0=A0=A0=
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br><br>I have observed that the graph sear=
ch time is negligible but when I traverse the result in while loop, for low=
er page ranges (Page 1: 1-100, Page 2: 101-200, Page 3: 201-300) it takes a=
bout 800-900 milliseconds to execute the first hasNext() call and for later=
pages like 8,9,10 it takes about 4000 to 5000 ms. This slows down the over=
all performance.<br>
<br>Please let me know if it is possible to significantly reduce this resul=
t traversal time irrespective of the page being requested. <br><br>I suspec=
t it may be related to the way I am creating the layer in graph database an=
d hence I have also attached the java code file that populates graph databa=
se. Code steps at a high level are<br>
<br>1. Create graphDatabaseService<br>2. Create spatialDatabaseService<br>3=
. Create SimplePointLayer called places<br>4. Read a place record that cont=
ains lat and long along with other details from file and input file add it =
to the layer to create SpatialDatabaseRecord<br>
5. To this newly created node add other properties like place name, address=
, zip code etc<br><br>Can the access time be reduced if we create some inde=
x which can be used while traversal?<br><br>Please let me know your thought=
s on this. <br>
<br>Regards<span class=3D"HOEnZb"><font color=3D"#888888"><br>Abhijeet</fon=
t></span><div class=3D"im"><br><br><br><br>On Thursday, 25 October 2012 17:=
15:55 UTC+5:30, Craig Taverner wrote:</div><blockquote class=3D"gmail_quot=
e" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-l=
eft:1ex">
<div class=3D"im">The best solution is to perform the query with a sufficie=
ntly large bounding box to give at least the number of results you expect, =
and then soft and limit in the client code afterwards. This works very well=
, if you guess the bounding box correctly. That guess is best done with dom=
ain knowledge, something the client code is more likely to have.<div>
<br></div><div>The fundamental problem here, and the reason why the sorting=
is not done internally, is that the spatial index is based on location, no=
t distance. While it is possible to make an index based on distance, the or=
igin of the search would be specific to the index. This means the index wou=
ld only work for searches of distance from a particular point always, not g=
eneralized to any point. So, to support searches around any point (the poin=
t you pass in the search query), we need to build a bounding box, query the=
index on that, and then filter to points at the right distance.</div>
</div><div><br><div class=3D"gmail_quote"><div><div class=3D"h5">On Thu, Oc=
t 25, 2012 at 1:23 PM, Peter Neubauer <span dir=3D"ltr"><<a>peter.n...@n=
eotechnology.<u></u>com</a>></span> wrote:<br>
</div></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><div><div class=3D"h5">Abhijeet,=
<br>
from the docs and implementation, there is no sorting going on here.<br>
Instead, all of the returned points satisfy the bonuding box you are<br>
requesting<br>
<br>
/**<br>
=A0 =A0 =A0 =A0 =A0* Extracts Layer items with a distance from the given po=
int that is<br>
less than or equal the given distance.<br>
=A0 =A0 =A0 =A0 =A0*<br>
=A0 =A0 =A0* @param layer with latitude, longitude coordinates<br>
=A0 =A0 =A0 =A0 =A0* @param point<br>
=A0 =A0 =A0 =A0 =A0* @param maxDistanceInKm<br>
=A0 =A0 =A0 =A0 =A0* @return geoPipeline<br>
=A0 =A0 =A0 =A0 =A0*/<br>
=A0 =A0 =A0 =A0 public static GeoPipeline startNearestNeighborLatLonSear<u>=
</u>ch(Layer<br>
layer, Coordinate point, double maxDistanceInKm) {<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Envelope searchWindow =3D<br>
OrthodromicDistance.<u></u>suggestSearchWindow(point, maxDistanceInKm);<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 GeoPipeline pipeline =3D start(layer, new S=
earchIntersectWindow(layer,<br>
searchWindow))<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 .calculateOrthodromicDistan=
ce(<u></u>point);<br>
<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (layer.getGeometryType() =3D=3D Constant=
s.GTYPE_POINT) {<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pipeline =3D pipeline.prope=
rtyFilter("<u></u>OrthodromicDistance",<br>
maxDistanceInKm, FilterPipe.Filter.LESS_THAN_<u></u>EQUAL);<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }<br>
<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return pipeline;<br>
=A0 =A0 =A0 =A0 }<br>
<br>
=A0 =A0 =A0 =A0 /**<br>
=A0 =A0 =A0 =A0 =A0* Calculates the distance between Layer items nearest to=
the given<br>
point and the given point.<br>
=A0 =A0 =A0 =A0 =A0* The search window created is based on Layer items dens=
ity and it<br>
could lead to no results.<br>
=A0 =A0 =A0 =A0 =A0*<br>
=A0 =A0 =A0 =A0 =A0* @param layer<br>
=A0 =A0 =A0 =A0 =A0* @param point<br>
=A0 =A0 =A0* @param numberOfItemsToFind tries to find this number of items<=
br>
for comparison<br>
=A0 =A0 =A0 =A0 =A0* @return geoPipeline<br>
=A0 =A0 =A0 =A0 =A0*/<br>
=A0 =A0 =A0 =A0 public static GeoPipeline startNearestNeighborSearch(<u></u=
>Layer layer,<br>
Coordinate point, int numberOfItemsToFind) {<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Envelope searchWindow =3D<br>
SpatialTopologyUtils.<u></u>createEnvelopeForGeometryDensi<u></u>tyEstimate=
(layer,<br>
point, numberOfItemsToFind);<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return startNearestNeighborSearch(<u></u>la=
yer, point, searchWindow);<br>
=A0 =A0 =A0 =A0 }<br>
<br>
<br>
For this to happen, you could either contribute a<br>
SortingNearestNeighborSearch (that does the sorting for you) or have a<br>
second sort step?<br>
<div><br>
Cheers,<br>
<br>
/peter neubauer<br>
<br>
G: =A0neubauer.peter<br>
S: =A0peter.neubauer<br>
P: =A0<a value=3D"+46704106975">+46 704 106975</a><br>
L: =A0 <a href=3D"http://www.linkedin.com/in/neubauer" target=3D"_blank">ht=
tp://www.linkedin.com/in/<u></u>neubauer</a><br>
T: =A0 @peterneubauer<br>
<br>
Neo4j 1.8 GA - <a href=3D"http://www.dzone.com/links/neo4j_18_release_fluen=
t_graph_literacy.html" target=3D"_blank">http://www.dzone.com/links/<u></u>=
neo4j_18_release_fluent_graph_<u></u>literacy.html</a><br>
<br>
<br>
</div></div></div><div><div><div><div class=3D"h5">On Thu, Oct 25, 2012 at =
12:10 PM, Abhijeet Deshpande<br></div></div><div><div class=3D"h5">
<<a>avdes...@gmail.com</a>> wrote:<br>
> Hi<br>
> As specified below, can someone please help me understand why<br>
> startNearestNeighborSearch / startNearestNeighborLatLonSear<u></u>ch f=
unctions<br>
> don't return the results sorted on the distance. I have read some<=
br>
> documentation which suggests that these functions return the results s=
orted<br>
> on distance starting with the nearest one.<br>
><br>
> Thanks<br>
> Abhijeet<br>
><br>
> On Tuesday, 23 October 2012 01:11:02 UTC+5:30, Abhijeet Deshpande wrot=
e:<br>
>><br>
>><br>
>> Hi Craig, Peter<br>
>><br>
>> Thank you for the inputs.<br>
>><br>
>> Removing sorting helped quite a lot and I also reduced the result =
size<br>
>> from 100 to 10. Now the response times have come down to a few mil=
liseconds<br>
>> but the downside is that the results are not sorted upon the dista=
nce.<br>
>><br>
>> The objective is to get 10 places per page starting with the close=
st one.<br>
>> To implement this I tried following GeoPipeline functions<br>
>><br>
>> 1. startNearestNeighborSearch(<u></u>layer, point, distance) and<b=
r>
>> 2, startNearestNeighborLatLonSear<u></u>ch(layer, point, distance)=
<br>
>><br>
>> but none of them returned the results sorted on distance even if t=
he<br>
>> function names seem to suggest so. As per the function definition,=
no<br>
>> additional sort filters are needed so please let me know if I am m=
issing<br>
>> something.<br>
>><br>
>> I also observed that the results of startNearestNeighborLatLonSear=
<u></u>ch are<br>
>> better sorted than startNearestNeighborSearch and think it has got=
something<br>
>> to do with the propertyFilter that these functions use. I would li=
ke to know<br>
>> more on =A0"OrthodromicDistance" and "Distance"=
; properties and the difference<br>
>> between their respecitive property filters.<br>
>><br>
>> Regards<br>
>> Abhijeet<br>
>><br>
>><br>
>> On Friday, 19 October 2012 19:04:56 UTC+5:30, Craig Taverner wrote=
:<br>
>>><br>
>>> I can suggest that since you have a sorting component in the p=
ipeline,<br>
>>> this will cause the pipeline to internally cache the entire re=
sultset on the<br>
>>> first hasNext or next call, sort it and return the first sorte=
d result.<br>
>>> Subsequent calls will access the internal, sorted, cache. This=
is a<br>
>>> necessary consequence of the sorting algorithm, and is mostly =
unavoidable.<br>
>>><br>
>>> So avoid this, you would need to take away the sort. Is it pos=
sible to<br>
>>> get the same results you want without the sort? Or at least by=
moving the<br>
>>> sort into your own code, or perhaps after a filter or some oth=
er action that<br>
>>> reduces the resultset size?<br>
>>><br>
>>> On Fri, Oct 19, 2012 at 2:13 PM, Abhijeet Deshpande <<a>avd=
es...@gmail.com</a>><br>
>>> wrote:<br>
>>>><br>
>>>> Hi Peter,<br>
>>>> Thank you for the response.<br>
>>>><br>
>>>> Actually this is not related to the very first run of quer=
y where the<br>
>>>> cache is empty.<br>
>>>><br>
>>>> The problem is that when ever I try to search places and r=
equest for 100<br>
>>>> places per page in response, first call to Pipeline.hasNex=
t() /<br>
>>>> Pipeline.next() always takes about 400 sec. time and this =
happens for every<br>
>>>> request.<br>
>>>><br>
>>>> Hope I am able to convey the problem that I am facing.<br>
>>>><br>
>>>> Regards<br>
>>>> Abhijeet<br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>>> On Monday, 15 October 2012 18:35:04 UTC+5:30, Peter Neubau=
er wrote:<br>
>>>>><br>
>>>>> Hi there,<br>
>>>>> the first run is probably with cold caches. In real wo=
rld scenarios,<br>
>>>>> you are running with warm caches, so you should try to=
warm up the<br>
>>>>> database by doing a few searches before the real work,=
or maybe loop<br>
>>>>> through your interesting nodes or so?<br>
>>>>><br>
>>>>> Cheers,<br>
>>>>><br>
>>>>> /peter neubauer<br>
>>>>><br>
>>>>> G: =A0neubauer.peter<br>
>>>>> S: =A0peter.neubauer<br>
>>>>> P: =A0<a href=3D"tel:%2B46%20704%20106975" value=3D"+4=
6704106975" target=3D"_blank">+46 704 106975</a><br>
>>>>> L: =A0 <a href=3D"http://www.linkedin.com/in/neubauer"=
target=3D"_blank">http://www.linkedin.com/in/<u></u>neubauer</a><br>
>>>>> T: =A0 @peterneubauer<br>
>>>>><br>
>>>>> Neo4j 1.8 GA -<br>
>>>>> <a href=3D"http://www.dzone.com/links/neo4j_18_release=
_fluent_graph_literacy.html" target=3D"_blank">http://www.dzone.com/links/<=
u></u>neo4j_18_release_fluent_graph_<u></u>literacy.html</a><br>
>>>>><br>
>>>>><br>
>>>>> On Mon, Oct 15, 2012 at 2:10 PM, Abhijeet Deshpande<br=
>
>>>>> <<a>avdes...@gmail.com</a>> wrote:<br>
>>>>> > Hi<br>
>>>>> > I am using neo4j spatial to find out locations ne=
ar by a lat/long.<br>
>>>>> > The<br>
>>>>> > results returned are sorted on OrthodromicDistanc=
e and are further<br>
>>>>> > paginated<br>
>>>>> > using the range function (100 results per page). =
The code for this is<br>
>>>>> ><br>
>>>>> > GeoPipeline flowList =3D<br>
>>>>> > (GeoPipeline)GeoPipeline.<u></u>startNearestNeigh=
borLatLonSear<u></u>ch(layer, loc,<br>
>>>>> ><br>
>>>>> > dist).sort("<u></u>OrthodromicDistance"=
).<u></u>copyDatabaseRecordProperties(<u></u>keys).range(low,<br>
>>>>> > high);<br>
>>>>> ><br>
>>>>> > Now to iterate over this GeoPipline, I am using h=
asNext() / next()<br>
>>>>> > functions<br>
>>>>> > but for some reason the first call for either of =
these functions<br>
>>>>> > takes long<br>
>>>>> > time to execute.<br>
>>>>> ><br>
>>>>> > When the application was run, first call to next(=
) or hasNext() took<br>
>>>>> > approximately 400 sec to execute, subsequent 99 c=
alls took 0ms.<br>
>>>>> ><br>
>>>>> > Can someone please point out where the mistake is=
and if there is a<br>
>>>>> > faster<br>
>>>>> > way to find the nearest places. The layer has 10 =
million entries.<br>
>>>>> ><br>
>>>>> > Regards<br>
>>>>> > Abhijeet<br>
>>>>> ><br>
>>>>> > --<br>
>>>>> ><br>
>>>>> ><br>
>>>><br>
>>>> --<br>
>>>><br>
>>>><br>
>>><br>
>>><br>
> --<br>
><br>
><br>
<br>
</div></div></div></div>--<br>
<br>
<br>
</blockquote></div><br></div>
</blockquote>
<p></p>
-- <br>
=A0<br>
=A0<br>
</blockquote></div><br></div>
--bcaec555561454f64f04cdefbc92--