Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion GeoPipeline hasNext() / next() functions take a long time for the first time

Date: Wed, 7 Nov 2012 10:32:50 -0800 (PST)
From: Abhijeet Deshpande <avdeshpa...@gmail.com>
To: neo4j@googlegroups.com
Message-Id: <aba12a0a-ac2f-439f-bece-095b486a3540@googlegroups.com>
In-Reply-To: <CAE2kSFfwF6dKWVUL+MhCdREWLfpNOi5xQdTBMagST0u3Kc=CwA@mail.gmail.com>
References: <f9e1e309-9dbd-40be-8076-57889b21aaea@googlegroups.com>
 <CAF59RW5fyC12N7q4SUdzj=ZwJ+GEmwc2M4Vt80on-fW_QV=f3w@mail.gmail.com>
 <5cab6a2c-d765-4d26-a11b-5272ca53e405@googlegroups.com>
 <CAE2kSFeykXwUj_RX9v7P2z8-O+pgpdKwQgmif-Vx+VyL8AW1aA@mail.gmail.com>
 <4d5ed4d4-d601-4b17-b9c5-95786b4bb70a@googlegroups.com>
 <3df0f296-718e-4027-be08-416ba745359f@googlegroups.com>
 <CAF59RW6vw+Cjfn+ZB5oJ2e_O04Zt65F5UHBu=XULHW8aejwqMA@mail.gmail.com>
 <CAE2kSFfwF6dKWVUL+MhCdREWLfpNOi5xQdTBMagST0u3Kc=CwA@mail.gmail.com>
Subject: Re: [Neo4j] GeoPipeline hasNext() / next() functions take a long
 time for the first time
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_223_17950529.1352313171009"

------=_Part_223_17950529.1352313171009
Content-Type: multipart/alternative; 
	boundary="----=_Part_224_17366687.1352313171011"

------=_Part_224_17366687.1352313171011
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Hi Craig

Thank you for the response. As suggested, I have removed the sorting 
component from the request and now simply fetching paginated results, 100 
per page, using the following call.

String[] keys = {"id","name","address","city","state","zip"};
GeoPipeline flowList = 
((GeoPipeline)GeoPipeline.startNearestNeighborLatLonSearch(layer, loc, 
dist).range(low, high)).copyDatabaseRecordProperties(keys);

Further I traverse the result using this loop

while(flowList.hasNext()){
                    geoPipeFlow = flowList.next();
                    ------
                    -------
                }

I have observed that the graph search time is negligible but when I 
traverse the result in while loop, for lower page ranges (Page 1: 1-100, 
Page 2: 101-200, Page 3: 201-300) it takes about 800-900 milliseconds to 
execute the first hasNext() call and for later pages like 8,9,10 it takes 
about 4000 to 5000 ms. This slows down the overall performance.

Please let me know if it is possible to significantly reduce this result 
traversal time irrespective of the page being requested. 

I suspect it may be related to the way I am creating the layer in graph 
database and hence I have also attached the java code file that populates 
graph database. Code steps at a high level are

1. Create graphDatabaseService
2. Create spatialDatabaseService
3. Create SimplePointLayer called places
4. Read a place record that contains lat and long along with other details 
from file and input file add it to the layer to create SpatialDatabaseRecord
5. To this newly created node add other properties like place name, 
address, zip code etc

Can the access time be reduced if we create some index which can be used 
while traversal?

Please let me know your thoughts on this. 

Regards
Abhijeet



On Thursday, 25 October 2012 17:15:55 UTC+5:30, Craig Taverner wrote:
>
> The best solution is to perform the query with a sufficiently large 
> bounding box to give at least the number of results you expect, and then 
> soft and limit in the client code afterwards. This works very well, if you 
> guess the bounding box correctly. That guess is best done with domain 
> knowledge, something the client code is more likely to have.
>
> The fundamental problem here, and the reason why the sorting is not done 
> internally, is that the spatial index is based on location, not distance. 
> While it is possible to make an index based on distance, the origin of the 
> search would be specific to the index. This means the index would only work 
> for searches of distance from a particular point always, not generalized to 
> any point. So, to support searches around any point (the point you pass in 
> the search query), we need to build a bounding box, query the index on 
> that, and then filter to points at the right distance.
>
> On Thu, Oct 25, 2012 at 1:23 PM, Peter Neubauer <
> peter.n...@neotechnology.com <javascript:>> wrote:
>
>> Abhijeet,
>> from the docs and implementation, there is no sorting going on here.
>> Instead, all of the returned points satisfy the bonuding box you are
>> requesting
>>
>> /**
>>          * Extracts Layer items with a distance from the given point that 
>> is
>> less than or equal the given distance.
>>          *
>>      * @param layer with latitude, longitude coordinates
>>          * @param point
>>          * @param maxDistanceInKm
>>          * @return geoPipeline
>>          */
>>         public static GeoPipeline startNearestNeighborLatLonSearch(Layer
>> layer, Coordinate point, double maxDistanceInKm) {
>>                 Envelope searchWindow =
>> OrthodromicDistance.suggestSearchWindow(point, maxDistanceInKm);
>>                 GeoPipeline pipeline = start(layer, new 
>> SearchIntersectWindow(layer,
>> searchWindow))
>>                         .calculateOrthodromicDistance(point);
>>
>>                 if (layer.getGeometryType() == Constants.GTYPE_POINT) {
>>                         pipeline = 
>> pipeline.propertyFilter("OrthodromicDistance",
>> maxDistanceInKm, FilterPipe.Filter.LESS_THAN_EQUAL);
>>                 }
>>
>>                 return pipeline;
>>         }
>>
>>         /**
>>          * Calculates the distance between Layer items nearest to the 
>> given
>> point and the given point.
>>          * The search window created is based on Layer items density and 
>> it
>> could lead to no results.
>>          *
>>          * @param layer
>>          * @param point
>>      * @param numberOfItemsToFind tries to find this number of items
>> for comparison
>>          * @return geoPipeline
>>          */
>>         public static GeoPipeline startNearestNeighborSearch(Layer layer,
>> Coordinate point, int numberOfItemsToFind) {
>>                 Envelope searchWindow =
>> SpatialTopologyUtils.createEnvelopeForGeometryDensityEstimate(layer,
>> point, numberOfItemsToFind);
>>                 return startNearestNeighborSearch(layer, point, 
>> searchWindow);
>>         }
>>
>>
>> For this to happen, you could either contribute a
>> SortingNearestNeighborSearch (that does the sorting for you) or have a
>> second sort step?
>>
>> Cheers,
>>
>> /peter neubauer
>>
>> G:  neubauer.peter
>> S:  peter.neubauer
>> P:  +46 704 106975
>> L:   http://www.linkedin.com/in/neubauer
>> T:   @peterneubauer
>>
>> Neo4j 1.8 GA - 
>> http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html
>>
>>
>> On Thu, Oct 25, 2012 at 12:10 PM, Abhijeet Deshpande
>> <avdes...@gmail.com <javascript:>> wrote:
>> > Hi
>> > As specified below, can someone please help me understand why
>> > startNearestNeighborSearch / startNearestNeighborLatLonSearch functions
>> > don't return the results sorted on the distance. I have read some
>> > documentation which suggests that these functions return the results 
>> sorted
>> > on distance starting with the nearest one.
>> >
>> > Thanks
>> > Abhijeet
>> >
>> > On Tuesday, 23 October 2012 01:11:02 UTC+5:30, Abhijeet Deshpande wrote:
>> >>
>> >>
>> >> Hi Craig, Peter
>> >>
>> >> Thank you for the inputs.
>> >>
>> >> Removing sorting helped quite a lot and I also reduced the result size
>> >> from 100 to 10. Now the response times have come down to a few 
>> milliseconds
>> >> but the downside is that the results are not sorted upon the distance.
>> >>
>> >> The objective is to get 10 places per page starting with the closest 
>> one.
>> >> To implement this I tried following GeoPipeline functions
>> >>
>> >> 1. startNearestNeighborSearch(layer, point, distance) and
>> >> 2, startNearestNeighborLatLonSearch(layer, point, distance)
>> >>
>> >> but none of them returned the results sorted on distance even if the
>> >> function names seem to suggest so. As per the function definition, no
>> >> additional sort filters are needed so please let me know if I am 
>> missing
>> >> something.
>> >>
>> >> I also observed that the results of startNearestNeighborLatLonSearch 
>> are
>> >> better sorted than startNearestNeighborSearch and think it has got 
>> something
>> >> to do with the propertyFilter that these functions use. I would like 
>> to know
>> >> more on  "OrthodromicDistance" and "Distance" properties and the 
>> difference
>> >> between their respecitive property filters.
>> >>
>> >> Regards
>> >> Abhijeet
>> >>
>> >>
>> >> On Friday, 19 October 2012 19:04:56 UTC+5:30, Craig Taverner wrote:
>> >>>
>> >>> I can suggest that since you have a sorting component in the pipeline,
>> >>> this will cause the pipeline to internally cache the entire resultset 
>> on the
>> >>> first hasNext or next call, sort it and return the first sorted 
>> result.
>> >>> Subsequent calls will access the internal, sorted, cache. This is a
>> >>> necessary consequence of the sorting algorithm, and is mostly 
>> unavoidable.
>> >>>
>> >>> So avoid this, you would need to take away the sort. Is it possible to
>> >>> get the same results you want without the sort? Or at least by moving 
>> the
>> >>> sort into your own code, or perhaps after a filter or some other 
>> action that
>> >>> reduces the resultset size?
>> >>>
>> >>> On Fri, Oct 19, 2012 at 2:13 PM, Abhijeet Deshpande <
>> avdes...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi Peter,
>> >>>> Thank you for the response.
>> >>>>
>> >>>> Actually this is not related to the very first run of query where the
>> >>>> cache is empty.
>> >>>>
>> >>>> The problem is that when ever I try to search places and request for 
>> 100
>> >>>> places per page in response, first call to Pipeline.hasNext() /
>> >>>> Pipeline.next() always takes about 400 sec. time and this happens 
>> for every
>> >>>> request.
>> >>>>
>> >>>> Hope I am able to convey the problem that I am facing.
>> >>>>
>> >>>> Regards
>> >>>> Abhijeet
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Monday, 15 October 2012 18:35:04 UTC+5:30, Peter Neubauer wrote:
>> >>>>>
>> >>>>> Hi there,
>> >>>>> the first run is probably with cold caches. In real world scenarios,
>> >>>>> you are running with warm caches, so you should try to warm up the
>> >>>>> database by doing a few searches before the real work, or maybe loop
>> >>>>> through your interesting nodes or so?
>> >>>>>
>> >>>>> Cheers,
>> >>>>>
>> >>>>> /peter neubauer
>> >>>>>
>> >>>>> G:  neubauer.peter
>> >>>>> S:  peter.neubauer
>> >>>>> P:  +46 704 106975
>> >>>>> L:   http://www.linkedin.com/in/neubauer
>> >>>>> T:   @peterneubauer
>> >>>>>
>> >>>>> Neo4j 1.8 GA -
>> >>>>> 
>> http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Oct 15, 2012 at 2:10 PM, Abhijeet Deshpande
>> >>>>> <avdes...@gmail.com> wrote:
>> >>>>> > Hi
>> >>>>> > I am using neo4j spatial to find out locations near by a lat/long.
>> >>>>> > The
>> >>>>> > results returned are sorted on OrthodromicDistance and are further
>> >>>>> > paginated
>> >>>>> > using the range function (100 results per page). The code for 
>> this is
>> >>>>> >
>> >>>>> > GeoPipeline flowList =
>> >>>>> > (GeoPipeline)GeoPipeline.startNearestNeighborLatLonSearch(layer, 
>> loc,
>> >>>>> >
>> >>>>> > 
>> dist).sort("OrthodromicDistance").copyDatabaseRecordProperties(keys).range(low,
>> >>>>> > high);
>> >>>>> >
>> >>>>> > Now to iterate over this GeoPipline, I am using hasNext() / next()
>> >>>>> > functions
>> >>>>> > but for some reason the first call for either of these functions
>> >>>>> > takes long
>> >>>>> > time to execute.
>> >>>>> >
>> >>>>> > When the application was run, first call to next() or hasNext() 
>> took
>> >>>>> > approximately 400 sec to execute, subsequent 99 calls took 0ms.
>> >>>>> >
>> >>>>> > Can someone please point out where the mistake is and if there is 
>> a
>> >>>>> > faster
>> >>>>> > way to find the nearest places. The layer has 10 million entries.
>> >>>>> >
>> >>>>> > Regards
>> >>>>> > Abhijeet
>> >>>>> >
>> >>>>> > --
>> >>>>> >
>> >>>>> >
>> >>>>
>> >>>> --
>> >>>>
>> >>>>
>> >>>
>> >>>
>> > --
>> >
>> >
>>
>> --
>>
>>
>>
>
------=_Part_224_17366687.1352313171011
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi Craig<br><br>Thank you for the response. As suggested, I have removed th=
e sorting component from the request and now simply fetching paginated resu=
lts, 100 per page, using the following call.<br><br>String[] keys =3D {"id"=
,"name","address","city","state","zip"};<br>GeoPipeline flowList =3D ((GeoP=
ipeline)GeoPipeline.startNearestNeighborLatLonSearch(layer, loc, dist).rang=
e(low, high)).copyDatabaseRecordProperties(keys);<br><br>Further I traverse=
 the result using this loop<br><br>while(flowList.hasNext()){<br>&nbsp;&nbs=
p;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nb=
sp;&nbsp; geoPipeFlow =3D flowList.next();<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbs=
p;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ------<br=
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -------<br>&nbsp;&nbsp;&nbsp; &nbs=
p;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; }<br><br>I have observ=
ed that the graph search time is negligible but when I traverse the result =
in while loop, for lower page ranges (Page 1: 1-100, Page 2: 101-200, Page =
3: 201-300) it takes about 800-900 milliseconds to execute the first hasNex=
t() call and for later pages like 8,9,10 it takes about 4000 to 5000 ms. Th=
is slows down the overall performance.<br><br>Please let me know if it is p=
ossible to significantly reduce this result traversal time irrespective of =
the page being requested. <br><br>I suspect it may be related to the way I =
am creating the layer in graph database and hence I have also attached the =
java code file that populates graph database. Code steps at a high level ar=
e<br><br>1. Create graphDatabaseService<br>2. Create spatialDatabaseService=
<br>3. Create SimplePointLayer called places<br>4. Read a place record that=
 contains lat and long along with other details from file and input file ad=
d it to the layer to create SpatialDatabaseRecord<br>5. To this newly creat=
ed node add other properties like place name, address, zip code etc<br><br>=
Can the access time be reduced if we create some index which can be used wh=
ile traversal?<br><br>Please let me know your thoughts on this. <br><br>Reg=
ards<br>Abhijeet<br><br><br><br>On Thursday, 25 October 2012 17:15:55 UTC+5=
:30, Craig Taverner  wrote:<blockquote class=3D"gmail_quote" style=3D"margi=
n: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">The=
 best solution is to perform the query with a sufficiently large bounding b=
ox to give at least the number of results you expect, and then soft and lim=
it in the client code afterwards. This works very well, if you guess the bo=
unding box correctly. That guess is best done with domain knowledge, someth=
ing the client code is more likely to have.<div>
<br></div><div>The fundamental problem here, and the reason why the sorting=
 is not done internally, is that the spatial index is based on location, no=
t distance. While it is possible to make an index based on distance, the or=
igin of the search would be specific to the index. This means the index wou=
ld only work for searches of distance from a particular point always, not g=
eneralized to any point. So, to support searches around any point (the poin=
t you pass in the search query), we need to build a bounding box, query the=
 index on that, and then filter to points at the right distance.</div>
<div><br><div class=3D"gmail_quote">On Thu, Oct 25, 2012 at 1:23 PM, Peter =
Neubauer <span dir=3D"ltr">&lt;<a href=3D"javascript:" target=3D"_blank" gd=
f-obfuscated-mailto=3D"AvRq1CfQ4d0J">peter.n...@neotechnology.<wbr>com</a>&=
gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Abhijeet,<br>
from the docs and implementation, there is no sorting going on here.<br>
Instead, all of the returned points satisfy the bonuding box you are<br>
requesting<br>
<br>
/**<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* Extracts Layer items with a distance fr=
om the given point that is<br>
less than or equal the given distance.<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*<br>
&nbsp; &nbsp; &nbsp;* @param layer with latitude, longitude coordinates<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* @param point<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* @param maxDistanceInKm<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* @return geoPipeline<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*/<br>
&nbsp; &nbsp; &nbsp; &nbsp; public static GeoPipeline startNearestNeighborL=
atLonSear<wbr>ch(Layer<br>
layer, Coordinate point, double maxDistanceInKm) {<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Envelope searchWind=
ow =3D<br>
OrthodromicDistance.<wbr>suggestSearchWindow(point, maxDistanceInKm);<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; GeoPipeline pipelin=
e =3D start(layer, new SearchIntersectWindow(layer,<br>
searchWindow))<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; &nbsp; .calculateOrthodromicDistance(<wbr>point);<br>
<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (layer.getGeomet=
ryType() =3D=3D Constants.GTYPE_POINT) {<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
; &nbsp; pipeline =3D pipeline.propertyFilter("<wbr>OrthodromicDistance",<b=
r>
maxDistanceInKm, FilterPipe.Filter.LESS_THAN_<wbr>EQUAL);<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br>
<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return pipeline;<br=
>
&nbsp; &nbsp; &nbsp; &nbsp; }<br>
<br>
&nbsp; &nbsp; &nbsp; &nbsp; /**<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* Calculates the distance between Layer i=
tems nearest to the given<br>
point and the given point.<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* The search window created is based on L=
ayer items density and it<br>
could lead to no results.<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* @param layer<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* @param point<br>
&nbsp; &nbsp; &nbsp;* @param numberOfItemsToFind tries to find this number =
of items<br>
for comparison<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* @return geoPipeline<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*/<br>
&nbsp; &nbsp; &nbsp; &nbsp; public static GeoPipeline startNearestNeighborS=
earch(<wbr>Layer layer,<br>
Coordinate point, int numberOfItemsToFind) {<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Envelope searchWind=
ow =3D<br>
SpatialTopologyUtils.<wbr>createEnvelopeForGeometryDensi<wbr>tyEstimate(lay=
er,<br>
point, numberOfItemsToFind);<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return startNearest=
NeighborSearch(<wbr>layer, point, searchWindow);<br>
&nbsp; &nbsp; &nbsp; &nbsp; }<br>
<br>
<br>
For this to happen, you could either contribute a<br>
SortingNearestNeighborSearch (that does the sorting for you) or have a<br>
second sort step?<br>
<div><br>
Cheers,<br>
<br>
/peter neubauer<br>
<br>
G: &nbsp;neubauer.peter<br>
S: &nbsp;peter.neubauer<br>
P: &nbsp;<a value=3D"+46704106975">+46 704 106975</a><br>
L: &nbsp; <a href=3D"http://www.linkedin.com/in/neubauer" target=3D"_blank"=
>http://www.linkedin.com/in/<wbr>neubauer</a><br>
T: &nbsp; @peterneubauer<br>
<br>
Neo4j 1.8 GA - <a href=3D"http://www.dzone.com/links/neo4j_18_release_fluen=
t_graph_literacy.html" target=3D"_blank">http://www.dzone.com/links/<wbr>ne=
o4j_18_release_fluent_graph_<wbr>literacy.html</a><br>
<br>
<br>
</div><div><div>On Thu, Oct 25, 2012 at 12:10 PM, Abhijeet Deshpande<br>
&lt;<a href=3D"javascript:" target=3D"_blank" gdf-obfuscated-mailto=3D"AvRq=
1CfQ4d0J">avdes...@gmail.com</a>&gt; wrote:<br>
&gt; Hi<br>
&gt; As specified below, can someone please help me understand why<br>
&gt; startNearestNeighborSearch / startNearestNeighborLatLonSear<wbr>ch fun=
ctions<br>
&gt; don't return the results sorted on the distance. I have read some<br>
&gt; documentation which suggests that these functions return the results s=
orted<br>
&gt; on distance starting with the nearest one.<br>
&gt;<br>
&gt; Thanks<br>
&gt; Abhijeet<br>
&gt;<br>
&gt; On Tuesday, 23 October 2012 01:11:02 UTC+5:30, Abhijeet Deshpande wrot=
e:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Hi Craig, Peter<br>
&gt;&gt;<br>
&gt;&gt; Thank you for the inputs.<br>
&gt;&gt;<br>
&gt;&gt; Removing sorting helped quite a lot and I also reduced the result =
size<br>
&gt;&gt; from 100 to 10. Now the response times have come down to a few mil=
liseconds<br>
&gt;&gt; but the downside is that the results are not sorted upon the dista=
nce.<br>
&gt;&gt;<br>
&gt;&gt; The objective is to get 10 places per page starting with the close=
st one.<br>
&gt;&gt; To implement this I tried following GeoPipeline functions<br>
&gt;&gt;<br>
&gt;&gt; 1. startNearestNeighborSearch(<wbr>layer, point, distance) and<br>
&gt;&gt; 2, startNearestNeighborLatLonSear<wbr>ch(layer, point, distance)<b=
r>
&gt;&gt;<br>
&gt;&gt; but none of them returned the results sorted on distance even if t=
he<br>
&gt;&gt; function names seem to suggest so. As per the function definition,=
 no<br>
&gt;&gt; additional sort filters are needed so please let me know if I am m=
issing<br>
&gt;&gt; something.<br>
&gt;&gt;<br>
&gt;&gt; I also observed that the results of startNearestNeighborLatLonSear=
<wbr>ch are<br>
&gt;&gt; better sorted than startNearestNeighborSearch and think it has got=
 something<br>
&gt;&gt; to do with the propertyFilter that these functions use. I would li=
ke to know<br>
&gt;&gt; more on &nbsp;"OrthodromicDistance" and "Distance" properties and =
the difference<br>
&gt;&gt; between their respecitive property filters.<br>
&gt;&gt;<br>
&gt;&gt; Regards<br>
&gt;&gt; Abhijeet<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Friday, 19 October 2012 19:04:56 UTC+5:30, Craig Taverner wrote=
:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I can suggest that since you have a sorting component in the p=
ipeline,<br>
&gt;&gt;&gt; this will cause the pipeline to internally cache the entire re=
sultset on the<br>
&gt;&gt;&gt; first hasNext or next call, sort it and return the first sorte=
d result.<br>
&gt;&gt;&gt; Subsequent calls will access the internal, sorted, cache. This=
 is a<br>
&gt;&gt;&gt; necessary consequence of the sorting algorithm, and is mostly =
unavoidable.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; So avoid this, you would need to take away the sort. Is it pos=
sible to<br>
&gt;&gt;&gt; get the same results you want without the sort? Or at least by=
 moving the<br>
&gt;&gt;&gt; sort into your own code, or perhaps after a filter or some oth=
er action that<br>
&gt;&gt;&gt; reduces the resultset size?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Fri, Oct 19, 2012 at 2:13 PM, Abhijeet Deshpande &lt;<a>avd=
es...@gmail.com</a>&gt;<br>
&gt;&gt;&gt; wrote:<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Hi Peter,<br>
&gt;&gt;&gt;&gt; Thank you for the response.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Actually this is not related to the very first run of quer=
y where the<br>
&gt;&gt;&gt;&gt; cache is empty.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; The problem is that when ever I try to search places and r=
equest for 100<br>
&gt;&gt;&gt;&gt; places per page in response, first call to Pipeline.hasNex=
t() /<br>
&gt;&gt;&gt;&gt; Pipeline.next() always takes about 400 sec. time and this =
happens for every<br>
&gt;&gt;&gt;&gt; request.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Hope I am able to convey the problem that I am facing.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Regards<br>
&gt;&gt;&gt;&gt; Abhijeet<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; On Monday, 15 October 2012 18:35:04 UTC+5:30, Peter Neubau=
er wrote:<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Hi there,<br>
&gt;&gt;&gt;&gt;&gt; the first run is probably with cold caches. In real wo=
rld scenarios,<br>
&gt;&gt;&gt;&gt;&gt; you are running with warm caches, so you should try to=
 warm up the<br>
&gt;&gt;&gt;&gt;&gt; database by doing a few searches before the real work,=
 or maybe loop<br>
&gt;&gt;&gt;&gt;&gt; through your interesting nodes or so?<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Cheers,<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; /peter neubauer<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; G: &nbsp;neubauer.peter<br>
&gt;&gt;&gt;&gt;&gt; S: &nbsp;peter.neubauer<br>
&gt;&gt;&gt;&gt;&gt; P: &nbsp;+46 704 106975<br>
&gt;&gt;&gt;&gt;&gt; L: &nbsp; <a href=3D"http://www.linkedin.com/in/neubau=
er" target=3D"_blank">http://www.linkedin.com/in/<wbr>neubauer</a><br>
&gt;&gt;&gt;&gt;&gt; T: &nbsp; @peterneubauer<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Neo4j 1.8 GA -<br>
&gt;&gt;&gt;&gt;&gt; <a href=3D"http://www.dzone.com/links/neo4j_18_release=
_fluent_graph_literacy.html" target=3D"_blank">http://www.dzone.com/links/<=
wbr>neo4j_18_release_fluent_graph_<wbr>literacy.html</a><br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; On Mon, Oct 15, 2012 at 2:10 PM, Abhijeet Deshpande<br=
>
&gt;&gt;&gt;&gt;&gt; &lt;<a>avdes...@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt; &gt; Hi<br>
&gt;&gt;&gt;&gt;&gt; &gt; I am using neo4j spatial to find out locations ne=
ar by a lat/long.<br>
&gt;&gt;&gt;&gt;&gt; &gt; The<br>
&gt;&gt;&gt;&gt;&gt; &gt; results returned are sorted on OrthodromicDistanc=
e and are further<br>
&gt;&gt;&gt;&gt;&gt; &gt; paginated<br>
&gt;&gt;&gt;&gt;&gt; &gt; using the range function (100 results per page). =
The code for this is<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; GeoPipeline flowList =3D<br>
&gt;&gt;&gt;&gt;&gt; &gt; (GeoPipeline)GeoPipeline.<wbr>startNearestNeighbo=
rLatLonSear<wbr>ch(layer, loc,<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; dist).sort("<wbr>OrthodromicDistance").<wbr>copyD=
atabaseRecordProperties(<wbr>keys).range(low,<br>
&gt;&gt;&gt;&gt;&gt; &gt; high);<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; Now to iterate over this GeoPipline, I am using h=
asNext() / next()<br>
&gt;&gt;&gt;&gt;&gt; &gt; functions<br>
&gt;&gt;&gt;&gt;&gt; &gt; but for some reason the first call for either of =
these functions<br>
&gt;&gt;&gt;&gt;&gt; &gt; takes long<br>
&gt;&gt;&gt;&gt;&gt; &gt; time to execute.<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; When the application was run, first call to next(=
) or hasNext() took<br>
&gt;&gt;&gt;&gt;&gt; &gt; approximately 400 sec to execute, subsequent 99 c=
alls took 0ms.<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; Can someone please point out where the mistake is=
 and if there is a<br>
&gt;&gt;&gt;&gt;&gt; &gt; faster<br>
&gt;&gt;&gt;&gt;&gt; &gt; way to find the nearest places. The layer has 10 =
million entries.<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; Regards<br>
&gt;&gt;&gt;&gt;&gt; &gt; Abhijeet<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt; --<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; --<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt; --<br>
&gt;<br>
&gt;<br>
<br>
</div></div>--<br>
<br>
<br>
</blockquote></div><br></div>
</blockquote>
------=_Part_224_17366687.1352313171011--

------=_Part_223_17950529.1352313171009
Content-Type: text/x-java; charset=US-ASCII; name=PlaceImporter_V2.java
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename=PlaceImporter_V2.java
X-Attachment-Id: c8b138fd-448b-4409-85b3-cf88500a86b9

package com.raved.neoraved.util;

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.StringTokenizer;

import org.neo4j.gis.spatial.SimplePointLayer;
import org.neo4j.gis.spatial.SpatialDatabaseRecord;
import org.neo4j.gis.spatial.SpatialDatabaseService;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import org.neo4j.graphdb.index.Index;
import org.neo4j.kernel.impl.batchinsert.BatchInserter;
import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl;

import com.vividsolutions.jts.geom.Coordinate;

public class PlaceImporter_V2 {

	public void batchImportPlacesFromFile(String storeDir, String filePath) {
		long totalTime = System.currentTimeMillis();
		GraphDatabaseService graphDatabaseService = new GraphDatabaseFactory().newEmbeddedDatabase(storeDir);
		
		if (graphDatabaseService == null) {
			System.out.println("graphDatabaseService is null");
			return;
		}
		Index<Node> placeIndex = graphDatabaseService.index().forNodes("PLACE");
		
		SpatialDatabaseService spatialDatabaseService = new SpatialDatabaseService(graphDatabaseService);
		
		if(spatialDatabaseService == null){
			System.out.println("spatialDatabaseService is null");
			return;
		}
		int lineNumber = 0;
		long t;
		String id = null;
		String latitude = null;
		String longitude = null;
		String name = null;
		String address = null;
		String city = null;
		String state = null;
		String zip = null;
		SimplePointLayer layer = null;
		Transaction tx = graphDatabaseService.beginTx();
		try{
			layer = spatialDatabaseService.createSimplePointLayer("places");
		
			FileInputStream fstream = new FileInputStream(filePath);
			DataInputStream in = new DataInputStream(fstream);
			BufferedReader br = new BufferedReader(new InputStreamReader(in));
			String strLine;
			
			// Read File Line By Line
			while ((strLine = br.readLine()) != null) {

				try {
					t = System.currentTimeMillis();
					lineNumber++;
					StringTokenizer record = new StringTokenizer(strLine, "|");
					id = getNextToken(record);
					latitude = getNextToken(record);
					longitude = getNextToken(record);
					name = getNextToken(record);
					address = getNextToken(record);
					city = getNextToken(record);
					state = getNextToken(record);
					zip = getNextToken(record);
					double lat = Double.parseDouble(latitude);
					double lon = Double.parseDouble(longitude);
					Coordinate c = new Coordinate(lat, lon);
					SpatialDatabaseRecord place = layer.add(c);

					if (id != null){
						place.setProperty("id", id);
						placeIndex.add(place.getGeomNode(), "id", id);
					}

					if (name != null)
						place.setProperty("name", name);

					if (address != null)
						place.setProperty("address", address);

					if (city != null)
						place.setProperty("city", city);

					if (state != null)
						place.setProperty("state", state);

					if (zip != null)
						place.setProperty("zip", zip);

					tx.success();
					if(lineNumber%5000 == 0 ){
						tx.finish();
						tx = graphDatabaseService.beginTx();
					}
					System.out.println("Added Record: "+lineNumber+", "+name+", Time: "+(System.currentTimeMillis()-t)+"ms");
				} catch (NumberFormatException nfe) {
					System.out.println("************* Error lat: "+latitude+", lon: "+longitude);
				} catch (Exception e) {
					e.printStackTrace();
				}
			}
			
		}catch(Exception e){
			e.printStackTrace();
		}
		tx.finish();
		graphDatabaseService.shutdown();
		totalTime = System.currentTimeMillis() - totalTime;
		System.out.println("Time taken for "+lineNumber+" records: "+totalTime+"ms");
	}

	public static void main(String[] args) {
		if(args.length != 2){
			System.out.println("Usage PlaceImporterV2 <storeDir> <inputFile>");
			return;
		}
		PlaceImporter_V2 importer = new PlaceImporter_V2();
		importer.batchImportPlacesFromFile(args[0], args[1]);
	}
	
	private static String getNextToken(StringTokenizer tokenizer){
		String ret = null;
		if(tokenizer != null && tokenizer.hasMoreTokens())
			ret = tokenizer.nextToken().trim();
		return ret;
	}
}

------=_Part_223_17950529.1352313171009--