solrpy and "deep paging" ?

53 views
Skip to first unread message

ronce...@comcast.net

unread,
Nov 24, 2015, 1:12:30 PM11/24/15
to solrpy
Hi all.  I'm relatively new to Solr.

To date, my Python scripting has made us of the urllib2 library to call into Solr.  Most of the time, I'm looking for a single result, so this is okay.  Occasionally, though, I need to fetch alot of IDs.  When I know the # is relatively small (< a million), and memory is high, and CPU load is low, I'll go ahead and use said Python script to get what I need. 

But I know this is bad...

We use a Java-based "solr dumper" tool that makes use of the Lucene libraries when we need to retrieve a huge # of records.  In fact, I'm supposed to use it exclusively and stop using my Python scripts to fetch anything more than a few thousand results in size.

So that's when I started searching for a Python lib/module to handle Solr stuff (why didn't I think of this before...?) and came across these pages and "solrpy".

TL;DR: Does "solrpy" make direct use of the Lucene libs, allowing me to get around the "deep paging" problem of making an HTTP request?

If not, do you know of one that does...?

Thanks.

Andre Hagenbruch

unread,
Nov 25, 2015, 1:20:52 AM11/25/15
to sol...@googlegroups.com
Am 23.11.15 um 23:19 schrieb ronce...@comcast.net:

Hi,

> TL;DR: Does "solrpy" make direct use of the Lucene libs, allowing me to
> get around the "deep paging" problem of making an HTTP request?
>
> If not, do you know of one that does...?

though I don't think this is implemented in solrpy (yet) you might want
to have a look at
<https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results#PaginationofResults-FetchingALargeNumberofSortedResults:Cursors>.
A python library that handles this is e.g.
<http://solrclient.readthedocs.org/en/latest/>.

Hope that helps,

Andre

Fred Drake

unread,
Nov 25, 2015, 7:07:43 PM11/25/15
to sol...@googlegroups.com
On Wed, Nov 25, 2015 at 1:20 AM, Andre Hagenbruch <ahage...@gmail.com> wrote:
> though I don't think this is implemented in solrpy (yet) you might want
> to have a look at

It's worth noting that this probably warrants a much different API,
and doesn't seem like it needs to be part of solypy.

Paging with solrpy is available, but makes use of the paging support
available over HTTP still. The massive responses can be avoided, but
not the use of HTTP. Scaling would still be better using the Lucene
libraries directly.


-Fred

--
Fred L. Drake, Jr. <fred at fdrake.net>
"A storm broke loose in my mind." --Albert Einstein
Reply all
Reply to author
Forward
0 new messages