Reuters Example Crashing w/ Large Dataset

54 views
Skip to first unread message

John Rusnak

unread,
Jul 11, 2014, 4:10:26 PM7/11/14
to ajax...@googlegroups.com
I am working with Solr 4.8.1 and a slightly modified version of the Reuters tutorial.
My data set is 15000 files of varying formats (.txt .doc .ppt .pdf etc) and varying sizes (1kb per file to 500000kb+ per file)

I have modified the tutorial only so much as to take out widgets that I am not using, leaving me with only the text search, results, and pager widgets - I do not believe these edits are the root of my issue.
I am successfully able to index documents and display them on the web interface using the evolvingweb/reuters tutorial to a certain point - this is my issue.

When my Solr index grows above a certain size and I try to load the index.html page of my AJAX Solr, the web browser I am using crashes after trying to load the page for some time (this happens in Mozilla & Chrome, and the server keeps running).
I have read that these tutorial widgets are simply designed as examples, not production code - as such, my question is what direction do I need to go in to turn these examples into widgets and/or an interface capable of handling a high volume of data.

Any ideas are appreciated!

James McKinney

unread,
Jul 11, 2014, 5:05:07 PM7/11/14
to ajax...@googlegroups.com
You may want to set an “fl” parameter to limit the fields returned, in particular to exclude whatever field is likely to have the 500MB file. http://wiki.apache.org/solr/CommonQueryParameters#fl

J

--
You received this message because you are subscribed to the Google Groups "ajax-solr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ajax-solr+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Rusnak

unread,
Jul 23, 2014, 3:07:43 PM7/23/14
to ajax...@googlegroups.com
The fl parameter seems like it could be a plausible solution but I'm wary to mess with the schema of my index more than I already have - I'm currently looking into ways to either lazily load some of the widgets to take strain off the browser or possibly only loading the search results once the request has been sent to the server (as apposed to loading them all on page load as in the tutorial).

Still working out which files I'll actually have to edit to manage this.

James McKinney

unread,
Jul 23, 2014, 3:18:19 PM7/23/14
to ajax...@googlegroups.com
Setting the fl parameter doesn’t require changing the schema. Just do:

Manager.store.addByValue(‘fl’, ‘list of fields to return’)

John Rusnak

unread,
Jul 31, 2014, 12:10:02 PM7/31/14
to ajax...@googlegroups.com
I see it - I've managed to change the parameters so it's only passing the data I need but it's still causing the browser to crash.

<     Manager.store.addByValue('fl', 'resourcename,content,author,last_modified');     >

I think the large amount of text within the documents (doc.content) that I'm passing is what's causing this, however the only way I know how to get summary snippets is to pass the entire document content.

I may look into highlighting parameters to see if there is another way to pass a summary snippet with less data.

Or I may try to find a way to give a data overload error that will stop the processes before the browser crashes.
Reply all
Reply to author
Forward
0 new messages