Advanced Search Tool: Suggestions for Implementing phase

5 views
Skip to first unread message

Ujitha Perera

unread,
Jun 4, 2016, 7:38:06 AM6/4/16
to plots-gsoc, Jeffrey Warren, David Days, Bryan
Hi all,

After moving database 'like queries' to a separate service class, we can go for optimising the search queries. Then no need to bother about particular requests and responses that come to the search controller. My curiosity is about the next step. We have to choose ideal mechanism for retrieve search results. And this search should support all the required options. Hence I would like to propose few methods that we can proceed with.

1. Treat search as a rails resource
In this approach we can create another resource called 'search' and we can handle all the search based operations from this resource. We can access to search service where the all queries located from the search model and we can manage search requests more efficiently and retrieve results more effectively. Then no need to append new search based methods to the search controller and all the validations and filtering can be done using search model. 

This approach manipulate the database with the user's requests. Then we can have search history. For the storage issues we can run a rake task to remove records that older than one month or two. 

For the testing purposes we can use usual controller tests and model tests. We can verify search results with the user experience using integration testing. There is nothing new and Its just rails basic structure. we can provide better solution for the current advanced search, by adding more options and filters for the user.  

2. Use sunspot (with solr)

solr is highly reliable, scalable service. Since we deal with large number of text based information like research notes, wiki edits, Q & A etc, this library has rich tools to use publiclab search. when we execute all the search queries on top of rails platform, search will take significant time period to respond to the user. Yes we had long conversation about this about two months ago. My suggestions and proposed implementation plan can be found in my GSoC proposal.

If we use this library search queries are not performed top of the rails platform and We can execute wide range of search queries using this gem. And for the sorting.

Yes, with the current situations of the system this may not be a huge requirement, but if think for the long run, solr can help to the system in various ways.


These are the suggestion for the implementation phase in  my project. I would like to know all the pros and cons from you all. I didn't go for long description here, If we have any doubts about these two approaches, I can give more details. And it will be great to know any other approaches that can be applied for my requirement.

Thanks,
Ujitha.

 
 
 
 


Jeffrey Warren

unread,
Jun 4, 2016, 9:08:43 AM6/4/16
to Ujitha Perera, Bryan, David Days, plots-gsoc, plot...@googlegroups.com

Great, this is a fantastic comparison of options. I like the testing plan for the first one -- we could also unit test the search model really thoroughly. And I also like the idea of solr because it seems very performant.

But is there any option to hybridize, by making a search resource/model that wraps Solr? Just wondering.

Jeff

--
You received this message because you are subscribed to the Google Groups "plots-gsoc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plots-gsoc+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Days

unread,
Jun 4, 2016, 9:51:41 AM6/4/16
to Jeffrey Warren, Ujitha Perera, Bryan, plots-gsoc, plots-dev
I think that we can hybridize (wrap solr) in a fairly straightforward manner.  The search classes and functions would be the front end, and solr would be the internal component to produce the results desired. 

I think that Ujitha is talking about using sunspot/solr directly throughout the project (option #2), or using a rails-based set of classes.  Is this correct, Ujitha?

If it is, then one look at the pros and cons for the approaches can be as follows:

  • Use rails search (either all custom, "wrapper" around sunspot/solr, etc)
    • Pros:  
      • Clear functionality for the development end (only have functions you need and want exposed).  
      • Custom functionality can be built as required.  
      • Can integrate with incremental changes from other projects (the location data discussion for Rich Profiles, for example)
      • Using a wrapper around solr can speed up some of the development (it can take care of deep-down functions like sorting and such)
      • Replacing the underlying search system (add or remove solr, or switch to something else) is easier because the public functionality is untouched--just the supporting code changes.
    • Cons:
      • Every new function has some technical debt, wherein the developer has to understand what the software is doing
      • Custom functionality can be a bit slower to develop, if there's a lot going on under the hood
      • If you use a wrapper to solr, then you need to understand two or three systems (search requirement, plots2, and solr) instead of one.
  • Use a search engine (sunspot/solr) directly throughout the project
    • Pros:
      • All the annoying little details (sorting algorithms, etc) are generally taken care of for you.
      • The search engine guys are usually pretty good at their job--their code is probably going to be faster than something you can come up with quickly.
      • Most common use-cases are taken care of--saves a lot of time and effort.
    • Cons:
      • Need to start any modifications with knowledge of how solr/sunspot works--an additional technical burden.
      • Hard to account for domain-specific requirements (for example, adding a geo location search from Rich Profiles, if solr/sunspot isn't built to handle geo data)
      • At times, using a third party search gets painful, so it's tempting to skip around solr--now you have a mixed system.
I emphasized the points that, to me, have the longest term effects, which may or may not be the most important.

Personally, I'm for using sunspot/solr (or custom code) under the hood (option 1, but as a wrapper around solr).  For me, the emphasis is on stabilizing the public facing endpoints--the RESTful calls and the wrapper search classes.  For long term maintenance, these are the things that will make the biggest difference, because follow-on developers "know" they can rely on a fixed, available function.  If a new set of requirements comes along, then adding a particular function  or capability is straightforward:  Figure out the endpoint or class function(s) that need(s) to be created, then start working under the hood to get the data to fulfill that.

I'm going to stop the rant right here, in case I'm completely misunderstanding what Ujitha meant.  Ujitha?


Jeffrey Warren

unread,
Jun 4, 2016, 9:54:34 AM6/4/16
to David Days, plot...@googlegroups.com, Bryan, Ujitha Perera, plots-gsoc

If it is what Ujitha meant, then I agree with David: #1 wrapping solr sounds like the best option!

Ujitha Perera

unread,
Jun 4, 2016, 11:32:14 AM6/4/16
to plots-gsoc, je...@publiclab.org, david....@gmail.com, btbo...@gmail.com
Yes, here I talk about two different approaches. by reading the explanation of David, I realise that there is an opportunity to hybridise both methods. Then we can have all the pros of both methods. 

And thank you David your explanation. I learned something new by reading this. If both David and Jeff are okay with this implementation, I can start developing by looking at this plan.

Here is TODO list (correct me if I'm doing any unnecessary tasks).

  • create search resource 
  • generate required model and views and db migrations
  • Modify current system to function top of 'search resources' MVC architecture
  • adding sunspot gem and setup solr to the system
  • generate required indexing fields 
  • develop search queries using sunspot's methods
  • Test using unit and integrations tests

Thanks,
Ujitha.


David Days

unread,
Jun 4, 2016, 12:08:52 PM6/4/16
to Ujitha Perera, Jeffrey Warren, plots-gsoc, btbo...@gmail.com

I think that your plan is good, Ujitha.  The search classes give is a chance to get the best of both worlds, and give us some flexibility in the future.

Just to explain a little further, I presented the options as I did based on my experience.  Some organizations choose to go with a full third party search provider, and other like to develop their own.  Each approach has strengths, but the full-third-party search approach is usually better for large, mature businesses that are typically slow to change--banks, large manufacturers, and government orgs typically fall into this category.

Public Lab is a more dynamic organization with some fast-changing systems and a good developer community, so we have the resources to do some of the dirty work ourselves.

Ujitha Perera

unread,
Jun 4, 2016, 1:00:39 PM6/4/16
to David Days, Jeffrey Warren, plots-gsoc, btbo...@gmail.com
Yes, This is good chance to experiment this kind of an implementation plan. Lets do some coding and then we can understand our progress level. 

So with this confirmation I’m going to complete above tasks one by one. If we have any modifications to this plan, we can post here. Then community and basically Jeff can update by reading this thread. 

Thanks,
Ujitha.

Jeffrey Warren

unread,
Jun 4, 2016, 1:04:15 PM6/4/16
to Ujitha Perera, David Days, Bryan, plots-gsoc

Super. Did you happen to see the discussion David and I had in your recent pull request on writing some search tests to merge into publiclab/plots2 master beforehand, so that there's a clear, agreed upon API that both you and other contributors can stick to as your branch diverges?

I liked the idea that this would make it easier to merge your changes back in down the road, and help other students remain compliant with the API you're building for.

Ujitha Perera

unread,
Jun 4, 2016, 1:34:49 PM6/4/16
to Jeffrey Warren, David Days, Bryan, plots-gsoc
Yes. We need to write enough test cases here. If we are trying to divide current search controller to a new simple controller and a rich search model, I think we can start writing tests for these new controller and model. Then I can submit direct PR to the publiclab/plots2 with the help of David. After that we can go for the test scripts that mentioned by the David in my PR comment. 


Thanks,
Ujitha.

Bryan

unread,
Jun 4, 2016, 1:35:11 PM6/4/16
to Jeffrey Warren, Ujitha Perera, David Days, plots-gsoc
David,

"Public Lab is a more dynamic organization with some fast-changing systems and a good developer community, so we have the resources to do some of the dirty work ourselves." For about 3 months a year, assuming GSoC is in session and Public Lab has a project for the year, then this is true. Otherwise there's usually just one constant developer to build and maintain everything with few and infrequent contributions from outsiders.

I don't think it negates what you are suggesting, I just don't want you to assume the current developer community will exist, say, 4 or 5 months from now.
-Bryan

Jeffrey Warren

unread,
Jun 4, 2016, 1:36:51 PM6/4/16
to Bryan, David Days, Ujitha Perera, plots-gsoc

This is definitely true. Let's also keep in mind that writing good tests and maintaining clean and readable code and github issues can do a lot to grow a coding community, so let's invest in the future!

Thanks all!

Reply all
Reply to author
Forward
0 new messages