com_search vs. com_finder: Proposal for 3.x

152 views
Skip to first unread message

Hannes Papenberg

unread,
Jun 22, 2018, 4:04:40 PM6/22/18
to Joomla! CMS Development
Hi folks,
as some of you might have noticed, I've been busy with com_finder in the
last few days. I've solved quite a few issues in Smart Search, (language
stemming, artifical sharding, performance) and I plan to do quite a bit
more.

Since Joomla 2.5 we've shipped with both com_search and com_finder,
basically two "complete" site search implementations. "Complete" because
com_search is actually an awfull implementation with pretty bad search
results and problems with performance and com_finder has great search
results, but has several issues (stemmer, performance, usability) since
it was added to Joomla.

The changes that I contributed so far were all against the 4.0 branch
and in the last 2 days one PR stirred some controversy. Considering that
com_finder is a very good site searching implementation, I proposed to
drop com_search for 4.0. That was met with quite some resistance, mainly
with the argument that com_finder isn't widely adopted and that most
people still use com_search and that thus we can not drop that
component. I would like to start at that point and see how we can change
that.

I would like to propose to backport at least some of the changes that I
did to Joomla 3.x and to promote com_finder more. Proposed changes:

- Activate the finder plugin by default in new installations
- In sampledata use Smart Search as first/primary search solution
- Add a permanent notice in the com_search backend and in the menu items
that people maybe should have a look at com_finder
- Backport the anti-sharding PR (database changes)
- Add the tuplecount parameter
- Backport the stemmer changes

The first 3 would be promotion for com_finder, the next 2 are for
performance and the last one is for larger language support.

The downside would be, that the last three changes are changes that
could be considered breaks in backwards compatibility. The anti-sharding
PR changes the DB schema and rewrites the model entirely, the tuplecount
PR would be fairly okay, but the stemmer change would replace the whole
stemming feature in com_finder. On the other hand this wouldn't impact
any existing finder plugins or template overrides. I also wouldn't know
a reason why anybody would override any of the classes that this would
touch or why someone would recreate com_finder so far as to use the DB
tables that would be dropped. I'm not sure if this would be a break in
BC or not. I would be interested to hear your opinions on this.

If we would backport these changes and promote Smart
Search/Finder/com_finder a bit more, I'd say that we can drop com_search
in 4.0...

I'm also working on a tool to help develop finder plugins. More on that
when I got something to show.

Regards,
Hannes

Ole Ottosen (ot2sen)

unread,
Jun 22, 2018, 4:43:43 PM6/22/18
to joomla-...@googlegroups.com
Hi Hannes,

First of all a huge thanks for the great effort you put into making finder/smart search even better. It certainly steps up some level with the improvements you´ve added.

Secondly, this stirred up controversy has been made by only two people and a third not being in favour. In comparison, at least a tenfold of active people actually support the improvements.
So it is a good call to ask the wider community for their input.

Perhaps it would be even better to add more power to the 4.0 improvements than to try to backport. Let marketing do their thing when 4.0 is near ready to sell smart as the clever choice.
You could promote the idea of a single strong and powerful search engine in an article for upcoming July magazine to let people understand what they could have and why they should have it. Just an idea :)
If finder/smart search is good enough for joomla.org, it sure must be for its users. Rock on mate :)




--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joomla-dev-cms+unsubscribe@googlegroups.com.
To post to this group, send an email to joomla-...@googlegroups.com.
Visit this group at https://groups.google.com/group/joomla-dev-cms.
For more options, visit https://groups.google.com/d/optout.

leo lammerink

unread,
Jun 22, 2018, 10:51:01 PM6/22/18
to 'Hannes Papenberg' via Joomla! CMS Development
A huge thanks from me as well @Hannes. I like this suggestion very much
"If we would backport these changes and promote Smart
Search/Finder/com_finder a bit more, I'd say that we can drop com_search
in 4.0" and understand Ole's point of view. I am in favor of avoiding
backwards compatibility issues where possible.....Even in 4.0 we will
find dozens of 'search-modules/components' based on com_search or alike
so if users want to use similar extensions to com_search they still will
find their toy on the JED. I am much in favor of one search platform
(finder)

Keep on going Hannes!

Leo

Anibal

unread,
Jun 23, 2018, 5:14:02 AM6/23/18
to Joomla! CMS Development
Hi,

Thank you Hannes for your effort to improve the current search technology of Joomla.

I have been working extensively with com_search and com_finder. In my opinion, both approaches have their benefits. com_search is simple and it just works. com_finder is more complex, it supports full-text search and it requires a sysadmin aware of the benefits and able to manage an indexation strategy (take into account that the index puts an extra load on the database that must be supported by the host). Since Joomla remains oriented to satisfy both use cases in J4, it is better to keep both alternatives. 

Best Regards,

brian teeman

unread,
Jun 23, 2018, 7:11:38 AM6/23/18
to Joomla! CMS Development


On Saturday, 23 June 2018 10:14:02 UTC+1, Anibal wrote:
and it requires a sysadmin aware of the benefits and able to manage an indexation strategy (take into account that the index puts an extra load on the database that must be supported by the host). 

On what basis do you make this statement. Sounds like typical FUD to me 

Bakual

unread,
Jun 23, 2018, 7:43:00 AM6/23/18
to Joomla! CMS Development
As I am one of those who is opposed to remove com_search in 4.0, I want to add my arguments here as well. So people discussing here actually can make their opinion based on some facts.
I'm with Anibal here, both search engines have their use cases. They seem to do the same thing on first look, but if you look closer they perform the search fundamentally different. Both approach have their own pros and cons. Anibal already listed some. I'm going to add a few more:

Classic search is a "real time" search. Once you press enter, the plugins will perform several SQL queries against various tables and whatever the plugins additionally want to do to get the results matching the search term. The plugins can do whatever they want with that search term. This makes the search process highly flexible, but also less performant. Also it has no impact on the performance of saving an item since nothing is done at that point.
Smart Search on the other hand is an "indexed search", which means the plugins do their work when an item is saved, not when the search is performed. This means they impact the perfomance of saving an item a bit, but the search process is faster since it has only a few queries to run. Also it can support search suggestions and the search results tend to be better because it does its search over all extensions at once and can weight the results, not sequentiell like classic search.

Classic search due to its nature allows to do some things which aren't possible with Smart Search. When you perform a "real time" search, you can manipulate the search term before processing. I made an example I do in my extension where I look up the search term against translated values of an item property (translated bible book names). There is no way to do that kind of thing with Smart Search.
Also with com_search you could write a plugin which itself performs a search against an external search API and gives back that result to the Joomla Search. Eg if you integrate an extern shop using some sort of bridge you could have a plugin which searchs that extern shop and returns the results found in the Joomla shop.

Personally, I can say that Smart Search doesn't even work on my site. The indexing of the Kunena Forum postings does't come to an end, probably due to the amount of items or whatever. Classic Search on the other hand works without issues. I take the "worse" search results over "no results due to broken indexing" in that case :)

Imho the approach of promoting com_finder is a good idea. Smart Search in a lot of use cases will be a better choice and it's a shame its not used more.
However in my opinion we're already to late in the J3 cycle to reasonably drop com_search with 4.0, Even with backporting stuff to 3.10, the plan is it will be released the same time 4.0 comes out. So the transition period is not that long. Also there will be more issues with com_finder to come (eg indexing performance on large sites).

Imho a better plan is to promote it now, write better guides, improve the documentation, improve the module description (those are very bad) and whatnot. Then during 4.0 the adoption rate will raise for sure (since Finder IS better than Search in most areas). And if we see com_search is superfluous (which it isn't as of today), then we can drop it for 5.0.
Keep also in mind that maintaining com_search creates basically no work at all. It didn't get touched a lot the last years, and will likely not be touched the years to come.

So now go and make a educated opinion on the matter :)

Anibal

unread,
Jun 23, 2018, 10:28:32 AM6/23/18
to Joomla! CMS Development
Hi Brian,

As Bakual pointed out for the com_finder case, the number of items is the key when the indexation becomes a heavy task. The index structure doesn't grow linearly since it stores the results for each term before searching for them, so it takes time to generate it. 

As an example, I have been integrating SobiPro search for a long time. SP comes with a native search, and I have developed com_search and com_finder plugins to integrate it with the site global search. Since the finder plugin indexes a complex structure of data, after many optimizations, for a medium-sized catalogue of 2,000 items, it takes 5' to index it in 46 MB of finder tables. To keep the index updated, the integration requires to maintain the index incrementally, for each entry change, and run a Cron job nightly (a process while the index is offline being regenerated).

To sum up, the com_finder is a great technology implementing full-text search on a SQL database, but it is not for everyone. In a complex scenario, with several extensions and many items to be indexed, it requires a good knowledge of how it works internally to optimize it. On the other hand, com_search runs several independent SQL queries to generate the search results, so it returns what the database finds and it doesn't get the user into an immediate problem. 

Talking about the future, I have become convinced that full-text search can't be implemented on a SQL database, and the use case is better implemented on a NoSQL database. On this line of work, Algolia or ElasticSearch are better solutions for the case.


Best Regards,

Michael Babker

unread,
Jun 23, 2018, 11:15:58 AM6/23/18
to joomla-...@googlegroups.com
On Sat, Jun 23, 2018 at 9:28 AM Anibal <anibal....@gmail.com> wrote:
Talking about the future, I have become convinced that full-text search can't be implemented on a SQL database, and the use case is better implemented on a NoSQL database. On this line of work, Algolia or ElasticSearch are better solutions for the case.

Well imagine that.  A platform built to a specific use case is a better option than a general use platform which can be manipulated to support a use case.  Who woulda thought that the built-to-use platform would be better than a general purpose platform?

Honestly, this discussion is going to go nowhere.  Same as the existing discussion on GitHub.  It's not much different than sitting users of SobiPro and Fabrik down to decide which one to use going forward, or K2 versus EasyBlog, or any "competing" extensions which fill the same general use case with different implementations.  People are going to argue their preferred stance until they're blue in the face and no progress is going to be made in figuring out an answer.  We screwed up in 2011 by adding Smart Search to core without an immediate (executable) plan on removing "old" Search, and because this project strongly favors the status quo unless there is a good reason to change the fact of the matter is it's going to be damn near impossible to get a consensus on a plan to do anything but keep both search platforms in the core package, regardless of however many pros or cons or edge cases people can come up with to demonstrate why their preferred platform is better.

A solution of removing one of the search packages from core was proposed on GitHub.  It too has essentially been shot down because of an argument of a package not being in the core distribution won't be supported by the extension ecosystem, meaning extension developers won't build Search (or Smart Search) integrations simply because it's an opt-in extension.  I call that FUD and a terrible argument personally; developers already don't support one of (or both) search components, and developers already commonly build integrations for opt-in extensions (how many developers have created integrations for Akeeba Backup when it had its restore point feature, or sh404SEF with its plugins, etc.?).

Personally, I think com_search should be moved out of core and kept as a core supported package as install from web and weblinks have been in the past (minus the part where the project teams charged with their maintenance abandon said maintenance task; and for those reading this the download numbers clearly indicate end users are still consuming those extensions so we as a project have failed miserably in keeping up with our code here).  I think com_finder should be the preferred search platform and the only one bundled in 4.0.  I think the "database grows really big" argument is FUD, even if using a proper search platform with an indexed data structure it too is going to have its own large database (truth is any indexed search has an exponentially bigger database than com_search which we have established has a 0 byte requirement beyond what is already in your database).

brian teeman

unread,
Jun 23, 2018, 1:13:00 PM6/23/18
to Joomla! CMS Development
OT but the weblinks download stats dont show usage. They just show the number of sites that upgrade from 2.x and have religiously done the update when notified - perhaps without even considering if it is used and so can be uninstalled

Michael Babker

unread,
Jun 23, 2018, 1:24:08 PM6/23/18
to joomla-...@googlegroups.com
94% of stats are made up and 100% of stats can be manipulated to tell whatever story you like.  Unfortunately I don’t have the time (or accurate enough data most likely) to work out how many downloads are by way of sites being updated and how many are new installations after decoupling.

On Sat, Jun 23, 2018 at 12:13 PM 'brian teeman' via Joomla! CMS Development <joomla-...@googlegroups.com> wrote:
OT but the weblinks download stats dont show usage. They just show the number of sites that upgrade from 2.x and have religiously done the update when notified - perhaps without even considering if it is used and so can be uninstalled

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joomla-dev-cm...@googlegroups.com.
To post to this group, send email to joomla-...@googlegroups.com.
--
- Michael Please pardon any errors, this message was sent from my iPhone.

Beat

unread,
Jun 26, 2018, 7:52:57 PM6/26/18
to Joomla! CMS Development
First of all, a BIG Thank You Hannes for your hard work to improve com_finder and for trying to move Joomla search to something better.

Would there be a way to offer the best of both worlds to websites ?

Most websites have only one searchbox.

In Joomla's case, the site admin must make a though choice, depending of his use-case, website and extensions used.

Both methods offer advantages and disadvantages, depending on extension.

E.g. for content, the new finder advanced search method is probably almost always better. For some other extensions, the old simple search method makes just more sense.

Even have a marketable name suggestion: "Unified Search".

That could comprise simple search (for activated plugins only), advanced search (for activated plugins only) and let's try to imagine extensibility and general API: possibly third party search methods.

By default, in new 4.0 installations, for core stuff, we could activate advance search for core extensions that are perfectly ready and by default use unified search in frontend.

That's just an idea. I have not done any research on that, nor verified if someone already proposed it . So, sorry if that's the case.

Best regards to all and kudos for all the cool advances and great work on all fronts!
Beat

Message has been deleted

Wilko Rietveld

unread,
Jun 30, 2018, 3:55:40 AM6/30/18
to Joomla! CMS Development
+1
Reply all
Reply to author
Forward
0 new messages