Yes, the trac search facilities are good, but sometimes not good
enough. Sometimes one likes to search "the whole thing", e.g.
including PDFs in the SVN trunk etc. I'm not sure, if whoosh
addresses this problem.
Searching content in the repository is addressed by the RepoSearch
plugin on trac-hacks, if I'm right.
http://trac-hacks.org/wiki/RepoSearchPlugin
Looking for content inside non-text file like a .pdf would require an
additional extraction/analyze step.
Also, I don't know if the plugin allows for searching the path names,
useful for locating some source file you have no idea in which
subproject or branch it is ;-)
-- Christian
I could have a go at reviving it.
-- Christian
Sure, fire away, I've given you SVN write. I'd suggest switching to Woosh.
It would be nice if smb. with write privileges could also provide some
love for http://trac-hacks.org/wiki/SearchAttachmentsPlugin There is
a fixed issue http://trac.edgewall.org/ticket/7978 that makes it
unnecessary to custom patch Trac anymore and there are some tickets in
tracker for review.
--
--anatoly t.
If so, another thing I would suggest working on is a better way to
work on indexing in the background. I tried using this plugin some
time ago, but ran into this problem. If I started the indexing
process on an existing project with a huge repository, it would take
hours upon hours upon hours to do the initial indexing. If I stopped
the indexer it wouldn't continue where it left off, and instead only
index new revisions.
I've thought of having a go at making that situation easier to deal
with, but it just hasn't been high-enough priority.
Chris Mulligan wrote:
> I thought I should follow up on this ticket.
>
> I've implemented something fairly similar to the Alec's Advanced
> Search suggestion - each component listens to its own ChangeListeners
> and indexes the documents at that time. I've only implemented the wiki
> and ticket systems so far, but that's because I just recently became
> comfortable with the approach I'm using.
Any visible code somewhere?
Also, for the changeset and source indexing, I'd suggest that you "wait"
for the changeset notification system that will be implemented on the
multirepos branch, instead of hacking the commit hook or the current
sync method directly (see http://trac.edgewall.org/changeset/7961 and
#7723).
> Right now it (re)creates the full index when you run "trac-admin env
> upgrade," or duing the intial setup.
A dedicated trac-admin command should be better - search resync / sync
(similar to repository resync / sync, in the multirepos branch, see
above changeset and http://trac.edgewall.org/changeset/7965).
> I'm doing my testing on a copy of a real internal trac install of ~600
> wiki page; 1700 ticket. I indexed a on a vmware workstation running on
> my desktop in about 4 minutes, but I haven't considered performance at
> all yet and it was CPU limited. The index is 5.1MB (sqlite db is
> 69MB), so disk space is basically not an issue.
Great!
> I think by using straight SQL for that, instead of moving around
> Ticket and Wiki objects will yield significantly superior performance.
> I haven't noticed a change in commit times when changing pages/tickets.
That might be different for repository source file indexing, though.
We'll see.
-- Christian
Also, for the changeset and source indexing, I'd suggest that you "wait"
for the changeset notification system that will be implemented on the
multirepos branch, instead of hacking the commit hook or the current
sync method directly (see http://trac.edgewall.org/changeset/7961 and
#7723).
A dedicated trac-admin command should be better - search resync / sync
(similar to repository resync / sync, in the multirepos branch, see
above changeset and http://trac.edgewall.org/changeset/7965).
Looks like a good start, once I get the full code I'll try it!
However, there are already some remarks I want to do based on this first
patch:
- I don't think it's a good idea to make whoosh a mandatory
requirement, it should stay optional
- Therefore the current db select/like based search code should be kept
somewhere, probably refactored as the fallback search backend
Of course, when whoosh is available, the WhooshSearchSystem should
certainly become the default ISearchBackend. A ISearchBackend interface
could have the methods you gave to the public API of SearchSystem in
your patch, plus a search(query) method and probably also a
search_syntax_help() method for describing the search query syntax.
Besides, I don't know if you had already taken a look at the
SearchRefactoring page, but at least one idea which I think is worth
reusing from there is the relative ranking of fields (e.g. a match in
the "keywords" fields is worth n times a match in the description, for
example). Is it possible to do this with whoosh?
-- Christian
> Looks like a good start, once I get the full code I'll try it!
Thanks. Here's the full diff I should have had earlier: http://trac.edgewall.org/attachment/wiki/AdvancedSearch/trac_whoosh_integration_20090323.diff
. Someone can delete the earlier one (20090321c.diff).
> However, there are already some remarks I want to do based on this
> first
> patch:
> - I don't think it's a good idea to make whoosh a mandatory
> requirement, it should stay optional
> - Therefore the current db select/like based search code should be
> kept
> somewhere, probably refactored as the fallback search backend
>
> Of course, when whoosh is available, the WhooshSearchSystem should
> certainly become the default ISearchBackend. A ISearchBackend
> interface
> could have the methods you gave to the public API of SearchSystem in
> your patch, plus a search(query) method and probably also a
> search_syntax_help() method for describing the search query syntax.
I understand your concerns here. I, personally, disagree. I think that
the concerns about dependencies are valid, but that the wins out weigh
the losses. I also think that a reasonable python only dependency
that's worked fine for me in a number of odd environments isn't a big
deal. The cost (in increased lines of code, complexity, decreased
performance and difficulty in future work) is real, and every
individual decision to add, rather than replace, is actually hurting
the project in the long run. That's a bit of a tangent though, sorry!
Suffice it to say that I want better search in Trac, and I don't (yet)
see a strong reason to not switch.
> Besides, I don't know if you had already taken a look at the
> SearchRefactoring page, but at least one idea which I think is worth
> reusing from there is the relative ranking of fields (e.g. a match in
> the "keywords" fields is worth n times a match in the description, for
> example). Is it possible to do this with whoosh?
This is possible, and it already does some of that by default. I
haven't tried to tweak this yet, just been focused on getting it
working.
Thanks again,
Chris
Ok, got it working. It searches ;-)
>> However, there are already some remarks I want to do based on this
>> first
>> patch:
>> - I don't think it's a good idea to make whoosh a mandatory
>> requirement, it should stay optional
>> - Therefore the current db select/like based search code should be
>> kept
>> somewhere, probably refactored as the fallback search backend
>>
>> Of course, when whoosh is available, the WhooshSearchSystem should
>> certainly become the default ISearchBackend. A ISearchBackend
>> interface
>> could have the methods you gave to the public API of SearchSystem in
>> your patch, plus a search(query) method and probably also a
>> search_syntax_help() method for describing the search query syntax.
>>
>
> I understand your concerns here. I, personally, disagree. I think that
> the concerns about dependencies are valid, but that the wins out weigh
> the losses. I also think that a reasonable python only dependency
> that's worked fine for me in a number of odd environments isn't a big
> deal. The cost (in increased lines of code, complexity, decreased
> performance and difficulty in future work) is real, and every
> individual decision to add, rather than replace, is actually hurting
> the project in the long run.
I think this doesn't need to be all or nothing, here. One scenario could
be that we introduce the whoosh search support in say 0.12, and if all
goes well, meaning everybody is happy with the new dependency, we drop
the db fallback search in 0.13. So we replace, but without disruption.
Starting the way you did by replacing the db search with the new whoosh
search is fine by me. Once it has stabilized, we can make the search
system modular the way I suggested above and re-add the db based search
as a fallback, before integrating in trunk. Er, by the way, are you
interested by a sandbox branch?
> That's a bit of a tangent though, sorry!
> Suffice it to say that I want better search in Trac, and I don't (yet)
> see a strong reason to not switch.
>
Did those "odd environments" include Windows? ;-)
I saw some issues with the index locking:
WindowsError: [Error 183] Cannot create a file when that file already
exists:
...n-search\\trac\\search_index\\_MAIN_LOCK'
That was during my trials to get the tests running, but nevertheless
this could be indicative of more fundamental issues. Trac is a somehow
complex application with both multithreading and multiprocessing going
on, so things like locking and dead locks might be sensitive. Something
to keep an eye on it...
>> Besides, I don't know if you had already taken a look at the
>> SearchRefactoring page, but at least one idea which I think is worth
>> reusing from there is the relative ranking of fields (e.g. a match in
>> the "keywords" fields is worth n times a match in the description, for
>> example). Is it possible to do this with whoosh?
>>
>
> This is possible, and it already does some of that by default. I
> haven't tried to tweak this yet, just been focused on getting it
> working.
>
Another thing is that whoosh presents the search results by order of
relevance (I suppose).That's fine but it would be nice to also support
ordering by most recent first, again a kind of "backward compatible" mode.
-- Christian
So what about starting a whoosh-search branch, is anyone else interested
besides me?
-- Christian
/me makes his usual argument that this should be done as a plugin first and
only talk about doing it in core if people find the plugin useful, but lets
face it, does anyone listen to me when I say that anymore?
--Noah
Listen? Yes, certainly.
Agree? Not necessarily. :-)
-- Remy