> To conclude it isn't montezuma's bug but cl-ppcre.
We should report it to the cl-ppcre guys then.
> issue 3, broken :must-not-occur or phrase query
>
> I found query "html-template !\"edi weitz\"" is OK in my test corpus,
> but if I tried query "html-template !edi !test", it tell me:
> "Invalid initialization argument: SCORER in call for class #<STANDARD-
> CLASS DISJUNCTION-SUM-SCORER>.".
>
> It is obvious that there is not slot named scorer in disjunction-sum-
> scorer class, it maybe a typo, scorer should be substituted by sub-
> scorers. I modified it at line 199 of boolean-scorer.lisp, It is OK
> and all unit test are passed too.
Excellent, can you send a patch in diff format?
Leslie
Thanks, but there are still minor problems:
* you have named the test case and files "typo". Why?
That does not seem to be an accurate description of what it does.
* the new file needs to be added to the list of test files
in the ASD.
* please send me a full Git-style patch so I can just say
patch < fix-subscorer.diff
I think "svn diff" will generate a proper patch.
> It seems I can not new post in cl-ppcre google group and I post my
> report into cl-ppcre mail list and looking forward to they response.
I have also answered there.
Independently from that, how about adding a custom tokenizer
that uses a state machine to do the work?
Leslie
> I use typo for I suppose I fix a typo problem.
A typo problem would be a missing character or two,
most frequently a wrong or missing paren.
How about boolean-subscorer.diff for the patch name?
As for the test name I suppose it should be integrated
into some other existing part of the test suite, but
I don't have time to look for this part right now.
Maybe you will find one. If not decide on a good name
that takes into accout the larger structure (e.g.
scorer.lisp)
> but I truly use svn diff to produce the diff file. I have used svn
> checkout a read only version for I'm not a developer of it.
Ah, perhaps you forgot to "svn add" the new file before generating
the diff?
>> Independently from that, how about adding a custom tokenizer
>> that uses a state machine to do the work?
I think you forgot to answer that question...
Leslie
> I think I'm beginning to understand you mean.
> 1. modify boolean-scorer.lisp.
> 2. add a test case named tc-boolean-subscorer.lisp into tests/unit/ search/
> 3. add tc-boolean-subscorer.lisp into montezuma.asd
> 4. svn diff > boolean-subscorer.diff
> Is it correct? :)
Yes, very good! But don't forget "svn add" before diffing.
> Do you mean we should divide the complex regular expression
> into many parts, and one part is a state, then conceive a
> state machine algorithm to do tokenizer work?
Yes. But maybe it will be enough to rewrite the tokenizer
regexp from scratch with proper documentation.
> The _xxx@xxx regular expression should be reserved?
I think we should get rid of it entirely.
> I have uploaded the diff:
> http://montezuma-dev.googlegroups.com/web/boolean-subscorer.diff?gda=-BIZKUgAAAAHcebzgL8O7LT9-wIq0tDqRi_4IpZrQFloDlBWJ2R4QVHkLHU_ZGJOFr8BwKYhqN1ldlWJbCLdb1KzwG_kFsLMGjVgdwNi-BwrUzBGT2hOzg
>
> Please check it.
Almost: the fixture name still says 'typo'.
+(deftestfixture typo
> Besides, I have found the cause of issue1. the bug occur when scorer
> score result documents.To go further, it is the phrase scorer respond
> to phrase term of second boolean clause. Its weight is not normalized.
> We regard the second boolean clause is prohibited so not to normalize
> it.
So why exactly do we get a NIL instead of a proper weight (which
should be an integer I guess)? 1 seems to be a sensible default
assuming the weights are multiplied with each other.
In any case we either need to assign a default weight or make
the phrase scorer handle NIL weights properly.
>> Applied to latest trunk and issue closed. Thank you very much!
> But I have mentioned before it is not a fix for issue1, although at
> the beginning I suppose it was.
Sorry, I got that wrong. Issue re-opened.
> I have studied the scorer algorithm and think there are one solution
> for this issue:
>
> It regards the second clause as prohibited for it is must-not-occur,
> so it does not
> compute its weight for efficient, however later it try to score result
> doc with it for
> interim predicate. We can prevent this issue by compute its weight
> anyway.
>
> How do you think about it?
It seems to me that prohibited clauses shouldn't be weightless
as they're just a negation of a weighted clause.
Is there a good way to get a sensible weight for a prohibited
clause?
Leslie
> (defmethod normalize-weight ((self boolean-weight) norm)
> (let ((query (query self)))
> (setf norm (* norm (boost query)))
> (dosequence (weight (weights self) :index i)
> (let ((clause (elt (clauses query) i)))
> (unless (prohibited? clause)
> (normalize-weight weight norm))))))
>
> It doesn't compute the weight when it find the clause is prohibited,
> so we can invalid this condition.
Do the tests still pass then? Does the weight make sense just by
removing that conditional?
>> Do the tests still pass then? Does the weight make sense just by
>> removing that conditional?
>
> Yes, none of 2358 tests failed.
>
> The weight must be used, it is why we have this issue and I believe
> this trick is only used for efficient.
Okay great, thanks a lot for checking this out. :)
Let's add a proper test case for this then and commit it.
Leslie
> I have upload it and for I can not check in code so this patch
> contains the previous one.
You need to 'svn revert' your old working directory changes, then
do 'svn update' to get them from the repository. After that you
can work from a clean slate.
> I only open a remark test case for issue1, but modify it expectation
> result.
Can you rephrase that sentence?
> There is a test case for issue1 already but it was disable by someone.
> however I suppose its expectation result is wrong, for its query
> require word should occur and must not occur, so the result should be
> nil.
Ah. Do you want to include that in your patch, or do you want me
to take a look at it? In the latter case you need to give
me the name and file of the test.
Thanks a lot!
Leslie