Keyword Index Maintenance

96 views
Skip to first unread message

paulh

unread,
Oct 19, 2008, 11:52:19 PM10/19/08
to ResourceSpace
We have been getting some odd search results. After a bit of digging
I found that the resources in question had some unexpected keywords
associated with them. From what I can see from the code, even if I
update the fields in question, these keywords will not be deleted....
This looks like a bug.

My example:

I have a resource with an 'original filename' value of:
genovania(_daughter_of_amelia).jpg

This causes the following keywords to be written to the database:
'genovania' , 'daughter' , 'amelia' , 'jpg' (of is skipped)

But if I query the keywords table with:

SELECT r.resource, r.keyword, r.position, r.resource_type_field,
k.keyword
FROM resource_keyword r, keyword k
WHERE r.keyword = k.ref
AND r.resource = 15078
AND resource_type_field = 51
ORDER BY r.resource_type_field, r.position;

I find the following additional keywords: 'gspshared' , 'photos' ,
'simpson' , 'wajir' , 'selection'
(these are the keywords that are spoiling the search function)

How they got there I do not know ... a bug somewhere else in the code
presumably.

Now let's say I edit the 'original filename' field to have a value of:
genovania(_daughter_of_amelia).jpgx

The code knows what the value was before and after the edit, and it
uses the "before value" to delete the current keywords. It therefore
does not pick up the supplementary values. Would it not be better to
just DELETE ALL KEYWORDS associated to the 'original filename' field?

I hope that is clear enough a description of the problem.

Thanks
Paul

ps. This is sort of related to another post I made about "rebuilding
keyword indexes". With this discovery of slippage between the real
field values and the keywords table wouldn't it be great if we could
run a "rebuild keyword indices" algorithm to ensure that the data is
correct / clean ?

Dan Huby

unread,
Oct 29, 2008, 6:42:43 PM10/29/08
to ResourceSpace

I've found a bug that accounts for this. The function update_field(),
which is used by exiftool integration and some other areas, indexed
fields regardless of the index setting on the field. This is fixed in
r536.

Regarding rebuilding keyword indexes - see /pages/tools/reindex.php

paulh

unread,
Nov 9, 2008, 8:11:12 PM11/9/08
to ResourceSpace
Pedantic note:

The log trace from this page ran from (0/15039) to (15038/15039).

That should presumably be from 1/ to 15039/

:)

Dan Huby

unread,
Nov 10, 2008, 3:44:17 AM11/10/08
to ResourceSpace


On 10 Nov, 01:11, paulh <paul.c.hun...@googlemail.com> wrote:
> Pedantic note:
>
> The log trace from this page ran from (0/15039) to (15038/15039).
>
> That should presumably be from 1/ to 15039/

Sorry - log trace? page?

Dan

Dan Huby

unread,
Nov 10, 2008, 3:47:46 AM11/10/08
to ResourceSpace


On 10 Nov, 01:11, paulh <paul.c.hun...@googlemail.com> wrote:
> Pedantic note:
>
> The log trace from this page ran from (0/15039) to (15038/15039).
>
> That should presumably be from 1/ to 15039/

Ah. Sorry. Bit slow this morning! I didn't realise you were talking
about /pages/tools/reindex.php - I thought it was in reference to the
keywords issue.

Yes, it should show $n+1 instead of $n.

Dan
Reply all
Reply to author
Forward
0 new messages