Multiple Caches and replace document

3 views
Skip to first unread message

Bruno Rezende

unread,
Feb 17, 2011, 8:16:44 AM2/17/11
to xappy-discuss
Hi,

suppose I have a index and have multiple caches that can be applied to
it. The cache that will be used will be chosen at search time. In this
scenario, how would incremental indexing be affected? I'm looking
IndexerConnection.replace method and have seen this:


if self._index.get_metadata('_xappy_hascache'):
if store_only:
# Remove any cached items from the cache - the document
is no
# longer wanted in search results.
self._remove_cached_items(id, xapid)
else:
# Copy any cached query items over to the new document.
olddoc, olddocid = self._get_xapdoc(id, xapid)
if olddoc is not None:
for value in olddoc.values():
if value.num < self._cache_manager_slot_start:
continue
xapdoc.add_value(value.num, value.value)


we can remove documents from our multiple caches, but I don't know
what should I do when the document is modified.

Richard Boulton

unread,
Feb 17, 2011, 8:35:51 AM2/17/11
to xappy-...@googlegroups.com, Bruno Rezende
On 17 February 2011 13:16, Bruno Rezende <brunovia...@gmail.com> wrote:
> suppose I have a index and have multiple caches that can be applied to
> it. The cache that will be used will be chosen at search time. In this
> scenario, how would incremental indexing be affected? I'm looking
> IndexerConnection.replace method and have seen this:

Currently, you can't apply multiple caches to an index; so I assume
you're thinking of patching xappy to do this. I think it should be
quite possible to do; you'll need to cause xappy to allocate a
separate set of value slots for each cache (currently, it's just
hardcoded to allocate slots for the cache based on the query id +
IndexerConnection._cache_manager_slot_start)

The current xappy policy is not to update caches when a document is
modified, since there's no way to know where the newly modified
document should be placed in the cached order. Xappy only updates the
cache when a document is deleted (or changed to be marked as
store_only). I don't see why this should be different with multiple
caches - you'll just need to remove the document from each cache.

Perhaps I'm missing what the question is here?

--
Celestial Navigation Limited, incorporated in England & Wales
(registration number 06978117), registered office address: 58
Kingsway, Duxford, Cambridgeshire, CB224QN, UK.

Bruno Rezende

unread,
Feb 17, 2011, 10:24:21 AM2/17/11
to Richard Boulton, xappy-...@googlegroups.com
Hi,

On Thu, Feb 17, 2011 at 11:35 AM, Richard Boulton <ric...@tartarus.org> wrote:
> On 17 February 2011 13:16, Bruno Rezende <brunovia...@gmail.com> wrote:
>> suppose I have a index and have multiple caches that can be applied to
>> it. The cache that will be used will be chosen at search time. In this
>> scenario, how would incremental indexing be affected? I'm looking
>> IndexerConnection.replace method and have seen this:
>
> Currently, you can't apply multiple caches to an index; so I assume
> you're thinking of patching xappy to do this.  I think it should be
> quite possible to do; you'll need to cause xappy to allocate a
> separate set of value slots for each cache (currently, it's just
> hardcoded to allocate slots for the cache based on the query id +
> IndexerConnection._cache_manager_slot_start)
>

hum... I was thinking in doing something like:

sconn = xappy.SearchConnection(path)
cache_id = _get_cache_id(request)
cache_path = _get_cache_path(cache_id)
cachemanager = xappy.cachemanager.XapianCacheManager(cache_path)
sconn.set_cache_manager(cachemanager)
... do the search and get results from cache ...

I think this should work, right? If I have 10 different cache ids,
would it work? or should I need to apply the 10 caches to the index if
I want to use them for search?

> The current xappy policy is not to update caches when a document is
> modified, since there's no way to know where the newly modified
> document should be placed in the cached order.  Xappy only updates the
> cache when a document is deleted (or changed to be marked as
> store_only).  I don't see why this should be different with multiple
> caches - you'll just need to remove the document from each cache.
>

the policy wouldn't be different. My question is if I need to do
anything special on each cache.

> Perhaps I'm missing what the question is here?

nope, probably I'm missing how cache works :-). I thought that
applying a cache to an index would mean that the cache would be copied
to the index and applying would be just a convenience, since caches
wouldn't be required to be applied to be used. But, then I read that
code in indexerconnection.replace and I don't know what it is supposed
to do. More specifically, I don't know what this code does:

# Copy any cached query items over to the new document.
olddoc, olddocid = self._get_xapdoc(id, xapid)
if olddoc is not None:
for value in olddoc.values():
if value.num < self._cache_manager_slot_start:
continue
xapdoc.add_value(value.num, value.value)

--
Bruno

Richard Boulton

unread,
Feb 17, 2011, 11:11:03 AM2/17/11
to xappy-...@googlegroups.com, Bruno Rezende
On 17 February 2011 15:24, Bruno Rezende <brunovia...@gmail.com> wrote:
> sconn = xappy.SearchConnection(path)
> cache_id = _get_cache_id(request)
> cache_path = _get_cache_path(cache_id)
> cachemanager = xappy.cachemanager.XapianCacheManager(cache_path)
> sconn.set_cache_manager(cachemanager)
> ... do the search and get results from cache ...
>
> I think this should work, right? If I have 10 different cache ids,
> would it work? or should I need to apply the 10 caches to the index if
> I want to use them for search?

Yes, it should work. If you don't apply the cache, it will be stored
in a separate index (so there are issues about keeping the cache in
sync with the main index when updating or replicating to worry about),
but the

> the policy wouldn't be different. My question is if I need to do
> anything special on each cache.

If the cache isn't "applied", it won't be notified of document
removals, so you'll have to handle those yourself. Other than that,
nothing special needs to be done.

> applying a cache to an index would mean that the cache would be copied
> to the index and applying would be just a convenience, since caches
> wouldn't be required to be applied to be used.

Correct.

> But, then I read that
> code in indexerconnection.replace and I don't know what it is supposed
> to do. More specifically, I don't know what this code does:

This is only needed when the cache has been applied. When a cache is
applied to an index, all the documents which are mentioned in a cache
have values added to them. Each cached query id corresponds to a
slot, and the value stored is the position at which that document
should be returned for the query. These values are used to return the
appropriate cached documents in the result of a search.

If a document is reindexed, the incoming document won't have those
values stored in it, so the code you quote copies the old cache values
into it. If it didn't do this, the document would no longer appear in
cached search results in the appropriate position.

If the cache isn't applied, the cache values aren't stored in the main
index, so replacing the document in the main index will not cause any
problem.

Reply all
Reply to author
Forward
0 new messages