Unable to figure out ProcessedDocument's add_term method

2 views
Skip to first unread message

Dominic LoBue

unread,
Jul 28, 2009, 1:05:28 AM7/28/09
to xappy-discuss
Hello,

I'm trying to update a ProcessedDocument with new/changed information,
but as yet I've been unable to make any changes I make searchable.

To elaborate, I'm indexing my email, then iterating over the index
records and putting the documents through an email threader and
assigning each thread a UUID. Up to this point I'm doing fine and got
it all under control.

Where I keep getting tripped up is when I then attempt to add the
thread UUID as a stored and searchable term to all of the members of
that particular thread.

When I use the add_term method, nothing happens. When I do
processeddoc.data['thread'] = [threadid] and then use the replace
method to update the record, the new field shows up in the visible
records, but isn't searchable.

Here's the code I'm using to thread and then assign the thread UUID to
members:
******************************
class threadIndexer(object):
def __init__(self):
xconn = xappy.IndexerConnection('xap.idx')
xconn.add_field_action('thread',
xappy.FieldActions.INDEX_EXACT)
xconn.add_field_action('thread',
xappy.FieldActions.STORE_CONTENT)
xconn.add_field_action('thread', xappy.FieldActions.SORTABLE,
type='string')
self.writer = xconn
self.writer.set_max_mem_use(max_mem=256*1024*1024)

import lazythread
self.thread = lazythread.lazy_thread()

def test_ids(self):
for id in self.writer.iterids():
print id

def start(self):
__r = (self.writer.get_document(x).data for x in
self.writer.iterids())
self.thread.thread(__r)

for thread in self.thread:
threadid = uuid4().hex
for msg in thread.messages:
try: __id = msg['muuid']
except: __id = msg[1]['muuid']
if type(__id) is list: __id = __id[0]
try: __doc = self.writer.get_document(__id)
except KeyError:
print "wtf, couldn't find msg %s... try later" %
__id
continue
__doc.add_term('thread', threadid)
#__doc.data['thread'] = [threadid]
#__doc.fields.append(xappy.Field('thread', threadid))
__doc.prepare()
#print __doc.data
#self.writer.replace(__doc)

self.writer.flush()
self.writer.close()
***********

Can anybody tell me what I'm doing wrong?

Thanks
Dominic

Richard Boulton

unread,
Jul 28, 2009, 3:18:30 AM7/28/09
to xappy-discuss
On Jul 28, 6:05 am, Dominic LoBue <dom.lo...@gmail.com> wrote:
>
> Can anybody tell me what I'm doing wrong?
>

self.writer.replace(__doc) is commented out.

Other than that, it looks about right - if that's not the problem,
perhaps you're not forming the searches for the threadid properly -
post an example of how you're doing a search, and I'll take a look at
that.

--
Richard

Dominic LoBue

unread,
Jul 28, 2009, 4:43:39 AM7/28/09
to xappy-...@googlegroups.com
Ah, there we go.

so the problem was when I used just add_term that's all it did - add
the term to the db. It didn't store the string and show it in the
results like I expected it to. So when I tried setting the values
equal to each other, it the value was stored as a string but wasn't
searchable. And since the UUID is 100% random, what was searchable and
what was showing up in the results were two different things.

In short, here's the solution:
*******************************8
__doc.data['thread'] = [threadid]
__doc.prepare()
self.writer.replace(__doc)

self.writer.flush()
self.writer.close()
**************************

The add_field_actions are apparently useless on already processed documents :(

Dominic
Reply all
Reply to author
Forward
0 new messages