have just done a quick commit that includes a crude search to let you
start doing some analysis of the likely results. Have a look at:
https://github.com/mediastandardstrust/superfastmatch/blob/master/TODO
to see what is coming next! The key missing bits are the association
task and the associations page which will mean that you don't have to
do any manual searches and can review all the results in aggregate
form.
Let me know how you get on!
Cheers,
Donny.
Also, this TODO list looks great, I was actually going to ask about
JSON responses as I have a rudimentary Python client in the works that
I'm finding useful for test purposes and that'll be nice to have.
-James
I'm loading all of the AZ bills that I have, and then running search
with a model bill that I know exists, the results I'm getting are
short strings like "pursuant to the subsection" but I'm not getting
the entire paragraphs that clearly appear in both documents. I've
attached AZ SB 1070 and the relevant model bill for you to look at, as
you can see entire paragraphs of the model bill (7K5..) appear in
1070. If you could look at why the match algorithm might be missing
these large overlaps that'd be helpful so we can tune our approach
accordingly.
-James
I pulled the changes and did a make clean and then tried rebuilding
and rerunning with the same test data I had been using.
Unfortunately I'm now getting segfaults during document load,
generally 500-1000 documents in. I hadn't seen this happen before and
I'd loaded tens of thousands of documents. I'm pretty reliably
getting a segfault now, but not on any particular document.
I'm afraid there isn't that much info I can give besides that right
now, let me know if there is a way I can help debug if you can't
reproduce it.
2011-08-03T09:18:46.545504-05:00: [INFO]: Queued document:
Document(1,744) for indexing queue id:744 Response Time: 0.0000 secs
2011-08-03T09:18:46.545521-05:00: [INFO]: (127.0.0.1:41350): PUT
/document/1/744/ HTTP/1.1: 202
Segmentation fault
make: *** [run] Error 139
-James