Fwd: [ica-atom-users] Re: Index and search after restoring backup

18 views
Skip to first unread message

peterVG

unread,
Nov 10, 2009, 6:35:23 PM11/10/09
to Qubit Toolkit Developers
Hi Vincent,

This sounds like interesting research. Please post any follow-up to
the list, I'd love to learn more about your findings.

Please note that we are using Zend Lucene as the search engine for ICA-
AtoM. This is a PHP port of the Apache Lucene engine. The index files
which it creates can be swapped between either.

Please note that nearly all the ISAD(G) fields are being indexed so
you should be getting search hits on those. Here is a link to the
SearchIndex class which creates the Lucene index document.
http://code.google.com/p/qubit-toolkit/source/browse/trunk/apps/qubit/lib/SearchIndex.class.php

You can see all the fields that are included, starting on line 197.
Please note also that we are adding search boosts to Identifier (line
230), Title (line 236) , Creator (line 257), Subject (line 303),
Places (line 309) and Names (line 315).

Cheers,

--peterVG



-------- Original Message --------
Subject: [ica-atom-users] Re: Index and search after restoring backup
Date: Tue, 10 Nov 2009 15:20:05 -0800 (PST)
From: Vincent <jan...@xs4all.nl>
Reply-To: ica-ato...@googlegroups.com
To: ICA-AtoM Users <ica-ato...@googlegroups.com>
References: <168ed09f-cf5c-4a08-9bbd-
bc107c...@j19g2000yqk.googlegroups.com>
<b66358430911061308w4dc...@mail.gmail.com>


Hi Jesús

On 6 nov, 22:08, Jesús García Crespo <cor...@sevein.com> wrote:
>
> I think that "Options not allowed in .htaccess files" message has nothing to
> do with this problem, although this is a known issue and Jack is working on
> it for 1.1 release.
>

I had already been searching for some information about htaccess (and
it looks interesting), but I will forget that for now. Thanks!

> > When I create a backup from the first application with a database ica1
> > using a command like "$MySQL/bin/mysqldump -Q -uroot -pmysqlpass -
> > hlocalhost ica1 > backup.sql", transfer the data to a second
> > application with a database ica2 using a command like "$MySQL/bin/
> > mysql -uroot -pmysqlpass -hlocalhost ica2 < backup.sql" and execute
> > "php symfony search:populate QubitSearch" in the directory of the
> > latter application... I don't find any results when searching with
> > terms that generate results in the first application.
>
> I followed all the steps you described but I couldn't reproduce this problem
> here.
>
> However, when I run "php symfony search:populate QubitSearch" in the second
> installation, after sql import, I got his error message:
>
> File "/www/icaatom_B/data/index/segments_6" is not readable.
>
> I checked that this file didn't exist. I removed all the files in
> /www/icaatom_B/data/index and I run "php symfony search:populate
> QubitSearch" again and it worked. Now, icaatom_B installation is returning
> search results as expected.
>
> Then, I rebuilt index in icaatom_A installation and it worked at the first
> time.
>
> What output do you get when you run rebuild index task? You see when it is
> working because you see something like this:
>
> QubitSearch >> Populating index...
> QubitSearch >> Index erased.
> QubitSearch >> admin inserted.
> QubitSearch >> City of Vancouver. Office of the City Clerk inserted.
> QubitSearch >> City of Vancouver Archives inserted.
> ...
>
> Please, remember that files in data/index are created by web server so it is
> possible that when you run php cli to rebuild index you are not able to
> modify that files or create new ones inside data directory. I solved this
> running php CLI by sudo to get credentials of the user which is running
> httpd server in my machine.
>

I don't get any errors when I run "php symfony search:populate
QubitSearch". The problem is that I can browse archival descriptions,
but I do not find them when I search them - using terms that are
inside them. Searching and finding them only seems to be possible for
a few documents and before running any mysql-command.

However, I figured out that I can specify terms as access points. When
I search documents and use these terms, I find the archival
descriptions that I have associated with them. It is also possible to
make a backup, transfer it and use these terms in another application,
finding the archival descriptions associated with them. In short, the
search functionality seems to work very well with the access points.

I had expected that the search functionality would unlock the archival
descriptions using all terms inside them (or at least most of them,
maybe excluding some terms in specific sections). At first sight, it
seems to be working this way, for a few documents and before running
any mysql-command. Now I wonder if it is meant to be this way.

My question now: what is searched by the search functionality? It is
meant to unlock access points and the content of the authority record?
Or does it use all content of the indexed archival descriptions? I
think it'll be the first; if it's the latter, it does not work quite
well - yet...

>
> What kind of data are you working with? When I did the fresh installations I
> imported data fromhttp://www.ica-atom.org/demo.xmlsample data.
>

I work with a collection from the National Archives of the
Netherlands. For a thesis, I wanted to compare the search
functionality of ICA AtoM and Nutch (from the Apache Foundation).

Nutch is a quite strait forward search engine, also meant to be a
search engine for everything that hits its way. When the emphasis of
the search functionality in ICA AtoM lies with working with access
points (which is a reasonable choiche), comparising them might not
make much sense...

Thanks in advance
Regards,

Vincent Jansen
Reply all
Reply to author
Forward
0 new messages