multiple indexes

28 views
Skip to first unread message

Paul Stubbe

unread,
Oct 12, 2012, 7:49:12 AM10/12/12
to dezi-...@googlegroups.com
Hi,

I'm trying to get the Dezi::Tutorial to work with multiple indexes.
(Because, I had multiple indexes with swish2)

In the engine_config (as I read in Dezi::Config) it is possible to define multiple indexes.

engine_config => {
    default_response_format => 'JSON',
    index  => [qw (dezi1.index dezi2.index)],
...

What else needs to be configured in order to view the results with Dezi::UI?

Paul Stubbe

unread,
Oct 12, 2012, 8:38:30 AM10/12/12
to dezi-...@googlegroups.com
The search works on the two indexes, but the user can not differentiate between them.

I think it would be interesting to offer "index" as a meta information with the description of the indexes as possible choices.

Peter Karman

unread,
Oct 12, 2012, 9:38:27 AM10/12/12
to dezi-...@googlegroups.com
I know that was a feature of Swish-e. It does not seem to be a feature
of Lucy (on which Dezi depends). In fact, I think it would be difficult
to achieve given how results are retrieved in Lucy.

I faced the same problem just yesterday (distinguishing results from
different indexes) and solved it by adding a new MetaName called
'origin' which I populated at indexing time.


--
Peter Karman . http://peknet.com/ . pe...@peknet.com

Paul Stubbe

unread,
Oct 12, 2012, 9:53:33 AM10/12/12
to dezi-...@googlegroups.com, pe...@peknet.com
Thanks,
 
I will work with your solution.

But could you explain a little more how best to populate the index with the "origin" metaname.


Op vrijdag 12 oktober 2012 15:39:55 UTC+2 schreef Peter Karman het volgende:

Peter Karman

unread,
Oct 12, 2012, 10:51:57 AM10/12/12
to dezi-...@googlegroups.com
On 10/12/12 8:53 AM, Paul Stubbe wrote:
> Thanks,
>
> I will work with your solution.
>
> But could you explain a little more how best to populate the index with
> the "origin" metaname.
>

What I did was add this to my swish.config:

MetaNames origin

and then, since I was spidering a website, I added a doc-filter like this:

% cat doc-filter.pl
sub {
my $doc = shift;

my $buf = $doc->content;

# add origin meta value
$buf =~ s,</head>,<meta name="origin" value="mysite"/></head>,;

# reset the content
$doc->content($buf);
}

and then invoked that filter from swish3 cmd line:

% swish3 -S spider -F lucy \
-f dezi.index -i http://mysite.foo \
-c swish.conf --doc_filter doc-filter.pl


If you were indexing static files, you could add that meta content some
other way.

If you wanted Dezi::Client to do it:

my $dezi_doc = Dezi::Doc->new(uri => 'foobar');
$dezi_doc->set_field('origin' => 'mysite');
$dezi_client->index( $dezi_doc );


HTH

Paul Stubbe

unread,
Mar 6, 2013, 7:03:16 AM3/6/13
to dezi-...@googlegroups.com, pe...@peknet.com


Op vrijdag 12 oktober 2012 16:52:02 UTC+2 schreef Peter Karman het volgende:
 
What I did was add this to my swish.config:

  MetaNames origin

and then, since I was spidering a website, I added a doc-filter like this:

% cat doc-filter.pl
sub {
     my $doc = shift;

     my $buf = $doc->content;

     # add origin meta value
     $buf =~ s,</head>,<meta name="origin" value="mysite"/></head>,;

     # reset the content
     $doc->content($buf);
}

and then invoked that filter from swish3 cmd line:

% swish3 -S spider -F lucy \
   -f dezi.index -i http://mysite.foo \
   -c swish.conf --doc_filter doc-filter.pl

 
Peter,

     Can / Should I use your new "dezibot" to do the same thing? (Add meta info.)

     What do you propose?

Greetings,

Paul

Peter Karman

unread,
Mar 6, 2013, 12:55:53 PM3/6/13
to Paul Stubbe, dezi-...@googlegroups.com
On 3/6/13 6:03 AM, Paul Stubbe wrote:
>
>
> Op vrijdag 12 oktober 2012 16:52:02 UTC+2 schreef Peter Karman het volgende:
>
> What I did was add this to my swish.config:
>
> MetaNames origin
>
> and then, since I was spidering a website, I added a doc-filter like
> this:
>
> % cat doc-filter.pl <http://doc-filter.pl>
> sub {
> my $doc = shift;
>
> my $buf = $doc->content;
>
> # add origin meta value
> $buf =~ s,</head>,<meta name="origin" value="mysite"/></head>,;
>
> # reset the content
> $doc->content($buf);
> }
>
> and then invoked that filter from swish3 cmd line:
>
> % swish3 -S spider -F lucy \
> -f dezi.index -i http://mysite.foo \
> -c swish.conf --doc_filter doc-filter.pl <http://doc-filter.pl>
>
>
> Peter,
>
> Can / Should I use your new "dezibot" to do the same thing? (Add
> meta info.)
>
> What do you propose?
>


Paul,

Thanks for the question.

The dezibot implementation is just a wrapper around the swish3 spider,
providing persistent caching and storage using DBI to allow for scaling
to multiple simultaneous crawls.

For your purposes, I still suggest the swish3 spider with doc_filter.
That keeps all your collections in a single index. You could also create
multiple indexes, one for each data store, which is a similar technique
I'm using now at $work here:

https://www.publicinsightnetwork.org/?s=test

In those results, the Post results are indexed with the
dezi-for-wordpress plugin and the Query results are getting indexed
outside of wordpress. Dezi then serves up 2 indexes (one for each type,
each with same schema) and provides integrated results via the wordpress
plugin on the site.

My Dezi config looks like:

{ engine_config => {
index => [qw( pin.org.index queries.index )],
parser_config => { query_dialect => 'Lucy', },
facets => { names => [
qw( categories tags author type )]
},
do_not_hilite => { map { $_ => 1 }
qw( permalink type categories tags author ) },
cache_ttl => 60, # only cache facets a short time

# result attributes in response
fields => [
qw( id permalink numcomments categories categoriessrch
tags tagssrch author author_s type date modified
displaydate displaymodified )
],
},
}


HTH,
pek
Reply all
Reply to author
Forward
0 new messages