On 3/11/15 7:23 AM, Paul Stubbe wrote:
> - If a page did not get re-indexed after say three weeks I would like Dezi to forget about it.
> ( I can probably add an "indexed date" as meta information and run a proces after three
> weeks to delete those links that Dezi finds.)
>
I think you could achieve that pretty well with the Lucy lib directly:
https://metacpan.org/pod/distribution/Lucy/lib/Lucy/Index/Indexer.pod#delete_by_query-query
the "swishlastmodified" field stores the indexing time as an epoch value.
You might try something like this:
#!/usr/bin/env perl
use strict;
use SWISH::Prog::Lucy::Indexer;
use SWISH::Prog::Lucy::Searcher;
my $index = 'dezi.index';
my $indexer = SWISH::Prog::Lucy::Indexer->new( invindex => $index );
my $lucy_indexer = $indexer->get_lucy;
my $three_weeks_ago = shift(@ARGV) || time() - ( 86400 * 21 );
my $padded_zeros = '0' x length($three_weeks_ago);
my $query = qq/swishlastmodified=($padded_zeros .. $three_weeks_ago)/;
print "$query\n";
my $searcher = SWISH::Prog::Lucy::Searcher->new( invindex => $index );
my $results = $searcher->search($query);
printf( "will delete %s documents\n", $results->hits );
$lucy_indexer->delete_by_query( $results->query->as_lucy_query );
$lucy_indexer->commit();
--
Peter Karman .
www.peknet.com . @peterkarman