Deziapp usage

2 views
Skip to first unread message

raja raja

unread,
May 2, 2018, 5:47:42 PM5/2/18
to dezi-...@googlegroups.com
Hi,

I successfully indexed columnar data (tab separated) using deziapp, such that each line in the file represents a document:

% deziapp -S MyAggregator -i doc/data.tsv
4384 documents in 00:00:02

Then to search the index, I ran the following query:

% deziapp -q polyester
# deziapp version 0.016
# Format: lucy
# Query: polyester
# Hits: 2
# Search time: 0.0048
294 18 ""
294 19 ""
.

The search output is correct: 18 and 19 are line (document) numbers where the hits match.

Questions:

1. What do 294 and "" represent in the search output above?

2. How to make search case sensitive/insensitive?

3. How to get complete line (i.e. document) in the output where there is a hit matched in the input file?

4. How to view the schema of the index (e.g. column headers)?

5. How to specify AND/OR/NOT/Proximity operators in the query?

6. How to specify a particular column for search? I tried to run the above-mentioned query using the following command (product is the header name of column where 'polyester' is present):
% deziapp -q product:polyester
# deziapp version 0.016
# Format: lucy
# Query: product:polyester
# Hits: 0
# Search time: 0.0042

Thank you!

Peter Karman

unread,
May 2, 2018, 6:20:04 PM5/2/18
to dezi-...@googlegroups.com
raja raja wrote on 5/2/18 4:47 PM:
> Hi,
>
> I successfully indexed columnar data (tab separated) using deziapp, such that each line in the file represents a document:
>
> % deziapp -S MyAggregator -i doc/data.tsv
> 4384 documents in 00:00:02
>
> Then to search the index, I ran the following query:
>
> % deziapp -q polyester
> # deziapp version 0.016
> # Format: lucy
> # Query: polyester
> # Hits: 2
> # Search time: 0.0048
> 294 18 ""
> 294 19 ""
> .
>
> The search output is correct: 18 and 19 are line (document) numbers where the hits match.
>


Excellent! Progress.


> Questions:
>
> 1. What do 294 and "" represent in the search output above?


That's the score, the URI, and the document title. In your case, if you are not
setting an explicit "title" or "swishtitle" field, it will be empty.

You can see the default result format here:
https://metacpan.org/source/KARMAN/Dezi-App-0.016/lib/Dezi/CLI.pm#L360

You can alter the result output format with the -x option. You can read all
about it here:

https://dezi.org/swish-e-docs/SWISH-RUN.pod.html#x-formatstring-extended-output-format


>
> 2. How to make search case sensitive/insensitive?

You can control the case sensitivity with the config you use. See
https://dezi.org/swish-e-docs/SWISH-CONFIG.pod.html#PropertyNames-list-of-meta-names

You can read up on MetaNames and PropertyNames config in
https://dezi.org/2014/07/18/metanames-and-propertynames/

I highly recommend familiarizing yourself with all the Swish3 configuration
options. Start with

https://dezi.org/swish-e-docs/SWISH-CONFIG.pod.html

You can see some examples here:

https://metacpan.org/source/KARMAN/Dezi-App-0.016/t/config2


>
> 3. How to get complete line (i.e. document) in the output where there is a hit matched in the input file?


Since all the contents of your documents are stored under separate fields
(PropertyNames) you have to ask for each one. The default property is
'swishdescription' so depending on your document layout and config, you might
have everything you want in there.

Otherwise, you'll want to use the -x option to ask for each field:

% deziapp -q yourquery -x '<field1>,<field2>,<field3>'

If you want a more programmatic way of getting the field values, you'll want to
write your own searcher. But I would start with deziapp as a development tool.

Alternately, you can run a Dezi server and serve your index from that. That's
basically a HTTP interface to the same index that deziapp searches.

>
> 4. How to view the schema of the index (e.g. column headers)?


You define those in your config file.


>
> 5. How to specify AND/OR/NOT/Proximity operators in the query?


They are what you think they are. See
https://metacpan.org/pod/Search::Query::Parser#Boolean-connectors


>
> 6. How to specify a particular column for search? I tried to run the above-mentioned query using the following command (product is the header name of column where 'polyester' is present):
> % deziapp -q product:polyester
> # deziapp version 0.016
> # Format: lucy
> # Query: product:polyester
> # Hits: 0
> # Search time: 0.0042
>

You must define 'product' as a MetaName and a PropertyName in order to index and
search on it as a "field".

hth,
pek


--
Peter Karman . he/him/his . 785.337.0405 . https://karpet.github.io/

raja raja

unread,
May 3, 2018, 12:22:57 PM5/3/18
to dezi-...@googlegroups.com
Great, thank you much Peter. Appreciate your help. This will keep me busy for a few days..:).
Thanks again!


> Sent: Wednesday, May 02, 2018 at 3:20 PM
> From: "Peter Karman" <pe...@peknet.com>
> To: dezi-...@googlegroups.com
> Subject: Re: [dezi-search] Deziapp usage
> --
> Dezi search platform . http://dezi.org/
> ---
> You received this message because you are subscribed to the Google Groups "dezi" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dezi-search...@googlegroups.com.
> To post to this group, send email to dezi-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/dezi-search.
> For more options, visit https://groups.google.com/d/optout.
>

raja raja

unread,
May 4, 2018, 10:01:08 AM5/4/18
to dezi-...@googlegroups.com
Hi Peter,

One quick question, after using the indexer shown below, I am able to search the index using the command below:
% deziapp -q polyester
# deziapp version 0.016
# Format: lucy
# Query: polyester
# Hits: 2
# Search time: 0.0044
588 19 "19"
470 18 "18"
.

But, when I specify the column name (PropertyName) in the query, it gives me the following error:

% deziapp -q polyester -x '<product>'
# deziapp version 0.016
# Format: lucy
# Query: polyester
# Hits: 2
# Search time: 0.0049
Invalid PropertyName: product
% deziapp -q polyester -x 'text<product>'
# deziapp version 0.016
# Format: lucy
# Query: polyester
# Hits: 2
# Search time: 0.0043
Invalid PropertyName: product

1/ I am guessing I missed something in my indexer code? How and where do I define 'product' as a MetaName and a PropertyName in order to both index and search on it as a "field"? My impression was that 'set_field' does that automatically, but may be not.

2/ One other side question, how do I pass a config text file to deziapp command?

--------
#my indexer
package MyAggregator;
use Moose;
extends 'Dezi::Aggregator';

use Dezi::Doc;

#e.g. datasample.tsv format
#id product
#001 silk


sub crawl {
my ( $self, $inputfile ) = @_;

open( RF1, $inputfile ) or die "Can't open < $inputfile: $!";
my $header = <RF1>; # read out header line

my $count = 0;
while ( my $line = <RF1> ) {
chomp $line;
$count++;
my @array = split( /\t/, $line );

my $dezi_doc = Dezi::Doc->new( uri => $count, );
$dezi_doc->set_field( 'title' => $count );
$dezi_doc->set_field( 'id' => $array[0] );
$dezi_doc->set_field( 'product' => $array[1] );


my $xml = $dezi_doc->as_string_ref;

my $doc = $self->doc_class->new(
content => $$xml,
url => $count,
modtime => time(),
parser => 'XML*',
type => 'application/xml',
size => length $$xml,
);

$self->indexer->process($doc);
}
close(RF1);
return $count;
}

1;

-------

Thank you!



> Sent: Thursday, May 03, 2018 at 9:22 AM
> From: "raja raja" <raj...@gmx.com>

Peter Karman

unread,
May 4, 2018, 10:21:09 AM5/4/18
to dezi-...@googlegroups.com
raja raja wrote on 5/3/18 1:56 PM:

>
> 1/ I am guessing I missed something in my indexer code? How and where do I define 'product' as a MetaName and a PropertyName in order to both index and search on it as a "field"? My impression was that 'set_field' does that automatically, but may be not.

You are correct.

You need a config file like:

MetaName product
PropertyName product

>
> 2/ One other side question, how do I pass a config text file to deziapp command?
>

If you type:

% deziapp -h

you'll see the full usage statement and options. You want -c or --config.

raja raja

unread,
May 4, 2018, 1:26:43 PM5/4/18
to dezi-...@googlegroups.com
Sure, thanks Peter!

> Sent: Friday, May 04, 2018 at 7:21 AM
> From: "Peter Karman" <pe...@peknet.com>
> To: dezi-...@googlegroups.com
> Subject: Re: [dezi-search] Deziapp usage
>

raja raja

unread,
May 4, 2018, 2:54:57 PM5/4/18
to dezi-...@googlegroups.com
Hi Peter,

Everything seems to be working great at this stage, I was able to get the desired search output. Just a little typo below (seems it's plural):

MetaNames product
PropertyNames product

One question, how do I print uri; when I run this command, it gives me an error (in the reference also I could not find a mention on printing uri):

deziapp -q polyester -x '<uri>, <swishrank>, <swishtitle>, <product>\n'

Thanks and have a great weekend ahead!



> Sent: Friday, May 04, 2018 at 7:21 AM
> From: "Peter Karman" <pe...@peknet.com>
> To: dezi-...@googlegroups.com
> Subject: Re: [dezi-search] Deziapp usage
>

Peter Karman

unread,
May 4, 2018, 3:03:59 PM5/4/18
to dezi-...@googlegroups.com
raja raja wrote on 5/4/18 1:54 PM:
> Hi Peter,
>
> Everything seems to be working great at this stage, I was able to get the desired search output. Just a little typo below (seems it's plural):
>
> MetaNames product
> PropertyNames product
>
> One question, how do I print uri; when I run this command, it gives me an error (in the reference also I could not find a mention on printing uri):
>
> deziapp -q polyester -x '<uri>, <swishrank>, <swishtitle>, <product>\n'
>


glad the docs helped get you sorted.

I think you want <swishdocpath>

see https://dezi.org/2013/02/13/reserved-fields/
Reply all
Reply to author
Forward
0 new messages