raja raja wrote on 5/2/18 4:47 PM:
> Hi,
>
> I successfully indexed columnar data (tab separated) using deziapp, such that each line in the file represents a document:
>
> % deziapp -S MyAggregator -i doc/data.tsv
> 4384 documents in 00:00:02
>
> Then to search the index, I ran the following query:
>
> % deziapp -q polyester
> # deziapp version 0.016
> # Format: lucy
> # Query: polyester
> # Hits: 2
> # Search time: 0.0048
> 294 18 ""
> 294 19 ""
> .
>
> The search output is correct: 18 and 19 are line (document) numbers where the hits match.
>
Excellent! Progress.
> Questions:
>
> 1. What do 294 and "" represent in the search output above?
That's the score, the URI, and the document title. In your case, if you are not
setting an explicit "title" or "swishtitle" field, it will be empty.
You can see the default result format here:
https://metacpan.org/source/KARMAN/Dezi-App-0.016/lib/Dezi/CLI.pm#L360
You can alter the result output format with the -x option. You can read all
about it here:
https://dezi.org/swish-e-docs/SWISH-RUN.pod.html#x-formatstring-extended-output-format
>
> 2. How to make search case sensitive/insensitive?
You can control the case sensitivity with the config you use. See
https://dezi.org/swish-e-docs/SWISH-CONFIG.pod.html#PropertyNames-list-of-meta-names
You can read up on MetaNames and PropertyNames config in
https://dezi.org/2014/07/18/metanames-and-propertynames/
I highly recommend familiarizing yourself with all the Swish3 configuration
options. Start with
https://dezi.org/swish-e-docs/SWISH-CONFIG.pod.html
You can see some examples here:
https://metacpan.org/source/KARMAN/Dezi-App-0.016/t/config2
>
> 3. How to get complete line (i.e. document) in the output where there is a hit matched in the input file?
Since all the contents of your documents are stored under separate fields
(PropertyNames) you have to ask for each one. The default property is
'swishdescription' so depending on your document layout and config, you might
have everything you want in there.
Otherwise, you'll want to use the -x option to ask for each field:
% deziapp -q yourquery -x '<field1>,<field2>,<field3>'
If you want a more programmatic way of getting the field values, you'll want to
write your own searcher. But I would start with deziapp as a development tool.
Alternately, you can run a Dezi server and serve your index from that. That's
basically a HTTP interface to the same index that deziapp searches.
>
> 4. How to view the schema of the index (e.g. column headers)?
You define those in your config file.
>
> 5. How to specify AND/OR/NOT/Proximity operators in the query?
They are what you think they are. See
https://metacpan.org/pod/Search::Query::Parser#Boolean-connectors
>
> 6. How to specify a particular column for search? I tried to run the above-mentioned query using the following command (product is the header name of column where 'polyester' is present):
> % deziapp -q product:polyester
> # deziapp version 0.016
> # Format: lucy
> # Query: product:polyester
> # Hits: 0
> # Search time: 0.0042
>
You must define 'product' as a MetaName and a PropertyName in order to index and
search on it as a "field".
hth,
pek
--
Peter Karman . he/him/his .
785.337.0405 .
https://karpet.github.io/