Documentation for all properties in sensei.properties

47 views
Skip to first unread message

Jayadev Jayaraman

unread,
Aug 30, 2013, 6:18:49 PM8/30/13
to sensei...@googlegroups.com
Is there any reference manual I can look up that has documentation for all properties one can set in sensei.properties ? I looked at http://senseidb.github.io/sensei/configuration.html , but I couldn't find such a thing. 

Baoqiu Cui

unread,
Aug 31, 2013, 9:39:08 AM8/31/13
to sensei...@googlegroups.com
That page is the right place to find the documentation for sensei.properties.  Search for a subtitle called "System Configuration", and you should find the docs you need.

Baoqiu 


On Sat, Aug 31, 2013 at 6:18 AM, Jayadev Jayaraman <jdis...@gmail.com> wrote:
Is there any reference manual I can look up that has documentation for all properties one can set in sensei.properties ? I looked at http://senseidb.github.io/sensei/configuration.html , but I couldn't find such a thing. 

--
You received this message because you are subscribed to the Google Groups "Sensei" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sensei-searc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jayadev Jayaraman

unread,
Aug 31, 2013, 6:04:16 PM8/31/13
to sensei...@googlegroups.com
Thanks. I'm trying to figure out how to configure Zoie, and when I launch a 2 node cluster (4 shards per node) with no data in it initially, the data directory set in sensei.properties [ sensei.index.directory ]  contains a "node0" or a "node1" directory and a "zoieone" directory at the start. 

When I access the index using the sensei web UI, I get this in the logs : 

2013/08/31 17:59:32.943 INFO [DefaultDirectoryManager] [] Starting with empty search index: version information not found at /mnt/sensei-index-aug12-2013/node0/shard3/index.directory
2013/08/31 17:59:32.943 INFO [DefaultDirectoryManager] [] Starting with empty search index: version information not found at /mnt/sensei-index-aug12-2013/node0/shard1/index.directory
2013/08/31 17:59:32.943 INFO [DefaultDirectoryManager] [] Starting with empty search index: version information not found at /mnt/sensei-index-aug12-2013/node0/shard2/index.directory
2013/08/31 17:59:32.943 INFO [DefaultDirectoryManager] [] Starting with empty search index: version information not found at /mnt/sensei-index-aug12-2013/node0/shard0/index.directory

This is understandable as there is initially no data.

When I load data using the "bin/load-index" python script that comes with sensei, the zoieone directory starts filling up on either machine , while the "node0" and "node1" directories remain empty. When I search for data on the web GUI, I am able to get what I want but the logs still complain that the data directories are empty. 

Can someone help me figure this out ? 

Thanks.

Jayadev Jayaraman

unread,
Aug 31, 2013, 6:21:00 PM8/31/13
to sensei...@googlegroups.com
While loading my index, I also get this sort of log message : 

2013/08/31 18:18:24.834 ERROR [MultiValueFacetDataCache] [] Maximum value per document: 1024 exceeded, fieldName=<multi-value-field-name>

Is there any way to increase the maximum number of values in a multi-value field past 1024 ? 

Thanks.

Yonghui Zhao

unread,
Sep 1, 2013, 5:20:41 AM9/1/13
to sensei...@googlegroups.com
Index.directory file contains data watermark information, which is defined by user and passed by data provider, it is a short string, for example:  time,  kafka data set, sequence id...
Sensei can get the watermark  after restart, you can do some failover work using this watermark. Index.directory is used for data provider mode.

If the index is batch generated for example map-reduece, the index won't have this file. So you will see the log.


“node0” and “node1” directories are empty? Maybe the script just read data remotely, I am not sure.

You can just copy the data to local directory and also you can create the index.directory file contain the watermark if you want to use it.



2013/9/1 Jayadev Jayaraman <jdis...@gmail.com>

Jayadev Jayaraman

unread,
Sep 1, 2013, 3:17:24 PM9/1/13
to sensei...@googlegroups.com
Hi, Thanks for answering.

I did use mapreduce to build a sensei index and I uploaded it to Amazon S3. I invoke load-index like this : 

bin/load-index s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/$SENSEI_INDEX_LOCATION

In sensei.properties , sensei.index.directory=/mnt/sensei-index-aug12-2013

So, to illustrate what my sensei data directory on each cluster node looks like : 

# All data in the "zoieone" folder, but no data in "node1".
$ du -sh /mnt/sensei-index-aug12-2013/*
20K     /mnt/sensei-index-aug12-2013/node1
32G     /mnt/sensei-index-aug12-2013/zoieone

# The folders for the 4 shards are created when start sensei with no data.
$ du -sh /mnt/sensei-index-aug12-2013/node1/*
4.0K    /mnt/sensei-index-aug12-2013/node1/shard4
4.0K    /mnt/sensei-index-aug12-2013/node1/shard5
4.0K    /mnt/sensei-index-aug12-2013/node1/shard6
4.0K    /mnt/sensei-index-aug12-2013/node1/shard7

# But no data is added to the shard folders when I load the index. All data goes to "zoieone".
# There is no index.directory file in this folder.
$ ls /mnt/sensei-index-aug12-2013/node1/shard4

# The "zoieone" folder contains the data for "node1"
$ du -sh /mnt/sensei-index-aug12-2013/zoieone/*
32G     /mnt/sensei-index-aug12-2013/zoieone/node1

# Data for all 4 shards in "zoieone"
$ du -sh /mnt/sensei-index-aug12-2013/zoieone/node1/*
8.6G    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4
7.1G    /mnt/sensei-index-aug12-2013/zoieone/node1/shard5
8.6G    /mnt/sensei-index-aug12-2013/zoieone/node1/shard6
7.1G    /mnt/sensei-index-aug12-2013/zoieone/node1/shard7

# "Timestamp folder" inside the shard folder in "zoieone"
$ du -sh /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/*
8.6G    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382

# An "index.directory" file and lucene index files are present inside this "timestamp" folder.
$ du -sh /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/*
6.8G    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/_25p.fdt
225M    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/_25p.fdx
4.0K    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/_25p.fnm
764M    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/_25p.frq
281M    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/_25p.prx
5.2M    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/_25p.tii
545M    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/_25p.tis
0       /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/committed
4.0K    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/index.directory
4.0K    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/segments_1
4.0K    /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/segments.gen

There seem to be these timestamp folders inside these shard folders in "zoieone" , which in turn contain the lucene index files.  There is also an "index.directory" watermark file in these like you mentioned, while there is none in the shards of the main index folder. 

To move data into the main index and make it work, do I merely move the whole timestamp folder into the main index directory like this : 

cp -R /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382 /mnt/sensei-index-aug12-2013/node1/shard4/

OR 

Do I move the *contents* of the timestamp folder into the main index directory ? 

cp -R /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/* /mnt/sensei-index-aug12-2013/node1/shard4/

Or do I need to do something else?

Thanks again.
Jayadev

Yonghui Zhao

unread,
Sep 2, 2013, 1:54:42 AM9/2/13
to sensei...@googlegroups.com
Line 600:    

if (copier != null) {
      zoieSystemFactory = new SenseiPairFactory(idxDir, dirMode, copier, interpreter, decorator,
          zoieConfig, zoieSystemFactory);
    } else if (SENSEI_INDEXER_COPIER_HDFS.equals(indexerCopier)) {
      zoieSystemFactory = new SenseiPairFactory(idxDir, dirMode, new HDFSIndexCopier(),
          interpreter, decorator, zoieConfig, zoieSystemFactory);

You defined the copier in sensei.properties so it use zoie pair mode.

You can see zoieone is the batch index, you load from remote, while zoietwo is for your delta index from data provider

If you only have batch index, you can remove index copier config, and copy the index to local.


I guess this could work.

cp -R /mnt/sensei-index-aug12-2013/zoieone/node1/shard4/1377986882382/* /mnt/sensei-index-aug12-2013/node1/shard4/




2013/9/2 Jayadev Jayaraman <jdis...@gmail.com>

Jayadev Jayaraman

unread,
Sep 3, 2013, 9:23:16 AM9/3/13
to sensei...@googlegroups.com
Thank you very much ! I tried that and it's working fine now. I am not getting that annoying log message anymore and the gateway is adding data to the index directory.

I have one last question. 

While loading my index, I also get this sort of log message : 

2013/08/31 18:18:24.834 ERROR [MultiValueFacetDataCache] [] Maximum value per document: 1024 exceeded, fieldName=<multi-value-field-name>

Is there any way to increase the maximum number of values in a multi-value field past 1024 ? 

Thanks.

Jayadev Jayaraman

unread,
Sep 3, 2013, 9:52:45 AM9/3/13
to sensei...@googlegroups.com

John Wang

unread,
Sep 3, 2013, 5:16:44 PM9/3/13
to sensei...@googlegroups.com
Hi Jayadev:

    This limit is encoded deep in how we handle multi-value facets. So it is difficult the change.

    Can you tell me your use-case and maybe we can suggest a workaround?

-John
Reply all
Reply to author
Forward
0 new messages