Why is there 2 Config files

55 views
Skip to first unread message

Vineeth Mohan

unread,
Oct 23, 2012, 8:29:32 AM10/23/12
to cleo-ty...@googlegroups.com
Hi ,

I am trying to make sense out of the config files used in cleo-premier

vineeth@vineeth-XPS-L501X:~/cleo-primer/cleo-primer$ ls src/main/resources/config/generic-typeahead
i001.config  i002.config
vineeth@vineeth-XPS-L501X:~/cleo-primer/cleo-primer$ du -sch generic-typeahead/*
25M    generic-typeahead/i001
84K    generic-typeahead/i002
26M    total


Here , there are 2 config files and cleo is loading both of them.
Also the data given to cleo are stored in i001 and nothing is going into i002.

Why do cleo needs 2 config file here and why are different instances created for each configuration ?

Thanks
           Vineeth

Andrew O'Brien

unread,
Oct 23, 2012, 11:01:40 AM10/23/12
to cleo-ty...@googlegroups.com
Each index is essentially claiming a partition of the (2^32=)~4 billion available ids. You can see that the main (only?) difference between those two config files is the cleo.search.generic.typeahead.config.partition.start value. When you load in companies, they all go into index i001 because their IDs are between 0 and 1,000,000 (I think the ones in the example top out at 2800 or so).

I'd assume they were provided in the primer to give us examples of how we would partition larger element sets. From what I can tell, the main benefit is that each partition gets its own writer thread, (but I'm not the authority there).

As an aside: It's a good exercise to list out what kinds of elements you're expecting to store and how many there are relative to one-another. (I found I actually only had a couple pushing into the millions that required special partitioning, but YMMV.) For smaller element sets (1000s to 10,000s), ScannerTypeaheads are probably a better fit (but benchmark before deciding). 
Reply all
Reply to author
Forward
0 new messages