Hi Chis,
Interesting. I just had a weekend hack for storing and retrieving json
objects using Krati. Take a look at
https://github.com/jingwei/jsonstore
You may find it helpful for setting up a krati based web services or
understanding how to configure IndexedDataStore.
Besides this, please see my reply inlined.
On Apr 23, 1:39 pm, ChrisLamprecht <
clampre...@gmail.com> wrote:
> Hi Jingwei & Krati users,
>
> I'm working on integrating Krati into the IndexTank search engine project
> for document storage. Currently all documents are stored compressed in
> RAM. It would be much more efficient to use a datastore such as BDB or
> Krati to reduce the RAM requirements. I've read the discussions on this
> message group, and I'm not sure on the best way to configure Krati for
> these requirements:
>
> * A hosted search service
> * Each search index (and Krati store) runs in its own JVM
> * Number of documents (key/value pairs) can range from very small (less
> than 25,000) to 50 million or more
> * Key size is always under 1024 bytes, and almost always much smaller
> (under 64 bytes). We can assume keys will fit in main memory for now.
> * Value size can range from around 100 bytes up to 100KB, but is most often
> in the range 300 - 10000 bytes.
> * Read and write frequency can range from almost none to tens per second
>
> The goal is to minimize RAM requirements, while keeping read/write
> performance at a reasonable level.
>
> So I think my questions are:
> - which DataStore to use (IndexedDataStore, DynamicDataStore?)
IndexedDataStore is a better solution. It is much more efficient at
handling updates.
It holds all the keys in main memory.
> - which Segment type to use (MappedSegment, ChannelSegment,
> WriteBufferSegment?)
WriteBufferSegment is a good choice. As its name suggests, it buffers
writes in memory and has goot write throughput.
> - what configuration parameters to use (initLevel, segment size, etc) for
> the DataStore
I prefer to use segments of size 64MB or 128MB.
The initLevel is a bit complicated, you can use StoreConfig with the
specified initialCapacity, which is much clearer. The initialCapacity
cannot be changed once
the underlying store is created. So please choose this parameter
according to the estimation of your data sets.