davidthings wrote:
> Hi,
>
> I've been doing a little investigation into Persevere performance and
> behavior with big tables. Mostly I've been finding that there are a
> few places where there are assumptions that might not be optimal:
>
> At least in the version r388, it seems like all objects in all Classes
> are loaded into memory at startup. I have my Java VM set to max out
> at around 400MB, and with that it looks like somewhere around 400,000
> single integer attribute objects is about as many as I can get.
>
No, that shouldn't be true. I have created tables with +1000000 objects
with multiple properties, and it starts in just a couple seconds with
about 23MB of memory (the normal usage). Persevere should only load
objects into memory as needed. Are you sure you don't have anything in
your startup that is somehow iterating through and loading every object?
> This means
> a) that start up can be very slow - O( n x m ) where n is number of
> records, m is number of attributes. (>20mins!)
> b) that start up can fail when there are a lot of objects in memory
> (crashing with heap errors)
>
> It seems like possibly this load all objects thing might happen in
> other places as well (index updates?). Around the same limit (400,000
> single integer objects) creation of new objects gets a bit tricky and
> at least once I've seen a failure with memory problems.
>
Rebuilding the indexes requires that all the objects be processed, but
they don't need to all be in memory at one time for this to take place.
They should be freed after they are processed.
> It seems like indexing all attributes by default is maybe not quite
> the right thing. If I have a lot of objects that are simply { value:
> [integer value] }, I might prefer that the DB not index them until I
> issue a request that might indicate that I'm every going to access
> 'value' in any search-like way. I wonder about assuming no index on
> an attribute until someone does a query that might use it, unless the
> 'index' meta-value is set in which case it is respected.
>
It actually is adaptive. Indexes that aren't used will become inactive,
and won't be updated.
> From a philosophical point of view, should we take databases that can
> fit in memory as the upper bound on database size for the forseable
> future? The application we're evaluating Persevere for, and I submit,
> many others will want a DB that can size up without hard limits.
>
Absolutely, Persevere databases should be able to far exceed memory, and
in the load tests I have performed I have done extensive queries,
updates, and more with tables that far exceed the amount of memory
allocated to Persevere without any problem.
It is certainly to get in situations where all excessive memory can be
consumed, especially with queries and application code that might load
all the objects into memory to carry out an action.
Kris
davidthings wrote:
> Seeing also that the transaction table grows with each access (approx
> 140B per single integer change) makes a few more things come to mind:
>
> 1) Is the reason for the long start up time that object states are
> built up from replaying all the transactions that have occurred since
> the database was created?
>
No, transactions are only replayed if the index needs to be rebuilt. If
the server crashes before indexes are finished updating, there is also
incremental replay, but that should obviously be very minimal.
> 2) How would this work in a system where lots of data changes all the
> time? In our system much of the user's data changes second to
> second. Would there be a transaction compactor, or would there be a
> separate state table?
>
A database compactor would be a nice feature. However, for databases
where the number of updates on objects is not significantly greater than
the number of objects, compaction would have limited value. On the other
hand, in many situations where data changes extremely rapidly, in may
not even be necessary to persist the data (and use the InMemorySource),
and it may be more efficient to simply rebuild the table when the server
is restarted (if the data is no longer relevant after a few minutes). Of
course, there is certainly an area in between where the transaction
compactor would be useful. In the meantime, the old internal storage
system (the DynaObjectDBSource) might be useful if the JavaScriptDB's
transaction file is growing too fast for the storage limits.
Kris