On 3 août, 19:14, "nchauvat (Logilab)" <
nicolas.chau...@logilab.fr>
wrote:
> On 3 août, 19:01, blep <
baptiste.lepill...@gmail.com> wrote:
>
> > I'm running into some performance issue with the datastore stub
> > provided with the SDK: ...
>
> This was reported before and the only answer was "SDK is for making
> development easier, not for simulating the performances of the actual
> production environment".
I know that. I'm not trying to do performance simulation, but plainly
input enough data into the local devenv to be able to do basic test
(check/debug algorithm...). I've already seen that performance of the
production environment are vastly different of my local one.
> > The original dataset is stored into a python bsddb (250Mo) and I have no such issue when querying and feeding it.
>
> You have the source code so you could look for the bottle-neck and
> fix, but I would suggest running performance tests on the actual
> server instead, since the local db will never have the performances of
> the actual servers anyway. If it does not block at 1e3, it will block
> at 1e6 or 1e9.
I don't expect the local test system to be able to plain around with
terabyte of data like the production environment could. But currently,
my local datastore is barely 2Mo and has a few thousand raw. Being
able to handle at least 100 000 locally seems like a reasonnable
target to me. This of course implies being able to insert them in a
reasonnable time.
I just gave a quick look at the code and in datastore_file_stub.py,
and it seems that _Dynamic_Put(), which if I guess correctly is called
somehow by model.put(), calls __WriteDatastore() which seems to
simply pickle all entities into a new file each time. So unless I'm
mistaken, the stub implementation goes through all the entities and
pickles them in a new file each time a put/commit is done, which
explains the linear increase in time taken to put an entity...
I'm just saying that bsddb which is available with standard python
distribution does not have this issue (I'm using it with more than
250K rows without any performance issue). The SDK could use it.
Do you know if there is already an open issue for this ? I could not
find it and you said this has already been reported...