Loading scikit model in Django

33 views
Skip to first unread message

Chih How Bong

unread,
Aug 25, 2015, 8:51:53 PM8/25/15
to Django users
Hi guys, I have searched the answers for days with no avail, thus I would like to ask here.

I am running a svm scikit model in Django, currently, every request have to load the ~20mb scikit model, and it was really slow.

I wonder if I should load the model in settings.py, and store in in session? However, the scikit model can be updated by a certain user.

Any comment? I am rather new to django, thank in advance.

James Schneider

unread,
Aug 25, 2015, 10:14:04 PM8/25/15
to django...@googlegroups.com

Have you thought at all about using a cache system such as memcached? The initial load would probably be slow, but it's possible that subsequent access to that same object would see a speed improvement. You would need to establish a workflow that ensures the object stays up to date whenever changes are made, but repeated reads would likely see a substantial improvement.

Storing that amount of data in the session would not have the desired effect, if anything, things would probably get worse, depending on the backend used to store sessions. Sessions are reloaded during every request (and therefore probably not addressing your issue), whereas a cache system is meant to hold things in an easy/quickly accessible format and location, independent of the request/response cycle.

-James

https://docs.djangoproject.com/en/1.8/topics/cache/#memcached

Chih How Bong

unread,
Aug 26, 2015, 1:14:10 AM8/26/15
to Django users
Thanks you James for your suggestion. I am going to look into the memcached now.

I was thinking that the ideal solution will be to load the model into a global variable, then the request can use refer to this variable for the model. I don't know if that possible.

Thanks.

James Schneider

unread,
Aug 26, 2015, 2:46:29 AM8/26/15
to django...@googlegroups.com


> Thanks you James for your suggestion. I am going to look into the memcached now.
>
> I was thinking that the ideal solution will be to load the model into a global variable, then the request can use refer to this variable for the model. I don't know if that possible.
>

Possible...yes. Good idea? Well, let's just say a cache is what you need in this case. I'm hoping that your objects can be serialized or pickled, that will make them much easier to store in the cache. This will also make it easier to support concurrent users with different objects in the cache.

The tough part is going to be maintaining a consistent state for the object in the cache if the user is making changes to it (updating the instance and then updating the cache with the new object so that further request are working with current data).

You may also want to consider handling changes via a batch job processor such as Celery if the updates are lengthy (locking the object for updates, retrieving the object, updating it, saving it to the DB or a file, and then updating the cache with the new version). That way your users get an instant response saying that their changes have been submitted and the updated version should be available shortly. It really depends on your user base and the timing of operations.

Start with implementing the cache, once that is working, look at a batch processor like celery if the response is still less than desirable, since there is a fair amount of setup required to get a batch processor online.

I've never worked with custom memcached entries, so I've reached the end of my knowledge scope here.

-James

Reply all
Reply to author
Forward
0 new messages