ThreadedListSerializer - will this cause me any problems?

25 views
Skip to first unread message

Kyle Edwards

unread,
May 17, 2019, 1:29:06 PM5/17/19
to Django REST framework

from multiprocessing.pool import ThreadPool


class ThreadedListSerializer(serializers.ListSerializer):
    """
    Use `multiprocessing.ThreadPool` to run the child serializer's to_representation calls in parallel

    control the thread pool size with `SERIALIZER_THREADS`

    Most effectively used with `ModelSerializer`'s that have IO bound field level serialization (ie. computed properties that make db calls):

    ```
        class User(ModelSerializer):
            ...

            class Meta:
                list_serializer_class = ThreadedListSerializer
    ```
    """
    def worker(self, item):
        representation = self.child.to_representation(item)
        connection.close()
        return representation

    def to_representation(self, data):
        """
        List of object instances -> List of dicts of primitive datatypes.
        """
        # Dealing with nested relationships, data can be a Manager,
        # so, first get a queryset from the Manager if needed
        iterable = data.all() if isinstance(data, models.Manager) else data

        pool = ThreadPool(processes=int(settings.SERIALIZER_THREADS))
        list_representation = pool.map(self.worker, [item for item in iterable])
        # avoid a memory leak :)
        pool.close()
        pool.join()
        return list_representation


I'm seeing some considerable response time improvements when used with a modelserializer

Kyle Edwards

unread,
May 17, 2019, 6:47:34 PM5/17/19
to Django REST framework
I get the feeling this isn't a great idea, especially if nested serializers are involved

Kyle Edwards

unread,
May 21, 2019, 8:57:16 AM5/21/19
to Django REST framework
while performance is improved with list serializers for models that included additional computed properties, curiously this implementation only works when the server receive a single request, and serialization of that request finishes before the next request.  

If however the server receives concurrent requests, some of the child item representations receive all null values.  Both requests are return 200 with no errors logs...

Carl Nobile

unread,
May 27, 2019, 6:09:27 PM5/27/19
to Django REST framework
Hi Kyle,

I'm thinking you will have reentrant issues, in other words, a single instance of a serializer will not be thread/process safe. If you dir(instance) you will see a whole lot of member objects that will be instance-specific data. Django is pretty good at handling multiple instances of requests, so don't try to do what Django is already doing. The best thing you can do to increase the speed of your request/responses is to do backend caching with Redis. You write code that intercepts the data coming from your models (or wherever the data comes from) and caches it.

~Carl

Reply all
Reply to author
Forward
0 new messages