Relationships exposed over RESTful interface

57 views
Skip to first unread message

Conrad Rowlands

unread,
Apr 10, 2014, 1:29:39 PM4/10/14
to django...@googlegroups.com
Hi,

I have an issue with regard to exposing a RESTful interface using filtersets and HyperLinkedModelserialisers. I have two model classes defined thus:

class Manufacturer(CommonExtendedIcon):
    WebSite=models.CharField(max_length=1024)
    StandardList=models.BooleanField(default=False)
    SourceURL=models.CharField(max_length=1024,null=True)
    LastChecked=models.DateTimeField(null=True)
    def __str__(self):
        return self.Name

class ManufacturerModel(CommonExtendedIcon):
    Manufacturer=models.ForeignKey(Manufacturer, related_name='models')
    WebSite=models.CharField(max_length=1024)
    SeriesStartDate=models.IntegerField(null=True)
    SeriesEndDate=models.IntegerField(null=True)
    SourceUrl=models.CharField(max_length=1024,null=True, default='')
    LastChecked=models.DateTimeField(null=True)
    def __str__(self):
        return self.Name

Now in my my rest Interface I make a query to retrieve say all of the Manufacturer Models with a Manufacturer begining with 'A'. Looking at the SQL that is subsequently run I can thus see that there are over 4000 queries issued. 1 to retrieve all of the ManufacturerModel Records and then 1 for EACH individual linked Manufacturer. Why is this and how can I stop this behaviour but still bring back the Manufacturer Details. I expected instead the first query to load all of the data. 

Here are my serialiser classes:-

class ManufacturerSerialiser(serializers.HyperlinkedModelSerializer):
    class Meta:
            model = Manufacturer
            fields = ('WebSite','LogoLocation','Name','id','StandardList','SourceURL')
           
class ManufacturerModelSerialiser(serializers.HyperlinkedModelSerializer):
    manufacturer = serializers.HyperlinkedIdentityField(view_name='manufacturer')
    class Meta:
        model = ManufacturerModel
        fields = ('WebSite','LogoLocation','Name','id','Manufacturer','SeriesStartDate','SeriesEndDate','SourceUrl','LastChecked')

And Finally my FilterSets

class ManufacturerFilter(django_filters.FilterSet):
        Name = django_filters.CharFilter(lookup_type='icontains')
        Website = django_filters.CharFilter(lookup_type='icontains')
        class Meta:
            model=Manufacturer
            fields = ('WebSite','LogoLocation','Name','id','StandardList')
           
           
class ManufacturerModelFilter(django_filters.FilterSet):   
        Manufacturer=django_filters.CharFilter(name='Manufacturer__Name', lookup_type='icontains')
        Name = django_filters.CharFilter(lookup_type='icontains')
        Website = django_filters.CharFilter(lookup_type='icontains')
        class Meta:
            model=ManufacturerModel
            fields = ('WebSite','LogoLocation','Name','id','Manufacturer','LastChecked')


I know that If I remove the 'Manufacturer' field  from the ManufacturerModel serialisation classes it works as expected meaning that it is the relationship that is causing this behaviour (the extra 4000 SQL Queries) .

Javier Guerra Giraldez

unread,
Apr 10, 2014, 1:58:36 PM4/10/14
to django...@googlegroups.com
On Thu, Apr 10, 2014 at 8:29 AM, Conrad Rowlands
<conradj...@googlemail.com> wrote:
> Why is this and how can I stop this behaviour but still bring back the
> Manufacturer Details. I expected instead the first query to load all of the
> data.


override the get_collection() method in your View (or ViewSet) to add
a select_related().

for example, something like this (untested!): add this mixin to your
View and optionally set the related_fields:

class SelectRelatedMixin(object):
related_fields = ()
def get_queryset(self):
return super(SelectRelatedMixin,
self).get_queryset().select_related(*self.related_fields)


class ManufacturerModelViewSet(SelectRelatedMixin, viewsets.ModelViewSet):
related_fields = ('manufacturer',)
..... the rest of your viewset

--
Javier

Conrad Rowlands

unread,
Apr 11, 2014, 8:51:37 AM4/11/14
to django...@googlegroups.com
Hi Javier,

Thank you for your quick response. Sadly I could not get your solution to work however you helped to point me in the right direction to a workable though not entirely satisfactorey solution. What I did was to refactor my ViewSet Class thus:-

class ManufacturerModelViewSet(viewsets.ModelViewSet):
    model=ManufacturerModel
    serializer_class=ManufacturerModelSerialiser
    filter_class=ManufacturerModelFilter
    def get_queryset(self):
        queryset = ManufacturerModel.objects.all().select_related()
        return queryset

Now this works, to a fashion! Instead of 4000 queries I now get 2 queries (the latter query bringing in ALL of the manufacturers even without the filter that I specified on the Manufacturer table, Bygone...). This is still way more peformant and at least usable if not quite right!

I would still be keen to know if there is any know method that would allow me to load this data using only the 1 queries bringing in all of the related fields in the original query. 

Thanks again for your help.

Kind regards

Conrad

Javier Guerra Giraldez

unread,
Apr 11, 2014, 1:17:11 PM4/11/14
to django...@googlegroups.com
On Fri, Apr 11, 2014 at 3:51 AM, Conrad Rowlands
<conradj...@googlemail.com> wrote:
> I would still be keen to know if there is any know method that would allow
> me to load this data using only the 1 queries bringing in all of the related
> fields in the original query.


what's the first query about? probably it's part of the authorization
process, so it might not be appropriate to reduce to just one query.

in any case, the exact number of queries is irrelevant. the important
thing is to keep it constant for any data size.

for example, a slightly complex task could easily need 10 queries but
the final throughput you get from the system would be almost the same
if it's 10 queries or just 1, as long as it's not (n+1) as you had
before.

of course, you still have to be sure the indexes are optimal to keep
each query nice and tight!

--
Javier

Conrad Rowlands

unread,
Apr 11, 2014, 1:31:34 PM4/11/14
to django...@googlegroups.com
Hi Javier,

Sorry for not being a bit clearer. Basically I am making a URL call to return all of the models with a given manufacturer. So the first SQL statement generated is returning all of the ManufacturerModel AND (because of the select related) all of the Manufacturers too in one query. The second query is requesting ALL of the manufacturers regardless of the filter I described. Now The reason that this second call is being made is because of my ManufacturerViewSet:- Code details below

       
class ManufacturerViewSet(viewsets.ModelViewSet):
    queryset=Manufacturer.objects.all()
    serializer_class = ManufacturerSerialiser
    filter_class = ManufacturerFilter

class ManufacturerModelViewSet( viewsets.ModelViewSet):

    model=ManufacturerModel
    serializer_class=ManufacturerModelSerialiser
    filter_class=ManufacturerModelFilter
    def get_queryset(self):
        queryset = ManufacturerModel.objects.all().select_related()
        return queryset

As you can see the queryset is loading all from the database.... What I would prefer to be able to do is to share the first queryset from the ManufacturerModelViewSet (which should have all of the correct fields) with the ManufacturerViewSet. I'm guessing this could be achieved by overriding get_queryset in my ManufacturerViewSet though I don't know what to do from there?

Javier Guerra Giraldez

unread,
Apr 11, 2014, 2:01:22 PM4/11/14
to django...@googlegroups.com
On Fri, Apr 11, 2014 at 8:31 AM, Conrad Rowlands
<conradj...@googlemail.com> wrote:
> As you can see the queryset is loading all from the database.... What I
> would prefer to be able to do is to share the first queryset from the
> ManufacturerModelViewSet (which should have all of the correct fields) with
> the ManufacturerViewSet. I'm guessing this could be achieved by overriding
> get_queryset in my ManufacturerViewSet though I don't know what to do from
> there?


first, you shouldn't use ManufacturerModel.objects.all(). in your
get_queryset(), it should call the superclass' defintion and add the
select_related() to it:

def get_queryset(self):
return super(ManufacturerModelViewSet,
self).get_queryset().select_related()


but in this case i guess the result is the same. It's just more
flexible this way. It also helps to factorize it away in a mixin
class as I suggested initially.

the queryset returned by this method isn't used as-is; it's first
passed to the filter_queryset() method.


second, if you're talking only about one specific URI that is handled
by ManufacturerModelViewSet, then it doesn't matter what you have or
not in other viewset (ManufacturerViewSet); you could even delete it
and it should still work the same.

to see if it's possible to avoid the second query, first you have to
know which part of the view requests it. a big help on this is the
DjangoDebugTollbar. it works perfectly with the RestFramework API
browser.

Of course, if you're using the browser, remember that it can do extra
queries to build the user-friendly HTML. those extra queries don't
pass through your viewset methods, so it's unlikely they would respect
the filters.

to see what queries you do in production mode, set the logging config
to report all queries to console or logfiles.
(https://docs.djangoproject.com/en/1.6/topics/logging/#django-db-backends).
or make the DB server itself log all queries.



--
Javier

Conrad Rowlands

unread,
Apr 11, 2014, 3:05:19 PM4/11/14
to django...@googlegroups.com
OK . I take on board point 1....

As for point two... I commented the ManufacturerViewset and the references to it in urls.py ,then when i run the url that I believe only looks at ManufacturerModelViewSet i get the following error:

Could not resolve URL for hyperlinked relationship using view name "manufacturer-detail". You may have failed to include the related model in your API, or incorrectly configured the `lookup_field` attribute on this field.

This message seems to be quite clear actually?...
I can only guess Django is working by convention here (behind the scenes) in that because the Viewset exposes a Manufacturer field that is within the model a ForeignKey field it is utilising the relevant Viewset to try and obtain the data it requires to set up the Hyperlinked Manufacturer field? It seems a bit 'magical' but not beyond the realms of possibility.

I am using a browser and I am using the DebogToolbar which I would be totally lost without!

Conrad Rowlands

unread,
Apr 11, 2014, 3:08:47 PM4/11/14
to django...@googlegroups.com
It seems Defining the ManufacturerModelSerialiser thus has solved the problem (which backs up my last mail I think). Its not entirely what I wanted but it at least gets me down to just doing the 1 Query as I originally intended, 1 query is always better than 4000 odd!

class ManufacturerModelSerialiser(serializers.HyperlinkedModelSerializer):
    Manufacturer = serializers.Field(source='Manufacturer.Name')


    class Meta:
        model = ManufacturerModel
        fields = ('WebSite','LogoLocation','Name','id','Manufacturer','url','SeriesStartDate','SeriesEndDate','SourceUrl','LastChecked')



On Friday, 11 April 2014 15:01:22 UTC+1, Javier Guerra wrote:
Reply all
Reply to author
Forward
0 new messages