Any inerest in Zlib pre-compressed HTML generator responses?

Bobby Mozumder

unread,

Aug 15, 2016, 7:10:33 PM8/15/16

to django-d...@googlegroups.com

Hi,

I’ve been using zlib compression on my custom view responses on my site (https://www.futureclaw.com). I do this before I even cache, and I store the compressed responses in my Redis cache. This effectively increases my Redis cache size by about 10x. It also reduces response times from my HTTP server by 10s of milliseconds, since my web server doesn’t have to compress on-the-fly for cached responses - the view fragments are served directly from the Redis cache.

I also use generators for my views, and I use Django’s streaming views with my generators to stream responses, so that the web browser receives early responses and start rendering immediately before the view is completely processed. The web browser receives the first HTML fragment before I even hit the database.

Ultimately my page load times are ridiculously fast, especially for a fully graphically intensive site. For full cache miss, it might take 15ms to generate a page. For a cache hit, it might take .3ms. For a full page load end-user, which loads all fonts and graphics, I'm down to 381ms, faster than 98% of all sites (it was several seconds before). See: https://tools.pingdom.com/#!/cHCL3d/https://www.futureclaw.com

(BTW I plan on reducing the full-page load times down to the 100-200ms range using HTTP/2 server push)

Anyways, is this feature of interest to the Django community for perhaps the next version?

Here is the relevant portion directly lifted from my code as an example. I made this earlier this year, and I’m not sure how much this needs to be changed for use by Django itself, especially considering I don’t use the Django template system or ORM:

class ZipView(View):

    level = 9
    wbits = zlib.MAX_WBITS | 16

    decompressor = zlib.decompressobj(wbits=wbits)
    
    @staticmethod
    def store(compressor,key,data,time=9,end=False):
        zData = compressor.compress(data)
        if end == False:
            zData += compressor.flush(zlib.Z_FULL_FLUSH)
        else:
            zData += compressor.flush(zlib.Z_FINISH)
        cache.set(key, zData, time)
        return zData
    
    def inflate(self,decompressor,data):
        return self.decompressor.decompress(data)

    pageData = []
    
    def getPageData(self,slug=None):
        pass

    def generator(self,request,slug=None,compress=True):
        tuple = []
        compressor = False
        decompressor = False
        
        zData = cache.get('pStart')  #pStart contains zlib header.  Header is needed.
        if zData is None:
            html = bytes(self.get_page_start_stream(),'utf-8')
            compressor = zlib.compressobj(level=self.level, wbits=self.wbits)
            zData = self.store(compressor,'pStart',html,time=None)
        else:
            if compress == False:
                decompressor = zlib.decompressobj(wbits=self.wbits)
                html = decompressor.decompress(zData)
        if compress == False:
            zData = html
        yield zData

        pageKey = self.getPageKey(slug)
        zPageData = cache.get(pageKey)
        if zPageData is None:
            expire = 3600
            zPage = b''
            self.getPageData(slug)
            
            #Fragment tuples contain HTML generator & associated parameters, as well as cache settings for each fragment.
            for tuple in self.fragments(request):
                if tuple[2]:
                    if tuple[2] < expire:
                        expire = tuple[2]
                k = tuple[0]+str(tuple[1])
                zData = cache.get(k)
                if zData is None:
                    func = tuple[3]
                    html = bytes(func(*tuple[4:]),'utf-8')
                    if compressor == False:
                        compressor = zlib.compressobj(level=self.level, wbits=self.wbits)
                        self.store(compressor,'zHeader',b"")
                    zData = self.store(compressor,k,html,tuple[2])
                else:
                    if compress == False:
                        if decompressor == False:
                            decompressor = zlib.decompressobj(wbits=self.wbits)
                        html = decompressor.decompress(zData)
                zPage += zData
                if compress == False:
                    zData = html
                yield zData
            zData = cache.get('pEnd')
            if zData is None:
                html = bytes(self.get_closing_stream(),'utf-8')
                if compressor == False:
                    compressor = zlib.compressobj(level=self.level, wbits=self.wbits)
                    self.store(compressor,'zHeader',b"")
                zData = self.store(compressor,'pEnd',html,time=None,end=True)
            else:
                if compress == False:
                    if decompressor == False:
                        decompressor = zlib.decompressobj(wbits=self.wbits)
                    html = decompressor.decompress(zData)
            zPage += zData
            cache.set(pageKey, zPage, expire)
            self.newtime = time.perf_counter()
            if compress == False:
                zData = html
            yield zData
        else:
            if compress == False:
                if decompressor == False:
                    decompressor = zlib.decompressobj(wbits=self.wbits)
                zPageData = decompressor.decompress(zPageData)
            yield zPageData

class FastView(ZipView):

    c = connection.cursor()

    def get(self, request):
        return self.html(request)

    def html(self, request, slug=None):
        try:
            encoding = request.META['HTTP_ACCEPT_ENCODING']
        except:
            encoding = ''
        if 'gzip' in encoding:
            compress = True
        else:
            compress = False
        if settings.BENCHMARK:
            count = 1
            logger.info('//Starting Benchmark')
            logger.info(' Class name: %s' % self.__class__.__name__)
            start_time = timing = time.perf_counter()
            if compress:
                page = b''
            for html in self.generator(request,slug,compress):
                page += html
                newtime = time.perf_counter()
                logger.info('  Step %i: %fms' % (count, (newtime-timing)*1000))
                count = count + 1
                timing = newtime
            logger.info(' Total Time: %fms' % ((newtime-start_time)*1000))
            logger.info('//End Benchmark')
            response = HttpResponse(page)
            if compress:
                response['content-encoding'] = 'gzip'
            return response
        else:
            response = StreamingHttpResponse(self.generator(request,slug,compress))
            if compress:
                response['content-encoding'] = 'gzip'
            return response

class IndexView(FastView,factory):

    title = 'FutureClaw'

    @staticmethod
    def getPageKey(slug=None):
        return 'pIndex'

    def fragments(self,request=None,slug=None):
        return [
            ['pMetaIndex','',3600,self.get_page_meta_stream,self.title, None, None],
            ['pCover','',3600*24,self.get_cover_page_stream,self.c],
            ['pStartCategory','',None,self.get_start_category_page_stream],
            ['pHeadlines','',3600,self.get_headlines_stream,self.c],
            ['pLatest','',3600,self.get_latest_collection_stream,self.c],
            ['pFullSeasonStart','',None,self.get_full_season_start_stream],
            ['pFullSeason',None,3600,self.get_full_season_stream,None,None],
            ['pAllSeasons',0,3600*72,self.get_all_seasons_stream,self.c,0,''],
            ['pGallery',0,None,self.get_gallery_stream,None,None,None,None],
            ['pArticle',0,None,self.get_article_stream,None],
        ]

Jani Tiainen

unread,

Aug 16, 2016, 2:14:51 AM8/16/16

to django-d...@googlegroups.com

Hi,

To me it looks like this can live just fine outside Django core as a separate package since it doesn't look that it requires any changes to Django core itself. So for the community it doesn't need to be within Django core, you can already do it for all existing and future versions - which would provide nice ground to prove that your solution is good and useful for others as well.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5B4D7D1E-F21B-4CA0-AABD-95610A8DAF40%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

--

Jani Tiainen

Curtis Maloney

unread,

Aug 16, 2016, 2:16:51 AM8/16/16

to django-d...@googlegroups.com

On 16/08/16 09:10, Bobby Mozumder wrote:
> Hi,
>

> I also use generators for my views, and I use Django’s streaming views
> with my generators to stream responses, so that the web browser receives
> early responses and start rendering immediately before the view is
> completely processed. The web browser receives the first HTML fragment
> before I even hit the database.

The hazard here is if anything raises an exception you've already sent
headers to say it didn't... typically this results in pages terminating
part way through.

Other than this, it is certainly a great way to overlap some transport
latency with work. I've done the same with a generator-based JSON
serialiser.

--
Curtis

Bobby Mozumder

unread,

Aug 16, 2016, 10:09:29 PM8/16/16

to django-d...@googlegroups.com

I’ll look into what it would take to make it a separate package, to make it more generic and usable by all.

I was pushing for this to be part of the core to overcome Django’s “slow-by-default” impression. My first version of this site had access times in the several second range, and it took a while to get that down using things like select_related and other optimizations.

This was especially a problem since i came to Django as a replacement to speed up our old PHP-based site that went down during a high-traffic moment. I finally had to toss out the Django the template system and ORM for something custom.

Django really should be “high-speed-out-of-the-box” by default.

-bobby

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/c07faa64-875d-09ed-9c45-d9dad781e40c%40gmail.com.

Bobby Mozumder

unread,

Aug 16, 2016, 10:15:35 PM8/16/16

to django-d...@googlegroups.com

I do have the option to query the database once to check if a URL is valid, before returning a response. This would also read most of the page’s data in the same query. But right now I’m just letting the URL resolver figure out if the page is valid before it hits the database, which can lead to incorrect responses given a mangled URL.

All other queries after the primary query are for secondary data - the “additional links”, the “categories”, the “further headlines” type of data, and those queries would only work if the primary query works as well.

-bobby

Reply all

Reply to author

Forward