[Django] #25072: GDALRaster Garbage Collection not working

54 views
Skip to first unread message

Django

unread,
Jul 6, 2015, 2:14:56 PM7/6/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
----------------------------+-------------------------------
Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Keywords: memory gis raster
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------+-------------------------------
I noticed that when creating many `GDALRasters` in a single context, the
memory allocated to GDALRasters does not get released properly. For
instance, the following loop as an ever growing memory consumption:


{{{
from django.contrib.gis.gdal import GDALRaster
n = 1000
for x in range(n):
rast = GDALRaster({
'driver': 'MEM',
'width': 500,
'height': 500,
'nr_of_bands': 1,
'srid': 3086,
'origin': (500000, 400000),
'scale': (1, -1),
'skew': (0, 0),
'bands': [{
'nodata_value': 10,
'data': range(500**2)
}],
})

}}}

Adding `rast.__del__()` to the loop solves the problem, but I guess that
should be automatic.

--
Ticket URL: <https://code.djangoproject.com/ticket/25072>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 6, 2015, 2:19:04 PM7/6/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+--------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:

Keywords: memory gis raster | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+--------------------------------------
Changes (by yellowcap):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

Unfortunately this goes beyond my knowledge of how python works. I am
happy to work on this if its productive, but would need help.

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:1>

Django

unread,
Jul 7, 2015, 3:03:25 PM7/7/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+--------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:

Keywords: memory gis raster | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+--------------------------------------

Comment (by timgraham):

I don't have much experience with garbage collection, but after doing some
reading I think this is expected as there is no guarantee that the garbage
collector will run in code like this. If you put that code in a function,
is `__del__()` called when the function exits (i.e. local variables should
be garbage collected at that point)?

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:2>

Django

unread,
Jul 7, 2015, 4:08:27 PM7/7/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------
Changes (by claudep):

* stage: Unreviewed => Accepted


Comment:

It might be related to bands. I made some tests, and as soon as I access
`self.bands`, the raster in not garbage collected. Perhaps because there
is a circular reference between raster (bands cached property) and bands
(band.source). That's a guess for now...

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:3>

Django

unread,
Jul 7, 2015, 4:29:39 PM7/7/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------
Changes (by claudep):

* Attachment "25072.diff" added.

Allow GDALRaster garbage collection

Django

unread,
Jul 7, 2015, 4:30:30 PM7/7/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------

Comment (by claudep):

Daniel, could you test the attached patch and see if that changes anything
in your experiments?

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:4>

Django

unread,
Jul 8, 2015, 3:49:39 AM7/8/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------

Comment (by yellowcap):

Thanks Claude, I can confirm that your patch solves the problem for my
case. I just tried the patch on both the loop posted above and the use
case where I spotted the problem in the first place. The memory usage
stayed flat in both runs.

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:5>

Django

unread,
Jul 8, 2015, 8:56:01 AM7/8/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------

Comment (by yellowcap):

I just noticed that I started getting segmentation faults when accessing
the GDALBand `data` method after applying the patch. I will have a closer
look at what is going on there.

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:6>

Django

unread,
Jul 8, 2015, 10:25:44 AM7/8/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------

Comment (by claudep):

Yes, you'll get segmentation faults if you use a band while its source has
been garbage collected. The patch need improvements to safeguard against
this, if possible.

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:7>

Django

unread,
Jul 8, 2015, 12:24:49 PM7/8/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------
Changes (by yellowcap):

* Attachment "25072_segfault.diff" added.

This test leads to a segfault on my system.

Django

unread,
Jul 8, 2015, 12:28:35 PM7/8/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------

Comment (by yellowcap):

I tracked down the segmentation fault I was getting, see the diff I
attached. Somehow accessing the data directly from a queryset without
creating intermediate objects gives me a segmentation fault. With an
intermediate object it does not.

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:8>

Django

unread,
Jul 9, 2015, 5:23:31 AM7/9/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------

Comment (by claudep):

Here's another possible solution:
https://github.com/django/django/pull/4971

That is stop to cache `GDALRaster.bands`, which is the cause of the other
side of the circular reference.

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:9>

Django

unread,
Jul 9, 2015, 10:06:30 AM7/9/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------

Comment (by yellowcap):

That looks fine to me, I think most rasters have only a few bands.

Maybe we can add a method to get only one specific band by index? Like
this single bands can be accessed efficiently. Something along the lines
of:

{{{
def get_band(self, index):
if index < 0 or index >= capi.get_ds_raster_count(self._ptr):
raise GDALException('Band index out of range.')
return GDALBand(self, index)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:10>

Django

unread,
Jul 9, 2015, 2:28:07 PM7/9/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------
Changes (by claudep):

* has_patch: 0 => 1


Comment:

Your wishes are my orders :-) Please check the latest PR version (while
still checking the initial memory issue).

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:11>

Django

unread,
Jul 10, 2015, 4:38:37 AM7/10/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-----------------------------------+------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------------+------------------------------------

Comment (by yellowcap):

Thanks Claude!

I just ran your patch through the loop and applied use case and it worked
perfectly, including low memory use.

The issues that came up through this ticket has made me realize that it
could be important to only instantiate specific bands (some time series
rasters might have lots of bands, its is not common, but happens). That's
why I asked for this...

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:12>

Django

unread,
Jul 10, 2015, 9:02:31 AM7/10/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-------------------------------------+-------------------------------------

Reporter: yellowcap | Owner: nobody
Type: Bug | Status: new
Component: GIS | Version: master
Severity: Normal | Resolution:
Keywords: memory gis raster | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:13>

Django

unread,
Jul 10, 2015, 2:14:35 PM7/10/15
to django-...@googlegroups.com
#25072: GDALRaster Garbage Collection not working
-------------------------------------+-------------------------------------
Reporter: yellowcap | Owner: nobody
Type: Bug | Status: closed
Component: GIS | Version: master
Severity: Normal | Resolution: fixed

Keywords: memory gis raster | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"d72f8862cb1a39934952e708c3c869be1399846e" d72f8862]:
{{{
#!CommitTicketReference repository=""
revision="d72f8862cb1a39934952e708c3c869be1399846e"
Fixed #25072 -- Prevented GDALRaster memory to be uncollectable

Setting GDALRaster.bands as a cached property was creating a circular
reference with objects having __del__ methods, which means the memory
could never be freed.
Thanks Daniel Wiesmann for the report and test, and Tim Graham for the
review.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25072#comment:14>

Reply all
Reply to author
Forward
0 new messages