[postgis-users] Incorrect Results for ST_ValueCount with large polygons

29 views
Skip to first unread message

Michael Treglia

unread,
Jun 21, 2018, 4:23:27 PM6/21/18
to PostGIS Users Discussion
Hi all,

I'm basically trying to compute # of pixels of land cover type by polygon, for lots of polygons.

I'm working with a query a query like this:
SELECT  bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
 FROM base_rasters.landcover6in
  INNER join  results_scratch.polys_test
  ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
    ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1) AS pvc
  GROUP BY (pvc).VALUE, bbl
 ORDER BY (pvc).VALUE;

For smaller polygons, spot-checking, it looks like I'm generally getting the right results. However, for larger polygons, not always. For example, though a polygon might not have roads, I'm getting non-0 counts for road pixels within a polygon (polygons have unique identifiers as 'bbl').  The total area of pixels counted ends up being larger than the polygon (larger than would be attributable to differences between polygons edges & pixels)

I'm thinking it's an issue when there are multiple raster tiles included... but not entirely sure. Any suggestions on whether there's something off in this query that might be causing an issue like I describe?

Thanks in advance for any suggestions!
Mike

Michael Treglia

unread,
Jun 22, 2018, 5:12:13 PM6/22/18
to PostGIS Users Discussion
Hi again all,

Sorry for the multiple messages on this - realizing the query I posted was somewhat off from what I intended - below are examples that are getting at what I'm going for but exhibiting the issue I described (sometimes including pixels in counts aren't actually be within polygons [i.e., the categorical value isn't actually within the polygon based on visual inspection & this same analysis with other tools]).

Simpler/quicker:

  SELECT  bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
 FROM base_rasters.landcover6in
  INNER join  results_scratch.polys_test
  ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
    ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1, TRUE, ARRAY[1, 2, 3, 4, 5, 6, 7]) AS pvc
  GROUP BY (pvc).VALUE, bbl
 ORDER BY bbl, (pvc).VALUE;
 


A longer, less efficient version that gives the same results (but unions the clipped raster tiles):


 SELECT
   bbl, (value_count).value, SUM((value_count).count) AS count
  FROM
    (
    select
    bbl,
      rid,
      ST_ValueCount(
        ST_Union(ST_Clip(rast, geom_2263, TRUE)), 1, TRUE, ARRAY[1, 2, 3, 4, 5, 6, 7]
      ) value_count
    FROM
      (SELECT bbl, geom_2263 FROM results_scratch.polys_test) v,
      (SELECT rid, rast FROM base_rasters.landcover6in) r
    WHERE ST_Intersects(rast, geom_2263)
    GROUP BY bbl, rid, geom_2263
    ) i
  GROUP BY bbl, value
  ORDER BY bbl, value

Michael Treglia

unread,
Jul 11, 2018, 4:59:48 PM7/11/18
to PostGIS Users Discussion
Hi everyone,

Sorry for multiple emails on this, but wanted to ping this list again to see if there is anything obvious I'm missing or should check regarding this issue. I've reiterated context/the problem/sample sql code below, and if there's other info that would be helpful, please let me know.

Thanks! Sincerely,
Mike

# To reiterate:
## The goal/issue:
I'm trying to calculate # of pixels from a categorical raster, by value, within distinct polygons of a vector layer. I have code, below, that seems to work accurately for small polygons, though I'm getting inaccuracies in large polygons - the addition of some pixels, including some from a value that aren't actually contained within a focal polygon. (I'm comparing to available calculations for the same overlay, along with results from QGIS Zonal Histogram tool). I'd like to get this working for this specific overlay, but then be able to apply the SQL code to comparable situations.

##Data:
The polygons are parcels, ranging quite widely in size;
The raster dataset is a 6" land cover dataset loaded as an in-db raster via raster2pgsql;
raster2pgsql -s 2263 -d -C -t 128x128 -M -I -l 4,16 "landcover_2010_nyc_05ft.img" base_rasters.landcover6in | psql -h localhost -U postgres -d dbname -v ON_ERROR_STOP=1

## System Info:
I'm running  PostGIS 2.4.4 with PostgreSQL v 10.3 on Windows 7 x64

## Two sets of sample code that give the same result:
1)
SELECT  bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
 FROM base_rasters.landcover6in
  INNER join  results_scratch.polys_test
  ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
    ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1, TRUE, ARRAY[1, 2, 3, 4, 5, 6, 7]) AS pvc
  GROUP BY (pvc).VALUE, bbl
 ORDER BY bbl, (pvc).VALUE;

2)
 SELECT
   bbl, (value_count).value, SUM((value_count).count) AS count
  FROM
    (
    select
    bbl,
      rid,
      ST_ValueCount(
        ST_Union(ST_Clip(rast, geom_2263, TRUE)), 1, TRUE, ARRAY[1, 2, 3, 4, 5, 6, 7]
      ) value_count
    FROM
      (SELECT bbl, geom_2263 FROM results_scratch.polys_test) v,
      (SELECT rid, rast FROM base_rasters.landcover6in) r
    WHERE ST_Intersects(rast, geom_2263)
    GROUP BY bbl, rid, geom_2263
    ) i
  GROUP BY bbl, value
  ORDER BY bbl, value


Michael Treglia

unread,
Jul 19, 2018, 5:04:34 PM7/19/18
to PostGIS Users Discussion
Hi All,

As partial resolution to this, below are a couple of working examples of SQL, thanks to an example from Regina Obe's book.

These work on smaller subsets of the data I've used for testing, though unfortunately in running on the entire datasets, after running for hours, I receive an error: Invalid memory allocation request size 1109458944.  Thoughts/suggestions welcome, but otherwise I hope these examples are useful to tothers.

Mike


--SQL CODE
with
    cte as (
        select bbl,
            st_valuecount(
                st_clip(st_union(p.rast), geom_2263),1, true, ARRAY[0,1,2,3,4,5,6,7]
                ) as pv
            FROM
                base_rasters.landcover6in as p
                inner join
                results_scratch.polys_test
            on st_intersects(p.rast, geom_2263)
            group by bbl, geom_2263
        )
SELECT bbl, 
    (pv).value, sum((pv).count)
from cte
group by bbl, (pv).value
ORDER by bbl, (pv).value;
--End SQL

And to get the results with land cover by polygon in a 'wide' format (e.g., amounts of different land cover in columns):

--SQL CODE
with
    cte as (
        select bbl,geom_2263,
            st_valuecount(
                st_clip(st_union(p.rast), geom_2263),1, true, ARRAY[0,1,2,3,4,5,6,7]
                ) as pv
            FROM
                base_rasters.landcover6in as p
                inner join
                results_scratch.polys_test
            on st_intersects(p.rast, geom_2263)
            group by bbl, geom_2263
        )
SELECT bbl, geom_2263,
    (sum((pv).count) filter (where (pv).value = 0)::numeric/sum((pv).count))::numeric(5,4) as prop_na,
    (sum((pv).count) filter (where (pv).value = 1)::numeric/sum((pv).count))::numeric(5,4) as prop_canopy,
    (sum((pv).count) filter (where (pv).value = 2)::numeric/sum((pv).count))::numeric(5,4)  as prop_grass,
    (sum((pv).count) filter (where (pv).value = 3)::numeric/sum((pv).count))::numeric(5,4)  as prop_bare,
    (sum((pv).count) filter (where (pv).value = 4)::numeric/sum((pv).count))::numeric(5,4)  as prop_water,
    (sum((pv).count) filter (where (pv).value = 5)::numeric/sum((pv).count))::numeric(5,4)  as prop_bldgs,
    (sum((pv).count) filter (where (pv).value = 6)::numeric/sum((pv).count))::numeric(5,4)  as prop_roads,
    (sum((pv).count) filter (where (pv).value = 7)::numeric/sum((pv).count))::numeric(5,4)  as prop_paved,
    ((sum((pv).count) filter (where (pv).value = 0)) * 0.25)::numeric(12,2) as area_na,
    ((sum((pv).count) filter (where (pv).value = 1)) * 0.25)::numeric(12,2) as area_canopy,
    ((sum((pv).count) filter (where (pv).value = 2)) * 0.25)::numeric(12,2) as area_grass,
    ((sum((pv).count) filter (where (pv).value = 3)) * 0.25)::numeric(12,2) as area_bare,
    ((sum((pv).count) filter (where (pv).value = 4)) * 0.25)::numeric(12,2) as area_water,
    ((sum((pv).count) filter (where (pv).value = 5)) * 0.25)::numeric(12,2) as area_bldgs,
    ((sum((pv).count) filter (where (pv).value = 6)) * 0.25)::numeric(12,2) as area_roads,
    ((sum((pv).count) filter (where (pv).value = 7)) * 0.25)::numeric(12,2) as area_paved
from cte
group by bbl, geom_2263
ORDER by bbl;
--End SQL

Michael Treglia

unread,
Sep 5, 2018, 2:15:41 PM9/5/18
to PostGIS Users Discussion
Hi All,

As one more follow-up on this thread... it sounds like the reason for the memory error I received is likely the large size of unioned rasters. For st_valuecount, it sounds like the st_union operation is not actually necessary, but without it, the st_clip operation seems to falter when a polygon overlaps multiple raster tiles (which is why I previously was seeing incorrect pixel counts for focal polygons without the st_union).

Some of this discovery on my part is documented in a StackExchange thread here - https://gis.stackexchange.com/questions/294482/error-quantifying-land-cover-classes-by-polygon-in-postgis (and thanks to Pierre Racine for the helpful comments that led me to figure out what was going on).

And I believe the issue with st_clip is already documented here: https://trac.osgeo.org/postgis/ticket/3457   (If I'm missing something, or if there's a way around this in PostGIS as-is, suggestions are welcome).

Cheers,
Mike

Reply all
Reply to author
Forward
0 new messages