Hi everyone.
I have question about the order of rows in the result of a query.
So, I have this table:
CREATE FOREIGN TABLE public.censo2011
(cload_id bigint ,
departamento character varying(14) ,
genero character varying(9) ,
anoymesdenacimiento pg_catalog.date ,
anodenacimiento character varying(80) ,
mesdenacimiento character varying(80) ,
edad integer ,
ascendencia character varying(80) ,
sabeleeryescribir character varying(80) ,
maximoniveleducativo character varying(80) ,
situacionlaboral character varying(80) ,
hijosnacidosvivos integer ,
poblacion integer )
SERVER cstore_server
OPTIONS (compression 'pglz');
And I COPY the data in descendant order by the "departamento" column. If I run a query I can see that the data is in fact sorted (Basically a SELECT * FROM censo2011 [yes I know I should not be running that type of query in a columnar dbms]).
But when I run the query SELECT departamento, COUNT(*) FROM censo2011 GROUP BY departamento this is what I get:
"Artigas";73377
"Lavalleja";58815
"San José";108304
"Canelones";520173
"Rocha";68088
"Soriano";82594
"Maldonado";164298
"Río Negro";54765
"Salto";124861
"Cerro Largo";84698
"Florida";67047
"Durazno";57084
"Tacuarembó";90051
"Montevideo";1318755
"Treinta y Tres";48134
"Rivera";103473
"Paysandú";113107
"Colonia";123203
"Flores";25050
As you cant see, the data is not sorted. If I add a ORDER BY to the query it works just fine.
So the question is, the order benefits compression but then, when we get the results that order is lost maybe as a side effect of decompression?
We are using cstore_fdw as the analytical database for a BI on the cloud platform so I really want to know how does it work.
Thanks!