Writeable CTE Not Working?

Kong Man

unread,

Jan 28, 2013, 9:32:51 PM1/28/13

to

Can someone explain how this writable CTE works? Or does it not?

What I tried to do was to make those non-null/non-empty values of suppliers.suppliercode unique by (1) nullifying any blank, but non-null, suppliercode, then (2) appending the supplierid values to the suppliercode values for those duplicates. The writeable CTE, upd_code, did not appear to work, allowing the final UPDATE statement to, unexpectedly, fill what used to be empty values with '-'||suppliercode.

WITH upd_code AS (
UPDATE suppliers SET suppliercode = NULL
WHERE suppliercode IS NOT NULL
AND length(trim(suppliercode)) = 0
)
, ranked_on_code AS (
SELECT supplierid
, trim(suppliercode)||'-'||supplierid AS new_code
, rank() OVER (PARTITION BY upper(trim(suppliercode)) ORDER BY supplierid)
FROM suppliers
WHERE suppliercode IS NOT NULL
AND NOT inactive AND type != 'car'
)
UPDATE suppliers
SET suppliercode = new_code
FROM ranked_on_code
WHERE suppliers.supplierid = ranked_on_code.supplierid
AND rank > 1;

I have seen similar behavior in the past and could not explain it. Any explanation is much appreciated.
Thanks,
-Kong

Виктор Егоров

unread,

Jan 29, 2013, 2:40:20 AM1/29/13

to

2013/1/29 Kong Man <kong_man...@hotmail.com>:

> Can someone explain how this writable CTE works? Or does it not?

They surely do, I use this feature a lot.
Take a look at the description in the docs:
http://www.postgresql.org/docs/current/interactive/queries-with.html#QUERIES-WITH-MODIFYING

> WITH upd_code AS (
> UPDATE suppliers SET suppliercode = NULL
> WHERE suppliercode IS NOT NULL
> AND length(trim(suppliercode)) = 0
> )
> , ranked_on_code AS (
> SELECT supplierid
> , trim(suppliercode)||'-'||supplierid AS new_code
> , rank() OVER (PARTITION BY upper(trim(suppliercode)) ORDER BY supplierid)
> FROM suppliers
> WHERE suppliercode IS NOT NULL
> AND NOT inactive AND type != 'car'
> )
> UPDATE suppliers
> SET suppliercode = new_code
> FROM ranked_on_code
> WHERE suppliers.supplierid = ranked_on_code.supplierid
> AND rank > 1;

I see 2 problems with this query:
1) CTE is just a named subquery, in your query I see no reference to
the “upd_code” CTE.
Therefore it is never gets called;
2) In order to get data-modifying CTE to return anything, you should
use RETURNING clause,
simplest form would be just RETURNING *

Hope this helps.

--
Victor Y. Yegorov

--
Sent via pgsql-sql mailing list (pgsq...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Kong Man

unread,

Jan 29, 2013, 2:29:40 PM1/29/13

to

Hi Victor,

> I see 2 problems with this query:
> 1) CTE is just a named subquery, in your query I see no reference to
> the “upd_code” CTE.
> Therefore it is never gets called;

So, in conclusion, my misconception about CTE in general was that all CTE get called without being referenced.

Thank you much for the explanation.
-Kong

Tom Lane

unread,

Jan 29, 2013, 8:16:40 PM1/29/13

to

Kong Man <kong_man...@hotmail.com> writes:
> Hi Victor,
>> I see 2 problems with this query:
>> 1) CTE is just a named subquery, in your query I see no reference to
>> the “upd_code” CTE.
>> Therefore it is never gets called;

> So, in conclusion, my misconception about CTE in general was that all CTE get called without being referenced.

I think this explanation is wrong --- if you run the query with EXPLAIN
ANALYZE, you can see from the rowcounts that the writable CTE *does* get
run to completion, as indeed is stated to be the behavior in the fine
manual.

However, for a case like this where the main query isn't reading from
the CTE, the CTE will get cycled to completion after the main query is
done. I think what is happening is that the main query is updating all
the rows in the table, and then when the CTE comes along it thinks the
rows are already updated in the current command, so it doesn't replace
'em a second time. This is a consequence of the fact that the same
command-counter ID is used throughout the query. My recollection is
that that choice was intentional and that doing it differently would
break use-cases that are less outlandish than this one. I don't recall
specific examples though.

Why are you trying to update the same table in two different parts of
this query, anyway? The best you can really hope for with that is
unspecified behavior --- we will surely not promise that one of them
completes before the other starts, so in general there's no way to be
sure which one would process a particular row first.

regards, tom lane

Kong Man

unread,

Jan 29, 2013, 9:45:56 PM1/29/13

to

> I think this explanation is wrong --- if you run the query with EXPLAIN
> ANALYZE, you can see from the rowcounts that the writable CTE *does* get
> run to completion, as indeed is stated to be the behavior in the fine
> manual.
>
> However, for a case like this where the main query isn't reading from
> the CTE, the CTE will get cycled to completion after the main query is
> done. I think what is happening is that the main query is updating all
> the rows in the table, and then when the CTE comes along it thinks the
> rows are already updated in the current command, so it doesn't replace
> 'em a second time. This is a consequence of the fact that the same
> command-counter ID is used throughout the query. My recollection is
> that that choice was intentional and that doing it differently would
> break use-cases that are less outlandish than this one. I don't recall
> specific examples though.

Cool. Now I understand it much better.

> Why are you trying to update the same table in two different parts of
> this query, anyway? The best you can really hope for with that is
> unspecified behavior --- we will surely not promise that one of them
> completes before the other starts, so in general there's no way to be
> sure which one would process a particular row first.

It was just my misuse of writable CTE thinking it would be more efficient than separate statements.

Best regards,
-Kong

Kong Man

unread,

Apr 5, 2013, 3:56:16 PM4/5/13

to

I am troubled to find out that a SELECT statement produces fewer rows than the actual row count and have not been able to answer myself as to why. I hope someone could help shedding some light to this.

I attempted to generate a set of INSERT statements, using a the following SELECT statement, against my translations data to reuse elsewhere, but then realized the row count was 8 rows fewer than the source of 2,178. COPY and pg_dump don't seem to lose any data. So, I compare the results to identify the missing data as follows. I don't even see any strange encoding in those missing data.

What scenario could have caused my SELECT query to dump out the 8 blank rows, instead of the expected data?

Here is how I find the discrepancy:
===============================================================================
$ psql -c "CREATE TABLE new_translation AS
SELECT display_name, name, type, translation
FROM translations t JOIN lang l USING (langid)
WHERE display_name = 'SPANISH_CORP'
ORDER BY display_name, name"
SELECT 2178

$ psql -tAc "SELECT
'INSERT INTO new_translation VALUES ('
||quote_literal(display_name)||
', '||quote_literal(name)||
', '||quote_literal(type)||
', '||quote_literal(translation)||');'
FROM new_translation
ORDER BY display_name, name" >/tmp/new_translation-select.sql

$ pg_dump --data-only --inserts --table=new_translation clubpremier |
sed -n '/^INSERT/,/^$/p' >/tmp/new_translation-pg_dump.sql

$ grep ^INSERT /tmp/new_translation-pg_dump.sql | wc -l
2178

$ grep ^INSERT /tmp/new_translation-select.sql | wc -l
2170

$ diff /tmp/new_translation-select.sql /tmp/new_translation-pg_dump.sql
27c27
<
---
> INSERT INTO new_translation VALUES ('SPANISH_CORP', 'AGENCY_IN_USE_BY_COBRAND', NULL, 'La cuenta no puede ser eliminada porque está siendo utilizada actualmente por la co-marca #cobrand#');
506c506
<
---
> INSERT INTO new_translation VALUES ('SPANISH_CORP', 'CAR_DISTANCE_UNIT', NULL, 'MILLAS');
1115c1115
<
---
> INSERT INTO new_translation VALUES ('SPANISH_CORP', 'HOTEL_PROMO_TEXT', 'label', NULL);
1131,1134c1131,1134
<
<
<
<
---
> INSERT INTO new_translation VALUES ('SPANISH_CORP', 'INSURANCE_SEARCH_ADVERTISEMENT_SECTION_ONE', 'checkout', NULL);
> INSERT INTO new_translation VALUES ('SPANISH_CORP', 'INSURANCE_SEARCH_ADVERTISEMENT_SECTION_THREE', 'checkout', NULL);
> INSERT INTO new_translation VALUES ('SPANISH_CORP', 'INSURANCE_SEARCH_ADVERTISEMENT_SECTION_TWO', 'checkout', NULL);
> INSERT INTO new_translation VALUES ('SPANISH_CORP', 'INSURANCE_SEARCH_FOOTER', 'checkout', NULL);
1615c1615
<
---
> INSERT INTO new_translation VALUES ('SPANISH_CORP', 'PAGE_FORGOT_PASSWORD', 'page_titles', NULL);
2215a2216
>
===============================================================================

Thank you in advance for your help,
-Kong

Kong Man

unread,

Apr 5, 2013, 6:51:38 PM4/5/13

to

This seems to answer my question. I completely forgot about the behavior of NULL value in the text concatenation.

http://www.postgresql.org/docs/9.1/static/plpgsql-statements.html#PLPGSQL-QUOTE-LITERAL-EXAMPLE

Because quote_literal is labelled STRICT, it will always return null when called with a null argument. In the above example, if newvalue or keyvalue were null, the entire dynamic query string would become null, leading to an error from EXECUTE. You can avoid this problem by using the quote_nullable function, which works the same as quote_literal except that when called with a null argument it returns the string NULL. For example,