Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

7.4 - FK constraint performance

3 views
Skip to first unread message

ow

unread,
Feb 11, 2004, 7:06:10 PM2/11/04
to
PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96

-- about 10 records
CREATE TABLE my.Small
(
id my.dint NOT NULL,
code my.dvalue NOT NULL,
CONSTRAINT pk_1 PRIMARY KEY (id),
) WITHOUT OIDS;

-- about 80M rows
CREATE TABLE my.Large
(
id my.dlong NOT NULL,
small_id my.dint NOT NULL,
value my.value,
CONSTRAINT pk_2 PRIMARY KEY (id),
CONSTRAINT fk_2 FOREIGN KEY (small_id) REFERENCES my.small (id) ON UPDATE
RESTRICT ON DELETE RESTRICT,
) WITHOUT OIDS;

CREATE INDEX small_fk ON my.Large USING btree (small_id);

---------------------------------------------
The fowllowing queiries run in less than 40 ms.
1) select 1 from Large where small_id = 239
2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x

3) delete from Small where id = 239
Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM
ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07

postgres[2091]: [92-1] LOG: duration: 185273.262 ms

When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly

Stephan Szabo

unread,
Feb 11, 2004, 8:06:36 PM2/11/04
to

Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Tom Lane

unread,
Feb 11, 2004, 10:12:15 PM2/11/04
to
ow <onewa...@yahoo.com> writes:
> When I try to delete record, it takes > 3 min. Why is it taking so long if
> practically the same select query (see (2)) is running very quickly. Anything
> that can be done to fix it?

What are those column datatypes?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

ow

unread,
Feb 12, 2004, 9:05:16 AM2/12/04
to

--- Stephan Szabo <ssz...@megazone.bigpanda.com> wrote:
> Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
> case? As a random question, does increasing the statistics target on
> Large.small_id and re-analyzing change its behavior?

Ran analyze, the result is the same. Here's more info:

1) There's 1 row in "Large" for "small_id" = 239

SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x

Quick query. Explain shows index scan.

2) There are many rows in "Large" for "small_id" = 1
SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 1 FOR UPDATE OF x

Runs for about 3 min. Explain shows table scan.

3) delete from Small where id = 239

Runs for about 3 min. It does appear that table scan is used for FK
verification. But why? Am deleting "id = 239" not "id = 1" and the query in
(1) runs very quickly. Had suspicion that wrong id is passed during FK
verification but FK constraint DOES work.

4) Domain types used in the example above
my.dint = int
my.dlong = int8
my.dvalue = varchar(15)

Thanks


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Stephan Szabo

unread,
Feb 12, 2004, 10:37:32 AM2/12/04
to
On Thu, 12 Feb 2004, ow wrote:

>
> --- Stephan Szabo <ssz...@megazone.bigpanda.com> wrote:
> > Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
> > case? As a random question, does increasing the statistics target on
> > Large.small_id and re-analyzing change its behavior?
>
> Ran analyze, the result is the same. Here's more info:

You also did the alter table to up the statistics target on the column,
right?

> 3) delete from Small where id = 239
> Runs for about 3 min. It does appear that table scan is used for FK
> verification. But why? Am deleting "id = 239" not "id = 1" and the query in
> (1) runs very quickly. Had suspicion that wrong id is passed during FK
> verification but FK constraint DOES work.

It doesn't plan it as id=239 but as id=$1 and then executes it with
$1=239. The plan data gets reused for other id values if it needs the
same fk action again later in the session.

I'd hoped that upping the statistics target and re-analyzing would make it
choose an index scan for the case where it doesn't know what constant is
going to be used. Hmm, what is the estimated cost difference and real
time difference on id=1 between seqscan and index scan (explain analyze
output with and without enable_seqscan=off should show you).

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majo...@postgresql.org

Stephan Szabo

unread,
Feb 12, 2004, 11:04:53 AM2/12/04
to
On Thu, 12 Feb 2004, ow wrote:

> --- Stephan Szabo <ssz...@megazone.bigpanda.com> wrote:
> > You also did the alter table to up the statistics target on the column,
> > right?
>

> Not really. I did not change the the default stats settings in the
> postgresql.conf. Not sure what needs to be changed, can you clarify?

Basically, run something like:
-- 1000 is just an arbitrary choice, but we can lower it later if this
-- works.
ALTER TABLE my.Large ALTER COLUMN small_id SET STATISTICS 1000;
ANALYZE my.Large;

This increases the target for just the column in question which means it
should have a better idea of the distribution and may make it make a
better guess if you've got a somewhat uneven distribution.

Tom Lane

unread,
Feb 12, 2004, 10:54:20 AM2/12/04
to
ow <onewa...@yahoo.com> writes:
> When I try to delete record, it takes > 3 min.

I think it must be using a seqscan for the foreign key check query.
Could you try this and show the results?

prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

explain analyze execute foo(239);

When I try it I see an indexscan plan, but maybe there's some aspect of
your setup that's causing problems.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

ow

unread,
Feb 12, 2004, 10:53:51 AM2/12/04
to

--- Stephan Szabo <ssz...@megazone.bigpanda.com> wrote:
> You also did the alter table to up the statistics target on the column,
> right?

Not really. I did not change the the default stats settings in the


postgresql.conf. Not sure what needs to be changed, can you clarify?

Thanks


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------

ow

unread,
Feb 12, 2004, 4:49:23 PM2/12/04
to
--- Tom Lane <t...@sss.pgh.pa.us> wrote:
> ow <onewa...@yahoo.com> writes:
> > When I try to delete record, it takes > 3 min.
>
> I think it must be using a seqscan for the foreign key check query.
> Could you try this and show the results?

1) SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = 201 FOR UPDATE OF x;

QUERY PLAN
Index Scan using small_fk on large x (cost=0.00..6.01 rows=1 width=6) (actual
time=0.251..0.251 rows=0 loops=1)
Index Cond: ((small_id)::integer = 201)
Total runtime: 0.338 ms

2) prepare foo(my.dint) as


SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

explain analyze execute foo(201);

QUERY PLAN
Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual
time=210566.301..210566.301 rows=0 loops=1)
Filter: ((small_id)::integer = ($1)::integer)
Total runtime: 210566.411 ms

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

ow

unread,
Feb 12, 2004, 7:25:48 PM2/12/04
to

--- Tom Lane <t...@sss.pgh.pa.us> wrote:
> Well, there's the smoking gun all right. Why does it think there are
> going to be 7893843 matching rows!? Could we see the pg_stats row for
> the large.small_id column?
>
> regards, tom lane

schemaname tablename attname null_frac avg_width n_distinct most_common_vals
most_common_freqs histogram_bounds correlation
my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9}
{0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}
0.0597573


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Feb 12, 2004, 7:42:58 PM2/12/04
to
ow <onewa...@yahoo.com> writes:
> schemaname tablename attname null_frac avg_width n_distinct most_common_vals
> most_common_freqs histogram_bounds correlation
> my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9}
> {0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}
> 0.0597573

According to this entry, your small_id column only contains the ten
values 1..10, roughly evenly distributed. So why are you probing for
239??

The planner is certainly going to estimate a probe for an unspecified
value as retrieving 10% of the table, and under that assumption it's
quite right to use a seqscan.

If this estimate is not right, perhaps you could give us a more accurate
view of the column statistics?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majo...@postgresql.org)

Tom Lane

unread,
Feb 12, 2004, 7:00:25 PM2/12/04
to
ow <onewa...@yahoo.com> writes:
> --- Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I think it must be using a seqscan for the foreign key check query.

> 2) prepare foo(my.dint) as


> SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

> explain analyze execute foo(201);

> QUERY PLAN
> Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual
> time=210566.301..210566.301 rows=0 loops=1)
> Filter: ((small_id)::integer = ($1)::integer)
> Total runtime: 210566.411 ms

Well, there's the smoking gun all right. Why does it think there are


going to be 7893843 matching rows!? Could we see the pg_stats row for
the large.small_id column?

regards, tom lane

---------------------------(end of broadcast)---------------------------

ow

unread,
Feb 12, 2004, 8:22:53 PM2/12/04
to

--- Tom Lane <t...@sss.pgh.pa.us> wrote:
> Can't see how we optimize your case without pessimizing more-common cases.

My case appears to be pretty common, i.e. 1 small and 1 large table with RI
constraint between them. In order to delete a record from the small table, the
large table must not have records that are dependent on the deleted row.

I think other RDBMSs simply use preset value instead of partial table scan when
there's not enough stat info. Might be a better way.

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Tom Lane

unread,
Feb 12, 2004, 8:10:36 PM2/12/04
to
ow <onewa...@yahoo.com> writes:
> Sounds pretty bad for my case. Any way to avoid the 10% scan?

Can't see how we optimize your case without pessimizing more-common cases.

Sorry.

regards, tom lane

---------------------------(end of broadcast)---------------------------

ow

unread,
Feb 12, 2004, 8:03:41 PM2/12/04
to
--- Tom Lane <t...@sss.pgh.pa.us> wrote:
> According to this entry, your small_id column only contains the ten
> values 1..10, roughly evenly distributed. So why are you probing for
> 239??

Let's say we have City (small) and Person (large) tables. A new city was added
(mistakenly) with id=239, it does not have any "persons" assigned yet. Hence,
we want to remove the wrong "city" record.

In any case, one can't remove record from "small" unless there are NO records
in "large", RI will not allow it. The initial problem was that I tried to
delete a record from "small" and it was taking about 3 min to do that.

> The planner is certainly going to estimate a probe for an unspecified
> value as retrieving 10% of the table, and under that assumption it's
> quite right to use a seqscan.

Sounds pretty bad for my case. Any way to avoid the 10% scan?

Thanks


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Feb 12, 2004, 11:25:36 PM2/12/04
to
Rod Taylor <r...@rbt.ca> writes:
> Statistics say there are 10 values. Statistics list the 10 most common
> values (all of them). Given this, would it not be reasonable to assume
> that 239 is a recent addition (if there at all) to the table and not
> very common?

We don't know that it's 239 when we make the plan. In order to know
that, we'd have to abandon caching of RI check query plans and re-plan
for each row. That strikes me as inevitably a losing proposition.

regards, tom lane

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Feb 12, 2004, 11:49:56 PM2/12/04
to
ow <onewa...@yahoo.com> writes:
> --- Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Can't see how we optimize your case without pessimizing more-common cases.

> I think other RDBMSs simply use preset value instead of partial table


> scan when there's not enough stat info. Might be a better way.

The problem here cannot be described as "not enough stat info". The
available stats are complete and they point very strongly to the
conclusion that searches in the large table should be seqscans.
To do otherwise would be folly in general, even if it happens to
be the correct thing in this particular example.

regards, tom lane

---------------------------(end of broadcast)---------------------------

Richard Huxton

unread,
Feb 13, 2004, 2:22:17 AM2/13/04
to
On Friday 13 February 2004 04:25, Tom Lane wrote:
> Rod Taylor <r...@rbt.ca> writes:
> > Statistics say there are 10 values. Statistics list the 10 most common
> > values (all of them). Given this, would it not be reasonable to assume
> > that 239 is a recent addition (if there at all) to the table and not
> > very common?
>
> We don't know that it's 239 when we make the plan. In order to know
> that, we'd have to abandon caching of RI check query plans and re-plan
> for each row. That strikes me as inevitably a losing proposition.

In this precise example, could you not:
1. Check index for value
2. If found, seq-scan

Of course that's only going to be a sensible thing to do if you're expecting
one of two results:
1. Value not there
2. Lengthy seq-scan if it is there
--
Richard Huxton
Archonet Ltd

Stephan Szabo

unread,
Feb 13, 2004, 10:02:51 AM2/13/04
to
On Thu, 12 Feb 2004, Tom Lane wrote:

> Rod Taylor <r...@rbt.ca> writes:
> > Statistics say there are 10 values. Statistics list the 10 most common
> > values (all of them). Given this, would it not be reasonable to assume
> > that 239 is a recent addition (if there at all) to the table and not
> > very common?
>
> We don't know that it's 239 when we make the plan. In order to know
> that, we'd have to abandon caching of RI check query plans and re-plan
> for each row. That strikes me as inevitably a losing proposition.

One thing is that IIRC we're going to ask for only one row when we do the
SPI_execp_current. However, unless I misremember, the behavior of for
update and limit means that saying limit 1 is potentially unsafe (if you
block on a row that goes away). Is there anyway for us to let the planner
know this?

Tom Lane

unread,
Feb 13, 2004, 10:27:27 AM2/13/04
to
Stephan Szabo <ssz...@megazone.bigpanda.com> writes:
> One thing is that IIRC we're going to ask for only one row when we do the
> SPI_execp_current. However, unless I misremember, the behavior of for
> update and limit means that saying limit 1 is potentially unsafe (if you
> block on a row that goes away). Is there anyway for us to let the planner
> know this?

I was looking at that last night. It seems like we could add a LIMIT at
least in some contexts. In the case at hand, we're just going to error
out immediately if we find a matching row, and so there's no need for
FOR UPDATE, is there?

However, I'm not sure it would help the OP anyway. With the stats he
had, the planner would still take a seqscan, because it's going to
expect that it can find a match by probing the first ten or so rows of
the first page. With anything close to the normal cost parameters,
that's going to look more expensive than an index probe. Possibly if
the table had a few more values it would work.

But in general it would be a good idea if the planner knew that plan
evaluation would stop after the first row. We could look at passing
that info down out-of-band instead of using LIMIT. There's already
support for this to allow EXISTS() subqueries to be planned properly;
see the tuple_fraction stuff in planner.c. We just can't get at it
via SPI ...

regards, tom lane

---------------------------(end of broadcast)---------------------------

Stephan Szabo

unread,
Feb 13, 2004, 10:49:59 AM2/13/04
to
On Fri, 13 Feb 2004, Tom Lane wrote:

> Stephan Szabo <ssz...@megazone.bigpanda.com> writes:
> > One thing is that IIRC we're going to ask for only one row when we do the
> > SPI_execp_current. However, unless I misremember, the behavior of for
> > update and limit means that saying limit 1 is potentially unsafe (if you
> > block on a row that goes away). Is there anyway for us to let the planner
> > know this?
>
> I was looking at that last night. It seems like we could add a LIMIT at
> least in some contexts. In the case at hand, we're just going to error
> out immediately if we find a matching row, and so there's no need for
> FOR UPDATE, is there?

I think there still is, because a not yet committed transaction could have
deleted them all in which case I think the correct behavior is to wait and
if that transaction commits allow the action and if it rolls back to
error.

Really we'd want a different behavior where we're only blocking in these
cases if all the matching rows are locked by other transactions.

> However, I'm not sure it would help the OP anyway. With the stats he
> had, the planner would still take a seqscan, because it's going to
> expect that it can find a match by probing the first ten or so rows of
> the first page. With anything close to the normal cost parameters,
> that's going to look more expensive than an index probe. Possibly if
> the table had a few more values it would work.

Hmm, that's true. It also doesn't help the real actions (cascade, set *)
since those really do need to get at all the rows, but it probably helps
in a reasonable number of cases.

Tom Lane

unread,
Feb 13, 2004, 10:56:44 AM2/13/04
to
Stephan Szabo <ssz...@megazone.bigpanda.com> writes:
> On Fri, 13 Feb 2004, Tom Lane wrote:
>> I was looking at that last night. It seems like we could add a LIMIT at
>> least in some contexts. In the case at hand, we're just going to error
>> out immediately if we find a matching row, and so there's no need for
>> FOR UPDATE, is there?

> I think there still is, because a not yet committed transaction could have
> deleted them all in which case I think the correct behavior is to wait and
> if that transaction commits allow the action and if it rolls back to
> error.

Good point. Okay, we can't put in a LIMIT. But we could still hack the
planner to prefer a fast-start plan by passing an out-of-band tuple
fraction, for those RI plans where it's appropriate. That would not
affect correctness.

>> However, I'm not sure it would help the OP anyway. With the stats he
>> had, the planner would still take a seqscan, because it's going to
>> expect that it can find a match by probing the first ten or so rows of
>> the first page. With anything close to the normal cost parameters,
>> that's going to look more expensive than an index probe.

s/more expensive/less expensive/ ... need more caffeine obviously ...

Stephan Szabo

unread,
Feb 13, 2004, 11:08:41 AM2/13/04
to

On Fri, 13 Feb 2004, Tom Lane wrote:

> Stephan Szabo <ssz...@megazone.bigpanda.com> writes:
> > On Fri, 13 Feb 2004, Tom Lane wrote:
> >> I was looking at that last night. It seems like we could add a LIMIT at
> >> least in some contexts. In the case at hand, we're just going to error
> >> out immediately if we find a matching row, and so there's no need for
> >> FOR UPDATE, is there?
>
> > I think there still is, because a not yet committed transaction could have
> > deleted them all in which case I think the correct behavior is to wait and
> > if that transaction commits allow the action and if it rolls back to
> > error.
>
> Good point. Okay, we can't put in a LIMIT. But we could still hack the
> planner to prefer a fast-start plan by passing an out-of-band tuple
> fraction, for those RI plans where it's appropriate. That would not
> affect correctness.

Right, I can try to look through the stuff you pointed at in the previous
message over the weekend.

> >> However, I'm not sure it would help the OP anyway. With the stats he
> >> had, the planner would still take a seqscan, because it's going to
> >> expect that it can find a match by probing the first ten or so rows of
> >> the first page. With anything close to the normal cost parameters,
> >> that's going to look more expensive than an index probe.
>
> s/more expensive/less expensive/ ... need more caffeine obviously ...

Me too apparently, since I knew what you were saying and agreed despite
the wording.

Rod Taylor

unread,
Feb 13, 2004, 8:57:20 AM2/13/04
to
> In this precise example, could you not:
> 1. Check index for value
> 2. If found, seq-scan
>
> Of course that's only going to be a sensible thing to do if you're expecting
> one of two results:
> 1. Value not there
> 2. Lengthy seq-scan if it is there

Most of the queries are going to be for the other values (in which case
you've wasted an index scan) which is minor, but in the event there is a
single 239 you're still taking a big hit.

That is an awful lot of work to handle the non-existant case only.

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc

signature.asc

Rod Taylor

unread,
Feb 13, 2004, 8:57:50 AM2/13/04
to
On Thu, 2004-02-12 at 23:25, Tom Lane wrote:
> Rod Taylor <r...@rbt.ca> writes:
> > Statistics say there are 10 values. Statistics list the 10 most common
> > values (all of them). Given this, would it not be reasonable to assume
> > that 239 is a recent addition (if there at all) to the table and not
> > very common?
>
> We don't know that it's 239 when we make the plan. In order to know
> that, we'd have to abandon caching of RI check query plans and re-plan
> for each row. That strikes me as inevitably a losing proposition.

Right, sorry, I forgot this was out of a fk trigger.

signature.asc

Rod Taylor

unread,
Feb 12, 2004, 9:53:45 PM2/12/04
to
On Thu, 2004-02-12 at 20:10, Tom Lane wrote:
> ow <onewa...@yahoo.com> writes:
> > Sounds pretty bad for my case. Any way to avoid the 10% scan?
>
> Can't see how we optimize your case without pessimizing more-common cases.
> Sorry.

Statistics say there are 10 values. Statistics list the 10 most common


values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?

--

signature.asc

news.postgresql.org

unread,
Feb 15, 2004, 10:29:42 PM2/15/04
to
Hello,

I just discovered the following change to CHAR(n) (taken from varlena.com,
general bits, issue 62).

This will cause me serious heart-ache, back-ache and bug-ache if we upgrade
to 7.4. Is there any compatibility flag available to use pre-7.4 behaviour
for CHAR(n)?

Regards, John

==================

In 7.4, one of the changes was that the char(n) type now truncates trailing
spaces. This causes a problem for those of us using ::char(n) to create
fixed length fields. Creating fixed length field batch files are usually
required to interface with legacy systems.

In the example below, psql is called from a shell with tuples only (t) and
unaligned (A). The result of the selection creates a fixed width file.

One of the techniques used to create fixed length fields in 7.3 and earlier
was to cast the value to char(n). This along with the various to_char()
functions used to be able to create fixed length records. For example:

% psql -At > batch.out << END
select accountid::char(30),
to_char( transaction_amount, 'FM000.00'),
(lastname || ',' || firstname )::char(40),
bankid::char(15),
to_char( now(), 'YYYYMMDD');
END
In 7.4 this no longer works. The fields created with the cast are no longer
fixed length. Instead of using the cast to make fixed length fields, use
rpad() and lpad() to do the same thing.
rpad(string text, length integer [, fill text])
lpad(string text, length integer [, fill text])
The previous selection should now be written as follows.

% psql -At > batch.out << END
select rpad(accountid,30),
to_char( transaction_amount, 'FM000.00'),
rpad( (lastname || ',' || firstname ), 40),
rpad(bankid, 15),
to_char( now(), 'YYYYMMDD');
END


Tom Lane

unread,
Feb 17, 2004, 6:10:56 PM2/17/04
to
"news.postgresql.org" <jl...@natsoft.com.my> writes:
> I just discovered the following change to CHAR(n) (taken from varlena.com,
> general bits, issue 62).

The description you quote doesn't appear to have much of anything to do
with the actual behavior of 7.4.

7.4 will trim trailing spaces when converting char(n) to varchar or
text, but the example query does not do that. It just coerces query
output columns to char(n), and that works the same as it did before.
For instance

regression=# select 'zit'::char(77);
bpchar
-------------------------------------------------------------------------------
zit
(1 row)


regards, tom lane

---------------------------(end of broadcast)---------------------------

elein

unread,
Feb 17, 2004, 6:17:28 PM2/17/04
to
This is an example of the problem. It used to expand
the middle thing to 15.

elein=# select 'x' || ' '::char(15) || 'x';
?column?
----------
xx
(1 row)

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Feb 17, 2004, 6:40:49 PM2/17/04
to
elein <el...@varlena.com> writes:
> This is an example of the problem. It used to expand
> the middle thing to 15.

> elein=# select 'x' || ' '::char(15) || 'x';
> ?column?
> ----------
> xx
> (1 row)

Still does, but then the spaces go away again when the value goes into
the concatenation, because concatenation is a text operator.

regards, tom lane

---------------------------(end of broadcast)---------------------------

elein

unread,
Feb 17, 2004, 6:56:11 PM2/17/04
to
So the problem is there. But blaming it on char was wrong.
It should be blamed on the varchar change. Hey, I thought
the truncation was for varchar and not text? It was for both?

It would be semantically tricky to change the operator.
The precendence is to convert to text. Now with
the implicit update of the char(n) to text for the operator
"corrupts" the char() value.

elein

---------------------------(end of broadcast)---------------------------

scott.marlowe

unread,
Feb 17, 2004, 7:07:24 PM2/17/04
to
On Tue, 17 Feb 2004, Tom Lane wrote:

> elein <el...@varlena.com> writes:
> > This is an example of the problem. It used to expand
> > the middle thing to 15.
>
> > elein=# select 'x' || ' '::char(15) || 'x';
> > ?column?
> > ----------
> > xx
> > (1 row)
>
> Still does, but then the spaces go away again when the value goes into
> the concatenation, because concatenation is a text operator.

But then this:

select 'x'||' '||'x'

should produce xx, but it produces x x.

Tom Lane

unread,
Feb 17, 2004, 7:25:47 PM2/17/04
to
"scott.marlowe" <scott....@ihs.com> writes:
> But then this:
> select 'x'||' '||'x'
> should produce xx, but it produces x x.

No, because the imputed type of those literals is text. You'd have to
cast the middle guy to char(n) explicitly to make its trailing spaces go
away when it's reconverted to text.

The real issue here is that trailing spaces in char(n) are semantically
insignificant according to the SQL spec. The spec is pretty vague about
which operations should actually honor that insignificance --- it's
clear that comparisons should, less clear about other things. I think
the 7.4 behavior is more consistent than what we had before, but I'm
willing to be persuaded to change it again if someone can give an
alternate definition that's more workable than this one.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

te...@ashtonwoodshomes.com

unread,
Feb 17, 2004, 7:23:44 PM2/17/04
to
> select 'x'||' '||'x'
>
> should produce xx, but it produces x x.
>

INCORRECT

This
select 'x'||' '::char ||'x'

Should produce xx

This
select 'x'||' '||'x'
is restateable as select 'x'|| ' '::text ||'x'

And the || operand for text is not dropping the extra spaces hence correctly
x x

Terry Fielder
Manager Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
te...@greatgulfhomes.com
Fax: (416) 441-9085

elein

unread,
Feb 17, 2004, 7:27:13 PM2/17/04
to
Apparently the ::char is cast to varchar and then text?
That explains x || ' ' || x

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Tom Lane

unread,
Feb 17, 2004, 10:53:17 PM2/17/04
to
elein <el...@varlena.com> writes:
> Apparently the ::char is cast to varchar and then text?

No, directly to text, because the || operator is defined as taking text
inputs. But there's no practical difference between text and varchar on
this point.

Richard Huxton

unread,
Feb 18, 2004, 4:40:16 AM2/18/04
to
On Wednesday 18 February 2004 00:25, Tom Lane wrote:
> "scott.marlowe" <scott....@ihs.com> writes:
> > But then this:
> > select 'x'||' '||'x'
> > should produce xx, but it produces x x.
>
> No, because the imputed type of those literals is text. You'd have to
> cast the middle guy to char(n) explicitly to make its trailing spaces go
> away when it's reconverted to text.
>
> The real issue here is that trailing spaces in char(n) are semantically
> insignificant according to the SQL spec. The spec is pretty vague about
> which operations should actually honor that insignificance --- it's
> clear that comparisons should, less clear about other things. I think
> the 7.4 behavior is more consistent than what we had before, but I'm
> willing to be persuaded to change it again if someone can give an
> alternate definition that's more workable than this one.

[rant on]

I've never really understood the rationale behind char(n) in SQL databases
(other than as backward compatibility with some old mainframe DB).
Insignificant spaces? If it's not significant, why is it there? You could
have a formatting rule that specifies left-aligned strings space-padded (as
printf) but that's not the same as mucking about appending and trimming
spaces.

The only sensible definition of char(n) that I can see would be:
A text value of type char(n) is always "n" characters in length. If you assign
less than "n" characters, it is right-padded with spaces. In all other
respects it behaves as any other text type of length "n" with right-trailing
spaces.

[rant off - ah, feel better for that :-]

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Feb 18, 2004, 10:56:00 AM2/18/04
to
Richard Huxton <d...@archonet.com> writes:
> I've never really understood the rationale behind char(n) in SQL databases
> (other than as backward compatibility with some old mainframe DB).

There are (or were) systems in which the benefit of using fixed-width
columns is a lot higher than it is in Postgres. The spec is evidently
trying to cater to them. Too bad the writers whacked the semantics
around so cruelly to do it :-(

> The only sensible definition of char(n) that I can see would be:
> A text value of type char(n) is always "n" characters in length.

If the SQL spec were willing to leave it at that, I'd be happy. But
we've got this problem that the trailing spaces are supposed to be
insignificant in at least some contexts. I find the pre-7.4 behavior
to be pretty inconsistent. For example, 7.3 and 7.4 agree on this:

regression=# select ('foo '::char(6)) = ('foo');
?column?
----------
t
(1 row)

Now given the above, wouldn't it stand to reason that

regression=# select ('foo '::char(6) || 'bar') = ('foo' || 'bar');
?column?
----------
f
(1 row)

or how about

regression=# select ('bar' || 'foo '::char(6)) = ('bar' || 'foo');
?column?
----------
f
(1 row)

In 7.4 both of these do yield true. A closely related example is

regression=# select ('foo '::char(6)) = ('foo'::text);

which yields false in 7.3 and true in 7.4.

I don't object to revisiting the behavior again, but 7.3 was not so
ideal that I want to just go back to it.

regards, tom lane

---------------------------(end of broadcast)---------------------------

Jeremy Smith

unread,
Feb 18, 2004, 1:06:32 PM2/18/04
to
Also, to make char(n) even more annoying, I had the one character value "K"
stored in a column that was char(2). When I pulled it from the database and
tried to compare it to a variable with a value of "K" it came out inequal.
Of course in mysql, that was not a problem.

Jeremy

-----Original Message-----
From: pgsql-s...@postgresql.org
[mailto:pgsql-s...@postgresql.org]On Behalf Of Richard Huxton
Sent: Wednesday, February 18, 2004 4:40 AM
To: Tom Lane; scott.marlowe
Cc: elein; news.postgresql.org; pgsq...@postgresql.org
Subject: Re: [SQL] CHAR(n) always trims trailing spaces in 7.4

[rant on]

---------------------------(end of broadcast)---------------------------

Jeremy Smith

unread,
Feb 18, 2004, 2:25:19 PM2/18/04
to
Hi,

in mysql I was able to make an insert such as:

INSERT INTO TABLE (integervariable) VALUES ('')

and have it either insert that variable, or insert the default if it had
been assigned. In postgresql it gives and error every time that this is
attempted. Since I have so many queries that do this on my site already, is
there any way to set up a table so that it just accepts this sort of query?

Thanks,
Jeremy

scott.marlowe

unread,
Feb 18, 2004, 2:43:34 PM2/18/04
to
On Wed, 18 Feb 2004, Jeremy Smith wrote:

> Hi,
>
> in mysql I was able to make an insert such as:
>
> INSERT INTO TABLE (integervariable) VALUES ('')
>
> and have it either insert that variable, or insert the default if it had
> been assigned. In postgresql it gives and error every time that this is
> attempted. Since I have so many queries that do this on my site already, is
> there any way to set up a table so that it just accepts this sort of query?

First off, the reason for this problem is that Postgresql adheres to the
SQL standard while MySQL heads off on their own, making it up as they go
along. This causes many problems for people migrating from MySQL to
almost ANY database.

Phew, now that that's out of the way, here's the standard ways of doing
it.

Use DEFAULT: If no default is it will insert a NULL, otherwise the
default will be inserted:
insert into table (integervar) values (DEFAULT);

OR

Leave it out of the list of vars to be inserted
insert into table (othervars, othervars2) values ('abc',123);

OR

Insert a NULL if that's what you want:

insert into table (integervar) values (NULL);

Note that NULL and DEFAULT are not quoted.

elein

unread,
Feb 18, 2004, 3:05:47 PM2/18/04
to
So exactly what is the order of casts that produces
different results with:

'x' || ' ' || 'x' and 'x' || ' '::char15 || 'x'

Are operators being invoked both (text,text)?

I'm trying to understand the precedence that causes
the different results.

elein

---------------------------(end of broadcast)---------------------------

Jeremy Smith

unread,
Feb 18, 2004, 3:15:43 PM2/18/04
to
Scott,

I understand that MySQL's adherence to the standards must be lazy as I am
running into frequent issues as I transfer my site. Unfortunately I have
over 2500 queries, and many more of them needed to be rewritten than I ever
would have imagined. I guess MySQL is the IE of open source DB, and
PostgreSQL is Netscape / Mozilla, in more ways than one.

I guess in some sense, since I relied on MySQL's laziness, my code also
became a bit lazy. There are many locations where I accept user input from
a form, and then have a process page. And on that process page I might have
hundreds of variables that look like:

$input = $_POST['input'];

and in the old days, if that was an empty value and inserted into a mysql
query, it would just revert to the default. Now it looks like I need to:

$input = $_POST['input'];
if (!$input) {
$input = DEFAULT;
}

over and over and over and over.... :) I guess I am just looking for a
shortcut since the site conversion has already taken a week and counting,
when I originally was misguided enough to think it would take hours.

Anyway, the help on this list is much appreciated..

Jeremy

OR

OR


---------------------------(end of broadcast)---------------------------

scott.marlowe

unread,
Feb 18, 2004, 3:19:48 PM2/18/04
to
On Wed, 18 Feb 2004, Jeremy Smith wrote:

> Scott,
>
> I understand that MySQL's adherence to the standards must be lazy as I am
> running into frequent issues as I transfer my site. Unfortunately I have
> over 2500 queries, and many more of them needed to be rewritten than I ever
> would have imagined. I guess MySQL is the IE of open source DB, and
> PostgreSQL is Netscape / Mozilla, in more ways than one.

Good comparison.

> I guess in some sense, since I relied on MySQL's laziness, my code also
> became a bit lazy. There are many locations where I accept user input from
> a form, and then have a process page. And on that process page I might have
> hundreds of variables that look like:
>
> $input = $_POST['input'];
>
> and in the old days, if that was an empty value and inserted into a mysql
> query, it would just revert to the default. Now it looks like I need to:
>
> $input = $_POST['input'];
> if (!$input) {
> $input = DEFAULT;
> }

I've run into this kind of thing before. IT helps if you have an array of
all your fields like:

$fields = array("field1","field3","last_name");

and then you can foreach across the input:

foreach($fields as $f){
if (!$_POST[$f]){
$_POST[$f]='DEFAULT';
} else {
$_POST[$f] = "'".$_POST[$f]."'";
}
}

> over and over and over and over.... :) I guess I am just looking for a
> shortcut since the site conversion has already taken a week and counting,
> when I originally was misguided enough to think it would take hours.

Well, you might find yourself rewriting fair portions of your site, but
usually you wind up with better code and better checking, so it's a bit of
a trade off.


---------------------------(end of broadcast)---------------------------

Jeremy Smith

unread,
Feb 18, 2004, 3:43:13 PM2/18/04
to

I've run into this kind of thing before. IT helps if you have an array of
all your fields like:

$fields = array("field1","field3","last_name");

and then you can foreach across the input:

foreach($fields as $f){
if (!$_POST[$f]){
$_POST[$f]='DEFAULT';
} else {
$_POST[$f] = "'".$_POST[$f]."'";
}
}

Wow, great idea. I will definitely do this, thanks alot.

Well, you might find yourself rewriting fair portions of your site, but
usually you wind up with better code and better checking, so it's a bit of
a trade off.


No doubt that this is true. Of course even without the better code and
error checking, the extra features like stored procedures and automatic row
locking was more than enough to make the switch worth it.

Thanks again!

Jeremy


---------------------------(end of broadcast)---------------------------

scott.marlowe

unread,
Feb 18, 2004, 3:44:47 PM2/18/04
to
On Wed, 18 Feb 2004, Jeremy Smith wrote:

>
>
> I've run into this kind of thing before. IT helps if you have an array of
> all your fields like:
>
> $fields = array("field1","field3","last_name");
>
> and then you can foreach across the input:
>
> foreach($fields as $f){
> if (!$_POST[$f]){
> $_POST[$f]='DEFAULT';
> } else {
> $_POST[$f] = "'".$_POST[$f]."'";
> }
> }
>
> Wow, great idea. I will definitely do this, thanks alot.
>
>
>
> Well, you might find yourself rewriting fair portions of your site, but
> usually you wind up with better code and better checking, so it's a bit of
> a trade off.
>
>
> No doubt that this is true. Of course even without the better code and
> error checking, the extra features like stored procedures and automatic row
> locking was more than enough to make the switch worth it.
>
> Thanks again!

You're welcome! Enjoy getting to know Postgresql and all the great folks
on the lists, I know I have.

Rod Taylor

unread,
Feb 18, 2004, 3:53:55 PM2/18/04
to
> and then you can foreach across the input:
>
> foreach($fields as $f){
> if (!$_POST[$f]){
> $_POST[$f]='DEFAULT';
> } else {
> $_POST[$f] = "'".$_POST[$f]."'";
> }
> }

Default in quotes isn't going to work, and please tell me you escape
those things with pg_escape_string() at some point.

scott.marlowe

unread,
Feb 18, 2004, 3:58:31 PM2/18/04
to
On Wed, 18 Feb 2004, Rod Taylor wrote:

> > and then you can foreach across the input:
> >
> > foreach($fields as $f){
> > if (!$_POST[$f]){
> > $_POST[$f]='DEFAULT';
> > } else {
> > $_POST[$f] = "'".$_POST[$f]."'";
> > }
> > }
>
> Default in quotes isn't going to work, and please tell me you escape
> those things with pg_escape_string() at some point.

Note that the ' marks aren't part of the string, they are the delimiter of
the string, and I always run every server with magic_quotes_gpc on.

anything else? :-)

Rod Taylor

unread,
Feb 18, 2004, 4:29:56 PM2/18/04
to
> Note that the ' marks aren't part of the string, they are the delimiter of
> the string, and I always run every server with magic_quotes_gpc on.
>
> anything else? :-)

Good point. I looked at the single quotes of the second line and somehow
the DEFAULT got quoted as well ;)

scott.marlowe

unread,
Feb 18, 2004, 4:50:12 PM2/18/04
to
On Wed, 18 Feb 2004, Rod Taylor wrote:

> > Note that the ' marks aren't part of the string, they are the delimiter of
> > the string, and I always run every server with magic_quotes_gpc on.
> >
> > anything else? :-)
>
> Good point. I looked at the single quotes of the second line and somehow
> the DEFAULT got quoted as well ;)

Oh, and I'm stuck using add_slashes (or the magic_quotes_gpc thingie)
'cause I'm on a server that's being eoled in favor of .net, and it's
running PHP 4.0.6... ugh. We really gotta get it upgraded soon.

Tom Lane

unread,
Feb 18, 2004, 11:58:37 PM2/18/04
to
elein <el...@varlena.com> writes:
> So exactly what is the order of casts that produces
> different results with:

> 'x' || ' ' || 'x' and 'x' || ' '::char15 || 'x'

> Are operators being invoked both (text,text)?

The only relevant operator is "text || text" (there are also some ||
operators for arrays, bytea, and BIT, but these will all be discarded
as not the most plausible match). Therefore, in your first example the
unspecified literals will all be presumed to be text, so the space does
not get trimmed.

One of the things we could think about as a way to tweak the behavior is
creating "||" variants that are declared to accept char(n) on one or
both sides. These could actually use the same C implementation function
(textcat) of course. But declaring them that way would suppress the
invocation of rtrim() as char-to-text conversion.

However, if we did that then "||" would behave differently from other
operators on character strings, so it doesn't seem like a very
attractive option to me.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

elein

unread,
Feb 19, 2004, 1:55:05 PM2/19/04
to
I guess I am asking about the cast sequence from
char(n) to text.
(' '::char(n))::text trims spaces. This is wrong, imo.
' '::text does not trim spaces.
' '::char(n) does not trim spaces and pads.

char(n) should not trim spaces, right? And
it doesn't on an insert. Text does not trim spaces.
Somewhere the space trimming occurs.

If it is in the operator || then the operator is wrong.
If char(n) is properly defined to not trim spaces then
there should be a separate cat for char(n). It is correct
for it to behave differently than cat for text and varchar
because of the different trimming behaviour.

I can do this patch if there is agreement. But
I may not be able to do it immediately.

elein

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Feb 19, 2004, 3:06:21 PM2/19/04
to
elein <el...@varlena.com> writes:
> Somewhere the space trimming occurs.

The cast from char(n) to text (or varchar) is what's doing the trimming
in 7.4. I think you can mostly revert the change by changing that
pg_cast entry to specify no conversion function instead of rtrim().
However that would probably result in squirrely, non-spec behavior for
comparisons.

> If char(n) is properly defined to not trim spaces then
> there should be a separate cat for char(n).

Possibly, but I think that is considering the issue much too narrowly.
Concatenation is not the only textual operator.

0 new messages