Postgresql database behaving unpredictably

shariqtariq

unread,

Jan 14, 2021, 8:08:34 PM1/14/21

to

Hello - I am running an application on postgresql 9.5 that features a 515 million record table. The following query used to run in under 10 seconds

select f.patient_num
from i2b2demodata.observation_fact f
where
f.concept_cd IN (select concept_cd from i2b2demodata.concept_dimension where concept_path LIKE '\\ACT\\Diagnosis\\ICD10\\V2_2018AA\\A20098492\\A18916316\\A27150507\\A17864418\\%')
group by f.patient_num

The DDL for the large table is at the bottom

Overnight the query is starting to run in 4 plus minutes

------------

so I migrated the database to postgres 12 and now the query runs in a shade over 2 minutes

Is there something that may have happened to the database? Is there anything that I can do to tweak the database?
------------------------------
-- Table: i2b2demodata.observation_fact

-- DROP TABLE i2b2demodata.observation_fact;

CREATE TABLE i2b2demodata.observation_fact
(
encounter_num integer NOT NULL,
patient_num integer NOT NULL,
concept_cd character varying(50) NOT NULL,
provider_id character varying(50) NOT NULL,
start_date timestamp without time zone NOT NULL,
modifier_cd character varying(100) NOT NULL DEFAULT '@'::character varying,
instance_num integer NOT NULL DEFAULT 1,
valtype_cd character varying(50),
tval_char character varying(255),
nval_num numeric(18,5),
valueflag_cd character varying(50),
quantity_num numeric(18,5),
units_cd character varying(50),
end_date timestamp without time zone,
location_cd character varying(50),
observation_blob text,
confidence_num numeric(18,5),
update_date timestamp without time zone,
download_date timestamp without time zone,
import_date timestamp without time zone,
sourcesystem_cd character varying(50),
upload_id integer,
text_search_index serial NOT NULL,
CONSTRAINT observation_fact_pk PRIMARY KEY (patient_num, concept_cd, modifier_cd, start_date, encounter_num, instance_num, provider_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE i2b2demodata.observation_fact
OWNER TO i2b2demodata;

-- Index: i2b2demodata.of_idx_allobservation_fact

-- DROP INDEX i2b2demodata.of_idx_allobservation_fact;

CREATE INDEX of_idx_allobservation_fact
ON i2b2demodata.observation_fact
USING btree
(patient_num, encounter_num, concept_cd COLLATE pg_catalog."default", start_date, provider_id COLLATE pg_catalog."default", modifier_cd COLLATE pg_catalog."default", instance_num, valtype_cd COLLATE pg_catalog."default", tval_char COLLATE pg_catalog."default", nval_num, valueflag_cd COLLATE pg_catalog."default", quantity_num, units_cd COLLATE pg_catalog."default", end_date, location_cd COLLATE pg_catalog."default", confidence_num);

-- Index: i2b2demodata.of_idx_clusteredconcept

-- DROP INDEX i2b2demodata.of_idx_clusteredconcept;

CREATE INDEX of_idx_clusteredconcept
ON i2b2demodata.observation_fact
USING btree
(concept_cd COLLATE pg_catalog."default");

-- Index: i2b2demodata.of_idx_encounter_patient

-- DROP INDEX i2b2demodata.of_idx_encounter_patient;

CREATE INDEX of_idx_encounter_patient
ON i2b2demodata.observation_fact
USING btree
(encounter_num, patient_num, instance_num);

-- Index: i2b2demodata.of_idx_modifier

-- DROP INDEX i2b2demodata.of_idx_modifier;

CREATE INDEX of_idx_modifier
ON i2b2demodata.observation_fact
USING btree
(modifier_cd COLLATE pg_catalog."default");

-- Index: i2b2demodata.of_idx_sourcesystem_cd

-- DROP INDEX i2b2demodata.of_idx_sourcesystem_cd;

CREATE INDEX of_idx_sourcesystem_cd
ON i2b2demodata.observation_fact
USING btree
(sourcesystem_cd COLLATE pg_catalog."default");

-- Index: i2b2demodata.of_idx_start_date

-- DROP INDEX i2b2demodata.of_idx_start_date;

CREATE INDEX of_idx_start_date
ON i2b2demodata.observation_fact
USING btree
(start_date, patient_num);

-- Index: i2b2demodata.of_idx_uploadid

-- DROP INDEX i2b2demodata.of_idx_uploadid;

CREATE INDEX of_idx_uploadid
ON i2b2demodata.observation_fact
USING btree
(upload_id);

-- Index: i2b2demodata.of_text_search_unique

-- DROP INDEX i2b2demodata.of_text_search_unique;

CREATE UNIQUE INDEX of_text_search_unique
ON i2b2demodata.observation_fact
USING btree
(text_search_index);

George Neuner

unread,

Jan 15, 2021, 4:53:15 PM1/15/21

to

On Thu, 14 Jan 2021 17:08:33 -0800 (PST), shariqtariq
<shariq...@gmail.com> wrote:

>Hello - I am running an application on postgresql 9.5 that features a
>515 million record table. The following query used to run in under 10
>seconds
>

> SNIP QUERY

>
>Overnight the query is starting to run in 4 plus minutes

One or more of the indexes could be corrupted - you could try
rebuilding them, and also checking the size of your work buffers
because the grouping requires sort.

However, sudden drastic performance drop smells a lot like imminent
disk failure. If you haven't already, I would move the database to
new storage and make sure your backups are good.

>so I migrated the database to postgres 12 and now the query runs in a
>shade over 2 minutes

12 has parallel query turned on by default, so that may be the reason
it ran faster.

Hope this helps,
George

Mladen Gogala

unread,

Jan 20, 2021, 11:15:05 PM1/20/21

to

On Fri, 15 Jan 2021 16:53:12 -0500, George Neuner wrote:

> However, sudden drastic performance drop smells a lot like imminent disk
> failure.

Huh? I am an Oracle DBA with some experience in Postgres and have
experienced many performance drops after a version upgrade and precisely
0 of those performance drops were caused by "imminent disk failure".
I would check whether the plans have changed, what is the difference in
parameters, check where the time is spent by using perf or strace and
check the logs. If the plans have changed, try installing pg_hint_plan
extension and force the desired plans. Then compare.

--
Mladen Gogala
Database Consultant
https://dbwhisperer.wordpress.com

George Neuner

unread,

Jan 21, 2021, 2:06:04 PM1/21/21

to

On Thu, 21 Jan 2021 04:15:03 +0000 (UTC), Mladen Gogala
<mgo...@yahoo.com> wrote:

>On Fri, 15 Jan 2021 16:53:12 -0500, George Neuner wrote:
>
>> However, sudden drastic performance drop smells a lot like imminent disk
>> failure.
>
>Huh? I am an Oracle DBA with some experience in Postgres and have
>experienced many performance drops after a version upgrade and precisely
>0 of those performance drops were caused by "imminent disk failure".

The OP complained of major performance loss *BEFORE* upgrading, and
claimed better performance following the upgrade (which likely was due
to the new PG version having parallel query turned on by default).

It is (fairly) well known that PG neither transfers existing index
data, nor builds indexes during an upgrade. The indexes are created
during the upgrade and marked to be rebuilt incrementally, but for
best performance immediately following the upgrade, the DBA must
rebuild the indexes manually.

>I would check whether the plans have changed, what is the difference in
>parameters, check where the time is spent by using perf or strace and
>check the logs. If the plans have changed, try installing pg_hint_plan
>extension and force the desired plans. Then compare.

Assuming no query or configuration changes, "under 10 seconds" to "4
plus minutes" is a huge difference to be explained by a plan change.
Yes, it can happen ... but it would have to be the result of either
corrupted indexes or large changes to the table data.

The OP did not mention large changes to the table, and I did mention
that the indexes might have been corrupted.

George