Consistent "Segmentation fault" error in logs when running a query

740 views
Skip to first unread message

Eugene Pirogov

unread,
Mar 29, 2021, 10:23:04 AM3/29/21
to Google Cloud SQL discuss
We're seeing a "Segmentation fault" error in logs for our staging environment, caused by a single specific DB query.

I am unable to replicate the problem consistently by running the same query manually, but we see the issue in logs basically every day now.

Our environment is: PostgreSQL 13.1, the DB tier is "db-custom-1-3840".

In the 1st half of February, we were seeing "Segmentation fault" in relation to another DB query & different DB instance. That issue somehow got self-resolved around Feb 15.

I'd love to provide more info to diagnose this – is there any other info I could provide that would help solve this case? Otherwise, would it be possible for a Google Cloud SQL engineer look into this for us?

Andrew K.

unread,
Mar 30, 2021, 1:10:32 PM3/30/21
to Google Cloud SQL discuss
Does this query involve either a LEFT JOIN or aggregate functions? We have a very similar issue with Cloud SQL Postgres 12.5, with four different queries causing segmentation faults, and these are the only similarities between then. So far we were not able to reproduce the segfaults on a local PG instance running the same DB and queries, and unfortunately it's not possible to attach a debugger to Cloud SQL (as far as we know).

Jan Libera

unread,
Apr 12, 2021, 5:24:55 AM4/12/21
to Google Cloud SQL discuss
Hello, 

Would you be able to confirm if you are still able to see the segmentation fault errors? If so, would you be able to please provide the most recent timestamp of the occurrence. 

Also, would you be able to provide a little more details regarding the query? As per Andrew's question, does the query involve a LEFT JOIN? According to the post presented here [1], this may be due to a PostgreSQL bug. Would you be able to verify this and get back to use with this information? 

Better yet, I would suggest you bring this issue up via public issue tracker [2], as this will allow for a more in depth investigation into the segmentation fault; at the same time bring the issue closer to the Cloud SQL team for further inspection.

Andrew K.

unread,
Apr 12, 2021, 5:50:40 AM4/12/21
to Google Cloud SQL discuss
From our side, we can definitely confirm they are still occurring. The most recent occurrence for us is less than 20 minutes ago, 2021-04-12 09:25:38.469 UTC and we're seeing about 20 segfaults a day on our production instance.

Unfortunately, due to the inability to collect stack traces via Cloud SQL, there is not much we can do to debug further. The PG bug you have linked dates back to PG 9.6 and is unlikely to have gone unnoticed and not been fixed by PG 13 (we've verified that it happens for us both on PG 12 and 13). Considering that we have not been able to reproduce it locally so far, it could very well be isolated to Cloud SQL implementation of Postgres. Reporting it to PG won't help, since they will request stack traces we cannot collect.

There are a few threads on the Google issue tracker that suggest Query Insights may be responsible for segmentation faults. Since we have it enabled, I will try disabling it (if I can find out how to do that) and report back.

Andrew K.

unread,
Apr 13, 2021, 1:44:17 AM4/13/21
to Google Cloud SQL discuss
Good news: I can confirm that disabling Query Insights on our production instance has made segfaults stop completely. What's more interesting is that we've re-enabled it with application tags disabled, and so far no segfaults either.

Eugene Pirogov

unread,
Apr 14, 2021, 10:39:40 AM4/14/21
to Google Cloud SQL discuss
Posted a comment to an issue with a similar name here: https://issuetracker.google.com/issues/184283279

Eugene Pirogov

unread,
Apr 14, 2021, 10:39:51 AM4/14/21
to Google Cloud SQL discuss
Hi

> Would you be able to confirm if you are still able to see the segmentation fault errors? 

Yes, I confirm this. The last one per our logs happened today at 2021-04-13 08:11:32.469 UTC. The segmentation errors are not extremely common (due to low traffic on staging environment), but happen every day, once or twice. This may become a deal breaker should be plan to move our production DB to Google Cloud though.

> does the query involve a LEFT JOIN?

The query definition itself has two JOINs, although not LEFT JOINs. However, one of the joined relations is a view that is defined to have LEFT JOINs in it.

Noted on the SO question. I am not sure what to take from it though...

Noted on the issue tracker too. I'll gather up the info I have so far and will publish there. Thank you!
On Monday, April 12, 2021 at 12:24:55 PM UTC+3 jli...@google.com wrote:

Andrew K.

unread,
Apr 15, 2021, 1:11:30 AM4/15/21
to Google Cloud SQL discuss
After prolonged testing, I have an update: to completely stop segfaults, Query Insights must be disabled after all. Merely disabling application tags makes them less frequent, but doesn't prevent them entirely.

Mike Berman

unread,
Jun 30, 2021, 4:40:56 PM6/30/21
to Google Cloud SQL discuss
Quick bump. @google what is the status of this? We are seeing similar behavior where the only fix was disabling insights in production. Any movement here?

wushawn

unread,
Jul 5, 2021, 12:04:54 PM7/5/21
to Google Cloud SQL discuss

Currently, The fix for this issue has been rolled out to >99% of instances. The team will provide further updates once this has been rolled out to 100% of instances on this link(https://b.corp.google.com/issues/183108383)
Reply all
Reply to author
Forward
0 new messages