AWX postgres container stuck in Restarting status

2,576 views
Skip to first unread message

senilio

unread,
Oct 4, 2017, 4:59:02 AM10/4/17
to AWX Project
Hi!

I installed AWX a couple of weeks back, and have been using it without issues since then. Yesterday I got an "Internal Server Error" from the GUI, and then I noticed that the postgres container had crashed, and is stuck in "Restarting" status.

The only output from "docker logs postgres" is a repeating:

initdb: directory "/var/lib/postgresql/data" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/var/lib/postgresql/data" or run initdb
with an argument other than "/var/lib/postgresql/data".
initdb: directory "/var/lib/postgresql/data" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/var/lib/postgresql/data" or run initdb
with an argument other than "/var/lib/postgresql/data".
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.

In troubleshooting, I upgraded the awx from github and re-ran the installer, but the issue persists.

I also tried moving /tmp/pgdocker out of the way, and then rerunning the installer to see if it would properly generate an emtpy DB and run from that. That works, but this means I'm starting from scratch. And there's nothing saying this won't happen again, so I'd rather not start over before pin pointing the issue.

Any ideas on how I could troubleshoot this further?

Thanks! 

Matthew Jones

unread,
Oct 4, 2017, 8:10:08 AM10/4/17
to senilio, AWX Project
We've had this reported a couple of times and I have yet to reproduce it locally (though I have no doubt of its severity). I need to figure out why the postgres container is insisting on re-running its init utility and then failing to start up when data already exists there.

--
You received this message because you are subscribed to the Google Groups "AWX Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to awx-project+unsubscribe@googlegroups.com.
To post to this group, send email to awx-p...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/awx-project/f1c2d2ff-9a8d-499f-8c1e-ba1cb31b4283%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Matt Jones
Principal Software Engineer
Ansible Tower

Giordano Bianchi

unread,
Oct 5, 2017, 3:14:42 AM10/5/17
to AWX Project
Hi,

FYI I had the same issue when I updated my environemt and I had to delete the /tmp/pgdocker folder to get it to work. 
Since it was a lab environment I didn't look into the issue further...

To unsubscribe from this group and stop receiving emails from it, send an email to awx-project...@googlegroups.com.

To post to this group, send email to awx-p...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/awx-project/f1c2d2ff-9a8d-499f-8c1e-ba1cb31b4283%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Dorman

unread,
Oct 30, 2017, 4:22:29 PM10/30/17
to AWX Project
This is repeatedly happening to our dev instances. They will be working fine, and then all of sudden you will see "A server error has occurred" when trying to access the UI. Every time it has been that the postgres container is stuck in "restarting" status.

Any troubleshooting advice? I have been unsuccessful in recovering the container once it gets in this state. As mentioned above, deleting /tmp/pgdocker means AWX re-initializes and you start from scratch. We've been having to roll back to stable snapshots each time this happens.

Thanks,
Stephen

Tony Coffman

unread,
Oct 30, 2017, 4:27:17 PM10/30/17
to AWX Project
+1

I ended up rebuilding my DEV on OpenShift rather than docker to resolve this because I couldn't figure out the issue on docker.

--Tony

Matthew Jones

unread,
Oct 31, 2017, 9:46:34 AM10/31/17
to Tony Coffman, AWX Project
I'm going to see about wrapping this up today, here's the issue: https://github.com/ansible/awx/issues/438

To unsubscribe from this group and stop receiving emails from it, send an email to awx-project+unsubscribe@googlegroups.com.

To post to this group, send email to awx-p...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Stephen Dorman

unread,
Nov 20, 2017, 3:59:30 PM11/20/17
to AWX Project
I did a full install after issue 438 was closed and merged, but I am still seeing this issue.

[root@awx-staging /]# docker ps |grep postgres


eadae4f62e7a postgres:9.6 "docker-entrypoint.sh"  2 weeksnago  Restarting (1)  4 minutes ago  5432/tcp  postgres



[root@awx-staging /]# docker logs postgres


initdb: directory "/var/lib/postgresql/data/pgdata" exists but is not empty


If you want to create a new database system, either remove or empty


the directory "/var/lib/postgresql/data/pgdata" or run initdb


with an argument other than "/var/lib/postgresql/data/pgdata".


The files belonging to this database system will be owned by user "postgres".


This user must also own the server process.


The database cluster will be initialized with locale "en_US.utf8".


The default database encoding has accordingly been set to "UTF8".


The default text search configuration will be set to "english".


Data page checksums are disabled.



Any suggestions? Thanks!
-Stephen

Matthew Jones

unread,
Nov 21, 2017, 6:48:08 AM11/21/17
to Stephen Dorman, AWX Project
You'll want to make sure you clear the pg container, the images, and probably clear the postgres data directory.

To unsubscribe from this group and stop receiving emails from it, send an email to awx-project+unsubscribe@googlegroups.com.

To post to this group, send email to awx-p...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Stephen Dorman

unread,
Nov 21, 2017, 1:15:38 PM11/21/17
to AWX Project
Sorry, my last message probably wasn't very clear.

I started with a clean Centos7 install and deployed AWX after the 438 issue was closed. Previously, this issue would surface within a week - the UI would show "A server error has occurred" and upon investigation the postgres container would be stuck restarting. This most recent install lasted over 2 weeks, but the same issue is back. Clearing the pg container, data, etc obviously clears all data and settings which is quite obnoxious. 

Anyone else having this issue still?

Thanks,
Stephen

Matthew Jones

unread,
Nov 21, 2017, 1:45:08 PM11/21/17
to Stephen Dorman, AWX Project
You'll continue to have the problem if you are using your existing pg container and data. I left the outline for a migration path in the PR linked for that issue closure.

To unsubscribe from this group and stop receiving emails from it, send an email to awx-project+unsubscribe@googlegroups.com.

To post to this group, send email to awx-p...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages