Hello,
This is a somewhat odd situation, but perhaps someone can provide some insight.
First, our broad setup:
- We have our website on an external server host.
- We are on Reason 4.6.
- The SQL databases are on this external server.
- We have our LDAP server on campus.
- We have our authoritative DNS servers on campus.
- The external server is set to resolve DNS with Google and our ISP and not on its own.
The issue arises when our campus network is inaccessible, such as when it is brought down for maintenance. Given the above setup, we expect users should be able to access the website regardless of the status of the campus network, and that appears to be the case. However, when they attempt to load a page, they often receive the "Opps" page, suggesting a server error.
As far as I can tell, in this situation, Reason loses the ability to consistently access the MySQL server due to "user already has more than
max_user_connections" and fails to obtain the data needed to render a page. One thought was that the server does something odd with
localhost when connecting to the SQL databases, such as looping out to external DNS rather than back directly to itself. This seems somewhat unlikely, given that users are still able to reach the server through DNS.
Also occurring at the same time are bind attempts to the LDAP server on campus, which may be occurring with every pageload. We expect this to fail given the circumstances, but our server admin suggested that these failed LDAP binds might be using up the connections needed to reach the database. The error appears to to trigger within the
ldap_bind function itself, so I'm not sure if
the surrounding code in ds_ldap (4.6) handles the error, possibly leaving the connection open. This seems the likely cause at the moment, esspecially when the LDAP errors were the first to appear in the logs before any SQL errors.
Thoughts? Insights? Suggestions? Below is a copy of an email I sent to our server admin which has more specifics on the error and some of my initial interpretations.
Thanks,
Nick
------------------------------
There's two issues evident in the logs: LDAP and mySQL.
LDAP
This one is perhaps expected and unresolvable given the needed resource is on campus:
"ldap_bind(): Unable to bind to server: Can't contact LDAP server"
MySQL
The real issue to look into. First, there is mention of having too many connections:
"WARNING","Wed, 22 Nov 2017 21:54:59 -0600","Unable to connect to database, sleeping and trying again (Reconnect attempt #5; Error #1203:User user already has more than \'max_user_connections\' active connections)",44,"folders/reason_package/carl_util/db/connectDB.php","url","54.162.166.214","CCBot/2.0 (http://commoncrawl.org/faq/)",1024,""
"FATAL","Wed, 22 Nov 2017 21:57:09 -0600","Unable to connect to database using connection "name" (Error #1203:User name already has more than 'max_user_connections' active connections) (called in db_query at …/carl_util/db/db_query.php:42)",630,"folders/reason_package/carl_util/error_handler/error_handler.php","url","40.77.167.85","Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",256,""
Likewise, there appears to be a half-success failure message as well of the server
"going away":
"FATAL","Wed, 22 Nov 2017 23:19:44 -0600",": run_one error<br />Query: \"query"<br />Error: \"MySQL server has gone away\" (errno: \"2006\")",110,"folders/reason_package/carl_util/db/db_query.php","url","157.55.39.124","Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",256,""
Looking at the Reason files mentioned, they refer to a settings files wherein database connections are defined. Looking at these entries, both the "reason" and "thor" databases used by Reason are set to connect via the hostname of "localhost". Both terminal errors above trigger their error messages and should redirect to the "Oops" page as a 50x internal server error.
I'm going to guess that the too many connections is likely the result of connections being established, failing, and not being properly closed because of the error. This may be a flaw in Reason when it catches an error, else some other process which I am not aware of. Either way, its likely a symptom of the real issue.
Thus, my best guess would be that "localhost" may somehow not act as desired.
Doing a little looking, it appears that "localhost" acts differently for MySQL:
On Unix, MySQL programs treat the host name localhost
specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost
, MySQL programs attempt to connect to the local server by using a Unix socket file. This occurs even if a --port
or -P
option is given to specify a port number. To ensure that the client makes a TCP/IP connection to the local server, use --host
or -h
to specify a host name value of 127.0.0.1
, or the IP address or name of the local server.
In reading the above, does it seem like a good idea to try switching either of these from localhost to 127.0.0.1? I'm not sure what the difference is between the two (socket vs TCP), but that's the best I have.