Apache httpd config for php-fpm with Atom 2.5+

522 views
Skip to first unread message

Tim Mooney

unread,
Mar 3, 2021, 6:29:36 PM3/3/21
to AtoM Users
Hi!

TL;DR:

I'm hoping an existing Atom 2.5.x or Atom 2.6.x site that's using Apache httpd (rather than nginx) can share what their Apache config looks like, specifically for proxying to the PHP-FPM pool for atom.

More info:

I'm an experienced Linux/UNIX sysadmin in my university's central IT.  I'm working with IT staff in our library and archives to pilot an AtoM install.  Central IT here uses RHEL whenever possible (we have lots of it), and to this point we use Apache httpd exclusively (we have lots of experience there too).  I understand based upon reading lots of AtoM documentation and posts here that the AtoM developers only develop or test for nginx on Ubuntu, so I'm hoping for some help from other sites that have used Apache httpd instead of nginx.

I initially tried AtoM 2.5.x on RHEL 7.x with Apache httpd 2.4.9 + SCL php 7.2.

I've started over with AtoM 2.6.2 on RHEL 8.3 with Apache httpd 2.4.37 + PHP 7.2.24.

In both cases, all of the common recipes for proxying from httpd to php-fpm have not worked with AtoM.  The problem seems to be the way AtoM appends /sfInstallPlugin or other arguments to the .php part of the URL.  I'm not a PHP programmer, but I believe this is called PATH_INFO.

There are multiple common ways to proxy requests from httpd to a PHP-FPM worker pool.  We've tried the FilesMatch + Proxy method:

  # Enable keepalives for the PHP-FPM proxy and make certain that the
  # timeout to the PHP-FPM pool isn't shorter than the httpd timeout.
  <Proxy fcgi://127.0.0.1:9002>
    Proxyset keepalive=On connectiontimeout=5 timeout=300
  </Proxy>

  # rather than using the older method of rewriting php, the recommended
  # method is to use FilesMatch and SetHandler:
  <FilesMatch \.php$>
    SetHandler "proxy:fcgi://127.0.0.1:9002"
  </FilesMatch>

We've tried the ProxyPassMatch method, both without the (/.*)? part of the regex:

    ProxyPassMatch ^/(.*.php)$
fcgi://127.0.0.1:9002/var/www/sites/atom keepalive=On connectiontimeout=5
timeout=300

And with the additional part to catch the param:

    ProxyPassMatch "^/(.*\.php(/.*)?)$"  fcgi://127.0.0.1:9002/var/www/sites/atom keepalive=On connectiontimeout=5
timeout=300

We've even tried several variations of an older URL rewrite method with mod_rewrite:

    RewriteRule ^(.*\.php(/.*)?)$ fcgi://127.0.0.1:9002/var/www/sites/atom/$1 [proxy,last]


Hopefully someone that has this working reliably on RHEL/CentOS/ScientificLinux/OEL/etc. with Apache httpd can share what's worked for them.

Thanks much!

Tim

Dan Gillean

unread,
Mar 5, 2021, 9:46:27 AM3/5/21
to ICA-AtoM Users
Hi Tim, 

In case you haven't tried looking at them for ideas yet, we have a number of community created RHEL/CentOS installation guides linked on the wiki, here: 
You might check out some of these to see how they have configured the webserver. Also, all posts in the last couple of years relating to RHEL or CentOS should be tagged as such - you can browse them in the forum here: 
Hopefully someone might have suggestions for you! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/bf2911b8-8037-4363-ab18-9075643d6ebfn%40googlegroups.com.
Message has been deleted

Jim Adamson

unread,
Apr 2, 2021, 4:28:03 AM4/2/21
to ica-ato...@googlegroups.com
Hello Tim,

I was interested to read your post, as using Apache is something I've been wondering about for a while now, but never got round to testing. I work in a small team with limited support capacity. All of our other web apps are fronted by Apache, so my thinking was standardizing on Apache would be a good thing for a team of our size.

Anyway, to the problem. You didn't describe beyond "it's not working" how the problem manifests itself, but I'm going to assume you faced the same problem that I faced in my test Vagrant Ubuntu 18.04, Apache 2.4.29 box:

When visiting an AtoM URL that has PATH_INFO data, e.g. the location /index.php/informationobject/browse you get:
"403, Access denied" corresponding to /var/log/apache2/error.log: "AH01071: Got error 'Access to the script '/var/www/html/atom/index.php/informationobject/browse' has been denied (see security.limit_extensions)"

or, if 'security.limit_extensions = ' is added to /etc/php/7.2/fpm/pool.d/atom.conf — not that it's advisable — you get:
"404, No input file specified" corresponding to to /var/log/apache2/error.log: "AH01071: Got error 'Unable to open primary script: /var/www/html/atom/index.php/informationobject/browse (No such file or directory)"

The home page, /index.php is returned normally.

I'm guessing you're at the stage of getting the web installer to run. My Vagrant box is pre-configured, so I'm not confronted by the web installer, and haven't unconfigured AtoM in order to do so. However, hopefully this won't matter.

I think you were on the right lines with it being a PATH_INFO related problem. If you look at the vanilla AtoM-Nginx config you will see a block like this:
  location ~ ^/(index|qubit_dev)\.php(/|$) {
    include /etc/nginx/fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_split_path_info ^(.+\.php)(/.*)$;
    fastcgi_pass atom;
  }
A key line here is fastcgi_split_path_info which populates SCRIPT_NAME and PATH_INFO with the respective values from the regex capture groups. The way I approached this problem was to try to mimic what the Nginx config block is doing, in Apache, to set all the CGI variables appropriately.

A couple of things I did to set myself up for testing this are as follows:
  • Type a2enmod proxy_fcgi (enable the proxy_fcgi module in Apache)
  • increase logging for this module:
    • Add LogLevel proxy_fcgi:trace6 to the bottom of /etc/apache2/apache2.conf
    • Traces then logged to /var/log/apache2/error.log  
After much experimentation and searching, I came across the ProxyFCGISetEnvIf directive. This allowed me to apply the CGI variable manipulations required to at least allow me to return the page /index.php/informationobject/browse successfully. Here is a barebones VirtualHost I came up with:

<VirtualHost *:80>
    DocumentRoot /var/www/html/atom
    KeepAliveTimeout 300
    Timeout 300
    ProxyFCGISetEnvIf "true" SCRIPT_NAME "/index.php"
    ProxyFCGISetEnvIf "%{REQUEST_URI} =~ m|^/(index)\.php(/.*)$|" PATH_INFO "$2"
    ProxyFCGISetEnvIf "true" SCRIPT_FILENAME "%{DOCUMENT_ROOT}/index.php"
    ProxyPassMatch ^/(index|qubit_dev)\.php(/|$) "unix:/run/php7.2-fpm.atom.sock|fcgi://localhost:9002/var/www/html/atom"
    <Directory "/var/www/html/atom">
        AllowOverride All
        Require all granted
    </Directory>
</VirtualHost>


To see the manipulations' Before & After values, which is helpful to understand what Apache is or isn't sending to PHP, the following can be done:

Visit /index.php to generate the following log lines:
#tail -f /var/log/apache2/error.log | stdbuf -o0 grep 'fix_cgivars: override'
[proxy_fcgi:trace4] fix_cgivars: override SCRIPT_NAME from '/index.php' to '/index.php'
[proxy_fcgi:trace4] fix_cgivars: override SCRIPT_FILENAME from 'proxy:fcgi://localhost:9002/var/www/html/atom/index.php' to '/var/www/html/atom/index.php'

Visit /index.php/informationobject/browse to generate the following log lines:
#tail -f /var/log/apache2/error.log | stdbuf -o0 grep 'fix_cgivars: override'
[proxy_fcgi:trace4] fix_cgivars: override SCRIPT_NAME from '/index.php/informationobject/browse' to '/index.php'
[proxy_fcgi:trace4] fix_cgivars: override PATH_INFO from '(null)' to '/informationobject/browse'
[proxy_fcgi:trace4] fix_cgivars: override SCRIPT_FILENAME from 'proxy:fcgi://localhost:9002/var/www/html/atom/index.php/informationobject/browse' to '/var/www/html/atom/index.php'

I think from the above it's self-evident what's going on. The PATH_INFO characters were being left on the end of SCRIPT_NAME, and PHP rightly couldn't locate a script file at that location. PATH_INFO itself wasn't set when it should have been. And SCRIPT_FILENAME suffered the same fate as SCRIPT_NAME, while also oddly being prepended with proxy:fcgi://localhost:9002.

Clearly the above isn't a complete solution, but hopefully it's helpful in forging support for Apache.

I didn't note any of these kinds of CGI variable manipulations in the J Grant Forrest guide, so I'm unclear on how he got it working. However, his guide links through to an Apache configuration where the more modern Proxy by Handler method is used; I'm inferring from what I've read about this method that the above manipulations may not be necessary:

You can also force a request to be handled as a reverse-proxy request, by creating a suitable Handler pass-through. The example configuration below will pass all requests for PHP scripts to the specified FastCGI server using reverse proxy. This feature is available in Apache HTTP Server 2.4.10 and later. For performance reasons, you will want to define a worker representing the same fcgi:// backend. The benefit of this form is that it allows the normal mapping of URI to filename to occur in the server, and the local filesystem result is passed to the backend. When FastCGI is configured this way, the server can calculate the most accurate PATH_INFO.

However, my testing with the Proxy by Handler method wasn't successful.

I hope you find my reply useful, and do keep us updated if you get to the point of having a production-ready configuration available; I'm sure the AtoM community will appreciate it.

Thanks, Jim

On Tue, 9 Mar 2021 at 22:20, 'Tim Mooney' via AtoM Users <ica-ato...@googlegroups.com> wrote:

Thanks Dan!

I had previously found the blog post from J Grant Forrest about Atom 2.5 on CentOS 8 and carefully reviewed that, though my searches hadn't previously discovered your wiki.  Thanks for that link.  I believe I have the same Apache httpd config as described in J Grant's excellent post, but I am not getting the same results.

I did go through the last couple years worth of posts in this group tagged CentOS before posting.  Many of them are CentOS but using nginx, rather than Apache httpd, and the ones that were httpd were generally about an unrelated issue and didn't provide the web server config.

I also looked at some of the earlier guides (Like the one Stefano did for RHEL 7.1), but that too does not cover the Apache httpd config.

It seems like at least a few of your clients have been able to get older versions of AtoM working with Apache httpd without much issue, which makes it all the more puzzling why we're seeing an issue.  We have lots of httpd proxying to PHP-FPM for other PHP applications we run (both in-house and opensource), so it's not like this is completely new to my site.  I can't remember the last time I've been this stymied by a web application.

Thanks,

Tim


--
Jim Adamson
Systems Administrator/Developer
Facilities Management Systems
IT Services
LFA/237 | Harry Fairhurst building | University of York | Heslington | York | YO10 5DD

Tim Mooney

unread,
Apr 14, 2021, 5:49:54 PM4/14/21
to AtoM Users
On Friday, April 2, 2021 at 3:28:03 AM UTC-5 Jim Adamson wrote:
Hello Tim,

Hey Jim!  I greatly appreciate your response and the detective work you've clearly put into this.

I was interested to read your post, as using Apache is something I've been wondering about for a while now, but never got round to testing. I work in a small team with limited support capacity. All of our other web apps are fronted by Apache, so my thinking was standardizing on Apache would be a good thing for a team of our size.

That's exactly the same for my site.  We have the technical capability to install and configure Ubuntu and nginx, but we don't have the support capacity.  We've standardized on RHEL, with Apache httpd for web content, and we have built out configuration management that makes it pretty easy for us to roll out a PHP application fronted by Apache httpd.  We're just not in a position from a person-hours standpoint to be able to support other flavors.  Being a university environment, we already have far too many "snowflake" systems, so adding something so different from our standard install probably isn't something we can take on.

Anyway, to the problem. You didn't describe beyond "it's not working" how the problem manifests itself, but I'm going to assume you faced the same problem that I faced in my test Vagrant Ubuntu 18.04, Apache 2.4.29 box:

When visiting an AtoM URL that has PATH_INFO data, e.g. the location /index.php/informationobject/browse you get:
"403, Access denied" corresponding to /var/log/apache2/error.log: "AH01071: Got error 'Access to the script '/var/www/html/atom/index.php/informationobject/browse' has been denied (see security.limit_extensions)"

or, if 'security.limit_extensions = ' is added to /etc/php/7.2/fpm/pool.d/atom.conf — not that it's advisable — you get:
"404, No input file specified" corresponding to to /var/log/apache2/error.log: "AH01071: Got error 'Unable to open primary script: /var/www/html/atom/index.php/informationobject/browse (No such file or directory)"

The home page, /index.php is returned normally.

Sorry I didn't provide better info about the actual error messages, but you've mostly intuited them correctly, including the differences between having 'security.limit_extensions' set (which it thankfully is, by default) and trying the dangerous config of disabling it.  Just like you, I tried it with and without having that set, but I obviously don't want to run it without it being set.

However, because we're attempting a brand new install, not even 'index.php' alone works.  Connecting to index.php tries to start '/sfInstallPlugin', which triggers one of the two cases you've identified above.

I'm guessing you're at the stage of getting the web installer to run. My Vagrant box is pre-configured, so I'm not confronted by the web installer, and haven't unconfigured AtoM in order to do so. However, hopefully this won't matter.

I suspect overall it wouldn't; getting the PATH_INFO issue resolved will likely allow the installer to work too.  It's just that right now we have no URLs we can access that don't result in a PATH_INFO "call".

I think you were on the right lines with it being a PATH_INFO related problem. If you look at the vanilla AtoM-Nginx config you will see a block like this:
  location ~ ^/(index|qubit_dev)\.php(/|$) {
    include /etc/nginx/fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_split_path_info ^(.+\.php)(/.*)$;
    fastcgi_pass atom;
  }
A key line here is fastcgi_split_path_info which populates SCRIPT_NAME and PATH_INFO with the respective values from the regex capture groups. The way I approached this problem was to try to mimic what the Nginx config block is doing, in Apache, to set all the CGI variables appropriately.

I did a bit of reading about that particular nginx directive, which was indeed what got me focused mainly on this being a PATH_INFO issue.

We have dozens of PHP applications deployed, but none of the current ones use PATH_INFO; instead they pass parameters in other ways.  Having relatively little experience with PATH_INFO, I definitely wasn't certain if I was even using the correct terminology.  I appreciate the confirmation that you believe I was probably on the right track.

A couple of things I did to set myself up for testing this are as follows:
  • Type a2enmod proxy_fcgi (enable the proxy_fcgi module in Apache)
  • increase logging for this module:
    • Add LogLevel proxy_fcgi:trace6 to the bottom of /etc/apache2/apache2.conf
    • Traces then logged to /var/log/apache2/error.log  
RHEL doesn't use a2enmod, but we have configuration management rules to enable the necessary modules in our generated httpd config.  I also did increase logging for several of the modules:

      LogLevel info proxy_module:debug proxy_fcgi_module:debug rewrite_module:debug

I probably should have used one of the trace levels, as you did, rather than just debug.
I found that exact section of the mod_proxy_fcgi documentation too, as well as the section right after it about  "Environment Variables" and the "proxy-fcgi-pathinfo" variable and the possible settings for it.  I tried various settings ("first-dot", "last-dot", and "full") but none of them seemed to have any affect on the problem.  Your idea to use trace in the logging may have helped me there too.

I'm left with the vague feeling that the (Apache httpd) docs are suggesting PATH_INFO handling should "just work" in the SetHandler/FilesMatch case, but it doesn't.  I'm not sure if that's a problem with the docs not being clear, the docs just being wrong, or the actual behavior being broken.

However, my testing with the Proxy by Handler method wasn't successful.

Same here, and the SetHandler method is the one we ideally would have used, if we could get it working reliably.  My interpretation of posts I've seen from various PHP-related projects is that the Apache community has settled on the SetHandler/FilesMatch method as the most elegant way to handle PHP with PHP-FPM, and the other methods are not as popular (but still useful, in some situations).  I know that the older mod_rewrite method that we used with some PHP apps when we were first starting with PHP-FPM caused bogus log entries when scripts or bots would try invalid URLs ending in .php, as we weren't verifying the URL was valid before rewriting it to pass it to the PHP-FPM worker.

I hope you find my reply useful, and do keep us updated if you get to the point of having a production-ready configuration available; I'm sure the AtoM community will appreciate it.

Unfortunately, my organization has decided that we don't have the support cycles to take this on, so I don't think we're going to proceed with AtoM.  Our Library staff may continue to investigate it, but central IT won't be helping with hosting. 

We were very concerned that even if we got things working for the initial install, we would be afraid that every upgrade would potentially break our off-the-beaten-path configuration.  If we can avoid it, we don't want to run any software that we're afraid to upgrade.

Thanks again for the time you put into this and for sharing!   Even though we're not going to proceed with AtoM, I've found the information you've shared very useful.

Thanks,

Tim

Reply all
Reply to author
Forward
0 new messages