OAI server could not be reached (server is behind proxy)

287 views
Skip to first unread message

euler

unread,
Jun 13, 2019, 4:40:56 AM6/13/19
to DSpace Technical Support
Dear All,

The repository I'm working on recently switched from http to https due to their new network security policy where all requests should pass through the proxy server and connection must be HTTPS. With regards to this, the harvesting from this repository stopped working. Originally, this repository was setup with Tomcat only and all the redirects to https was done by the proxy server. With this development, I installed Apache 2.4 as a front end for Tomcat (using this guide: https://wiki.duraspace.org/display/DSPACE/ModJk) and to handle the SSL connection. I also changed the protocol in oai.cfg the dspace.oai.url and bitstream.baseUrl from http to https.

My problem now is that even though with all the changes I made, when I test the harvesting with dspace -g -a https://repository/oai/request -i all in the command line, it is giving me the OAI server could not be reached error. Also, when I test the OAI baseURL in http://re.cs.uct.ac.za/ for validation, it says "Can't connect" and (certificate verify failed). I was told that the proxy they're using is HaProxy and so I requested them to let Apache in the repository server handle the SSL connection. I have a hunch that the proxy server is still handling the SSL connection because I'm having certificate chain issues when I test the repository url in ssllabs even though I have installed the correct certificates in Apache. Could it be possible that the harvesting failed because of this?

Also while searching for possible solutions, I encountered this post: http://dspace.2283337.n4.nabble.com/OAI-server-could-not-be-reached-in-DSpace-5-2-tp4677057p4677085.html but since I am using Apache as the front end for Tomcat, am I right to assume that the properties:

-Dhttps.proxySet=true
-Dhttps.proxyHost=proxy.server
-Dhttps.proxyPort=443

in Tomcat and http.proxy.host = ip_proxy and http.proxy.port = port_proxy in dspace.cfg is not applicable in this scenario?

I have set up repositories before that is using the https protocol in their OAI baseURL and harvesting from this server is fine but I have no prior experience when it comes to setting up the repository behind a proxy server.

I would greatly appreciate any possible solutions regarding this and if there are any configurations I may have missed. I would also appreciate if someone from this list who have experience setting up their repository behind a proxy server particularly with HaProxy can share their thoughts on this.

OS: Windows Server 2008 R2
Java: 1.8.0_45
DSpace version: 5.4
Tomcat: 7.0
Apache: Apache/2.4.25 (Win64) mod_jk/1.2.42 OpenSSL/1.0.2k

Thanks in advance!

euler

unread,
Jun 28, 2019, 5:54:09 AM6/28/19
to DSpace Technical Support
Dear All,

It's been a while since I posted this question but unfortunately I did not receive any response. Would greatly appreciate any suggestions, solutions or comments regarding my problem as stated below. I would also like to add that I have no control over the haproxy server, so I am just waiting for the action from their Network admin regarding my request to them.

Hoping for a positive response and thanks in advance!

Best regards,
euler

Tim Donohue

unread,
Jul 1, 2019, 4:50:37 PM7/1/19
to euler, DSpace Technical Support
Hi euler,

Have you verified that other DSpace webapps work OK behind the proxy and Apache?   Is this problem *only* with OAI-PMH, or are you having a larger issue with getting DSpace running behind a proxy server?   The reason I ask is that it's difficult to tell whether you need help with configuring DSpace to run behind a proxy (in general), or if you have everything else working great, and it's just OAI-PMH that is causing issues.

Tim

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/f39074d3-13f1-45bc-b2fd-4909f83e211b%40googlegroups.com.


--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

euler

unread,
Jul 1, 2019, 10:19:44 PM7/1/19
to DSpace Technical Support
Hi Tim,

Thanks for your response. Other DSpace webapps are working perfectly. The thing with this OAI-PMH server is that it validates against other OAI-PMH validator like http://validator.oaipmh.com/http://oval.base-search.net/ but not with https://www.openarchives.org/Register/ValidateSite and http://re.cs.uct.ac.za/ (unfortunately, the last validator is no longer working). Using the command dspace harvest -g -a https://base-url-of-oai-pmh/oai/request -i all in my local instance gave me this response:

Testing basic PMH access:  invalidAddress: OAI server could not be reached.
Testing ORE support:  invalidAddress: OAI server could not be reached.

When I run the same command from within their internal network, it worked, so I have a feeling that the proxy server intercepting the external requests might be the culprit for this. In order to factor out the certificate chain issues of this repository, I requested the network admin to install appropriate chain certificates in their proxy server. Unfortunately, they are hesitant to do that because it will/might affect other subdomains managed by that proxy. I also requested them if it's possible to configure HAProxy with SSL Pass-Through so that the Apache server installed in the repository will handle the SSL connection but they did not do that for the same reason I mentioned earlier. So now I am stuck with this harvesting error since I have no control over the proxy server.

Lastly, I would like to add this question if it is possible to make the harvesting work even if the client won't update its address of the OAI-PMH server? In other words, since all requests will be redirected to https, would it be possible for the client to harvest successfully without changing the OAI-PMH address of all their collections that were setup to harvest its contents from OAI? I asked because this repository I'm working on contains at least 80 sets that are being harvested so it would be tedious for the harvesting client to update them. Is there a workaround for this?

Thanks again in advance!
euler
To unsubscribe from this group and stop receiving emails from it, send an email to dspac...@googlegroups.com.

Tim Donohue

unread,
Jul 2, 2019, 4:06:50 PM7/2/19
to euler, DSpace Technical Support
Hi euler,

Is there a chance this is a simple HTTPS / SSL issue?  In the past, I've noticed that some older OAI-PMH clients these days (and even some validators) fail to support HTTPS (instead they expect OAI-PMH to only be via HTTP).  You seem to be saying that your OAI-PMH server works fine with some clients/validators, but *doesn't* work with other clients/validators.  It might be possible that this is simply that some clients/validators still seem to expect HTTP.  (It also could be that your SSL certificate is "recognized" by some clients, while others are having issues validating it....but, not sure)

I have to admit that for our hosted service at DuraSpace (DSpaceDirect.org), we made a decision early on to require HTTPS for all traffic *EXCEPT* OAI-PMH (in other words, we allow for OAI-PMH requests via plain HTTP).  We had customers run into similar issues with client harvesters that expected HTTP only (I don't recall which ones, to be honest).

So, you might want to take a step back and try out some simple HTTP vs HTTPS tests.  It might not be a proxy issue, but it might be an issue with what some harvesting clients are expecting from the OAI-PMH server.

Tim

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/3ed6bfa4-80fe-465f-83ea-fca3f0f42b9b%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages