How tu run Dataverse (Glassfish) behind squid proxy?

118 views
Skip to first unread message

Bruno Lavoie

unread,
May 16, 2018, 2:27:35 PM5/16/18
to Dataverse Users Community
Hello, 

For security reasons, our servers are without direct internet connectivity and they must use a squid proxy to access internet resources.

The problem, is when I start Glassfish it gets stuck on this line, until it timeout:

[2018-05-16T11:52:51.300-0400] [glassfish 4.1] [INFO] [AS-EJB-00054] [javax.enterprise.ejb.container] [tid: _ThreadID=14 _ThreadName=RunLevelControllerThread-1526485945486] [timeMillis: 1526485971300] [levelValue: 800] [[
  Portable JNDI names for EJB DOIEZIdServiceBean: [java:global/dataverse/DOIEZIdServiceBean!edu.harvard.iq.dataverse.DOIEZIdServiceBean, java:global/dataverse/DOIEZIdServiceBean]]]

It makes glassfish so long to start, that even systemd service is failing (because timeout config of 90 secs.)

So, I tried to configure proxy settings with jvm options in the domain.xml file:

        <jvm-options>-Djava.http.proxyHost=proxy.domain.com</jvm-options>
        <jvm-options>-Djava.http.proxyPort=8080</jvm-options>
        <jvm-options>-Djava.https.proxyHost=proxy.domain.com</jvm-options>
        <jvm-options>-Djava.https.proxyPort=8080</jvm-options>
        <jvm-options>-Dhttp.proxyHost=proxy.domain.com</jvm-options>
        <jvm-options>-Dhttp.proxyPort=8080</jvm-options>
        <jvm-options>-Dhttps.proxyHost=proxy.domain.com</jvm-options>
        <jvm-options>-Dhttps.proxyPort=8080</jvm-options> 

I put it with or without java prefix to be sure, and not working at all.
I tried with openjdk and oracle jvms, the same thing.

This causes me a lot of problem, because when adding some datasets/datafiles and it freeze.
When doing tcpdump on the server, it tries to contact ids-ezid-prd.cdlib.org directly.

So now, I hope that someone here is knowledgeable enough to help me on this.

Thanks
Bruno Lavoie
Universite Laval

Don Sizemore

unread,
May 16, 2018, 2:38:46 PM5/16/18
to dataverse...@googlegroups.com
Hello,

Have you tried setting http_proxy and https_proxy in /etc/environment (or similar)?

Don

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/7753693d-be12-4b03-83a0-f8f642e17b35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bruno Lavoie

unread,
May 16, 2018, 2:52:14 PM5/16/18
to Dataverse Users Community
Hello, 

Tried in the systemd service file, it doesn't work...
And by experience the Java JVM don't take theses env vars in account...

Bruno
To post to this group, send email to dataverse...@googlegroups.com.

Bruno Lavoie

unread,
May 18, 2018, 2:31:04 PM5/18/18
to Dataverse Users Community
Hello all,

Got some time to investigate further on this problem.

While tcpdump'ing I quickly saw that our datacenter network rules are dropping outbound SYN packets rather than rejecting them. This has the effect of the ezlib call to wait a long time before returning a timeout, then retrying a few times. It is what is causing very long startup times and dataset additions. To work around this issue, I just added a REJECT iptable entry for the DST-IP corresponding to ezlib endpoint service. With this added rule the connection call is failing fast with a formal connection rejection.

But that doesn't resolve the issue that I'll need to make Dataverse able to communicate externally via a Squid Proxy.

As noted in my previous post, JVM options should normally do the trick to use a proxy server. I can confirm this because I've done this in the past for other stuff.  
Also, we tried to add external Dataverses to harvest and it's doing the same thing: unable to be proxied too. 

So, where is the problem?

I just cloned Dataverse and ezlib GIT repositories an analyzed a little bit of code:
At first, it seems to be the way HttpClient is working by default (or historically): not taking system properties into consideration. To make it use defined system properties, SystemDefaultHttpClient must be used rather than DefaultHttpClient. A switch to SystemDefaultHttpClient should not change client behavior because it's a subclass of DefaultHttpClient. By the way, those class are deprecated since 4.3 and it is recommended to use HttpClientBuilder class. And looking at this HttpClientBuilder class, there's an explicit useSystemProperties() method that fulfill the need to use system properties.

The bottom line is that the httpclient lib don't care about system properties unless we tell it to do. I think that those system properties must always prevail when set. Because of this, we don't have the ability to host the service on corporate data centers without direct internet access. 

Outsize EZLIB, by doing a textual search in sources tells be that in Dataverse it's used in 5 files:
DataCiteRESTfullClient.java
DatasetPage.java
EditDatafilesPage.java
StorageIO.java
HttpSendReceiveClientStep.java

I think it shouldn't be too hard to make it better...

Am I missing something about that?

Thanks
Bruno Lavoie
Université Laval

Philip Durbin

unread,
May 18, 2018, 3:02:29 PM5/18/18
to dataverse...@googlegroups.com
Hi Bruno,

Thanks for doing all this investigation. I don't think anyone else has tried running Dataverse behind a proxy before. Or maybe they gave up. I appreciate that you've taken such a deep look into the problem!

The one thing that comes to mind is that in the code we also use an HTTP client called Unirest. The import statement is for com.mashape.unirest.http.Unirest. I don't know if it respects proxy-related JVM options or not.

Do you mind opening an issue at https://github.com/IQSS/dataverse/issues about all this? Something about how Dataverse doesn't work behind a proxy?

Thanks!

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jim Myers

unread,
May 19, 2018, 10:11:05 AM5/19/18
to Dataverse Users Community
Bruno,
There is an update to the ezlib master branch that might be worth looking at. I submitted patches to them back in 2015 to use the newer client builder mechanism and update the apache httpclient from 4.2.6 to 4.5.1 (the patches were accepted but they didn't create a new release). Our issue at the time ont the SEAD Datanet project was in being able to connect to the Purdue ezid server and the 1.0.0 release wasn't handling the proxy setup on their end (I see my note form then on the commit mentions SNI support). I don't know enough about what you're doing to know if it will help, but it should be easy to try (either build against master or I think you could try just swapping the ezid jar for mine (posted at https://sead2.ncsa.illinois.edu/files/5b002ff1e4b09da60d3bf446 - just recompiled to update to httpclient 4.5.3) .

-- Jim

Bruno Lavoie

unread,
May 22, 2018, 2:12:22 PM5/22/18
to Dataverse Users Community
Hello Philip,

Thanks for the quick response, yes I'll fill a ticket about it.

I wasn't aware of Unirest tool, looks nice at first. 

Just on this project homepage, I spotted theses line in the feature list: 
  • Customizable timeout, concurrency levels and proxy settings
  • Customizable HttpClient and HttpAsyncClient implementation.
By going further on the Java page, we can see that we can pass it custom HttpClient:

Unirest.setHttpClient(httpClient);

With this, we could just instantiate an http client that handle properly system settings: HttpClientBuilder +  useSystemProperties(), then pass it to the Unirest class.

Unirest also supports proxy settings by itself:

Unirest.setProxy(new HttpHost("127.0.0.1", 8000));

But don't know what would be the best way of doing things, but I think that using HttpClientBuilder + useSystemProperties() would be more automagic and portable way of doing things. Using the Unirest.setProxy method will need a bit more code to figure out if there's a particular proxy setting somewhere.

Easily feasible for Dataverse?

For the EZID part, did you see the Jim's comment ?
I will try to swap jar files to see what it does...

For EZID, looks like it's not maintained a lot, do you know owners of this project? Maybe we'll need to fork and patch it correctly...

Thanks
Bruno Lavoie


To post to this group, send email to dataverse...@googlegroups.com.

Bruno Lavoie

unread,
May 22, 2018, 2:50:18 PM5/22/18
to Dataverse Users Community
Hello Jim,

Just replaced the EZID jar with yours.

From my tcpdump, EZID requests are using the proxy. :)

You said that you recompiled it using httpclient 4.5.3, while I did not replaced the current dataverse install one (httpclient-4.4.1.jar)...
I'm no java expert, so, can it be hazardous?

Thanks

Bruno Lavoie

unread,
May 22, 2018, 2:59:45 PM5/22/18
to Dataverse Users Community
Sorry Jim, 

I does now work as expected...
Logs in my tcpdump was noise... :(

Philip Durbin

unread,
May 22, 2018, 3:00:56 PM5/22/18
to dataverse...@googlegroups.com
Hi Bruno,

Yes, I saw Jim's reply. (Thanks, Jim.)

The first thing to know about EZID is last August they announced "Over the course of the next two years, EZID DOI services will be phased out for users outside of the University of California" at https://www.cdlib.org/cdlinfo/2017/08/04/ezid-doi-service-is-evolving/

All Dataverse installations outside of California (most of them) will need to move another supported persistent ID provider such as DataCite or Handle: http://guides.dataverse.org/en/4.8.6/installation/config.html#persistent-identifiers-and-publishing-datasets

Of course, you shouldn't concern yourself too much about persistent ID providers at the moment because you are at the stage where you are evaluating Dataverse. I see from your other reply that you just upgraded your EZID jar. This is a fine solution for now while you're evaluating Dataverse. As a project, we at Dataverse need to decide what the out-of-the-box persistent ID provider should be. Right now it's EZID with a test namespace of "doi:10.5072/FK2" as explained at https://ezid.cdlib.org/doc/apidoc.html#testing-the-api . When people are evaluating an installation of Dataverse, we want them to continue to be able to publish using a test namespace list this.

Given how little Unirest is used so far (only for rsync and geoconnect features), I doubt that you'll even notice if it doesn't work behind a proxy during your evaluation, but you are still welcome to create a GitHub issue about this if you want.

Please let us know if you're blocked at all.

I just saw a :( in your very latest reply, so please update us with your latest status.

Thanks!

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Bruno Lavoie

unread,
May 22, 2018, 3:53:13 PM5/22/18
to Dataverse Users Community
Hi Phil, 

I was able to make ezid work with a quick fork and useSystemProperties: 
It is a success for this part.... 

But, as you said, maybe it's not so important for evaluation purposes and I wasn't aware that this service will phase out. 

For now and on, the most important part for our evaluation is to work with multiple dataverses. 
Because of this proxy stuff, we're still unable to test Harvesting features with other dataverses.

Maybe you'll hear about us again... :)

Thanks
Bruno
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
May 22, 2018, 3:58:37 PM5/22/18
to dataverse...@googlegroups.com
Sounds good. Please keep the questions coming. Thanks!

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jim Myers

unread,
May 23, 2018, 7:20:54 AM5/23/18
to Dataverse Users Community
Bruno,
Glad to hear you were able to get it running. I'm sorry I picked the wrong http-client version - looks like we (QDR) are running with 4.5.3 due to some other dependencies, even though we have 4.4.1 in the dataverse pom.xml file. I don't know what's going to happen to the ezid library after the transition Phil mentions, but you could send your change back with a pull request - might help someone else...

Cheers,
   Jim

Bruno Lavoie

unread,
May 23, 2018, 10:05:23 AM5/23/18
to Dataverse Users Community
Hello,

Last updates:

Thanks
Bruno Lavoie
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages