Apache httpd as a potential requirement for running Dataverse 4.0

76 views
Skip to first unread message

Philip Durbin

unread,
Apr 4, 2014, 8:23:27 AM4/4/14
to dataverse...@googlegroups.com
As announced* previously on this list, Dataverse developers such as
myself are working away on a major new version of the software, which
we're calling Dataverse 4.0.

Do people who run Dataverse installations have any concerns about
introducing a dependency on Apache httpd?

The idea is that Glassfish would run as a non-privileged user on a
high port and Apache would run on ports 80 and 443, acting as a proxy
for Glassfish. If we assume everyone is ok with running Apache, it
would allow us to use the popular "mod_shib" module for Shibboleth
support. Potentially, we could also handle enforcement of HTTPS over
HTTP in Apache.

I already threw this idea out to the dvn-auth list (below**) but I
thought I'd bring the question to the larger community.

Any opinions or thoughts on this are welcome!

Phil

p.s. It's quite likely that we will be introducing a dependency on
Solr in Dataverse 4.0: http://lucene.apache.org/solr/

* https://groups.google.com/d/msg/dataverse-community/9j6M4s4qi68/HohRwnGwwZYJ

** permalink: https://lists.iq.harvard.edu/pipermail/dvn-auth/2014-April/000006.html

---------- Forwarded message ----------
From: Philip Durbin <philip...@harvard.edu>
Date: Wed, Apr 2, 2014 at 1:50 PM
Subject: Re: [dvn-auth] Reintroducing the Dutch Dataverse Network
To: Ben Companjen <ben.co...@dans.knaw.nl>
Cc: "dvn-...@lists.iq.harvard.edu" <dvn-...@lists.iq.harvard.edu>

Hi Ben (and all)!

It was great talking to you, Arnoud, and Eko this morning! I really
appreciate you taking the time to summarize our conversation for the
benefit of everyone on this list, or who may stumble upon the archived
version. (Speaking of the archived version, it seems to be cut off but
I'm hoping that this reply will capture your whole message.)

We don't have a lot of experience with OIOSAML but we're happy to help
you through any problems with Glassfish. You're welcome to continue
pinging us in #dvn on freenode IRC[1] or my opening a ticket by
emailing sup...@thedata.org

DVN has always run on PostgreSQL and I suspect their are requirements
baked into the DVN 3.x code but as we work on the Dataverse 4.0 code
base (which is entirely new), we are trying to be aware of not tying
ourselves to a particular database (i.e.
https://redmine.hmdc.harvard.edu/issues/3729 ). No promises though...
I'm sure we'll continue to run PostgreSQL in development and
production. :)

I doubt that Dataverse will run in Tomcat. It's a Java Enterprise
Edition (Java EE) application so it requires a full container such as
Glassfish. It *should* run on Jboss/Wildfly and I hope we aren't
introducing any hard dependencies on Glassfish but it's certainly
possible we accidentally are since we use Glassfish exclusively in
development. I should mention that TomEE only supports the "web
profile" of Java EE (last I checked) but I'm pretty sure we require
the "full profile". Oh, and I'm not sure if SELinux is supported or
not... definitely worth trying.

With regard to Apache httpd, we have never required it to run DVN 3.x
but it's under consideration as a requirement for Dataverse 4.0. As
you've noted, using Apache as a proxy like this allows the Shibboleth
attributes to be exposed as environment variables. Given the trouble
I've had using OIOSAML (especially with ADFS[2]), I'm interested in
looking into the "Fronting Glassfish with Apache to run Shibboleth SP"
option[3] some more. I have some notes on the Apache option at
https://github.com/dvn/shibpoc/tree/master/java/shibsppoc

Perhaps I should send this to the larger dataverse-community list (
https://groups.google.com/group/dataverse-community ) but can I get a
quick pulse from people who run (or may run) Dataverse if they have
any objection to running Glassfish behind Apache?

Phil

1. Lots of good Shibboleth chatter this week starting at
http://irclog.iq.harvard.edu/dvn/2014-03-31

2. https://github.com/IQSS/dvn/wiki/configuring-ADFS-relying-party

3. Other options include OIOSAML and OpenAM per this "Dataverse
Shibboleth/SAML Design Document" I continue to update:
https://docs.google.com/document/d/1y2axfd_ScmXVICFlV8AuPDdp5xHwTag54pUpVefzs5g/edit?usp=sharing

On Wed, Apr 2, 2014 at 12:49 PM, Ben Companjen
<ben.co...@dans.knaw.nl> wrote:
> Hi all,
>
> As the future network admin of the Dutch Dataverse Network (DDN), I [0] am
> currently involved with the transition of the service responsibility of
> this service from the library of the University of Utrecht to Data
> Archiving and Networked Services (an institute of the Royal Netherlands
> Academy of Arts and Sciences and the Netherlands Organisation for
> Scientific Research). My colleagues Eko Indarto and Arnoud Jippes are
> working with me on this, as the developer and sysadmin respectively.
>
> As Philip Durbin wrote in his email of October 3, 2013 [1], the DVN (v3.3)
> was patched with OIOSAML to support federated login using personal
> accounts supplied by Dutch higher educational institutions via SURFconext.
> The patch is available and supports not only login, but account creation
> with basic role assignment on first login based on SAML attributes as
> well. It has worked for the current participating (and paying)
> institutions, although the patch doesn't do session management very well.
> After logging in for the first time, an account is created, but the user
> needs to quit and restart the browser to be able to login for the first
> time. This may also be why logging in when browsing studies takes the user
> back to the homepage instead of the study that the user was looking at.
>
> In this transition, we're upgrading the DVN software to 3.6.2, on a new
> RHEL 6 with SELinux server. Next is to reconnect to SURFconext, the
> federated login provider for Dutch higher education institutions. By May
> 1st the transition must be complete and it looks like we'll make it.
>
> However, the "getting OIOSAML to work with 3.6.2" part has not been easy,
> partly due to lack of experience with Glassfish and OIOSAML.
> From a system administration point of view, consolidation of deployment
> environments continues to be important to us. Our Java applications are
> deployed in Tomcat and use MySQL as DBMS with an Apache proxy in front of
> Tomcat. This has also allowed us to use Shibboleth for federated login for
> one of our other services, the long-term preservation archive EASY [2]. I
> personally don't know the details, but setting up Shibboleth and the
> connection to SURFconext has been harder than building software support
> for Shibboleth, which I'm told boils down to getting attribute values from
> environment variables. (We did need Shibboleth's lazy session mode
> enabled.)
>
> This environment and knowledge made us try patch the patch with Shibboleth
> support. That includes fronting Glassfish with Apache. By the end of this
> week I hope to know whether we succeeded :)
>
>
> We (Eko, Arnoud and I) had a Skype call with Philip today, to exchange
> some of our experiences with SAML and learn that DVN v4 will focus on
> Shibboleth (because the demand was highest for Shibboleth). Support for
> more generic authentication frameworks had crossed all our minds before,
> but implementing such support is beyond any current plan (as I understood).
>
> Although more a thought than a design, we suggested a plugin framework for
> DVN to allow e.g. account creation/management as part of the login
> procedure. With such a framework in place, we could create a plugin
> instead of a patch that chooses a authentication provider, redirects to
> the login page and performs the logic of assigning roles to new accounts
> (i.e. authorisation) in between authentication and session start. One
> plugin could be created for the Dutch environment, another for the US
> environment (with InCommon) and perhaps yet another for Facebook
> authentication.
>
> We further asked about known DVN production environments in which DVN is
> deployed in Tomcat and/or uses MySQL, but it appears that DVN relies on
> some JavaEE features that Tomcat does not support. Perhaps TomEE might
> help here, but Philip has no experience with this product. PostgreSQL
> dependencies have been requested to be removed in DVN 4.
>
> It was great to discuss DVN via Skype today, but we understand that
> keeping the discussion open generally helps the wider community. We're
> learning too, and would love to hear about experiences with environments
> similar to ours, or different.
>
> Regards,
>
> Ben
>
>
>
> [0]: http://dans.knaw.nl/en/content/ben-companjen (bencomp on Freenode,
> @bencomp on Twitter)
> [1]:
> https://lists.iq.harvard.edu/pipermail/dvn-auth/2013-October/000001.html
> [2]: https://easy.dans.knaw.nl/ui/home
>
>
>
> Ben Companjen
> Information scientist
> ben.co...@dans.knaw.nl
> +31 6 1334 9717
>
> Data Archiving and Networked Services (DANS)
> DANS promotes sustained access to digital research data. See
> <http://www.dans.knaw.nl/> for more information and contact details. DANS
> is an institute of KNAW and NWO.
>
> DANS | Anna van Saksenlaan 51 | 2593 HW The Hague | P.O. Box 93067 | 2509
> AB The Hague | +31 70 349 44 50 | in...@dans.knaw.nl | www.dans.knaw.nl
> <http://www.dans.knaw.nl/>
>
>
>
>
>
>
> _______________________________________________
> dvn-auth mailing list
> dvn-...@lists.iq.harvard.edu
>
> To unsubscribe from this list or get other information:
>
> https://lists.iq.harvard.edu/mailman/listinfo/dvn-auth



--
Philip Durbin
Software Developer for http://thedata.org
http://www.iq.harvard.edu/people/philip-durbin


--
Philip Durbin
Software Developer for http://thedata.org
http://www.iq.harvard.edu/people/philip-durbin
Reply all
Reply to author
Forward
0 new messages