Secure remote access to Open Refine [configuration problem]

582 views
Skip to first unread message

jred...@alumnos.unex.es

unread,
Sep 19, 2014, 7:55:38 AM9/19/14
to openr...@googlegroups.com
Hello,


I would like to run the Google Refine webapp from my computer anywhere I have internet connection. To do so I start the refine service like this: "./refine -i 0.0.0.0".
So far so good, I can see my proyects anywhere and work with them without troubles.

The issue that concerns me is that anyone with the url of the service (http://mymachine:3333) can enter and modify anything he wants. How can I secure this access using a password?

Thank you,


Jesús Redondo

Thad Guidry

unread,
Sep 19, 2014, 11:44:28 AM9/19/14
to openrefine
You would have to use a Proxy (like mod_proxy on Apache) and then tunnel through that.  That would have some limited form of security, but not rock-solid security (there are still some risks).

Here is an email thread from our archive that can help get you started in the right direction:

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

jred...@alumnos.unex.es

unread,
Sep 22, 2014, 6:51:23 AM9/22/14
to openr...@googlegroups.com

Thanks for pointing out this approach.

I understand that the first step to the solution is to make a proxy to run Open refine through Apache. This is achieved using the following lines of mod_proxy (which is installed and running):

ProxyPass /refine http://127.0.0.1:3333/
ProxyPassReverse /refine/ http://127.0.0.1:3333/

Unfortunately this doesn't work properly. When I load the page http://___IP_of_My_Server____/refine this is what I see (almost nothing):




I made sure that Mod_proxy is working by using other dynamic pages with a tomcat, and they do work fine, so I really don't know what is failing.

Thanks

Jesús Redondo

Thad Guidry

unread,
Sep 22, 2014, 3:11:36 PM9/22/14
to openrefine
Are you sure you are running through the correct port ?  Or is it that your stilling using the default one for OpenRefine ?  You can change this in the refine.ini file by the way.

Other than that... 
I would have to defer to Stefano or Tom, on any more ideas.  I am not a mod_proxy guru.  You might want to ask on their mailing list for more ideas and setup / config help.


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tom Morris

unread,
Sep 23, 2014, 11:25:57 PM9/23/14
to openr...@googlegroups.com
Hi Jesús,

I'm buried with non-OpenRefine stuff for the next few days, but this
is absolutely a use case that we want to make point & click easy.

The simple case would probably be a non-desktop cloud virtual image
that you could get started with a couple of clicks and share with
however your collaborators are. A desktop hosted scenario could be a
second step to that.

Tom

jred...@alumnos.unex.es

unread,
Sep 24, 2014, 5:20:19 AM9/24/14
to openr...@googlegroups.com
Hello, thanks for the interest in this issue.

I can't wait to see those changes ;). Thank you

Here there is a related question about the issue I asked before that could be useful for future references:

     http://stackoverflow.com/questions/26012265/mod-proxy-doesnt-show-open-refine-app-properly

Jesús

Kingdon Barrett

unread,
Sep 24, 2014, 3:21:10 PM9/24/14
to openr...@googlegroups.com
Hey,

If I had to guess, I'd say your problem comes from changing the app's URL location to where it's no longer at the root of the host:port combo.

Since you are on localhost, I'm going to suggest that you add "refine" to the list of hosts in /etc/hosts that resolve to 127.0.0.1

Then try setting up ProxyPass with a virtualhost for that "refine" name

^ here's an example from a non-refine context

^ here's another approach, much more complicated, less guaranteed to work, but more likely to illustrate the problem and solution for you.  On top of reverse proxy config, try adding url rewriting.

In other words, something somewhere inside of refine is using an absolute URL path (link starting with http:// or with /) and since you are only proxying requests that start with /refine, those links go nowhere.

The third approach would be to change refine so that it only uses relative URIs for the various linked resources

The easiest approach is what I showed in the first link, set up a virtualhost and proxypass everything from <Location /> to 127.0.0.1:3333, almost exactly what you did.

--Kingdon

On Fri, Sep 19, 2014 at 7:55 AM, <jred...@alumnos.unex.es> wrote:

--

jred...@alumnos.unex.es

unread,
Sep 25, 2014, 5:31:14 AM9/25/14
to openr...@googlegroups.com
Hello,

The first approach works in local and even in remote, but the problem is that it uses the base URL for Openrefine. For example, if the domain of the server were http://exampleserver.com then the Open refine app would be accessible by that given URL and that doesn't suit my needs because I am already serving a Web in port 80. The idea is that Openrefine runs in an unused port (like http://exampleserver.com:3333) or in a different path (like http://exampleserver.com/refine).

I am trying the approach with the DNS, the problem is that I am not a guru of apache2. Can I create a virtualserver with the rules used before but only for openrefine which would be something like this: http://myopenrefineserver? And then modify the /etc/hosts of the clients to resolve http:myopenrefineserver to the IP of the server? This would left the original server (or virtualserver) http://exampleserver.com untouched I guess. I will give this approach a chance.

Probably the best solution is with mod_rewrite, but certainly, it doesn't seem easy.

Thanks for all the advices!

Jesús

Kingdon Barrett

unread,
Sep 25, 2014, 9:05:47 AM9/25/14
to openr...@googlegroups.com
No, sorry if I didn't make this clear, but you don't use your web server's primary hostname for your refine virtualhost.  You make a new one, like refine.exampleserver.com, and set access controls on that host in the usual way with AuthType and AuthUserFile directives just for that virtualhost within the definition you just created.

You will need to set up a VirtualHost for your existing service as well as the new one.  My system also has a regular HTTP root in /var/www, and it has a separate VirtualHost directive that looks like this:

<VirtualHost *>
    ServerName downloads.nerdland.info

    DocumentRoot /var/www/html
</VirtualHost>

Be careful defining virtual hosts that the order they are added will determine which one gets picked for requests that don't match any.  For example, add a line "ServerAlias www.exampleserver.com" so that your vhost for "exampleserver.com" also responds for that name, or make sure that the host you want to answer requests that don't match any, by default gets parsed first, since your root config's DocumentRoot will be ineffective after adding the VHost.  In debian, the convention is to add all sites to /etc/apache2/sites-available and symlink them to sites-enabled

Make sure to number them so that for example 001-default which is parsed before 999-refinevhost will get default to answer for xyzname.undefinedhost
Reply all
Reply to author
Forward
0 new messages