Tom Morris

unread,

Nov 15, 2020, 9:47:14 PM11/15/20

to openref...@googlegroups.com

I've been thinking about how we authenticate HTTP requests, including how we handle both password-based basic auth and token based authentication and am interested in feedback on the design sketch below.

Currently we have two mechanisms:

- HTTP headers that have to be set up by hand and are sent with all Fetch URL calls

- https://username:pass...@example.com/ URL syntax which is intended to create the appropriate HTTP Basic Auth headers for the target site (HTTPS only) https://github.com/OpenRefine/OpenRefine/issues/217

Both have significant security issues and both are awkward to use. What we really want is a database of credentials which are stored per target host/URL. For basic auth, we should only send a specific set of credentials when we receive a 401 challenge for the correct realm. For token based auth, we want to be able to have different tokens (and token names) for different endpoints.

The authentication database could be separate, but to play with the concept, I've leveraged our existing preferences database, carving out a prefixed section of the namespace (all set up by hand). Matching is done based on longest prefix match to the target URL with reversed domain names. So, http:/example.com.www/reconcile becomes http://com.example.www/reconcile and matches in increasingly specific order are:

http://com.example

http://com.example.www

http://com.example.www/reconcile

I'm not sure we need this level of flexibility, but it doesn't really add much complexity. One thing I'm unsure about is whether we should consider protocol (http vs https) in the matching. Opinions? Currently protocol is included.

Each entry has a type ("basic" or "token"), followed by the token name or username, then the token value or password.

Lookup is done at runtime and the credentials are not included in the operation history or any logging. Credentials are not currently encrypted. We could do some simple encryption, but I'm not sure how much value it adds. Another possibility for passwords would be to prompt for the user for them when first needed and only store them for the duration of the session.

Below are concrete examples of what the two types of entries look like:

Preferences

Key	Value
scripting.expressions

http-auth-https://com.xyzzy.reconciliation	token api_key mysecrettoken42
http-auth-https://org.httpbin	basic testuser testpw

Thad Guidry

unread,

Nov 15, 2020, 10:43:42 PM11/15/20

to openref...@googlegroups.com

Another possibility for passwords would be to prompt for the user for them when first needed and only store them for the duration of the session.

From a user perspective, I like this the best personally.

Are you considering pac4j implementations? https://github.com/pac4j/pac4j

Thad

https://www.linkedin.com/in/thadguidry/

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine-dev/CAE9vqEFOci5ACk5XR0XOZnYBU63-rbjjk_YqaJcAad7TAbi2SQ%40mail.gmail.com.

Antonin Delpeuch (lists)

unread,

Nov 20, 2020, 10:39:52 AM11/20/20

to openref...@googlegroups.com

Hi Tom,

I agree with the idea that we ultimately need a new place to store these
credentials: not in the workflow JSON, nor in the preferences, but
somewhere more hidden (with some basic encryption, just for the sake of
not having them in plain text).

I like the idea of storing these centrally, such that a single
credential entry could be used in multiple contexts (fetching URLs,
reconciliation, …) but I am concerned about the added complexity.

It is pretty common to see to different authentication methods used
within the same domain (API key, Cookie based, HTTP auth…) so I would
not expect this sharing to be useful that often. If we add support for
authentication in the reconciliation protocol, it is likely that it is
done in a way that is slightly incompatible with the current
authentication mechanisms in the web APIs offered alongside
reconciliation endpoints.

You have described how these credentials could be stored in the backend,
but for me it is not clear what the user experience looks like.
Say I register a new reconciliation service https://foo.com/reconcile
which requires authentication. I am prompted for my API key (for
instance) which gets stored in the credentials store for
"https://com.foo/reconcile". Later on, I want to fetch data from the
public API of the same service at https://foo.com/api. What happens
here? If I do not specify any authentication, the API key for the
reconciliation endpoint will be reused by default, right? Now I see a
few problems with this:
- how will the user understand that they do not need to enter
credentials again here? They will basically need to understand the
underlying matching algorithm, right?
- what if this API does not actually need authentication and returns an
error when this extra parameter is supplied? The user did not supply any
authentication and obtains an error that will be difficult to understand
and circumvent, no?
- what about the case where the credentials are the same for the
reconciliation endpoint and the web API, but are passed in different ways?
- say I supply different credentials in the "fetch URL" operation. Which
key is going to be used to store them, since the URLs generally vary for
each row and might not even have a common prefix?

So I would go for a simpler system. Perhaps just a list of secrets,
stored as a simple key-value store (just like the preference store),
which the user could pick credentials from, in various contexts
(reconciliation, fetch URLs, others…). The credentials would be referred
to by their keys in the workflow JSON, and the operation itself would
retrieve the secret value from the store at runtime.

Even that is quite involved by itself - so if you are after
authentication support for reconciliation, perhaps it would be worth
adding support for that with our current naive credentials storage
(directly in the preferences, for instance), and improve it as a second
step?

Best,
Antonin

> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine-de...@googlegroups.com

> <mailto:openrefine-de...@googlegroups.com>.

> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine-dev/CAE9vqEFOci5ACk5XR0XOZnYBU63-rbjjk_YqaJcAad7TAbi2SQ%40mail.gmail.com

> <https://groups.google.com/d/msgid/openrefine-dev/CAE9vqEFOci5ACk5XR0XOZnYBU63-rbjjk_YqaJcAad7TAbi2SQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Thad Guidry

unread,

Nov 20, 2020, 11:44:04 AM11/20/20

to openref...@googlegroups.com

Say I register a new reconciliation service https://foo.com/reconcile
which requires authentication. I am prompted for my API key (for
instance) which gets stored in the credentials store for
"https://com.foo/reconcile". Later on, I want to fetch data from the
public API of the same service at https://foo.com/api. What happens
here?

I would say those are not the same service. Same domain, but different services. Registering for a service should be done outside of OpenRefine, otherwise, we take on too many roles.

Agree with a simpler system for users. Antonin's approach mimics what other systems do, no?

Allowing storing a credential value (password, API key, etc.) and retrieving it through a scoped variable?

This is how GitHub, Gitlab, and other systems work. Ex. https://docs.gitlab.com/ee/ci/variables/#instance-level-cicd-environment-variables

Thad

https://www.linkedin.com/in/thadguidry/

To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine-dev/1fc4d45f-f06d-2e84-51d2-a47412c2010e%40antonin.delpeuch.eu.

Reply all

Reply to author

Forward

HTTP authentication design sketch

Tom Morris

Preferences

Thad Guidry

Antonin Delpeuch (lists)

Thad Guidry