Looking up (and provisioning) an identity using just an email address

Skip to first unread message

Karl Kornel

Apr 3, 2020, 8:12:31 PM4/3/20
to Developer Discuss

A colleague and I are working on a workflow where we make data available for a user to download via Globus, and we are running into the problem where we are not sure how to properly identify the user, and deal with the possibility that the user has never used Globus before.

In the ideal scenario, all we have is the user's email address, and we do not know if the user has ever logged in to Globus.

Everything we are doing is entirely on the command-line and via script (using the Globus CLI).  We are not using the Globus API.  We already have a shared endpoint created, and now we want to give a user access, using the `globus endpoint permission create` command.  The problem is, we don't necessarily have a user's Globus identity username, we only have the user's email address.

We were thinking of using `globus get-identities` to look up the identity from the email address, but that does not work.  For example, we created a Globus ID with username "xapol...@globusid.org" and email "xapol...@mailboxt.com".  Running `globus get-identities xapol...@mailboxt.com` tells us there is no such identity.

We tried using `globus endpoint permission create --provision-identity` to set up a new identity using the email address, but that didn't work.  I really should have realized that '--provision-identity' wants an identity username, not an email address!

So, I am kindof stuck.  Right now, our workaround is to have users browse to https://app.globus.org/account/identities, and tell us what their identity username is.  That has the side-effect of ensuring a Globus identity is created, but it's annoying because it is an extra step.

Any advice would be appreciated!

~ Karl

Stephen Rosen

Apr 9, 2020, 1:04:42 PM4/9/20
to Developer Discuss
Hi Karl,

Sorry for taking so long to reply.

The trouble in this case is that given a user's email address, if the user has never logged in we don't know what their identity will be.
I'll share the tricky case below, so you're aware of how things can go wrong, and then we can think about appropriate workarounds.

Here's a scenario -- I think the most problematic one -- which we see:
- We want to share with a user, call him Luke
- Luke's university ID is a non-friendly name generated by IT or HR, tk...@example.edu
- Example University lets Luke setup an email alias, lu...@example.edu

Luke might even think that his account is "lu...@example.edu", not realizing that it's just an alias.

If Luke does a login to any Globus-backed application, not just the webapp, he'll go through a flow with his Example University identity provider which will provide Globus with his real university ID (tk...@example.edu).
But even then, we have to know to share with "tk...@example.edu", not "lu...@example.edu", because only one of those two strings is his identity username.

Although we don't support it today, we could at least in theory support looking up "tk...@example.edu" from "lu...@example.edu" once Luke has logged in.
But we can't do it until then, so for many use cases it doesn't solve things.

IMO, this means that a good solution to your problem will have to force the user to log in somewhere.
That way we can get "tk...@example.edu" into our Globus ACLs without issue.

There's another nasty wrinkle lurking.
If "lu...@example.edu" is just an alias, it could be rebound to another identity in the future.
That means that it is technically possible, though exceedingly rare, for lu...@example.edu to be registered in Globus as belonging two identities.
If tk...@example.edu and skyw...@example.edu are both identities, with lu...@example.edu as the email address listed for both of them, what happens when someone wants to share with lu...@example.edu?
In practice, this almost never happens, but it is important to note. I'll mention it again later on, below.

If you know that the username matches the email address, you could pre-provision the identity
You mentioned this in your mail, so I take it you know about it. But for the listhost, a quick rundown...

In some cases, we know that "lu...@example.edu" really is Luke's username.
In that case, you can use the `--provision-identity` flag (provision=True on AuthClient.get_identities for SDK users) to ensure that the identity exists, even if Luke has not yet logged in anywhere.
This is somewhat "risky", because if you're wrong, you end up with an ACL for "lu...@example.edu" which does nothing and may confuse people.
But it is supported for the case where you really know.

Another thing you can do is to run your own web application where users log in and which handles "post login sharing".
(I know you said you're doing everything with the CLI, but I think it's just not going to cut it in this case.)

I don't want to understate how much work it is to run your own app for this integration, especially if you need it to be available 24x365.
I hesitate to even recommend it unless you plan to process a very large number of users through this flow.
But with the Globus SDK, it is quite doable and the logic is pretty simple, so I'd like to at least share how it's done.

Here's what I have in mind:
- Register a client with credentials
- Make the client identity an access manager on your endpoint so it can create ACLs using its own identity
- Put together a 3-legged OAuth login flow as part of an app, using the client credentials (We have a simple example in the SDK docs)
  - You will need to request scopes: "openid email profile urn:globus:auth:scope:auth.globus.org:view_identity_set"
- Setup a simple mapping of email addresses to desired ACLs. e.g. map "lu...@example.edu" to "rw, <endpoint-id>, /some/path", in config or a small DB
  ( this SDK call is what we need to feed the data into )
- Whenever a user logs in, your application will do these steps
  1. Fetch identity information using the userinfo call
  2. Look for the email addresses not only in the effective identity, but in all identities in the "identity_set" field
      I wish I had a good example of this to point at, but I don't right now. The globus-cli code does this through our complex output helper here.
      It might be best for you to just try the call and look at the response payload -- make sure it has the "identity_set" field, which it only gets from the "view_identity_set" scope above.
  3. Map all of the user's email addresses to any "prepared ACLs" in your config store
  4. Have the *application act as itself* and create any "prepared ACLs"
      - For this you need to use client credentials to get a token. We want a Transfer token which can act as your app, not the user who is logging in.
      - I like this pattern of usage, with a Client Credentials Authorizer, since it has a bit of a "set it and forget it" feel.
  5. (Optional) Redirect the user to the Globus WebApp
      I'm not sure if this is behavior will be stable and supported long-term, but today you can even specify an endpoint and path for the user, as in
      so it's possible to send the user to an endpoint+path where data was just shared.

Even without having built or tested such a solution, I'm 99.9% confident that it works and solves your use-case.
The only wrinkle is the case where two identities have the same email address -- either they both could get ACLs added, or it's first-come-first-served.
The downside, of course, is that someone has to go build it.

I think you should weigh it against your current workaround of having the users tell you their usernames, and decide whether or not it will save effort.
(Also, obviously once you write it, you could reuse the code for other things.)

I really wish I could present you with a simpler workaround or solution.
Ideally just "here's 60 lines of inscrutable bash which does what you want with the CLI" :-)

Sadly, the situation with email addresses is nefarious.
Properly handling them requires that you "store and defer" activities which relate to the user in question, and then trigger those activities only once the user has logged in.

In this particular case, it's reasonable to say "couldn't Globus just do this for us?"
However, in the general case, we're talking about storing and deferring an arbitrary action -- not just sharing -- and things get difficult quickly.
We also have to contend with the "two identities share an email" case I mentioned before.
While an application could easily make a decision for a specific use-case, deciding on a globally applicable behavior or even just a good default for a configurable behavior is hard.

It's likely that there's another workaround that I haven't thought of.
If you come up with one, or if you have questions about the proposed solution above, or even if you decide that it's not worth the time sink to build an app for this, please give us a shout.
I'd love to hear your feedback on this and whether or not it's viable for you.

Reply all
Reply to author
0 new messages