Should we seek to include a complete call chain context with the TI spec?

24 views
Skip to first unread message

Andrew Jessup

unread,
Jun 25, 2020, 5:54:00 PM6/25/20
to [WG] Transitive Identity
Hey everyone,

Hope you're all keeping safe and well.

Going over Russel's excellent presentation from a few weeks ago on transitive identity at Netflix, one important design decision he talked about was to provide only the context of the initial caller (typically the end user) as well as the "direct" (that is, the most proximal upstream caller of the receiving service) caller. This allows for some useful simplifications, at the cost of not supporting authorization based on the identity of intermediate callers.

I'd love to hear feedback from folks in this group as to whether you feel this is an appropriate trade-off for other settings. Do you have use cases where the identity of an intermediate caller (in addition to the initial and direct caller) is necessary?

Cheers,

AJ

Ed Warnicke

unread,
Jun 25, 2020, 5:56:07 PM6/25/20
to Andrew Jessup, [WG] Transitive Identity
Would that work across administratively separate organizations (ie, organizations that aren't subdivisions of some larger organization)?  Does it protect against concerns with intermediate hops in the chain?

Ed

--
To unsubscribe from this group and stop receiving emails from it, send an email to transitive-identi...@spiffe.io.

Ram Lakshminarayanan

unread,
Jun 25, 2020, 5:57:18 PM6/25/20
to Andrew Jessup, [WG] Transitive Identity
I believe that is a good balance. Practically speaking , when designing authorization, people are only aware of just the immediate caller, and having the end user context will help with Auditing or derive additional security/authorization. 

Thanks
Ram

Ram Lakshminarayanan

unread,
Jun 25, 2020, 6:01:03 PM6/25/20
to Ed Warnicke, Andrew Jessup, [WG] Transitive Identity
Great question. When you cross organizational boundaries, one has to cross Service Domain. So from that context, still the immediate caller /proxy service would be a good benchmark.  When you cross business/division boundaries, they usually won't know all the intermediate services that are involved. Creating authorization policy based on such context will be very difficult/not practical. 
IF every hops, logs all the Immediate caller identity and end user, you can re-construct the chain of events for Auditing/Investigation. 

I used to think one would need an entire call chain but given how systems are deployed ( across different cloud services) and owned by different BU/Divisions, its not practical. ( My 2 cents/observation) 

Thanks
Ram

Ed Warnicke

unread,
Jun 25, 2020, 7:44:53 PM6/25/20
to Gilman, Evan, Ram Lakshminarayanan, Andrew Jessup, [WG] Transitive Identity
I've heard a number of similar cases in my travels from finance... others quite like it in defense, supply chain, etc.

Ed

On Thu, Jun 25, 2020 at 6:36 PM Gilman, Evan <ev...@hpe.com> wrote:
I was presented with a use case for the complete call chain about a year back. It was "I only want to authorize requests to my payment processing service if I can assert that it was requested by an authorized user AND that it has transited our fraud detection service."

I do know that this was not a contrived use case, however I'm not sure how common such cases are, nor if this kind of authorization is better solved in a different way.

Frederick Kautz

unread,
Jun 25, 2020, 7:51:00 PM6/25/20
to Ed Warnicke, Gilman, Evan, Ram Lakshminarayanan, Andrew Jessup, [WG] Transitive Identity

I think being able to reason over the entire chain across multiple organizations should be a capability. Historically, it wasn’t possible to do so due to compute and infrastructure capabilities. In today’s environment, we are very close to reaching a solution. We just need to close the gap.


I'll give you two examples:


Medical Use Case

From the healthcare/medical perspective, we want to be able to continuously audit that certain properties are true on the customer/partner side. E.g. If we have a requirement that an HTTP based Data Loss Prevention solution on the partner side of the connection, preserving intermediate identity would give us the capability to audit the path of the connection and make policy decisions based on the result. We have *no* control over the partners logging nor access to their logs to subsequently audit.  In fact, logs are considered HIPAA data, so we can’t even request to review them except under very specific circumstances. HIPAA is just one among many examples of data security requirements that preclude such audits in many industries. We also have *no* control over what the partner considers a reasonable origin for the chain that leads to the request.  We may decide that the intermediates connections are not important and accordingly reduce the chain, but I would prefer not to be forced into accepting this by design.


Edge Computing


Consider a non HTTP use case for transitive identity in edge computing. There is a very good chance that 5G will replace WiFi for enterprise wireless connectivity. Suppose we have a hospital with a device connected on 5G. That device doesn't connect directly to the Internet but to the on premise data center. The on premise data connects to an edge data center operated by a company like Equinix with managed 3rd party services and data stores. Once the connection has traversed through those managed services, it continues on to our AWS or GCP VPC.



In this entire chain, we have multiple organizations in your path. Simultaneously, we want to enforce that connections do indeed traverse through each of the components we have by policy and reject connections that fail to meet our requirements.


Also, take note that this is not authenticating every single request in HTTP but is strongly authenticating long living connection requests. The cost of verifying the chain when establishing these connections is low cost compared to how long the connection remains established.


If we terminate identity at the edge of each boundary, we significantly complicate this capability in edge use cases.


Net-net denying the option for selecting when, where, and how to summarize chains-of-identity and trust in favor of static one-size-fits all requirements puts a large number of transitive identity problems out of reach.


Cheers,
Frederick


Andrew Jessup

unread,
Jun 25, 2020, 8:03:55 PM6/25/20
to [WG] Transitive Identity, fred...@kautz.dev, Gilman, Evan, rlakshminarayanan, Andrew Jessup, [WG] Transitive Identity, hag...@gmail.com
Thanks for the detailed use case breakdown Frederick, this is very helpful.

Also, take note that this is not authenticating every single request in HTTP but is strongly authenticating long living connection requests. The cost of verifying the chain when establishing these connections is low cost compared to how long the connection remains established.

Is this true for both of the two classes of use case you outline? ie. that the identity of the channel should be asserted based on upstream callers rather than the requests that pass through it? 

I wonder how this might be accomplished in practice if a service had multiple upstream callers? Would logic need to exist to create separate (eg. mTLS) channels to a backend based on the identity of both the service and each upstream request it was receiving?

Frederick Kautz

unread,
Jun 25, 2020, 8:12:52 PM6/25/20
to Andrew Jessup, [WG] Transitive Identity, Gilman, Evan, rlakshminarayanan, hag...@gmail.com
I realized after I sent this that I wasn't clear.

I think both paths are important. E.g. gRPC is designed to perform authentication with *every* request. The same is also true with HTTP, since a service may have a high fan across a variety of services. We cannot assume a 1:1 relationship between a connection and authenticated chain.

In the L2/L3 use cases, these connections are long lived and generally only happen on initial connections and on a connection refresh (equivalent to a ping across the control plane). 

Andrew Jessup

unread,
Jun 30, 2020, 8:39:38 PM6/30/20
to Frederick Kautz, [WG] Transitive Identity, Gilman, Evan, rlakshminarayanan, hag...@gmail.com
It feels like we're looking at two similar but meaningfully different use cases here.

The first is capturing the authentication context of the initial caller, to allow a callee to make an authorization decision based on both the initial caller (often a human who initiated an action) and the direct caller immediately upstream of the callee. For now I'll call this use case "end user context propagation" (pithier suggestions welcome ;).

The second use case is to verify the provenance of a request through some or all of it's call chain. As per Fredericks email above - a motivation might be to prove a partner has processed a message according to a predetermined policy before acting on it. Another motivation might be to prove the provenance of data produced by one system but stored in another, such as that described by Paul Mundt of adaptant.io earlier this year. For now I'll call this use case "assertion of call chain provenance".

There are at least two important practical differences I can see between these use cases:

* Identity namespace - In the case of end user context propagation, the "end user identity" is a human identity and thus not represented by a SPIFFE namespace - and instead will likely be an oAuth, Kerberos or similar identity framework. Some form of foreign token exchange or embedding will likely be necessary. By contrast it is probably reasonable to assume that each software component in a request call chain has a SPIFFE ID (though they may be in different trust domains).

* Message integrity - when asserting call chain provenance it is necessary by definition to be able to prove that a specific message has passed through one or more endpoints. This could either be done by signing individual messages as they pass through each endpoint, or the callee assering the identity of every channel the message was delivered over. For end user context propagation through, in many cases it is sufficient simply to prove the identity of the initial user and the direct workload caller, and typically authorization will be agnostic to the identity of intermediate callers.

My suspicion here is that the ideal solution to these two use cases individually will look quite different to each other, and that it would be prudent to consider the design of how we should support them in SPIFFE as distinct, though complimentary, efforts.

Would love to hear from others as to whether this approach makes sense.

Ed Warnicke

unread,
Jul 7, 2020, 8:32:06 PM7/7/20
to Andrew Jessup, Frederick Kautz, [WG] Transitive Identity, Gilman, Evan, rlakshminarayanan
Andrew,

I like the direction you are taking here in terms of breaking down the problem.  I  think your breakdown of "end user context propagation" and "assertion of call chain provenance" are a constructive way to further the conversation.  I also like your breakdown of aspects of the problem around identity namespace and message integrity, it provides a productive framework for discussion.

I would suggest that in the "identity namespace" aspect, does it seem to you that "end user context propagation" is "how do I handle crossing a single identity namespace boundary" (oath etc to spiffe) and the "chain of provenance" case is more "how do I handle crossing n identity namespace boundaries" ?

Ed

Josh Kline

unread,
Jul 7, 2020, 9:16:53 PM7/7/20
to Andrew Jessup, [WG] Transitive Identity
> The first is capturing the authentication context of the initial caller, to allow a callee to make an authorization decision based on both the initial caller (often a human who initiated an action) and the direct caller immediately upstream of the callee. For now I'll call this use case "end user context propagation" (pithier suggestions welcome ;).

We use "initial actor identity" to describe this use case.


> provide only the context of the initial caller (typically the end user) as well as the "direct" (that is, the most proximal upstream caller of the receiving service) caller.
>  Do you have use cases where the identity of an intermediate caller (in addition to the initial and direct caller) is necessary?

From time to time we get someone who thinks they want full-call-chain identity for authorization based on intermediate actors.
However, practically speaking we've never gotten a concrete use case that requires it.
The use cases presented in this thread are more advanced than our current situation at Uber.
We're still implementing immediate actor verification, and initial actor verification by including two signed JSON Web Tokens in each request.

>  the "end user identity" is a human identity and thus not represented by a SPIFFE namespace

We've assigned Uber personnel a SPIFFE ID in a separate trust domain from workloads.
JWT representing personnel are signed by a private key held by our single-sign-on system, and contain a kid header corresponding to this key. This is not a certificate and is not included in the SPIRE distributed trust bundle. Verifiers fetch the public key from JWKS at a well known JKU.

JWT representing workloads are signed by the workload's SPIRE assigned private key and contain x5c header with their SPIRE assigned certificate chain.

We try very hard not to conflate actor position (immediate vs initial) with actor type (e.g workload vs personnel vs customer). The initiator of a call chain may be a workload, or a human. The immediate actor may be a workload, or a human.


>  I  think your breakdown of "end user context propagation" and "assertion of call chain provenance" are a constructive way to further the conversation.

I agree. It's a clear way to highlight the capabilities of any given proposal.

-Josh Kline
Identity and Access Management team, Engineering Security, Uber.
Reply all
Reply to author
Forward
0 new messages