Multi-tenancy support

Otis Gospodnetic

unread,

Sep 28, 2017, 1:25:54 PM9/28/17

to Jaeger Tracing

Hi,

Over in https://github.com/jaegertracing/jaeger/issues/427 we are working on adding auth-z support to Jaeger. What we would really like to do is add multi-tenancy support. For us this approach in #427 actually provides everything we need for multi-tenancy - tenant info is passed to Agent via JAEGER_TAGS, the Agent sends the tenant info to Collector along with the spans, the Collector validates the tenant auth info in one of the provided data stores, and the tenant info is then stored right along spans. Once there, the UI (our UI) can make use of the stored tenant info to retrieve only spans for the logged in user. Jaeger's UI, however, has not been touched and is not aware of tenant data being stored with spans.

If something else is needed to support multi-tenancy in Jaeger, I thought it would be good to bring up multi-tenancy here so that interested parties can get involved and move this forward.

Thanks,

Otis

--

Sematext Cloud:

Transaction Tracing - Infrastructure Monitoring - Log Management

http://sematext.com/

Yuri Shkuro

unread,

Sep 28, 2017, 1:52:09 PM9/28/17

to Jaeger Tracing

> For us this approach in #427 actually provides everything we need for multi-tenancy

This sounds surprising to me - all it's doing is protecting the backend from unauthorized span submissions, but anyone can still read anything from the query-service, so there is no tenant separation. That's what I meant by my comment on #427 - I think you need to build tenancy understanding in the backend first before worrying about auth-z. Given the limitations on search capabilities in Cassandra, I could see tenancy being a part of the primary key in all tables (in ES you can just add an extra tag filter to all queries).

Juraci Paixão Kröhling

unread,

Sep 29, 2017, 2:21:54 AM9/29/17

to jaeger-...@googlegroups.com

On 09/28/2017 07:25 PM, Otis Gospodnetic wrote:
> If something else is needed to support multi-tenancy in Jaeger, I
> thought it would be good to bring up multi-tenancy here so that
> interested parties can get involved and move this forward.

Besides Yuri's comments, I'd like to add that we don't know what else is
needed, as we don't have multiple use cases for multi tenancy yet.

We have a valid one, but I'm a bit uneasy that it might not be enough to
move forward.

Questions we might need to answer:

- How to store the tenant information on the backing storage?

Tags would be one solution, but primary keys would be better. Or even,
different key spaces? We discussed a bit on this already and I'm
convinced that using primary keys would work best in Cassandra, but
let's make this decision official :)

- How to extract the tenant information from the request? Is this going
to be part of the payload, or via HTTP header? How to prevent tenant A
from impersonating tenant B?

My take, as suggested before, is to use JWT: some server is responsible
for the auth and issues a JWT. This JWT is sent from the agent to the
collector (or tracer client to collector), and the collector looks into
a specific field on this JWT (the field to look for can be customized).
As JWTs are signed, the collector verifies the signature based on the
cert that is specified via configuration (CLI, file, env var, ...). The
advantage of JWTs is that they don't require a backend to store data, as
we can trust the data within the token, should it match the issuer's
signature.

- Juca.

Bruce Peterson

unread,

Jun 19, 2018, 6:54:18 PM6/19/18

to Jaeger Tracing

I hope this discussion on multi-tenancy is still alive. At AT&T our main use case is a little different. For our primary use case, span authentication is not needed. For our larger tenants, we will not share the data base. This should make a multi-tenant solution much easier. Let me outline our primary use case and then describe what we have done for a POC. I will call this case Shared Services for Large Tenants.

For this use case we have dozens of Tenants sharing many (100-1000) common Services. The Tenants (top level service requestors and root span originators) are very large (high volume) products/applications with the tenancy determined via application code. The Tenant will require their own collector/DB and perhaps may even require multiple collector/DB based on the location of the serving application instance. The tenancy (for all downstream services) is attached to each service request and therefore must be implemented via distributed context.

The Services themselves must not need to be aware of the baggage tag names or to explicitly extract the tenancy information from the context. All spans, including those generated by instrumented 3rd party libraries and frameworks, must automatically include the tenant information as span tags. Each Service will share its Jaeger Agent(s) between all of that service's Tenants. The Jaeger Agent will use the tenancy information in the spans to send those spans to the correct Tenant collector/DB. The Agents config file will contain the data to associate tenant information with a collectors host ip and port information.

In our POC we used specially named baggage tags that would automatically "self promote" to become span tags. These tags will travel with context across to all services and ,after self promoting, to each service's jaeger agent as a tag in every span. For example, any baggage tag that has a key starting with a double underscore, as in “__tenant-id”, will be automatically promoted to be a span tag in each (sub) Service. In order to self-promote, the client libraries must be modified to create the span tag from the baggage tag when the opentracing-span is being converted to a thrift-span. We modified and tested the go, java, node.js and python client libraries. Security and an API to hide the implementation could easily be added.

The Jaeger Agent was modified to hold a map of uber tchannel objects using a tenant name as the key. The Jaeger Agent will "route" the spans to different collector/DB's based on the value of a specified span tag. The configuration was modified to support this functionality. Example:

"--collector.host-port=def...@192.168.99.100:30418,ten...@192.168.99.101:30021,ten...@192.168.99.101:30063"

"--collector.route-tag=__tenant-id"

Instructs the jaeger agent, when you see a span tag named “__tenant-id” that has a value of “tenant1”, then use “192.168.99.101:30021” as the host:port address to the collector/DB.

We also created test clients in Java, Go, Python and Node.js. Hotrod was also modified to be multi-tenant. All with great success.At the end we concluded that perhaps the tenant tag or what the agent called the collector.route-tag should be a fixed name. It would get very confusing to have each tenant specify its own different tenant tag to common services.

From AT&T's perspective, we would need this base functionality to begin to use Jaeger. How can we best move this forward.

Thanks,

Bruce Peterson

SRE AT&T

Yuri Shkuro

unread,

Jun 19, 2018, 11:46:02 PM6/19/18

to Jaeger Tracing

Overall it's a reasonable approach. We need to consider if it can be generalized to other forms of multi-tenancy, e.g. cases where it is ok to store data from different tenants in the same storage, either under different keyspaces or having tenant ID as part of the PK. I have a feeling that partitioning tenants data in a single storage / namespace via PK is the most general use case, from which all other deployments can be derived by forking the data flow further and further upstream from the storage. Having said that, we don't need to implement the most generic use case either. What you propose is the second most extreme case of forking the data flow at the earliest possible point. The most extreme case would be to fork traffic right in the clients, to multiple agents. The downside of forking tenants into separate backends is that you'd need to run different UIs for different tenants. The generic case of partitioning data in the DB by tenant allows a single UI cluster serving all tenants. There will be a need for auth-n & auth-z layers to tell the UI or query service which tenant IDs it is allowed to show to the given user.

The idea of baggage keys self-promoting themselves to span tags is useful. I would go with having the tracers configured with a default list of keys that self-promote, with the possibility of overriding it. I don't think relying on double-underscore prefix is a good choice, I am even surprised it worked for you because underscores don't preserve well in HTTP header keys which is how Jaeger encodes baggage by default (i.e. uberctx__tenant-id - bad choice of key).

It might be easier to start a google doc and jam on it first to figure out the exact form of the solution and the work items / build order.

On Tuesday, June 19, 2018 at 6:54:18 PM UTC-4, Bruce Peterson wrote:

I hope this discussion on multi-tenancy is still alive. At AT&T our main use case is a little different. For our primary use case, span authentication is not needed. For our larger tenants, we will not share the data base. This should make a multi-tenant solution much easier. Let me outline our primary use case and then describe what we have done for a POC. I will call this case Shared Services for Large Tenants.

For this use case we have dozens of Tenants sharing many (100-1000) common Services. The Tenants (top level service requestors and root span originators) are very large (high volume) products/applications with the tenancy determined via application code. The Tenant will require their own collector/DB and perhaps may even require multiple collector/DB based on the location of the serving application instance. The tenancy (for all downstream services) is attached to each service request and therefore must be implemented via distributed context.

The Services themselves must not need to be aware of the baggage tag names or to explicitly extract the tenancy information from the context. All spans, including those generated by instrumented 3rd party libraries and frameworks, must automatically include the tenant information as span tags. Each Service will share its Jaeger Agent(s) between all of that service's Tenants. The Jaeger Agent will use the tenancy information in the spans to send those spans to the correct Tenant collector/DB. The Agents config file will contain the data to associate tenant information with a collectors host ip and port information.

In our POC we used specially named baggage tags that would automatically "self promote" to become span tags. These tags will travel with context across to all services and ,after self promoting, to each service's jaeger agent as a tag in every span. For example, any baggage tag that has a key starting with a double underscore, as in “__tenant-id”, will be automatically promoted to be a span tag in each (sub) Service. In order to self-promote, the client libraries must be modified to create the span tag from the baggage tag when the opentracing-span is being converted to a thrift-span. We modified and tested the go, java, node.js and python client libraries. Security and an API to hide the implementation could easily be added.

The Jaeger Agent was modified to hold a map of uber tchannel objects using a tenant name as the key. The Jaeger Agent will "route" the spans to different collector/DB's based on the value of a specified span tag. The configuration was modified to support this functionality. Example:

"--collector.host-port=default@192.168.99.100:30418,tenant1@192.168.99.101:30021,tenant2@192.168.99.101:30063"
"--collector.route-tag=__tenant-id"

Juraci Paixão Kröhling

unread,

Jun 20, 2018, 4:52:29 AM6/20/18

to Jaeger Tracing

Bruce,

That's a great write up, thanks for sharing! See my comments inline.

On Tue, 2018-06-19 at 15:54 -0700, Bruce Peterson wrote:
> In our POC we used specially named baggage tags that would
> automatically "self promote" to become span tags. These tags will
> travel with context across to all services and ,after self promoting,
> to each service's jaeger agent as a tag in every span. For example,
> any baggage tag that has a key starting with a double underscore, as
> in “__tenant-id”, will be automatically promoted to be a span tag in
> each (sub) Service. In order to self-promote, the client libraries
> must be modified to create the span tag from the baggage tag when the
> opentracing-span is being converted to a thrift-span. We modified and
> tested the go, java, node.js and python client libraries. Security
> and an API to hide the implementation could easily be added.
>

I would split this into its own feature, abstract from the tenancy
discussion.

> The Jaeger Agent was modified to hold a map of uber tchannel objects
> using a tenant name as the key. The Jaeger Agent will "route" the
> spans to different collector/DB's based on the value of a specified
> span tag. The configuration was modified to support this
> functionality. Example:
>
> "--collector.host-port=def...@192.168.99.100:30418,ten...@192.168.9
> 9.101:30021,ten...@192.168.99.101:30063"
> "--collector.route-tag=__tenant-id"
>

Have you tried building your own "span router" component? Something
like:

Client -> Agent -> Span Router -> Collector tenant 1

The Span Router would act just like you described above: get a list of
collectors and choose the appropriate one according to a tag within the
span.

Would you be willing to give it a try, if you haven't already?

- Juca.
>

Reply all

Reply to author

Forward