Hey Jakub,
You can have a Druid schema that doesn't explicitly specify dimensions, and in that case any field not already specified as a timestamp or metric will be ingested as a dimension. This feature is often helpful in your case.
When deciding whether to use a shared datasources, or a datasource per tenant, the considerations are usually:
Pros of datasources per tenant:
- Each datasource can have its own schema, its own backfills, its own partitioning rules, and its own data load rules
- Queries can be faster since there will be fewer segments to examine for a typical tenant's query
- You get the most flexibility
Pros of shared datasources:
- Each datasource requires its own JVMs for realtime indexing
- Each datasource requires its own YARN resources for hadoop batch jobs
- Each datasource requires its own segment files on disk
- For these reasons it can be wasteful to have a very large number of small datasources