Nested datasources

74 views
Skip to first unread message

АМИР З

unread,
Sep 21, 2022, 5:03:08 AM9/21/22
to Druid User
Hi.
We are creating a data repository where users will upload various csv files. And it creates inconvenience by the fact that over time a huge number of data resources will be created with which it will be necessary to work. According to our idea, we will give each user one data resource, where new files will be added, i.e. data resource will be constantly replenished with new records and columns. We are also considering a variant with something like nested tables.
I would like to get advice from those who have implemented something like this.
Thanks in advance.

Mark Herrera

unread,
Sep 26, 2022, 4:03:55 PM9/26/22
to Druid User
Hi,

This might be a good starting point:

https://druid.apache.org/docs/latest/operations/security-user-auth.html

By any chance, is this related to your solution to your datasource log question from earlier this year?

Best,

Mark

АМИР З

unread,
Sep 30, 2022, 4:19:07 AM9/30/22
to Druid User
Hi. Thanks for the reply.

вторник, 27 сентября 2022 г. в 01:03:55 UTC+5, mark.h...@imply.io:
Unfortunately that's not exactly what I need. 
I need a solution on how to store the data of the files uploaded by the users. So far I have an idea to create one datasource for each user in which each column will be a json array of data from each new file. It goes something like this:
                         file_naem1, file_name2, file_name3
ingestion_date json_aray json_aray
ingestion_date json_aray json_aray
ingestion_date json_aray json_aray json_aray

But it is inconvenient in terms of data selection



By any chance, is this related to your solution to your datasource log question from earlier this year?
No, this query is not related to my previous question,

Mark Herrera

unread,
Oct 3, 2022, 6:02:39 PM10/3/22
to Druid User
I took the liberty of posting your question to Slack. One of the founders asked for some clarification and also suggested the following:
  • I'm not sure what "inconvenient in terms of data selection" means here — it would be helpful to know what kind of selection you are wanting to do?
  • the desire here sounds similar-ish to the desires people have when implementing multi-tenant workloads, so this doc may be useful: https://druid.apache.org/docs/latest/querying/multitenancy.html
    • it has some info about how to think about whether to use one giant datasource, or split it up
    • the doc highlights how to consider what kind of data management and retrieval operations may be necessary, and how that influences choice of design
Let us know if any of this helps with your use case.

Best,

Mark

АМИР З

unread,
Oct 6, 2022, 5:08:32 AM10/6/22
to Druid User
Hi.

вторник, 4 октября 2022 г. в 03:02:39 UTC+5, mark.h...@imply.io:
I took the liberty of posting your question to Slack. One of the founders asked for some clarification and also suggested the following:
  • I'm not sure what "inconvenient in terms of data selection" means here — it would be helpful to know what kind of selection you are wanting to do?
I meant the inconvenience of SQL queries to datasources when using nested columns.
Thank you, yes this article came in handy for describing the options we offer to create datasources to our customer.
The customer need to choose between shared datasources and datasource-per-tenant.
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages