Datasources logs

63 views
Skip to first unread message

АМИР З

unread,
Aug 23, 2022, 10:00:12 AM8/23/22
to Druid User
Hi all.
I have a question.
Where can I find data about created datasources except sys.tasks, I need the extended version, where I can see under which account the datasource was created.

Thank you in advance!

Mark Herrera

unread,
Aug 23, 2022, 3:42:37 PM8/23/22
to Druid User
Hi,

Something like request logging, but to see who created a specific datasource?

Best,

Mark

АМИР З

unread,
Aug 24, 2022, 2:13:56 AM8/24/22
to Druid User
Hi, Mark. Tanks fo you attention.
Look, there is an web interface that will upload files to the druid. And we want to have information about who, when, from which ip uploaded the file.
I found some audit info in sys.tasks and logs but it's not enough.

среда, 24 августа 2022 г. в 00:42:37 UTC+5, mark.h...@imply.io:

Sergio Ferragut

unread,
Aug 24, 2022, 9:33:56 PM8/24/22
to Druid User
Hi amiros,

I'm not sure whether the contents of these tables cover your needs, but I just learned here: https://github.com/apache/druid/issues/5859
that druid_auditdruid_tasklogs tables exist and might be helpful for you.
I don't think these tables are directly accessible from the Druid UI in "sys", but rather directly in the Metadata Store database.

Let us know if this helps. If not, let us know anyway, it is interesting to capture missing audit requirements for the Apache Druid project.

Thanks,
Sergio 

Mark Herrera

unread,
Aug 25, 2022, 10:19:03 AM8/25/22
to Druid User
Thanks, Sergio!

Adding to what Sergio said, another colleague offered the following:
  • I wonder if they would need to actually look at the web logs for the API
  • Which for that matter makes me wonder if it’s logged in the Overlord log somewhere when someone submits an ingestion task?
If you have time and are able, please share you results. I reached out to several people about this question, and there's some interest in documenting this.

АМИР З

unread,
Aug 25, 2022, 2:13:01 PM8/25/22
to Druid User
Thanks, Sergio and Mark!

We decided to do something different, we do not have time to look into the configuration of the Druid, not being sure that we will find what we want. We'll create our own audit tables and write the data we need into them during the file upload event.

четверг, 25 августа 2022 г. в 19:19:03 UTC+5, mark.h...@imply.io:

Sergio Ferragut

unread,
Aug 25, 2022, 4:40:26 PM8/25/22
to Druid User
That sounds good, it will be interesting for you to flesh out the requirements and implementation of that and perhaps then look at contributing it back to the Apache Druid project. 

One thought is that in the sys.tasks table covers some of your requirements but not all, perhaps extending the schema on that table and adjusting the code the feeds it to add missing columns. 
It currently covers the what in `datasource`, the when in the `created_time` field.
It seems like you need `username` and `source_ipaddr` to complete the picture.

I am no expert in the code, but it does seem like the Overlord is the one logging into the table and should be extendable if the missing info is readily available. 
Reply all
Reply to author
Forward
0 new messages