m3db deployment questions

197 views
Skip to first unread message

Aimilios Tsouvelekakis

unread,
May 3, 2021, 5:09:50 AM5/3/21
to M3

Good morning,

I am looking m3db as a replacement to some influx high cardinality databases and I have a few questions when going though the documentation:

1) As fas ar I see, influx is supported but has not been documented. I guess using the influxdb protocol should be enough
2) m3db supports 1 database per process. This means that there are no schemas and the only thing inside the database is namespaces (a relation to tables if we want a classic rdbms example). is that correct?
3) Does m3db support multiple users login under the same database? I could not find any information on that. If yes, can it be done through a cli?
4) if I run a single node m3db do I need the rest of the components like m3coordinator / m3aggregator / m3query? For the last one I think no because I can think i can see data directly visualized in grafana as a prometheus source. Am I right?
For the first two components I believe they are useful only for cluster mode.

I may come back with a few more questions if i need to understand more.

Thank you,
Aimilios

Asaf Mesika

unread,
May 3, 2021, 6:12:06 AM5/3/21
to Aimilios Tsouvelekakis, M3
Hi Aimilios,

I can answer only question I have answers to :)

2. M3DB has the notion of Namespaces, which gives you the ability to separate your data into different "logical" units so to speak, BUT if you wish to work with Aggregations, then you can't really. A cluster works with a set of namespaces, each representing a different pair of (resolution, retention). Normally "default" named namespace is raw data, and the rest are at your disposal like (1min, 7d), (10min, 5months), etc.
4. I'm not entirely sure it's possible to run a single node of m3db and being able to write, since it requires by default Replication Factor of 3. m3query contains the actually query engine, you can either run it or run the coordinator which has m3query embedded in it. m3aggregator is the process doing the actual aggregation. You also do without it, since it is embedded in the coordinator. For production grade, run them all, as they consume memory, have resilience, etc.


--
You received this message because you are subscribed to the Google Groups "M3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to m3db+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/m3db/543fe93f-a357-4018-90ee-49204b958315n%40googlegroups.com.

Aimilios Tsouvelekakis

unread,
May 3, 2021, 8:24:10 AM5/3/21
to M3
Hi Asaf,

at first thank you for your answers. Let me elaborate a bit more.

About point 2: It is reasonable (at least in my opinion) not to be able to aggregate between different namespaces. Let's say that I have an influx db with cpu measurements and ram measurements in different tables so this means that I need two different namespaces in m3db. So i want to aggregate between cpu measurements and ram measurements, not a mix of them, besides I am getting the idea wrong. Extending my point, if I want to merge 2 influx db instances where I have let's say cpu measurements for bare metal and vm and ram measurements for bare metal and vm (1 influx for bare metal 1 for vm), I cannot put them in the same m3db database because I cannot get different people to access the data. That is the reason why I asked about schemas. The table of namespaces as you say it mostly represents the logical unit, I have a cpu namespace, a ram namespace, a network namespace.

About point 4: I used the documentation to run m3db in single and I made a couple of writes with the HTTP api which should not be used for production level. While I like the idea of m3db I see that you need quite a few resources. In my case:
- A node (either VM or k8s pod) where telegraf plus m3coordinator run
- 3 nodes  (either VM or k8s pod) for m3db in cluster mode

And I am not sure where exactly m3query and m3aggregator if needed should run. Consequently to replace 1 influx database using 1 node I need 4 nodes. Am i missing something?

Thank you,
Aimilios

Asaf Mesika

unread,
May 5, 2021, 6:26:01 AM5/5/21
to Aimilios Tsouvelekakis, M3
Let's go over point by point Aimilos,

Point 2 - If I understand your requirement correctly, you would like to control who can access the data, hence you want to split it to logical units so you can say this team can access this data, and the other team can access that data. This is perfectly valid, yet M3DB AFAIK (I'm not a maintainer), doesn't yet have any access control of any kind, so: no users, no roles/permissions, as opposed to InfluxDB. This is something I believe can be added by the community but someone needs to take it up. Regarding 2nd requirement of having 2 logical data units, and have them aggregated separately - as I mentioned before - it's not currently supported.
I can suggest a work-around:
a) Add a label, like team=A, to distinguish between the logical data units. So all metrics for team A will have additional label team=A. M3DB supports adding this label for you on any query, by specifying the filter on an HTTP header when you query, so it saves you from parsing the query to add it.
b) Add service which sits in front of M3Query, which will be in charge of authentication, and authorization. The action items of this service would be to add the filter on the label you use like "team"

Point 4 - essentially you can run it all in one - I think I read some where it can be done. Alternatively you can run all docker containers on the same machine. I think M3DB should be improved on that front so it will easy to run it a single process, so you can get started easily like you do with InfluxDB. The good thing will be when you want to scale out, then M3DB can support it while InfluxDB doesn't.




Reply all
Reply to author
Forward
0 new messages