[GSOC 2020] Few queries regarding 'Improving OpenWISP Monitoring'

79 views
Skip to first unread message

Hardik jain

unread,
Mar 18, 2020, 5:04:40 PM3/18/20
to OpenWISP
Hello, everyone.
My name is Hardik Jain. I had few queries regarding 'Improving OpenWISP Monitoring towards it's first release' in GSoC ideas 2020.
Here are the details:

1) TIMESERIESDATABASE : While the page clearly mentioned it was upon us to research and propose which database has to be used.
    I wanted a review upon my research. I am mentioning three time series databases, two were mentioned on the gsoc-ideas page, while one I found quite                 interesting.

   a) Prometheus : This is definitely an easy pick as database can be easily migrated from influxdb to it.
                               It has all the required qualities as well to be narrowed down further, an active open source project, friendly documentation and very few                                               dependencies.
   b) Opentsdb : This was the second database recommended in the ides list. From it's docs I have found out it is a good database but requires a Apache server                                 setup with Hadoop, Hbase, GNU Plot, Java runtime environment (many dependencies leading to unnecessary maintenance) ideal with Linux                              distribution. Also, the other day I was reading Apache v/s NGINIX (clearly many clients might not have Apache servers)
   c) Victorial-Metrics : All the good reasons to consider it, benchmark tests (Please check this out!) and the documentation says a lot. Though it's not old and thus                                       unused by that large of an audience right now also doesn't have a very active community (as it's new) and good documentation. Code                                                might not be quite stable as I read in a few reviews so I think it's a good option but moving to it would be early now.

My Recommendation :
If a large majority of clients use Apache server (less possibility), Opentsdb won't be a bad choice (Prometheus says so) else my recommendation would be Prometheus over Victoria-metrics, owing to the former's higher stability over the latter and good documentation with no external dependencies (docs mentioned it to be standalone) leaving time for completing other tasks on the list (rather than putting great efforts to explore the database on our own which I think can lead to good wastage of time and efforts and might not be the primary objective). Also, it's mentioned in Victoria-metrics documentation(link same as above) that they allow easy migration from prometheus to their database. So, this can be achieved later on too.

2) Query parameter of graph:
Right now the query parameter of a chart (now called Graph) is editable via the django admin. This poses several issues: from UX (it's hard to deal with it) to security (the query could be manipulated to get data to which the user does not have access to).

I agree with this point (honestly was going to open an issue when I noticed that anything in the database can be accessed :smiley:, until I read this). I also, get it that it is not quite friendly for the end-user.

We need to refactor this part so that the user can choose among a predefined set of queries, but the list should be customizable using a Django setting, so that users can implement their own queries if needed.

I understand that these predefined queries can be sorted later when the coding period starts, though just for a very small clarification I would like to know if users can add their custom queries via modifying a simple Django setting won't then the same risk of giving users access to data that shouldn't be accessible to them once again occur? If, so I think adding necessary restrictions and defining the boundaries of these customization would be essential.

3) Documentation:
This module is in great need of a good documentation. Just for an example though I was able to build it correctly on my local pc, some unit tests are failing when I tested them, Now, I will have to try the hard way debugging :(
I will post an update on this soon.

4) A basic query, can we add a few modifications to what is described in the original idea (would, surely discuss it here before sending a draft proposal).

5) Mentors: I would be extremely grateful, if someone can please suggest me probable mentors for this project. Most of my PRs were reviewed by @nemesisdesign and @atb00ker. Though I am unsure which mentors shall be participating in GSOC 2020 and for which project. So, any help in this regard would be highly appreciated :grinning:

Exremely, sorry for being too lengthy, just wanted to be clear with small details that might matter much later on.
(Also, I might trouble a bit later on too with few more small queries on this thread as I am exploring the idea/project, so please bear with me for a while.)

Thanks in advance!

Best regards
Hardik Jain
:)

Hardik jain

unread,
Mar 19, 2020, 10:50:40 AM3/19/20
to OpenWISP
Adding a few more queries, :sweat_smile:

6)Adding backward compatitability for renaming models :
Rename the model which is now called Threshold to AlertSettings, in a backward compatible way (eg: provide a data migration which creates the alert table first, copies the data from the threshold table into alert and then remove the old threshold table, or an equivalently working alternative).

My query is, in newer versions of Django (>2), it automatically detects renaming of models without causing any data loss. I have verified it first handed by simply running `makemigrations'.
$ python manage.py makemigrations
Did you rename the monitoring.Threshold model to AlertSettings? [y/N]
So, I wanted to know would this alternate way be acceptable as creating a new data table and deleting the old one won't be required if we are simply renaming the model without any data loss.
Similar case for renaming model `Graph` to `Chart`.

7)Monitoring Checks section in Device Admin :
This should involve checks from network monitoring checks (just want to confirm) which currently has only one check type `ping` to check for `rtt`. Also, after this should the `Network Monitoring Section` still be visible on the admin home page?


Hardik jain

unread,
Mar 20, 2020, 1:05:35 PM3/20/20
to OpenWISP
8) Adding possibility to collect memory (RAM), CPU and flash (disk space) usage of devices via the API:
I did a bit research to find out that currently only openwrt (linux based) is supported by openwisp-controller as a backend (LEDE has been removed in latest release, if I am not wrong). So, I had a simple way of collecting the data, I do it very often on my Ubuntu to check details when it hangs (sometimes) :smile:. Create a system_check.sh shell script.
system_check.sh file
# !/bin/sh
free
-m | awk 'NR==2{printf "Memory Usage: %s/%sMB (%.2f%%)\n", $3,$2,$3*100/$2 }'
df
-h | awk '$NF=="/"{printf "Disk Usage: %d/%dGB (%s)\n", $3,$2,$5}'
top
-bn1 | grep load | awk '{printf "CPU Usage: %.0f%%\n", $(NF-2)*100}'

# convert it into an executable
chmod
+x system_check.sh

# This results in:

~$ ./system_check.sh
Memory Usage: 6156/7676MB (80.20%)
Disk Usage: 27/187GB (15%)
CPU
Usage: 74%

My query is will it be acceptable if I use this simple shell script to import data in a python file in api directory, convert it into a JSON object (keeping in mind NETJSON device_monitoring spec.) or an alternative approach would be required here.

Best Regards
Hardik Jain
:)

Federico Capoano

unread,
Mar 21, 2020, 1:27:39 PM3/21/20
to OpenWISP
There's a better way to do it on OpenWRT, try this:

ubus call system info

I forgot to share some scripts I implemented to collect these metrics: https://github.com/openwisp/lua-monitoring

Federico

Federico Capoano

unread,
Mar 21, 2020, 1:28:45 PM3/21/20
to OpenWISP
On Wed, Mar 18, 2020 at 4:04 PM Hardik jain <hardikas...@gmail.com> wrote:
Hello, everyone.
My name is Hardik Jain. I had few queries regarding 'Improving OpenWISP Monitoring towards it's first release' in GSoC ideas 2020.
Here are the details:

1) TIMESERIESDATABASE : While the page clearly mentioned it was upon us to research and propose which database has to be used.
    I wanted a review upon my research. I am mentioning three time series databases, two were mentioned on the gsoc-ideas page, while one I found quite                 interesting.

   a) Prometheus : This is definitely an easy pick as database can be easily migrated from influxdb to it.
                               It has all the required qualities as well to be narrowed down further, an active open source project, friendly documentation and very few                                               dependencies.
   b) Opentsdb : This was the second database recommended in the ides list. From it's docs I have found out it is a good database but requires a Apache server                                 setup with Hadoop, Hbase, GNU Plot, Java runtime environment (many dependencies leading to unnecessary maintenance) ideal with Linux                              distribution. Also, the other day I was reading Apache v/s NGINIX (clearly many clients might not have Apache servers)
   c) Victorial-Metrics : All the good reasons to consider it, benchmark tests (Please check this out!) and the documentation says a lot. Though it's not old and thus                                       unused by that large of an audience right now also doesn't have a very active community (as it's new) and good documentation. Code                                                might not be quite stable as I read in a few reviews so I think it's a good option but moving to it would be early now.

My Recommendation :
If a large majority of clients use Apache server (less possibility), Opentsdb won't be a bad choice (Prometheus says so) else my recommendation would be Prometheus over Victoria-metrics, owing to the former's higher stability over the latter and good documentation with no external dependencies (docs mentioned it to be standalone) leaving time for completing other tasks on the list (rather than putting great efforts to explore the database on our own which I think can lead to good wastage of time and efforts and might not be the primary objective). Also, it's mentioned in Victoria-metrics documentation(link same as above) that they allow easy migration from prometheus to their database. So, this can be achieved later on too.
 
Nice analysis.

The most important points to keep in mind are:

- we need to implement the current features and make them both work for Prometheus and Influxdb
- the new timeseries DB that we will support needs to provide an open source solution for horizontal scaling, because influxdb does not provide it (has a paid cloud service instead, which is not a viable option for many openwisp users)

2) Query parameter of graph:
Right now the query parameter of a chart (now called Graph) is editable via the django admin. This poses several issues: from UX (it's hard to deal with it) to security (the query could be manipulated to get data to which the user does not have access to).

I agree with this point (honestly was going to open an issue when I noticed that anything in the database can be accessed :smiley:, until I read this). I also, get it that it is not quite friendly for the end-user.

We need to refactor this part so that the user can choose among a predefined set of queries, but the list should be customizable using a Django setting, so that users can implement their own queries if needed.

I understand that these predefined queries can be sorted later when the coding period starts, though just for a very small clarification I would like to know if users can add their custom queries via modifying a simple Django setting won't then the same risk of giving users access to data that shouldn't be accessible to them once again occur? If, so I think adding necessary restrictions and defining the boundaries of these customization would be essential.

Django settings can be manipulated only by system administrators, who are IT people and should know what they're doing.
We will only need to provide some basic documentation giving an example of how to write an additional chart query and add it to the settings, showing the resulting chart.

The rest of the users using the system (think about organizations that operate the network and use OpenWISP to perform basic management operations like updating the configuration on monitoring the statistics) will only be able to select from the predefined set of chart queries available.

In most cases, non technical users won't have access to this part of the system, they'll only see the charts.
 
3) Documentation:
This module is in great need of a good documentation. Just for an example though I was able to build it correctly on my local pc, some unit tests are failing when I tested them, Now, I will have to try the hard way debugging :(
I will post an update on this soon.
 
That's one of the purpose of this project.

4) A basic query, can we add a few modifications to what is described in the original idea (would, surely discuss it here before sending a draft proposal).

Yes you can make improvement proposals as long as they're in scope with the project and help to achieve the goal. 

5) Mentors: I would be extremely grateful, if someone can please suggest me probable mentors for this project. Most of my PRs were reviewed by @nemesisdesign and @atb00ker. Though I am unsure which mentors shall be participating in GSOC 2020 and for which project. So, any help in this regard would be highly appreciated :grinning:
 
I developed the alpha of this module and I will be mentoring it.
There should be another backup mentor as well and will share the news when I have more info.

[cut]


On Thu, Mar 19, 2020 at 9:50 AM Hardik jain <hardikas...@gmail.com> wrote:
Adding a few more queries, :sweat_smile:

6)Adding backward compatitability for renaming models :
Rename the model which is now called Threshold to AlertSettings, in a backward compatible way (eg: provide a data migration which creates the alert table first, copies the data from the threshold table into alert and then remove the old threshold table, or an equivalently working alternative).

My query is, in newer versions of Django (>2), it automatically detects renaming of models without causing any data loss. I have verified it first handed by simply running `makemigrations'.
$ python manage.py makemigrations
Did you rename the monitoring.Threshold model to AlertSettings? [y/N]
So, I wanted to know would this alternate way be acceptable as creating a new data table and deleting the old one won't be required if we are simply renaming the model without any data loss.
Similar case for renaming model `Graph` to `Chart`.

That's great, if the table can be renamed with minimum effort we should go for it.

7)Monitoring Checks section in Device Admin
This should involve checks from network monitoring checks (just want to confirm) which currently has only one check type `ping` to check for `rtt`. Also, after this should the `Network Monitoring Section` still be visible on the admin home page?
 
Maybe, maybe not, we can decide later in the project depending on the results we get.
This is an easy change to do if needed.


On Fri, Mar 20, 2020 at 12:05 PM Hardik jain <hardikas...@gmail.com> wrote:
8) Adding possibility to collect memory (RAM), CPU and flash (disk space) usage of devices via the API:
I did a bit research to find out that currently only openwrt (linux based) is supported by openwisp-controller as a backend (LEDE has been removed in latest release, if I am not wrong).

LEDE was a fork of OpenWRT, it was then merged back into OpenWRT and the project avoided to split.

Federico

HARDIK ASHISH JAIN

unread,
Mar 23, 2020, 5:42:33 AM3/23/20
to OpenWISP
Thanks for the response, Federico. I have replied to most of the points you had pointed out to and also hope to send a draft proposal explaining them in much more details soon!

- we need to implement the current features and make them both work for Prometheus and Influxdb

I have read it the documentation of Prometheus in more detail, it says that most of the data compression algorithms, query parameters and structure of the Prometheus is similar to InfluxDB which makes it easy for migration. Also, Prometheus suppoerts many InfluxData APIs(company behind InfluxDB) to make it easily integrable with other features that InfluxDB supports such as Grafana(currently these modules are not being used in the project but in future if they are included then too it would not cause any difficulty). So, it seems that our features would be compatible with Prometheus too.

- the new timeseries DB that we will support needs to provide an open source solution for horizontal scaling, because influxdb does not provide it (has a paid cloud service instead, which is not a viable option for many openwisp users)

I looked into this though Prometheus originally doesn't support horizontal scaling, there are many good tools available(open-source) which many organizations use to achieve this. Namely, Cortex, Thanos, Victoria-Metrics(I just found out it is also built on top of Prometheus!), etc. Among these Thanos(completely open-source) is quite popular and Victoria-metrics docs were not clear whether it is available free or on Paid version. So, I think that this can be discussed later on too as to what would be best to be used on top of Prometheus to enable Horizontal-scaling (as there are many options available).

Django settings can be manipulated only by system administrators, who are IT people and should know what they're doing.
We will only need to provide some basic documentation giving an example of how to write an additional chart query and add it to the settings, showing the resulting chart.

The rest of the users using the system (think about organizations that operate the network and use OpenWISP to perform basic management operations like updating the configuration on monitoring the statistics) will only be able to select from the predefined set of chart queries available.

In most cases, non technical users won't have access to this part of the system, they'll only see the charts.

I have prepared a workaround which seemed to me a bit better than having a big list to select from for the end-user. Since, it is a bit lengthy I have included it in my draft proposal which I will be sending very soon :)

Yes you can make improvement proposals as long as they're in scope with the project and help to achieve the goal.

Great!! Just wanted to know, can we use FusionCharts in this module. Currently, many charts have to be created/exported and this will involve changing dependency from plotly (which is currently being used) to FusionCharts. Plotly is great and lightweight too but fusion charts allows to select from a wide variety of charts and also allows us to provide it data in JSON format. We can export data in PNG/Pdf/SVG,etc. formats with a preview.

There's a better way to do it on OpenWRT, try this:
 
ubus call system info

I forgot to share some scripts I implemented to collect these metrics: https://github.com/openwisp/lua-monitoring

Thanks for sharing, this command is surely better. Also, now I won't have to write a script for converting received data into NETJSON standard format. As, the script provided took care of the same :)

LEDE was a fork of OpenWRT, it was then merged back into OpenWRT and the project avoided to split.

Thanks for sharing!

Hardik jain

unread,
Mar 23, 2020, 11:31:00 AM3/23/20
to OpenWISP
Hi Federico, I have carefully gone through your suggestions and submitted a draft proposal taking them into consideration. I still think that the proposal though covering all measurable outcomes on the ideas page might miss a few points which I might not be aware of(eg. I wasn't aware earlier that horizontal scalability is a fundamental need). So , please review it when you are free.
Reply all
Reply to author
Forward
0 new messages