notes about grafana, mqtt, influx, and weewx

413 views
Skip to first unread message

mwall

unread,
Sep 7, 2018, 4:57:51 PM9/7/18
to weewx-development
vince and i have been having a discussion about how to get data from weewx into grafana.  i figured there are others who could benefit from this discussion, and i'm sure there are some grafana/influx/mqtt users who could contribute.

vince's application is to compare data from three different types of weather station hardware.

my applications include tide monitoring (and prediction), tank level monitoring (freshwater supplies), weather station comparisons (cc3000, vantage, and others), and power monitoring (outback mate, victron vedirect, sunny webbox, and enphase envoy)

grafana is a wonderful tool for plotting data.  it will talk to many different database back-ends, but it is particularly nice to use with influx.  there is now a windrose plugin to grafana, so you can make weather-specific dashboards.  and of course there are all of the other graphic analysis tools.

influxdb is a nice way to store near-realtime data (e.g., loop data from weewx) as well as weewx archive data.  however, it is not obvious how one should structure the data within influxdb.  the general guideline is that an influx measurement is analogous to an SQL table, a tag is analogous to an indexed column, and a field is analogous to an unindex column.  but that still leaves many ways to structure the data.

you can use the weewx-mqtt extension to easily feed loop and/or archive data into an mqtt broker.  then you can use a subscriber such as telegraf to put the data into a timeseries data store such as influxdb.

or you can use the weewx-influx extension to feed loop and/or archive data directly into influxdb.

but what is the best way to structure the mqtt messages?  one observation per message?  or a json object in each message with a set of observations?

and what is the best way to structure data in influxdb?

if vince does not mind, i'd like to post some of our conversation here then continue it on this thread

m

Thomas Keffer

unread,
Sep 7, 2018, 6:25:04 PM9/7/18
to Matthew Wall, weewx-development
WeeRT uses InfluxDB to store both LOOP and archive data. 

The data model is discussed in the README. 

The measurement wxpackets is just the raw LOOP packets, one per row. Much like WeeWX, a timestamp and a unit system must be declared for each row, but, otherwise, there are no constraints. Partial packets cause no problems.

The measurement wxrecords is LOOP packets aggregated over a regular time interval, typically 5 minutes. At first, I used InfluxDB's "subsampling" ability to create these records. It sort of worked, but it timestamped the newly created subsampled records with the beginning of the time interval, which required all sorts of data gymnastics to fix. I finally gave up and had WeeRT do its own aggregations.

Regarding MQTT, I prefer a JSON object with a set of observations over single observations. Fewer packets, smaller InfluxDB storage requirements, and the packets are much more efficient (a timestamp and a unit system covers a set of observations, instead of just one observation). The only downside is that you can't precisely subscribe to a single observation, so in very narrowcast situations you transmit more data to subscribers than you might have to. OTOH, the reverse is also true for subscribers like the database, which want all the observations.

I am not a big fan of exposing any database on the internet. There should always be some trusted app between it and the wild west out there. That could be an MQTT broker. I chose not to do that with WeeRT because it would be yet another application that has to be running to make the whole thing work.

-tk

Vince Skahan

unread,
Sep 7, 2018, 6:33:42 PM9/7/18
to weewx-development
On Friday, September 7, 2018 at 1:57:51 PM UTC-7, mwall wrote:
if vince does not mind, i'd like to post some of our conversation here then continue it on this thread


Absolutely !

Some thoughts:
  • generating JSON-formatted MQTT topics as input is to me the ideal format
    • it's easy to generate and subscribe to the one(s) you want
    • it's easy to use/ignore the fields that you choose
  • having one 'listener' process to subscribe to topics and get them to influxdb is to me the ideal architecture
    • that means only that listener needs to actually talk to influxdb
    • that makes it easier to set up and easier to secure
    • albeit at the risk of single point of failure of course
  • minimizing the number of moving parts to receive/store/display the data makes automation much simpler
    • you could even cook up a few Docker containers to deploy the server-side stuff !
  • having the dashboard front-end visible, yet the collector/listener/database back end hidden seems wisest security-wise

Re: formats and influxdb:
  • I like the idea of every source of data being 'able' to have its own database
    • that means you could have different retention periods as needed.   More on that in a bit.....
  • For a weather station, I'd do something ala the weewx/loop topic its MQTT extension can publish
    • for the VP2 weewx setup here, my experience is the MQTT extension publishing weather/loop is the ideal for me
    • for the WeatherFlow station here, I cooked up MQTT topics lining up with what the hardware sends and how often:
      • "obs/air" = temp/humidity/lightning - once every 60 secs
      • "obs/sky" = wind/rain - once every 60 secs
      • "status/hub" = inside hub status like RSSI values - once every 10 secs
      • "status/air" and "status/sky" - outside device status like RSSI and battery - once every 10 secs
      • "obs/rapid_wind" = more frequent wind measurements - once every 3 secs
  • For my here-and-there instrumentation I have a little nodeMCU setup that sends temperature every 5 secs currently
    • That is JSON-formatted ala { "sensorID": "nodeMCU_<chipid>", "degF": 72.4 }
    • - the intent is to have the same code on multiple like setups around the house, with the sensor ID being the hardware id of the nodeMCU card
    • - that lets me use the same arduino code on each nodeMCU as long as I know where I put which card around the house :-)
Re: how many databases, my experience is that grafana is good at tearing apart which db stuff came from.   That lets you do things like say "give me outTemp from station1's db" as opposed to "give me outTemp as measured by nodeMCU card-12345" and use common terminology everywhere.   I also like the idea of saving things with different retention periods, similar notionally to the difference in how weewx keeps archive_day_rain (etc.) summary tables vs. the bigger main archive table itself with the main source data.

For a WeatherFlow station, I probably only care about obs/air and obs/sky for weather, but I could see keeping summary data of status/air and status/sky and status/hub type data.   I personally haven't found a good reason to watch the every-3-second rapid_wind needle moving on a display, but that would be fun too.  In that case I'd just have a couple gauges with the current speed+direction and not even bother with a graph.  Again to me a separate short-retention-time db for that seems to make sense.

For a WeatherFlow station, I 'could' generate a weewx(ish) loop structure with all the observations, and just fill in the data I have every time a UDP message is received.  In other words, have one big honkin' JSON data structure and fill in the blanks as the data is available (it would take 2 messages minimum to get the weather data, 3 more for device status, 1 more if you wanted rapid_wind too).

But there's no right way or wrong way I'm thinking, but generally the fewer moving parts the better, and the fewer technologies to do job-xyz the better...usually...


Vince Skahan

unread,
Sep 7, 2018, 6:57:57 PM9/7/18
to weewx-development
On Friday, September 7, 2018 at 3:25:04 PM UTC-7, Tom Keffer wrote:
WeeRT uses InfluxDB to store both LOOP and archive data. 

The data model is discussed in the README. 


I like the idea of a 'deep' packet which to me has the idea of additional metadata (tags) attached to it, ignoring the timestamp in ms difference due to influxdb speaking ms.

In my stuff, I try to remember to add have a weewx(ish) dateTime field intended to say when the data was sent from wherever, independent of whatever timestamp field the receiving software (influxdb) might attach to the 'meat' with whatever formatting that software speaks.

Possibly inconsistent examples of my current MQTT topics:

          #---- nodeMCU arduino publishing ds18b20 sensor values ---
     # espID indicates the source card that sent the data
# oops, no timestamp.  Doh !
#
$ mosquitto_sub -h mqtt -t esp/test
{"espID":10469339,"degF":75.425}

#---- WeatherFlow station 'air' device observations ----
# (this uses WeatherFlow API terminology)
#
$ mosquitto_sub -h mqtt -t "wf/obs/air"
{"battery": 3.52, "firmware_revision": 20, "lightning_strike_avg_distance": 0, "lightning_strike_count": 0, "relative_humidity": 53, "report_interval": 1, "station_pressure": 1005.0, "temperature": 23.33, "timestamp": 1536360581}

#---- WeatherFlow station 'sky' device observations ----
# this uses WeatherFlow API terminology
$ mosquitto_sub -h mqtt -t "wf/obs/sky"
{"battery": 3.47, "firmware_revision": 43, "illuminance": 83071, "precipitation_type": 0, "rain_accumulated": 0.0, "report_interval": 1, "solar_radiation": 692, "timestamp": 1536360608, "uv": 9.07, "wind_avg": 0.44, "wind_direction": 284, "wind_gust": 1.12, "wind_lull": 0.0, "wind_sample_interval": 3}

#---- weewx mqtt extension published info ----
# this uses the current weewx-mqtt extension, lightly hacked so
# the values are floats, not strings
#
$ mosquitto_sub -h mqtt -t "weather/loop"
{"monthET": 0.0, "heatindex": 75.9, "outHumidity": 49.0, "dayET": 0.0, "maxSolarRad": 530.366762221, "consBatteryVoltage": 4.64, "monthRain": 0.0, "insideAlarm": 0.0, "barometer": 29.985, "dateTime": 1536360900.0, "stormRain": 0.0, "sunrise": 1536327360.0, "windchill": 75.9, "dewpoint": 55.3501548838, "lowOutTemp": 75.3, "outsideAlarm1": 0.0, "altimeter": 29.9832616326, "outsideAlarm2": 0.0, "forecastRule": 187.0, "rainAlarm": 0.0, "inTemp": 74.0, "inHumidity": 49.0, "windSpeed10": 3.02, "hourRain": 0.0, "yearRain": 34.02, "soilLeafAlarm2": 0.0, "windGustDir": 0.0, "extraAlarm1": 0.0, "extraAlarm2": 0.0, "extraAlarm3": 0.0, "extraAlarm4": 0.0, "extraAlarm5": 0.0, "extraAlarm6": 0.0, "extraAlarm7": 0.0, "extraAlarm8": 0.0, "windrun": 41.0833333333, "rain": 0.0, "humidex": 80.9056035875, "forecastIcon": 3.0, "pressure": 29.5987380092, "rxCheckPercent": 99.0833333333, "ET": 0.0, "soilLeafAlarm4": 0.0, "trendIcon": -20.0, "rainRate": 0.0, "soilLeafAlarm3": 0.0, "usUnits": 1.0, "soilLeafAlarm1": 0.0, "leafWet4": 0.0, "txBatteryStatus": 0.0, "yearET": 0.0, "appTemp": 74.7472795468, "inDewpoint": 53.6117725505, "interval": 5.0, "dayRain": 0.0, "windDir": 0.0, "outTemp": 75.9, "windSpeed": 5.0, "sunset": 1536374220.0, "rain24": 0.0, "windGust": 18.0, "highOutTemp": 75.9, "cloudbase": 5035.41934459}



Vince Skahan

unread,
Sep 9, 2018, 2:48:00 PM9/9/18
to weewx-development
FWIW, one thing I noticed today is that you can have common schema for multiple MQTT topics, so even if telegraf puts everything in a mqtt_consumer database within influxdb by default, you can easily get the value you want in grafana by specifying which topic to pull it from.

So theoretically if you had a bunch of sensors with a schema ala { "sensorID": 12345, "degF": 72.3 } publishing to MQTT, as long as each sensor wrote to a different topic (perhaps sensor/12345 for illustration) you could get a nice multi-sensor multi-topic graph in grafana all from the same telegraf-ingested influxdb.


Reply all
Reply to author
Forward
0 new messages