The original idea for this implementation came from the rt2x00 driver
(rt2x00link.c). I would like to use it in several places in the mac80211 and
ath5k code and I hope it can be useful in many other places in the kernel code.
Signed-off-by: Bruno Randolf <b...@einfach.org>
Reviewed-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
--
v7: Include bug.h. ewma_init() returns void. Use CONFIG_AVERAGE.
v6: Fixed ULONG_MAX in comment. Add reviewed by KOSAKI Motohiro.
v5: Use unsigned long instead of insigned int.
v4: Initialize internal variable to 0. Remove unneeded const qualifiers.
v3: Addressing Andrew Mortons comments: Implement in lib/average.c and make
access and initalization functions. Use unsigned int for values. Rename
functions to ewma_* since there might be other moving average
implementations which are not exponentially weighted.
v2: Renamed 'samples' to 'weight'. Added more documentation. Use avg_val
pointer. Add a WARN_ON_ONCE for invalid values of 'weight'. Divide
and round up/down.
---
include/linux/average.h | 32 ++++++++++++++++++++++++++
lib/Kconfig | 3 ++
lib/Makefile | 2 ++
lib/average.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 94 insertions(+), 0 deletions(-)
create mode 100644 include/linux/average.h
create mode 100644 lib/average.c
diff --git a/include/linux/average.h b/include/linux/average.h
new file mode 100644
index 0000000..ee3da51
--- /dev/null
+++ b/include/linux/average.h
@@ -0,0 +1,32 @@
+#ifndef _LINUX_AVERAGE_H
+#define _LINUX_AVERAGE_H
+
+#include <linux/kernel.h>
+
+/* Exponentially weighted moving average (EWMA) */
+
+/* For more documentation see lib/average.c */
+
+struct ewma {
+ unsigned long internal;
+ unsigned long factor;
+ unsigned long weight;
+};
+
+extern void ewma_init(struct ewma *avg, unsigned long factor,
+ unsigned long weight);
+
+extern struct ewma *ewma_add(struct ewma *avg, unsigned long val);
+
+/**
+ * ewma_get() - Get average value
+ * @avg: Average structure
+ *
+ * Returns the average value held in @avg.
+ */
+static inline unsigned long ewma_get(const struct ewma *avg)
+{
+ return DIV_ROUND_CLOSEST(avg->internal, avg->factor);
+}
+
+#endif /* _LINUX_AVERAGE_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index fa9bf2c..3116aa6 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -210,4 +210,7 @@ config GENERIC_ATOMIC64
config LRU_CACHE
tristate
+config AVERAGE
+ bool
+
endmenu
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..76d3b85 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ obj-$(CONFIG_GENERIC_ATOMIC64) += atomic64.o
obj-$(CONFIG_ATOMIC64_SELFTEST) += atomic64_test.o
+obj-$(CONFIG_AVERAGE) += average.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h
diff --git a/lib/average.c b/lib/average.c
new file mode 100644
index 0000000..f1d1b46
--- /dev/null
+++ b/lib/average.c
@@ -0,0 +1,57 @@
+/*
+ * lib/average.c
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/average.h>
+#include <linux/bug.h>
+
+/**
+ * DOC: Exponentially Weighted Moving Average (EWMA)
+ *
+ * These are generic functions for calculating Exponentially Weighted Moving
+ * Averages (EWMA). We keep a structure with the EWMA parameters and a scaled
+ * up internal representation of the average value to prevent rounding errors.
+ * The factor for scaling up and the exponential weight (or decay rate) have to
+ * be specified thru the init fuction. The structure should not be accessed
+ * directly but only thru the helper functions.
+ */
+
+/**
+ * ewma_init() - Initialize EWMA parameters
+ * @avg: Average structure
+ * @factor: Factor to use for the scaled up internal value. The maximum value
+ * of averages can be ULONG_MAX/(factor*weight).
+ * @weight: Exponential weight, or decay rate. This defines how fast the
+ * influence of older values decreases. Has to be bigger than 1.
+ *
+ * Initialize the EWMA parameters for a given struct ewma @avg.
+ */
+void ewma_init(struct ewma *avg, unsigned long factor, unsigned long weight)
+{
+ WARN_ON(weight <= 1 || factor == 0);
+ avg->internal = 0;
+ avg->weight = weight;
+ avg->factor = factor;
+}
+EXPORT_SYMBOL(ewma_init);
+
+/**
+ * ewma_add() - Exponentially weighted moving average (EWMA)
+ * @avg: Average structure
+ * @val: Current value
+ *
+ * Add a sample to the average.
+ */
+struct ewma *ewma_add(struct ewma *avg, unsigned long val)
+{
+ avg->internal = avg->internal ?
+ (((avg->internal * (avg->weight - 1)) +
+ (val * avg->factor)) / avg->weight) :
+ (val * avg->factor);
+ return avg;
+}
+EXPORT_SYMBOL(ewma_add);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Signed-off-by: Bruno Randolf <b...@einfach.org>
---
drivers/net/wireless/ath/ath5k/Kconfig | 1 +
drivers/net/wireless/ath/ath5k/ani.c | 4 ++--
drivers/net/wireless/ath/ath5k/ath5k.h | 26 ++------------------------
drivers/net/wireless/ath/ath5k/base.c | 4 ++--
drivers/net/wireless/ath/ath5k/debug.c | 2 +-
5 files changed, 8 insertions(+), 29 deletions(-)
diff --git a/drivers/net/wireless/ath/ath5k/Kconfig b/drivers/net/wireless/ath/ath5k/Kconfig
index eb83b7b..4784457 100644
--- a/drivers/net/wireless/ath/ath5k/Kconfig
+++ b/drivers/net/wireless/ath/ath5k/Kconfig
@@ -4,6 +4,7 @@ config ATH5K
select MAC80211_LEDS
select LEDS_CLASS
select NEW_LEDS
+ select AVERAGE
---help---
This module adds support for wireless adapters based on
Atheros 5xxx chipset.
diff --git a/drivers/net/wireless/ath/ath5k/ani.c b/drivers/net/wireless/ath/ath5k/ani.c
index f141919..545e489 100644
--- a/drivers/net/wireless/ath/ath5k/ani.c
+++ b/drivers/net/wireless/ath/ath5k/ani.c
@@ -216,7 +216,7 @@ static void
ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
bool ofdm_trigger)
{
- int rssi = ah->ah_beacon_rssi_avg.avg;
+ int rssi = ewma_get(&ah->ah_beacon_rssi_avg);
ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "raise immunity (%s)",
ofdm_trigger ? "ODFM" : "CCK");
@@ -301,7 +301,7 @@ ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
static void
ath5k_ani_lower_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as)
{
- int rssi = ah->ah_beacon_rssi_avg.avg;
+ int rssi = ewma_get(&ah->ah_beacon_rssi_avg);
ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "lower immunity");
diff --git a/drivers/net/wireless/ath/ath5k/ath5k.h b/drivers/net/wireless/ath/ath5k/ath5k.h
index 308b79e..2718136 100644
--- a/drivers/net/wireless/ath/ath5k/ath5k.h
+++ b/drivers/net/wireless/ath/ath5k/ath5k.h
@@ -25,6 +25,7 @@
#include <linux/io.h>
#include <linux/types.h>
+#include <linux/average.h>
#include <net/mac80211.h>
/* RX/TX descriptor hw structs
@@ -1102,7 +1103,7 @@ struct ath5k_hw {
struct ath5k_nfcal_hist ah_nfcal_hist;
/* average beacon RSSI in our BSS (used by ANI) */
- struct ath5k_avg_val ah_beacon_rssi_avg;
+ struct ewma ah_beacon_rssi_avg;
/* noise floor from last periodic calibration */
s32 ah_noise_floor;
@@ -1315,27 +1316,4 @@ static inline u32 ath5k_hw_bitswap(u32 val, unsigned int bits)
return retval;
}
-#define AVG_SAMPLES 8
-#define AVG_FACTOR 1000
-
-/**
- * ath5k_moving_average - Exponentially weighted moving average
- * @avg: average structure
- * @val: current value
- *
- * This implementation make use of a struct ath5k_avg_val to prevent rounding
- * errors.
- */
-static inline struct ath5k_avg_val
-ath5k_moving_average(const struct ath5k_avg_val avg, const int val)
-{
- struct ath5k_avg_val new;
- new.avg_weight = avg.avg_weight ?
- (((avg.avg_weight * ((AVG_SAMPLES) - 1)) +
- (val * (AVG_FACTOR))) / (AVG_SAMPLES)) :
- (val * (AVG_FACTOR));
- new.avg = new.avg_weight / (AVG_FACTOR);
- return new;
-}
-
#endif
diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index 4d21202..ae069b7 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -1307,8 +1307,7 @@ ath5k_update_beacon_rssi(struct ath5k_softc *sc, struct sk_buff *skb, int rssi)
memcmp(mgmt->bssid, common->curbssid, ETH_ALEN) != 0)
return;
- ah->ah_beacon_rssi_avg = ath5k_moving_average(ah->ah_beacon_rssi_avg,
- rssi);
+ ewma_add(&ah->ah_beacon_rssi_avg, rssi);
/* in IBSS mode we should keep RSSI statistics per neighbour */
/* le16_to_cpu(mgmt->u.beacon.capab_info) & WLAN_CAPABILITY_IBSS */
@@ -2562,6 +2561,7 @@ ath5k_reset(struct ath5k_softc *sc, struct ieee80211_channel *chan)
ah->ah_cal_next_full = jiffies;
ah->ah_cal_next_ani = jiffies;
ah->ah_cal_next_nf = jiffies;
+ ewma_init(&ah->ah_beacon_rssi_avg, 1000, 8);
/*
* Change channels and update the h/w rate map if we're switching;
diff --git a/drivers/net/wireless/ath/ath5k/debug.c b/drivers/net/wireless/ath/ath5k/debug.c
index 54dcf77..043cb15 100644
--- a/drivers/net/wireless/ath/ath5k/debug.c
+++ b/drivers/net/wireless/ath/ath5k/debug.c
@@ -719,7 +719,7 @@ static ssize_t read_file_ani(struct file *file, char __user *user_buf,
st->mib_intr);
len += snprintf(buf+len, sizeof(buf)-len,
"beacon RSSI average:\t%d\n",
- sc->ah->ah_beacon_rssi_avg.avg);
+ (int)ewma_get(&sc->ah->ah_beacon_rssi_avg));
#define CC_PRINT(_struct, _field) \
_struct._field, \
This uses the recently added generic EWMA library function.
Signed-off-by: Bruno Randolf <b...@einfach.org>
---
include/linux/nl80211.h | 2 ++
include/net/cfg80211.h | 4 ++++
net/mac80211/Kconfig | 1 +
net/mac80211/cfg.c | 3 ++-
net/mac80211/rx.c | 1 +
net/mac80211/sta_info.c | 2 ++
net/mac80211/sta_info.h | 3 +++
net/wireless/nl80211.c | 3 +++
8 files changed, 18 insertions(+), 1 deletions(-)
diff --git a/include/linux/nl80211.h b/include/linux/nl80211.h
index fb877b5..0ceb552 100644
--- a/include/linux/nl80211.h
+++ b/include/linux/nl80211.h
@@ -1132,6 +1132,7 @@ enum nl80211_rate_info {
* @__NL80211_STA_INFO_AFTER_LAST: internal
* @NL80211_STA_INFO_MAX: highest possible station info attribute
* @NL80211_STA_INFO_SIGNAL: signal strength of last received PPDU (u8, dBm)
+ * @NL80211_STA_INFO_SIGNAL_AVG: signal strength average (u8, dBm)
* @NL80211_STA_INFO_TX_BITRATE: current unicast tx rate, nested attribute
* containing info as possible, see &enum nl80211_sta_info_txrate.
* @NL80211_STA_INFO_RX_PACKETS: total received packet (u32, from this station)
@@ -1149,6 +1150,7 @@ enum nl80211_sta_info {
NL80211_STA_INFO_PLID,
NL80211_STA_INFO_PLINK_STATE,
NL80211_STA_INFO_SIGNAL,
+ NL80211_STA_INFO_SIGNAL_AVG,
NL80211_STA_INFO_TX_BITRATE,
NL80211_STA_INFO_RX_PACKETS,
NL80211_STA_INFO_TX_PACKETS,
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index e5702f5..f21f701 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -424,6 +424,7 @@ struct station_parameters {
* @STATION_INFO_TX_RETRIES: @tx_retries filled
* @STATION_INFO_TX_FAILED: @tx_failed filled
* @STATION_INFO_RX_DROP_MISC: @rx_dropped_misc filled
+ * @STATION_INFO_SIGNAL_AVG: @signal_avg filled
*/
enum station_info_flags {
STATION_INFO_INACTIVE_TIME = 1<<0,
@@ -439,6 +440,7 @@ enum station_info_flags {
STATION_INFO_TX_RETRIES = 1<<10,
STATION_INFO_TX_FAILED = 1<<11,
STATION_INFO_RX_DROP_MISC = 1<<12,
+ STATION_INFO_SIGNAL_AVG = 1<<13,
};
/**
@@ -485,6 +487,7 @@ struct rate_info {
* @plid: mesh peer link id
* @plink_state: mesh peer link state
* @signal: signal strength of last received packet in dBm
+ * @signal_avg: signal strength average in dBm
* @txrate: current unicast bitrate to this station
* @rx_packets: packets received from this station
* @tx_packets: packets transmitted to this station
@@ -505,6 +508,7 @@ struct station_info {
u16 plid;
u8 plink_state;
s8 signal;
+ s8 signal_avg;
struct rate_info txrate;
u32 rx_packets;
u32 tx_packets;
diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig
index 4d6f865..798d9b9 100644
--- a/net/mac80211/Kconfig
+++ b/net/mac80211/Kconfig
@@ -6,6 +6,7 @@ config MAC80211
select CRYPTO_ARC4
select CRYPTO_AES
select CRC32
+ select AVERAGE
---help---
This option enables the hardware independent IEEE 802.11
networking stack.
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 18bd0e5..d327918 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -343,8 +343,9 @@ static void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo)
if ((sta->local->hw.flags & IEEE80211_HW_SIGNAL_DBM) ||
(sta->local->hw.flags & IEEE80211_HW_SIGNAL_UNSPEC)) {
- sinfo->filled |= STATION_INFO_SIGNAL;
+ sinfo->filled |= STATION_INFO_SIGNAL | STATION_INFO_SIGNAL_AVG;
sinfo->signal = (s8)sta->last_signal;
+ sinfo->signal_avg = (s8) -ewma_get(&sta->avg_signal);
}
sinfo->txrate.flags = 0;
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 902b03e..34bcc84 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1158,6 +1158,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx)
sta->rx_fragments++;
sta->rx_bytes += rx->skb->len;
sta->last_signal = status->signal;
+ ewma_add(&sta->avg_signal, -status->signal);
/*
* Change STA power saving mode only at the end of a frame
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 6d8f897..bed23e1 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -241,6 +241,8 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
sta->local = local;
sta->sdata = sdata;
+ ewma_init(&sta->avg_signal, 1000, 8);
+
if (sta_prepare_rate_control(local, sta, gfp)) {
kfree(sta);
return NULL;
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 9265aca..84062e2 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -13,6 +13,7 @@
#include <linux/types.h>
#include <linux/if_ether.h>
#include <linux/workqueue.h>
+#include <linux/average.h>
#include "key.h"
/**
@@ -224,6 +225,7 @@ enum plink_state {
* @rx_fragments: number of received MPDUs
* @rx_dropped: number of dropped MPDUs from this STA
* @last_signal: signal of last received frame from this STA
+ * @avg_signal: moving average of signal of received frames from this STA
* @last_seq_ctrl: last received seq/frag number from this STA (per RX queue)
* @tx_filtered_count: number of frames the hardware filtered for this STA
* @tx_retry_failed: number of frames that failed retry
@@ -291,6 +293,7 @@ struct sta_info {
unsigned long rx_fragments;
unsigned long rx_dropped;
int last_signal;
+ struct ewma avg_signal;
__le16 last_seq_ctrl[NUM_RX_DATA_QUEUES];
/* Updated from TX status path only, no locking requirements */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 4e78e3f..660987d 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -1841,6 +1841,9 @@ static int nl80211_send_station(struct sk_buff *msg, u32 pid, u32 seq,
if (sinfo->filled & STATION_INFO_SIGNAL)
NLA_PUT_U8(msg, NL80211_STA_INFO_SIGNAL,
sinfo->signal);
+ if (sinfo->filled & STATION_INFO_SIGNAL_AVG)
+ NLA_PUT_U8(msg, NL80211_STA_INFO_SIGNAL_AVG,
+ sinfo->signal_avg);
if (sinfo->filled & STATION_INFO_TX_BITRATE) {
txrate = nla_nest_start(msg, NL80211_STA_INFO_TX_BITRATE);
if (!txrate)
Signed-off-by: Bruno Randolf <b...@einfach.org>
---
drivers/net/wireless/ath/ath5k/Kconfig | 1 +
drivers/net/wireless/ath/ath5k/ani.c | 4 ++--
drivers/net/wireless/ath/ath5k/ath5k.h | 26 ++------------------------
drivers/net/wireless/ath/ath5k/base.c | 4 ++--
drivers/net/wireless/ath/ath5k/debug.c | 2 +-
5 files changed, 8 insertions(+), 29 deletions(-)
diff --git a/drivers/net/wireless/ath/ath5k/Kconfig b/drivers/net/wireless/ath/ath5k/Kconfig
index eb83b7b..4784457 100644
--- a/drivers/net/wireless/ath/ath5k/Kconfig
+++ b/drivers/net/wireless/ath/ath5k/Kconfig
@@ -4,6 +4,7 @@ config ATH5K
select MAC80211_LEDS
select LEDS_CLASS
select NEW_LEDS
+ select AVERAGE
---help---
This module adds support for wireless adapters based on
Atheros 5xxx chipset.
diff --git a/drivers/net/wireless/ath/ath5k/ani.c b/drivers/net/wireless/ath/ath5k/ani.c
index f141919..db98a85 100644
--- a/drivers/net/wireless/ath/ath5k/ani.c
+++ b/drivers/net/wireless/ath/ath5k/ani.c
@@ -216,7 +216,7 @@ static void
ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
bool ofdm_trigger)
{
- int rssi = ah->ah_beacon_rssi_avg.avg;
+ int rssi = ewma_read(&ah->ah_beacon_rssi_avg);
ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "raise immunity (%s)",
ofdm_trigger ? "ODFM" : "CCK");
@@ -301,7 +301,7 @@ ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
static void
ath5k_ani_lower_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as)
{
- int rssi = ah->ah_beacon_rssi_avg.avg;
+ int rssi = ewma_read(&ah->ah_beacon_rssi_avg);
index 54dcf77..7d785cb 100644
--- a/drivers/net/wireless/ath/ath5k/debug.c
+++ b/drivers/net/wireless/ath/ath5k/debug.c
@@ -719,7 +719,7 @@ static ssize_t read_file_ani(struct file *file, char __user *user_buf,
st->mib_intr);
len += snprintf(buf+len, sizeof(buf)-len,
"beacon RSSI average:\t%d\n",
- sc->ah->ah_beacon_rssi_avg.avg);
+ (int)ewma_read(&sc->ah->ah_beacon_rssi_avg));
#define CC_PRINT(_struct, _field) \
_struct._field, \
--
This uses the recently added generic EWMA library function.
Signed-off-by: Bruno Randolf <b...@einfach.org>
---
+ select AVERAGE
---help---
This option enables the hardware independent IEEE 802.11
networking stack.
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 18bd0e5..0b74c05 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -343,8 +343,9 @@ static void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo)
if ((sta->local->hw.flags & IEEE80211_HW_SIGNAL_DBM) ||
(sta->local->hw.flags & IEEE80211_HW_SIGNAL_UNSPEC)) {
- sinfo->filled |= STATION_INFO_SIGNAL;
+ sinfo->filled |= STATION_INFO_SIGNAL | STATION_INFO_SIGNAL_AVG;
sinfo->signal = (s8)sta->last_signal;
+ sinfo->signal_avg = (s8) -ewma_read(&sta->avg_signal);
--
Isn't that generic EWMA library function making this much more CPU
intensive than necessary? I would assume the compiler is not able to
optimize the unsigned long division in ewma_add() when the weight value
is "hidden" in this way through ewma_init() call. While generic library
functions can be nice, it could be useful to have an option for using an
inline version of ewma_add that takes in avg->weight as one of the
arguments to allow cases like weight=8 here to be implemented using
shift right instead of unsigned long division especially when done on
hot path..
> @@ -1158,6 +1158,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx)
> sta->last_signal = status->signal;
> + ewma_add(&sta->avg_signal, -status->signal);
This one here would be done for every frame received and by using a
function call that ends up doing a potentially expensive division.
> @@ -241,6 +241,8 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
> + ewma_init(&sta->avg_signal, 1000, 8);
The divisor (weight) is set to 8 here which should allow for quite a bit
more efficient ewma_add implementation if it were to be done in a way
that makes it easier for the compiler to see what value is being used
(e.g., passing this 8 to ewma_add_inline()) to allow shift right to be
used..
Or am I missing something here and we have a compiler that is actually
able to take care of this kind of optimizations by itself in the current
ewma library design? Or is the unsigned long division with the expected
weight values going to be fast enough to not justify caring about it
even on any of the embedded boards anyway?
--
Jouni Malinen PGP id EFC895FA
I understand that this could be more efficient, but if it matters or not -
honestly, I don't know. Can a more knowledgeable person than me comment on
this?
bruno
> > Or am I missing something here and we have a compiler that is actually
> > able to take care of this kind of optimizations by itself in the current
> > ewma library design? Or is the unsigned long division with the expected
> > weight values going to be fast enough to not justify caring about it
> > even on any of the embedded boards anyway?
>
> I understand that this could be more efficient, but if it matters or not -
> honestly, I don't know. Can a more knowledgeable person than me comment on
> this?
Yes, it does matter -- think of an embedded MIPS board running at 500MHz
and trying to push 11n speeds.
johannes
I assume the number of samples (weight) is the more
important tunable. One option is you can require factor
to be a power of two that is much larger than weight,
then at least you can store factor/weight precomputed
and multiply by it instead of doing a divide in ewma_add.
Then ewma_get can also just be a shift as well.
--
Bob Copeland %% www.bobcopeland.com
OK, I have done tests with a 266MHz x86 board, pushing 30Mbps UDP thru it. It
can handle maximum around 18Mbps thuput in both cases, with and without the
average. And that is using the ewma function two times (ath5k beacons and
signal average). Here are the results of iperf -u -c IP -b 30M -t 120 (2min of
30Mbps UDP):
With average:
0.0-120.4 sec 269 MBytes 18.7 Mbits/sec 0.370 ms 114323/306094 (37%)
0.0-120.4 sec 263 MBytes 18.3 Mbits/sec 0.336 ms 118668/306110 (39%)
0.0-120.4 sec 265 MBytes 18.5 Mbits/sec 0.229 ms 117036/306120 (38%)
Without:
0.0-120.4 sec 267 MBytes 18.6 Mbits/sec 0.385 ms 115457/306123 (38%)
0.0-120.4 sec 267 MBytes 18.6 Mbits/sec 0.374 ms 115868/306063 (38%)
0.0-120.4 sec 261 MBytes 18.2 Mbits/sec 0.566 ms 119603/306112 (39%)
So I conclude that compared to all the other things we do when we receive a
packet having one division more or less does not have a huge performance
impact.
> I assume the number of samples (weight) is the more
> important tunable. One option is you can require factor
> to be a power of two that is much larger than weight,
> then at least you can store factor/weight precomputed
> and multiply by it instead of doing a divide in ewma_add.
> Then ewma_get can also just be a shift as well.
That's a good idea! I will try to improve the code.
bruno
Hmm, maybe I suck in mathemathics, but I don't see a way to do that given the
formula:
(((internal * (weight - 1)) + (val * factor)) / weight
bruno
> Hmm, maybe I suck in mathemathics, but I don't see a way to do that given the
> formula:
>
> (((internal * (weight - 1)) + (val * factor)) / weight
If you assume weight == 2^n, you can write that as:
(((internal << n) - internal) + (val * factor)) >> n
You only showed that CPU utilization is not so high that an effect on
network throughput can be seen. You did not measure what the increase
in CPU utilization actually was. :-)
--
Stefan Richter
-=====-==-=- =-== =--==
http://arcgraph.de/sr/
> OK, I have done tests with a 266MHz x86 board, pushing 30Mbps UDP thru it. It
> can handle maximum around 18Mbps thuput in both cases, with and without the
That sounds like you're limited by 11g, not by the CPU ...
johannes
John, is there any reason you completely ignored Jouni's feedback here?
This wasn't addressed in v8.
johannes
Sure, x/y := x/z * z/y, and by picking z := 2^n, we can pre-compute z/y
and write x/z using a shift. The problem however is always range vs
granularity, you chose to first /z and then *z/y, this avoids some
overflow issues but truncates the lower n bits of x.
If you first *z/y and then /z you keep your low bits but risk loosing
the top bits to an overflow.
I guess the question is do we really need weights outside of 2^n? If
not, you can use the weight := 2^n version. If you do, you get to pick
either of the previously mentioned options.
Sadly gcc doesn't sanely support a u128 type, which would be very useful
to avoid some of these overflow issues (like we used to use u64 mults
for u32 fixed points mults).
Nope. I forgot to mention that I was testing in A mode and I expect 30Mbps
thruput here (I get that consistently on other boards). Also I see my CPU is
at 100% with top.
bruno
True. That's exactly the point: "CPU utilization is not so high that an effect
on network throughput can be seen", which was what people were worrying about.
So IMHO: No problem! :) But of course it makes sense to optimize and I'll try
to do that in subsequent patches.
bruno
No, people worried about CPU utilization that can be reduced.
> So IMHO: No problem! :) But of course it makes sense to optimize and I'll try
> to do that in subsequent patches.
Good.
--
Stefan Richter
-=====-==-=- =-== =-==-
http://arcgraph.de/sr/
Well, I confess that I didn't fully appreciate Jouni's point.
Is there any reason to presume that Bruno cannot (or will not) improve
the performance from here? Do you have any measurement of the actual
performance impact?
John
--
John W. Linville Someday the world will need a hero, and you
linv...@tuxdriver.com might be all we have. Be ready.
It's been pointed-out that this breaks kernel->userland ABI.
That could be fixed rather easily, but due to the performance concerns
I'm going to just revert this one for now. Please correct the ABI
issue when submitting a new version that addresses the performance
concerns raised by Jouni.
John
--
John W. Linville Someday the world will need a hero, and you
linv...@tuxdriver.com might be all we have. Be ready.
Thank you all for your help and sorry for following up so late!
I think we don't really need weights outside of 2^n and i'm going to post a
patch based on Peter Zijlstra's formula. Thanks again! Would it make sense to
have the factor 2^n too, so we can bitshift there too?
bruno