How to remove spikes from data?

548 views
Skip to first unread message

anc...@gmail.com

unread,
Sep 30, 2021, 6:35:47 AM9/30/21
to weewx-user
Hi there,
I need to remove some temperature spikes from my weewx.sdb database. I know this has been already asked many times, but my case is different. I need to delete spikes in which the values are climatically acceptable: so, it is not sufficient to give (temperature is intended in °C)

echo "delete from archive where outTemp < -6;" | sudo sqlite3 /home/weewx/archive/weewx.sdb

for solving the problem. In my case, the spike was about 16°C when temperature was about 20°C (see the graph below)
daytempdew.png
What I need to do is something like: DELETE if Δt = 300 AND |ΔT| > 5, where 
Δt is time difference (in seconds, I chose 300 because my archive interval is 5 minutes) and ΔT is the temperature difference in that time bin. 
In other words I am thinking about using a differential approach. It seems to me that this can be the only solution, is this possible with sqlite3?
Thank you,
Andrea

Tom Keffer

unread,
Sep 30, 2021, 8:52:41 AM9/30/21
to weewx-user
Yes, it's possible, but it's not a simple SELECT statement. For example, this query will return all rows where the temperature difference between adjacent rows is greater than 1 degree:

SELECT g.* FROM (SELECT dateTime, datetime(dateTime,'unixepoch','localtime'),  LAG(outTemp) OVER (ORDER BY dateTime) AS prev_outTemp, outTemp FROM archive) AS g WHERE ABS(g.outTemp-g.prev_outTemp) > 1;

For my own database, this returns

1192323300|2007-10-13 17:55:00|64.099998|63.0
1192382100|2007-10-14 10:15:00|51.900002|53.0
1192756800|2007-10-18 18:20:00|44.700001|46.700001
1192757100|2007-10-18 18:25:00|46.700001|49.5
1192757400|2007-10-18 18:30:00|49.5|47.400002
...

You would then need to arrange to delete those specific timestamps.

The select statement will only work with modern versions of sqlite, which have the LAG() function.




--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/32298c8c-6d93-4a80-8b9e-29fdcc181c8an%40googlegroups.com.

wfs...@gmail.com

unread,
Sep 30, 2021, 9:27:37 AM9/30/21
to weewx-user
Here's a query using the "join a table to itself" method if you don't have the LAG function.  It prints observations that are outside the average of the surrounding observations by 2+ degrees.  I get about 30 observations this year and I think it's legit.  Although I don't get spikes in my data, I do use a 15 minute interval and my station is closer to the ground than it should be.  A lot can happen in 15 minutes and my temp graph is always kinda wiggly.

Caveat Emptor on the attached query.  You could modify it to delete or update the offending rows instead of printing them out.

Walt


Query Sep30A.txt

anc...@gmail.com

unread,
Sep 30, 2021, 10:41:17 AM9/30/21
to weewx-user
Thank you Walt, this script works fine! in fact it highlighted that spike in my graph:

DT                   datetime    temp              temp_prior        temp_next         variance    d_back              d_forw
-------------------  ----------  ----------------  ----------------  ----------------  ----------  ------------------  ----------------
2021-09-30 04:15:00  1632968100  19.2970370370371  19.3363567649282  14.5003779289494  2.38        0.0393197278911508  4.79665910808766

(I added two columns d_back = |t1-t2| and t_forw = |t1-t3|)
Now, I need to delete the wrong record: in this case, it is t3 (d_forw = 4.8). How can I automatically find it and then remove? Unfortunately I don't know the sql language and I can't complete the script myself.

Thank you very much,
Andrea

wfs...@gmail.com

unread,
Sep 30, 2021, 2:11:05 PM9/30/21
to weewx-user
Well, make sure you back up your database before you try to delete anything in case this goes wrong.  I would

1. Stop weewx
2. Make a copy of the database
3. Execute the delete SQL
4. Run the original query again to see if things are OK
5. Rebuild weewx dailies
6. Start weewx

Hopefully I'm not forgetting anything.

I'm attaching SQL to delete the spiked records.  You will have to change the database name in the .open statement to the db you want to change.  You may need to run sqlite3 with sudo in order to delete records.

I thought a better solution would be to update the spiked temp with an average of the readings surrounding the spike, but I'm having trouble coming up with the SQL to do that.

Walt
Query Sep30D.txt

wfs...@gmail.com

unread,
Sep 30, 2021, 5:03:35 PM9/30/21
to weewx-user
Just a reminder, this delete query will delete all situations where the criteria is met.  Some of these situations are legit, temps can change rapidly.  You want to make sure it's going to delete only the records you want before running it.  There are probably some enhancements that can be made to the query to find more obvious spikes.  Something like +0, -10, +0 might be a spike, whereas +0, -6. -8 is not, the temp is just falling fast.

anc...@gmail.com

unread,
Oct 3, 2021, 7:48:23 AM10/3/21
to weewx-user
Exactly. Is there a way to consider only cases such as  +0, -10, +0 (i.e. spikes) and not those like  +0, -6. -8 ?
Meanwhile I thank you for your precious help.
Andrea

wfs...@gmail.com

unread,
Oct 3, 2021, 3:20:36 PM10/3/21
to weewx-user
Here's an updated query that does better checking and it can delete the rows or update the temperatures.  Test it out on a copy of your database first.  This works for sqlite.  Don't know about others.

Walt

Temperature Anomalies 1.txt

Tom Keffer

unread,
Oct 3, 2021, 9:39:09 PM10/3/21
to weewx-user
Wow! Some serious SQL fu!

Very nice.

wfs...@gmail.com

unread,
Oct 4, 2021, 9:30:16 AM10/4/21
to weewx-user
Thanks Tom!  I love Weewx and get a kick out of playing with the data.  My biggest variance here in Springfield Illinois was 4.4 degrees.
On 4/21/21 at 12:15pm i went from 49.1 to 56.3 then fell back to 54.7.  No rain that day, just clouds and sun I guess.  Breezy.

Walt

Alan Jackson

unread,
Oct 5, 2021, 9:56:53 AM10/5/21
to weewx-user
Here is a description of my adventures cleaning up the database

WindnFog

unread,
Oct 11, 2021, 8:55:23 AM10/11/21
to weewx-user
This would be out of my league to develop in Python. In my Fortran and Pascal programming days of data acquisition, we routinely used 5,7, or 9 point median filters to smooth and de-spike oceanographic data. I would think this might be a valuable addition to weewx.  

It's not a lot of code (bubble sort a small array of input data with the median value being the output). Still, where the weewx developers would put it and how users would configure which channels to filter, the width of the filter, etc., might be a bridge too far.  Just a thought.

- Paul VE1DX

Paul Dunphy

unread,
Oct 11, 2021, 9:14:07 AM10/11/21
to weewx...@googlegroups.com
FWIW, here's how I'm doing it in C on an Arduino to de-spike/smooth temperature and humidity:

#define buff_size 7  // Must be an odd number.  Should be greater than 5.  7 works well.


      >snip<

void bubble_sort(float sort_array[], int n)
  {
  int i, j;
  float  temp;
 
  for (i = 0 ; i < n - 1; i++)
    {
      for (j = 0 ; j < n - i - 1; j++)
        {
          if (sort_array[j] > sort_array[j+1])
            {
            // Swap values
            temp            = sort_array[j];
            sort_array[j]   = sort_array[j+1];
            sort_array[j+1] = temp;
            }
          }
      }
  } 


      >snip<


// Sort them. Use quick and dirty bubble sort because it's a small number of data points
bubble_sort(h_array_sort, buff_size);
bubble_sort(t_array_sort, buff_size);

// Use the median of the last "buff_size" readings for the display
median_index = buff_size / 2;

h = h_array_sort[median_index];
t = t_array_sort[median_index];


- Paul VE1DX
You received this message because you are subscribed to a topic in the Google Groups "weewx-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/weewx-user/nChGnMJLB2k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/f379634b-50bc-49ec-b2a3-ae243f8c8bcfn%40googlegroups.com.

Richard Horobin

unread,
Oct 12, 2021, 6:54:45 PM10/12/21
to weewx-user
I accept that you want to remove spikes.

Do you think this process could be used to detect "warning" levels? eg tomato plants are not frost-hardy at all, so we need a warning when the temperature goes to 5C.

Many other processes require warnings at plus or minus 2.5 standard deviations, as used in Shewhart Statistical Process Control charts. I imagine this would be an optional add-on function, as it's non-core to weewx.

Stephen Hocking

unread,
Oct 12, 2021, 7:02:53 PM10/12/21
to weewx...@googlegroups.com
It does seem like an ideal use-case for a Kalman filter.




--
  "I and the public know
  what all schoolchildren learn
  Those to whom evil is done
  Do evil in return"		W.H. Auden, "September 1, 1939"

HRM Resident

unread,
Oct 12, 2021, 8:04:20 PM10/12/21
to weewx...@googlegroups.com
     Regardless of whether it’s a median filter, a Kalman filter or anything else, integrating it into weewx doesn’t seem trivial to me.  This is something Tom and the other developers would need to consider and decide the best way to proceed.  Also, how many users would take advantage of it.  There’s a time series based on the loop intervals, etc., to think of as well.

     I agree it probably would be best as an optional add-on, but again, I am not one of the coders.  Personally, I would only use such a feature to de-spike the signal strength.  I don’t have a need for an alarm, but others might.

    Lastly, I wonder if there’s already a way to trigger an alarm.  That’s a feature that I haven’t researched, if it does exist.  I threw my two cents in only because of the simplicity and robustness of a small median filter.  Getting one working with weewx and/or an alarm is out of my league.

     For what it’s worth, I too am growing tomatoes, but we’ve harvested the lot this week, escaping the frost due to an unusually mild autumn.  But gardening is off topic! :-)

- Paul VE1DX 

On Oct 12, 2021, at 8:02 PM, Stephen Hocking <stephen...@gmail.com> wrote:



vince

unread,
Oct 12, 2021, 8:10:32 PM10/12/21
to weewx-user
There's been an example alarm.py in weewx for over a decade.   Search for "weewx alarm" for some old threads.

gjr80

unread,
Oct 12, 2021, 9:17:42 PM10/12/21
to weewx-user
We've sort of been around this buoy before. Issue #3 and PR #228 refer, with the last activity seemingly closure of PR #228 back in 2018. I suspect that if someone wants to take on the work there's probably no reason a spike detector could not be added to the code base. It's just not a high priority for us at the moment. Probably some good guidance in the comments in PR #228. Though of course I am not the one you need to convince :)

Gary

Alan Jackson

unread,
Oct 12, 2021, 10:29:45 PM10/12/21
to weewx...@googlegroups.com
I have only cleaned the database in post-processing and then replaced it.

Reply all
Reply to author
Forward
0 new messages