discussion about time-zone issues on get_data or get_processed

Yiyan Li

unread,

Sep 8, 2020, 6:34:07 PM9/8/20

to pvlib-python

Hi everyone,

This is Yiyan Li, postdoc researcher at NC state university, majoring power systems and data analytics. Happy to join this group!

Recently I'm working with Strata Solar, using pvlib to do short-term PV power forecasting. But when I try to get the weather forecasts (say, from RAP) using get_processed_data function, I found that the returned weather data doesn't match the time index, see the following figure:

Before I use get_processed_data function, I have specified our time zone (US/eastern) and the langitude, lattitude information. You can see that the temperature is abnormally high in the afternoon. In fact, the highest temperature usually apears at 5-6 pm everyday, which is unresonable.

Then I compared the returned temperature with the field measurement temperature downloaded from NOAA at the same location, see the following

There is a 4-5 hour mismatch between the returned weather data from RAP and the field measurement from NOAA. It seems that the returned weather data from RAP is at UTC time instead of the specified US/Eastern time, which makes the mismatch happen.

Is there anything wrong of my understanding? or is this a bug in the get_data function? A more serious problem is that if the returned weather forecasts (especially the cloud coverage) have misalignment with the time index, then the power forecasting result will be significantly influenced.

Looking for your reply!

Yiyan

William Holmgren

unread,

Sep 8, 2020, 7:17:30 PM9/8/20

to Yiyan Li, pvlib-python

The screen shot of the DataFrame looks pretty reasonable to me. Maximum forecast GHI is midday (11:00 local, when cloud cover is 0). Temperature maximum around 5 pm is not unreasonable in summer. https://www.wrh.noaa.gov/mesowest/getobext.php?sid=KRDU&num=72&raw=0#

As for the time series plot, are you sure you're handling both the forecast and the observation time zones consistently?

Will

--
You received this message because you are subscribed to the Google Groups "pvlib-python" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pvlib-python...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pvlib-python/f93aac4e-9ab8-4ac1-ae27-5fca694915a6n%40googlegroups.com.

Bobby Heyer

unread,

Sep 9, 2020, 1:47:14 AM9/9/20

to William Holmgren, Yiyan Li, pvlib-python

Hi Yiyan

I actually had the same issue. My work around was to extract the raw data only, then manually do a timezone conversion, before running the process_data function.

# Get GFS Data
model = GFS(resolution='Half') 
raw_data = model.get_data(latitude, longitude, start, end)

# Adjust Timezone
tz_fix = pytz.timezone('Australia/Queensland')
raw_data.index = raw_data.index.tz_convert(tz_fix)

# Process Fields - PVlib class which will clean up gfs data into a standard usable format
data = model.process_data(raw_data)

To view this discussion on the web visit https://groups.google.com/d/msgid/pvlib-python/CAG%2BWgockJxvX9utaeXnQWdVLDxmRVgnNj9a8jX7%3DEpBYxg3xtw%40mail.gmail.com.

Bobby Heyer

unread,

Sep 9, 2020, 1:57:41 AM9/9/20

to William Holmgren, Yiyan Li, pvlib-python

Hi Yiyan

I had the same problem, my work around was to work in UTC time, then do a manual conversion of timezone later

# Timezone - GFS Data is in UTC, therefore use UTC
tz = 'UTC'

# Forecast Period 
start = pd.Timestamp(date.today(), tz=tz)
end = start + pd.Timedelta(days=7)

# Get GFS Data
model = GFS(resolution='Half')
raw_data = model.get_data(latitude, longitude, start, end)

# Adjust Timezone
tz_fix = pytz.timezone('Australia/Queensland')
raw_data.index = raw_data.index.tz_convert(tz_fix)

# Process Fields - PVlib class which will clean up gfs data into a standard usable format
data = model.process_data(raw_data)

On Wed, Sep 9, 2020 at 9:17 AM William Holmgren <william....@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pvlib-python/CAG%2BWgockJxvX9utaeXnQWdVLDxmRVgnNj9a8jX7%3DEpBYxg3xtw%40mail.gmail.com.

Yiyan Li

unread,

Sep 9, 2020, 8:35:19 AM9/9/20

to Bobby Heyer, William Holmgren, pvlib-...@googlegroups.com

Hi Bobby and William,

Thanks for the reply!

Hi William, currently I'm doing the same thing as you do: retrieve the raw data first, then correct the time-zone manually, and then process the corrected data.

I think this is sort of a big issue in pvlib, which will have a significant influence on the forecasting accuracy, especially when the cloud coverage forecast is not correct. So it will be great if we can confirm the existence of this issue, and then try to find out where the problem is.

Another evidence to support the existence of this misalign issue is that, no matter what time-zone you specified in the code, RAP (or other data sources) will always return the same forecast, see the attached figure.

In the upper part of this figure, I specify the time zone as US/Eastern, while in the lower part as UTC. you can see that they return the exact same data. This shouldn't happen because 6:00-EST and 6:00-UTC are basically not the same hour.

A possible explanation for this issue is that the RAP (and others like HRRR, GFS, etc) database cannot recognize the time zone information. They will always return you the forecasts under UTC standard time zone.

Thanks again for your quick reply!

Yiyan

--

Yiyan Li, Ph.D.

Postdoc Researcher
NSF FREEDM Systems Center
North Carolina State University
Raleigh, NC, 27607

Tel: 919.903.1199

William Holmgren

unread,

Sep 9, 2020, 11:50:28 AM9/9/20

to Yiyan Li, Bobby Heyer, pvlib-python

Looks like the problem is reproduced in the documentation: https://pvlib-python.readthedocs.io/en/v0.8.0/forecasts.html

discussion about time-zone issues on get_data or get_processed_data (NC state & Strata Solar)

Yiyan Li

William Holmgren

Bobby Heyer

Bobby Heyer

Yiyan Li

William Holmgren