discussion about time-zone issues on get_data or get_processed_data (NC state & Strata Solar)

22 views
Skip to first unread message

Yiyan Li

unread,
Sep 8, 2020, 6:34:07 PM9/8/20
to pvlib-python
Hi everyone,

This is Yiyan Li, postdoc researcher at NC state university, majoring power systems and data analytics. Happy to join this group!

Recently I'm working with Strata Solar, using pvlib to do short-term PV power forecasting. But when I try to get the weather forecasts (say, from RAP) using get_processed_data function, I found that the returned weather data doesn't match the time index, see the following figure:

get_processed_data.png

Before I use get_processed_data function, I have specified our time zone (US/eastern) and the langitude, lattitude information. You can see that the temperature is abnormally high in the afternoon. In fact, the highest temperature usually apears at 5-6 pm everyday, which is unresonable.

Then I compared the returned temperature with the field measurement temperature downloaded from NOAA at the same location, see the following
durham.png
There is a 4-5 hour mismatch between the returned weather data from RAP and the field measurement from NOAA. It seems that the returned weather data from RAP is at UTC time instead of the specified US/Eastern time, which makes the mismatch happen. 

Is there anything wrong of my understanding? or is this a bug in the get_data function? A more serious problem is that if the returned weather forecasts (especially the cloud coverage) have misalignment with the time index, then the power forecasting result will be significantly influenced.

Looking for your reply!

Yiyan


William Holmgren

unread,
Sep 8, 2020, 7:17:30 PM9/8/20
to Yiyan Li, pvlib-python
The screen shot of the DataFrame looks pretty reasonable to me. Maximum forecast GHI is midday (11:00 local, when cloud cover is 0). Temperature maximum around 5 pm is not unreasonable in summer. https://www.wrh.noaa.gov/mesowest/getobext.php?sid=KRDU&num=72&raw=0#

As for the time series plot, are you sure you're handling both the forecast and the observation time zones consistently?

Will


--
You received this message because you are subscribed to the Google Groups "pvlib-python" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pvlib-python...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pvlib-python/f93aac4e-9ab8-4ac1-ae27-5fca694915a6n%40googlegroups.com.

Bobby Heyer

unread,
Sep 9, 2020, 1:47:14 AM9/9/20
to William Holmgren, Yiyan Li, pvlib-python
Hi Yiyan

I actually had the same issue. My work around was to extract the raw data only, then manually do a timezone conversion, before running the process_data function.
# Get GFS Data
model = GFS(resolution='Half')
raw_data = model.get_data(latitude, longitude, start, end)

# Adjust Timezone
tz_fix = pytz.timezone('Australia/Queensland')
raw_data.index = raw_data.index.tz_convert(tz_fix)

# Process Fields - PVlib class which will clean up gfs data into a standard usable format
data = model.process_data(raw_data)

Bobby Heyer

unread,
Sep 9, 2020, 1:57:41 AM9/9/20
to William Holmgren, Yiyan Li, pvlib-python
Hi Yiyan

I had the same problem, my work around was to work in UTC time, then do a manual conversion of timezone later
# Timezone - GFS Data is in UTC, therefore use UTC
tz = 'UTC'

# Forecast Period
start = pd.Timestamp(date.today(), tz=tz)
end = start + pd.Timedelta(days=
7)

# Get GFS Data
model = GFS(resolution='Half')
raw_data = model.get_data(latitude, longitude, start, end)

# Adjust Timezone
tz_fix = pytz.timezone('Australia/Queensland')
raw_data.index = raw_data.index.tz_convert(tz_fix)

# Process Fields - PVlib class which will clean up gfs data into a standard usable format
data = model.process_data(raw_data)
On Wed, Sep 9, 2020 at 9:17 AM William Holmgren <william....@gmail.com> wrote:

Yiyan Li

unread,
Sep 9, 2020, 8:35:19 AM9/9/20
to Bobby Heyer, William Holmgren, pvlib-...@googlegroups.com
Hi Bobby and William,

Thanks for the reply!

Hi William, currently I'm doing the same thing as you do: retrieve the raw data first, then correct the time-zone manually, and then process the corrected data.

I think this is sort of a big issue in pvlib, which will have a significant influence on the forecasting accuracy, especially when the cloud coverage forecast is not correct. So it will be great if we can confirm the existence of this issue, and then try to find out where the problem is.

Another evidence to support the existence of this misalign issue is that, no matter what time-zone you specified in the code, RAP (or other data sources) will always return the same forecast, see the attached figure.
图片1.png

In the upper part of this figure, I specify the time zone as US/Eastern, while in the lower part as UTC. you can see that they return the exact same data. This shouldn't happen because 6:00-EST and 6:00-UTC are basically not the same hour.

A possible explanation for this issue is that the RAP (and others like HRRR, GFS, etc) database cannot recognize the time zone information. They will always return you the forecasts under UTC standard time zone.

Thanks again for your quick reply!

Yiyan


--
Yiyan Li, Ph.D.
Postdoc Researcher
NSF FREEDM Systems Center
North Carolina State University
Raleigh, NC, 27607

William Holmgren

unread,
Sep 9, 2020, 11:50:28 AM9/9/20
to Yiyan Li, Bobby Heyer, pvlib-python
Looks like the problem is reproduced in the documentation: https://pvlib-python.readthedocs.io/en/v0.8.0/forecasts.html

Pull request welcome.

Will
Reply all
Reply to author
Forward
0 new messages