Inaccurate model with overproduction

52 views
Skip to first unread message

David Lowrimore

unread,
Feb 9, 2026, 4:40:04 PMFeb 9
to pvlib-python
Hello,

We are engineering seniors working with a local energy company for our capstone project, trying to model their energy output of a 10MWAC site. We are using a test site with generation data to validate our model. We are having extremely high average error and are having trouble diagnosing why. 

System specs:
Modules: 41472
Strings: 1536
15MWDC / 10MWAC rating

We have done the following:
- Fixed timestamp issues.
- Ensured correct # of modules and strings.
- Filtered out low generation values under 0.5MW when computing MAPE and MSE
- Turned off MPPT (it's shutting our inverter off during summer months due to cell temp and lowering string voltage)
- Clipped power above nameplate P_rating

Any help would be appreciated!

Oregon_Burns_Site.rtf

cwh...@sandia.gov

unread,
Feb 9, 2026, 4:54:05 PMFeb 9
to pvlib-python
The plot generated by that script looks reasonable for the inputs, at first glance.

What is the error you are observing? Is there any way you could share a part of the measured energy output?

You may want to remove your NSRDB API key when sharing code.

Cliff

David Lowrimore

unread,
Feb 9, 2026, 5:26:56 PMFeb 9
to pvlib-python
Here is a sample of 5 days throughout the year of generation data with the predicted data next to it. Here is a screen shot of errors we got comparing some of the pvlib tools. Thank you for the API key reminder.
Screenshot 2026-02-09 at 2.24.28 PM.png
Sample_8760_comparison.xlsx

cwh...@sandia.gov

unread,
Feb 9, 2026, 5:54:47 PMFeb 9
to pvlib-python
The data suggest a system with ~70MWdc capacity. The description above says 15MWdc.  Maybe the description is of one block of a larger system. 

The PVWatts inverter model caps AC output at a calculated level, pac0 = pdc0 * eta_inv_nom. pvlib's docstring doesn't tell you that, it's in the reference. So if you are trying to applying clipping at some AC limit, you could calculate the pdc0 parameter accordingly. But doing so at a high DC:AC ratio, like 1.5 in your case, affects the conversion efficiency below the clipping level. If it was me, with that DC:AC ratio on the order of 1.5, I'd leave pdc0 at the DC rated capacity, and handle the clipping myself.

FYI: the VMPPT_LOW, VMPPT_HIGH values are not MPPT limits (SAM describes those incorrectly). They are the low and high voltages at which the inverter was tested for CEC listing. Inverters can (and do) operate outside of that voltage range. This is why pvlib doesn't use those values.

Will Hobbs

unread,
Feb 9, 2026, 6:06:47 PMFeb 9
to pvlib-python
Plotting the data in your spreadsheet, it looks to me like there may still be some timestamp issues. Your actual generation looks to be shifted a full hour earlier than the modeled output for the January day. And then the actual generation looks about an hour later in the July and August days. Consider double-checking the actual data timezone, e.g., to make sure it doesn't observe daylight saving time. 

Also, I *think* this step will mess up solar position calculations:
> data.index = data.index - pd.Timedelta(minutes=30) #shift from interval ending to interval start

My last thought is that if I understand correctly, you are saying that the array max power point voltage drops below the inverter's minimum MPPT voltage on hot days. If that's the case, this could overall make modeling more difficult. You could start with validating your model on time ranges where you don't think off-MPP operation is happening before complicating too much. 

--
You received this message because you are subscribed to the Google Groups "pvlib-python" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pvlib-python...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pvlib-python/329a1a26-799a-4ff7-bc14-874e3c59f5f4n%40googlegroups.com.

David Lowrimore

unread,
Feb 9, 2026, 8:18:17 PMFeb 9
to pvlib-python
Thank you for the feedback Cliff. I will try your suggestion with clipping. May I ask, why does the data suggest 70MWdc? The generation data is only for the 15MWdc site and we have modeled the 41,472 modules at 370W a module.

David Lowrimore

unread,
Feb 9, 2026, 8:45:30 PMFeb 9
to pvlib-python
We do have daylight savings here. I thought writing 'America/Los_Angeles' would take that into account? We are struggling with the timestamp issue because it seems to be behind in the beginning of the year, and then synched, and then running early for part of the year. I don't fully get what the issue is. Here is an attached boolean 8760 where is says true if it's producing. 

Correct, with MPPT, we observed several days with 5-6 hours of high DC power with 0 AC power. 

Thank you for your feedback Will, we will look into time ranges when this doesn't happen and see how MPPT is affecting predictions. 
Full8760_boolean_production_Starvation_Ridge.xlsx

Will Hobbs

unread,
Feb 9, 2026, 11:24:54 PMFeb 9
to pvlib-python
I haven’t tested it, but 

data.index = data.index.tz_convert(site.tz)

right after getting the NSRDB data *might* do what you want. Otherwise I think your ‘data’ DataFrame is in local standard time, which will carry through the rest of the analysis. 

This is assuming that your measured data observe daylight saving time, but I haven’t had a chance to look at your spreadsheet. 

Also note that pvlib calculations are instantaneous, while timestamps in your measured data might represent either the beginning or end of the interval (e.g., 11:00 may represent the average or sum for 10:00-11:00 or 11:00-12:00). With pvlib models 11:00 would be a reasonable approximation for 10:30-11:30. Running pvlib with 5min data and then carefully resampling to hourly could be a little better. 


cwh...@sandia.gov

unread,
Feb 10, 2026, 10:04:09 AMFeb 10
to pvlib-python
Why 70MWdc? This morning, I don't know. Yesterday I must have crossed conversations, I was looking at output with figures of that magnitude but not the file you posted. Now it will bother me until I find that other file :)

Cliff 

Will Hobbs

unread,
Feb 10, 2026, 1:16:32 PMFeb 10
to pvlib-python
I glanced at the boolean 8760, and I think your measured power observes daylight saving time and your modeled power does not.That, combined with shifting the NSRDB data timestamps by 30min when they shouldn't be shifted, probably explains a lot of your issues. 

Confirming array azimuth is also helpful when there are "time of day" issues. On both Google and Bing maps satellite images, it looks like the tracker axis azimuth is about 181 deg, not 180 (or 1 instead of 0, in the convention I think you are using). Being off by 1 degree won't matter too much, but every little bit helps, so you could try updating that.

As an aside, I also see what *looks* like about 5% of trackers are misaligned (stuck?) in the southern portion of the plant in the Google maps imagery. That could be something to keep in mind as a possible source of underperformance in the actual data. 5% of trackers being stuck is probably a ~1-2% energy hit.

Will

--
You received this message because you are subscribed to the Google Groups "pvlib-python" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pvlib-python...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages