small differences between two observed data series (directly from nc file and calib.tv )

10 views

Skip to first unread message

amir kk

unread,

Jul 12, 2018, 10:44:38 PM7/12/18

to shyft

Hello everybody,

I extracted observed data directly from discharge.nc and also use the observed data that is used in calibration part (calib.tv[0].ts.values ) and made a graph with these two data series. They don`t fit perfectly. I am so curious to know why there is small differences between these two observed data series. Shouldn`t it be both of them are the exactly the same ? is discharge.nc the only source of discharge observed data ?

The s_corr: 1.0, s_var: 1.0, s_bias: 1.0 and weight: 1.0 for all targets value.

thanks in advanced

Best Regards,

Amir

Yisak Abdella

unread,

Jul 13, 2018, 10:34:25 AM7/13/18

to shyft

Hi

It is most probable that this has to do with interpretation of the type of timeseries (called 'point_interpretation_policy' in shyft). A shyft timeseries can have one of the following two interpretation_policies: POINT_AVERAGE_VALUE or POINT_INSTANT_VALUE. The repository u r using to read the discharge data in discharge.nc is CFTsRepository (in cf_ts_repository.py). This repository does
not explicitly set the point_interpretation_policy as it uses 'api.TsFactory().create_point_ts' instead of the 'api.TimeSeries'. constructor (which requires the interpretation to be set initially). Anyway, there is no way for the repository to know what the interpretation of the timeseries in the netcdf file is. CFTsRepository is a general repository (i.e. not connected to an official database with defined timseries interpretation as for example, MetNetcdfRepo). It is, therefore, the user's responsibility to ensure that the correct interpretation is set when using such repos or alternatively save this info in the netcdf file. It is only the user who knows what he/she has saved in the file and how it should be interpreted. In your case now, I suspect that the timeseries interpretation is set by default to POINT_INSTANT_VALUE (i.e. discharge varies linearly between the discharge points). Before this timeseries is saved to the TargetSpecificationVector (calib.tv) it is averaged using the specified time_axis. This averaging results in the conversion of the sereis type from POINT_INSTANT_VALUE to POINT_AVERAGE_VALUE and thereby leading to the differences. The discharges you got in the netcdf file should be interpreted as POINT_AVERAGE_VALUE and not POINT_INSTANT_VALUE. In otherwrods, the two series you plotted are actually not the same series, one is the original series with POINT_INSTANT_VALUE interpretation and the other is the averaged version of this. To adjust this, just set the point_interpretation_policy of the discharge sereis to POINT_AVERAGE_VALUE. You can do this by adding this line

tsp[ts_info['uid']].set_point_interpretation(api.point_interpretation_policy.POINT_AVERAGE_VALUE)

right before this line

https://github.com/statkraft/shyft/blob/e756f503e7c26ebb8da06d203cef741cf1fef221/shyft/orchestration/simulators/config_simulator.py#L138