Start on a data model

9 views
Skip to first unread message

Cameron Shorter

unread,
Oct 26, 2025, 6:58:34 AM10/26/25
to solarmap-nz-tech
I've had a first crack at defining a data model, which I'm aware has many limitations, but which I hope might be a starting point for our conversations.
image.png
Source is in draw.io format, and I'd be happy if someone wants to fix / build upon this.
Design is currently in this folder. (Probably should move it to git at some point.)

Feedback welcomed, on the content, or the approach, or tools.

--
Cameron Shorter
Business Analyst / Engineer / Technical Writer

Cameron Shorter

unread,
Oct 26, 2025, 5:44:53 PM10/26/25
to solarmap-nz-tech
After mulling over this, I've added a "Scenario" data type.
This is a way to frame the resilience and economic viability of scenarios such as:
* A heatwave, a cloudy week, a blackout, a normal day.

A building/region report is an aggregate of how it handles all scenarios it will face in a year.

We can then discuss:
* What percentage of buildings in a region are economically viable?
* How many more if we changed the incentives?

What are our mitigating strategies and associated costs?
* Is it cheaper to incentivise local battery installations, or build new power stations and high voltage transmission wires.

image.png

hawk...@electricfish.com

unread,
Oct 26, 2025, 6:37:49 PM10/26/25
to solarmap-nz-tech
Hi Cameron -

I took a stab at translating the ERD into SQL. There were some things I wasn't sure about, which I left in comments.

model.sql

Jenny Sahng

unread,
Oct 26, 2025, 10:30:22 PM10/26/25
to hawk...@electricfish.com, solarmap-nz-tech
Hey Cameron, thanks for giving this a first go, great to have something to talk to. And thanks Richard for a first draft on the SQL to create it - that actually helped a lot to flatten these matrices in my mind into something 2D (would you mind putting that up as a PR so that we can comment in-line on it?

I like the elegance of having one MthHrMatrix, whether the value it's storing is energy generation, energy use, weather profile, buy/sell price, etc. However, I do find it a bit hard to wrap my head around (and do let me know if this kind of SSOT matrix is a GIS data convention, in which case of course we should stick with it). Also, I want to keep coming back to the MVP that Josh & Steve mentioned - keeping things like savings calculations out of scope, as long as the DB schema we come up with is extensible to later encompass it.

Therefore, here's my proposed iteration with a smaller number of tables: Building, Scenario, HourlySolarPotential, and Run.

Building
  • id: uuid (primary key)
  • building_address: str
  • region: Enum
  • pv_viable_m2: float (sqm of viable space on building)
  • pv_viable_kw: float (size of the solar PV system that would fit)
  • row_created_date
  • row_updated_date
HourlySolarPotential
  • id: uuid (primary key)
  • building_id: uuid (foreign key to Building table)
  • scenario_id: uuid (foreign key to Scenario table)
  • month: int[1,12]
  • hour: int[0,23]
  • pv_generation_kwh: float (solar generation potential for this building, in this month, and in this hour)
  • season: Enum (bucketing the months into seasons, for easy aggregating later. totally optional)
  • row_created_date
  • row_updated_date
Scenario
For MVP, there will only be one Scenario - the "normal" scenario based on historical weather data. This could be extended in future to add other scenarios like Cameron suggested. I've left out things like financial viability etc. as per Josh & Steve's request to stick to the generation potential, and leave the economics out of the MVP.
  • id: uuid (primary key)
  • description: string, start with just defaulted to "Normal" and this is our best guess at what a typical year.
  • month: int[1,12]
  • hour: int[0,23]
  • (add some stuff here from whatever weather inputs we are using for our estimates!)
  • row_created_date
  • row_updated_date
Run
We can call this something else, but it's metadata about the computation run itself, both for reproducibility and also to figure out if some of the estimates are based on different inputs or an outdated model compared to others. Not sure how necessary this is.
  • id: uuid (primary key)
  • run_by: str (username of who ran it/pushed these results. This is kind of getting into logging territory)
  • model_version: str (semantic version of where the model was at when this was run; this requires us to use semantic versioning in our github repo; we could instead just save the latest commit hash and call it `latest_commit_hash`)
  • data_input: str (file name of the input DSM?)
  • data_weather: str (file name to the weather data - although I understand maybe this is what the CloudyDayProfiles table was trying to achieve? But that seems like uploading input data, which I'm not sure we want to be doing...)
  • Anything else to protect reproducibility?
Information about the region's solar potential, a building's annual solar potential, etc. could just be aggregated views on top of this. Thoughts?

--
You received this message because you are subscribed to the Google Groups "solarmap-nz-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solarmap-nz-te...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/solarmap-nz-tech/dd48f9fa-b946-4334-831b-c1c4869fecdan%40googlegroups.com.

hawk...@electricfish.com

unread,
Oct 27, 2025, 2:26:24 PM10/27/25
to solarmap-nz-tech
I've validated my SQL and pushed it, but it's still the larger model.

I think starting simpler is important (migrating table data is not hard!) so kudos to Jenny for that. 

Two questions:
* Do we need UUIDs for PKs or can we just use a sequence?
* What is the idea behind the date stamps? Is it just a versioning system or did we want to do time travel?

hawk...@electricfish.com

unread,
Oct 27, 2025, 2:37:20 PM10/27/25
to solarmap-nz-tech
One other odd question...how do we define "season"? Is it based on solar irradiance, in which case we probably want to use the "cross-quarter days" (e.g., Feb-Apr is spring) or average temperatures (which would be more like the usual season boundaries (e.g., Mar-Jun is spring)?

Cameron Shorter

unread,
Oct 27, 2025, 4:24:57 PM10/27/25
to hawk...@electricfish.com, solarmap-nz-tech
A few quick responses, with more to come after I mull over things:

Schema design sketching:
* Praise to Richard for knocking up SQL so quickly. I'm not so proficient.
* Q: What is our preferred way to brainstorm and comment on each other's schema ideas? I'm liking the simple structure that Jenny put into an email. Maybe we use that structure in a google doc or git pull.
 
Re database application: 
* I'm in favour of starting with Geopackage, which is an SQLite in one file, is a standard, and widely supported by all geospatial applications, like QGIS. (We can move to PostGIS later if we are looking for performance, which is also widely supported. I'm open to considering DuckDB, but am skeptical that it would be the right database for us. Happy to have someone attempt to convince me otherwise.

Re: Complex vs Simple schema:
* Yes, good idea to start simple.
* I was initially considering the complex as well, so that we ensure that we can scale. Even if we don't populate or implement now.

Re GIS Database basics:
* Re rows/columns: In spatial databases, we typically talk about layers, which equate to a table of similar geometric features: usually Points or Lines, or Polygons. So we'd have a buildings layer, and a regions layer (where Queenstown would be one feature within the region). These all include an attribute called Geometry. 
* It seems grid data like the Digital Surface Model can be stored in the GeoPackage. Interested to hear from others whether it is the right way to go.

Re timestamps:
* I'd suggest timestamps are not collected at the feature level, but rather at the entire layer level. (To reduce data size, and we don't need the granularity)
* So we know when we create a layer, by a specific data run, but no more.

Re season:
* We are most interested in the angle of the sun and shading profiles. We are less interested in weather characteristics.
* Note that the angle of the sun in Autumn is the same as Spring, so we only need to record 3 seasons from a shading perspective.

Jenny Sahng

unread,
Nov 2, 2025, 4:00:41 PM11/2/25
to Cameron Shorter, hawk...@electricfish.com, solarmap-nz-tech
Following up on this thread after reviewing Cameron's v3 data schema:
  • Yeah good point Richard that we don't need UUIDs, looks like Cameron has incorporated that and all the IDs are now just integers - works for me!
  • The date stamps were just for versioning, not to turn it into an event-based replayable thing - so maybe it's unnecessary especially if we have a run ID which would also have the time it was run/computed.
Reply all
Reply to author
Forward
0 new messages