elkarel
unread,Apr 23, 2012, 7:01:09 AM4/23/12You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to
I am beginner in Data Warehouse Design. I have red some theory, but
recently met a practical problem with a design of a OLAP cube. I use
star schema.
Lets say I have 2 dimension tables and 1 fact table:
Dimension Gazetteer:
- dimension_id
- country_name
- province_name
- district_name
Dimension Device:
- dimension_id
- device_category
- device_subcategory
Fact table:
- gazetteer_id
- device_dimension_id
- hazard_id (column for measure called "number of hazards")
- area_m2 (column for measure called "total area occupied")
A "business object" (which is a mine field actually) can have multiple
devices, is located in a single location (Gazetteer) and ocuppies X
square meters.
So in order to know which device categories there are, I created a
fact per each device in hazard like this:
+--------------+---------------------+-----------------------
+-----------+
| gazetteer_id | device_dimension_id | hazard_id |
area_m2 |
+--------------+---------------------+-----------------------
+-----------+
| 123 | 321 | 0a0a-502c-11aa1331e98 |
6000 |
+--------------+---------------------+-----------------------
+-----------+
| 123 | 654 | 0a0a-502c-11aa1331e98 |
6000 |
+--------------+---------------------+-----------------------
+-----------+
| 123 | 987 | 0a0a-502c-11aa1331e98 |
6000 |
+--------------+---------------------+-----------------------
+-----------+
I defined a measure "number of hazards" as distinct-count of
hazard_id.
I also defined a "total area occupied" measure as a sum of area_m2.
Now I can use the dimension gazetteer and device and know how many
hazards there are with given dimension members.
But the problem is the "total area occupied" measure: because it is
defined as a sum, it gives a value n-times higher than the actual
area, where n is th number of devices of the particular hazard object.
For example, with the data above would give 18000m2.
How would you solve this problem?
I am using the Pentaho stack but I believe it is a generic question.
Thanks in advance for any suggestion