When to normalize database?

38 views
Skip to first unread message

william yang

unread,
Nov 26, 2015, 10:56:16 PM11/26/15
to mongodb-user
Hello!

I have got a setup structure as follows (in golang + Mgo v2 driver):

type Unit struct {
 
// Binary JSON Identity
 ID bson
.ObjectId `bson:"_id,omitempty"`
 
// App-level Identity
 
BldgUUID   string `bson:"ub"`
 
FloorUUID  string `bson:"uf"`
 
UnitUUID   string `bson:"uu"`
 
UnitNumber string `bson:"un"`
 
// Tenant Info
 
TeanantID string `bson:"ut"`
 
// Leasing Term Info
 
LeaseStart time.Time `bson:"us"`
 
LeaseEnd   time.Time `bson:"ue"`
 
LeaseCost float32 `bson:"uc"`
 
// Statistics
 
Created time.Time `bson:"uz"`
}

type
Floor struct {
 
// Binary JSON Identity
 ID bson
.ObjectId `bson:"_id,omitempty"`
 
// App-level Identity
 
FloorUUID string `bson:"f"`
 
// Floor Info
 
FloorNumber int `bson:"l"`
 
// Units
 
FloorUnits []string `bson:"u"`
 
// Statistics
 
Created time.Time `bson:"y"`
}


type
Building struct {
 
// Binary JSON Identity
 ID bson
.ObjectId `bson:"_id,omitempty"`
 
// App-level Identity
 
BldgUUID string `bson:"b"`
 
// Address Info
 
BldgNumber  string `bson:"i"` // Street Number
 
BldgStreet  string `bson:"s"` // Street
 
BldgCity    string `bson:"c"` // City
 
BldgState   string `bson:"t"` // State
 
BldgCountry string `bson:"x"` // Country
 
// Building Info
 
BldgName      string `bson:"w"`
 
BldgOwner     string `bson:"o"`
 
BldgMaxTenant int    `bson:"m"`
 
BldgNumTenant int    `bson:"n"`
 
// Floors
 
BldgFloors []Floor `bson:"p"`
 
// Statistics
 
Created time.Time `bson:"z"`
}

Should I normalize Units into a table of its own and have FloorUnits[] hold reference ID's to where the units are? At this stage I feel like all the pipeline and aggregation work may be slowing down my queries. Such as for example:

func GetUnitsFromFloor(database *mgo.Database, bhash string, floor_num string) {

fn, _ := strconv.Atoi(floor_num)
var resp map[string]interface{}

pipeline := []bson.M{
{"$match": bson.M{"b": bhash}},
{"$unwind": "$p"},
{"$unwind": "$p.u"},
{"$match": bson.M{
"p.l": fn,
}},
{"$project": bson.M{
"uid": "$p.u.uu",
}},
}

iter := database.C("buildings").Pipe(pipeline).Iter()
for iter.Next(&resp) {
fmt.Printf("Result: %s\n", resp["uid"])
}
}

I feel like I'm mostly right but the really old MySQL part of me is telling me I should split this into 2 tables. At this stage am I doing anything wrong or against Mongo conventions? Should I continue with the style of aggregate pipelining like above?

Thanks!

Wan Bachtiar

unread,
Dec 4, 2015, 12:35:30 AM12/4/15
to mongodb-user

At this stage am I doing anything wrong or against Mongo conventions?

Hi William,

Data in MongoDB has a flexible schema. Flexible schema allows you to focus on your application design and let the database design conform for the benefit of the application (See Domain Driven Design approach).

Before deciding whether to normalise the design or not, you need to consider the application usage of the data (i.e. queries and data processing) as well as the inherent structure of the data itself. i.e. You need to make sure the schema change benefit your application in overall, not just one type of operation. 

See Data Modelling Introduction and Data Model Examples and Patterns for more information and examples.


Should I continue with the style of aggregate pipelining like above?

Based on your aggregation pipeline in GetUnitsFromFloor(), a similar result could be achieved using find(). For example, to query all units from building gen and floor number 10 via Mongo Shell:

db.buildings.find(
    /* Match building 'gen' and floor number '10' */
    {"b": "gen", 
     "p.l": 10, 
    },
    /* Project matching floors units id */
    {"p.$.u.u":1}
);

Try using the above find() solution, and see the performance difference.


An alternative design approach that you could try is to use sub-documents rather than an array of documents. This is given that a floor number is unique within a building, and a unit number is unique within a building floor.
For example:

{   BldgUUID : "1234-5678-9012-3456",
    BldgName : "gen",
    Floors: {
        "1": {
            CreatedTime: "2015-12-04T15:21:30",
            Units: {
                "1": {
                    TenantID: "1234",
                    LeaseStart: "2015-12-01T00:00:00",
                    LeaseCost: "500"
                },
                "2": {
                    TenantID: "5678",
                    LeaseStart: "2015-11-01T00:00:00",
                    LeaseCost: "400"
                }
            }
        },
        "2": {
            ...
        }
}}

This way you could have a more meaningful document. For example to find out a lease cost from a building document for floor number 1 and unit number 1:

building["Floors"]["1"]["Units"]["1"].LeaseCost


Regards,

Wan.

Reply all
Reply to author
Forward
0 new messages