When to normalize database?

38 views

Skip to first unread message

william yang

unread,

Nov 26, 2015, 10:56:16 PM11/26/15

to mongodb-user

Hello!

I have got a setup structure as follows (in golang + Mgo v2 driver):

type Unit struct {
 // Binary JSON Identity
 ID bson.ObjectId `bson:"_id,omitempty"`
 // App-level Identity
 BldgUUID   string `bson:"ub"`
 FloorUUID  string `bson:"uf"`
 UnitUUID   string `bson:"uu"`
 UnitNumber string `bson:"un"`
 // Tenant Info
 TeanantID string `bson:"ut"`
 // Leasing Term Info
 LeaseStart time.Time `bson:"us"`
 LeaseEnd   time.Time `bson:"ue"`
 LeaseCost float32 `bson:"uc"`
 // Statistics
 Created time.Time `bson:"uz"`
}

type Floor struct {
 // Binary JSON Identity
 ID bson.ObjectId `bson:"_id,omitempty"`
 // App-level Identity
 FloorUUID string `bson:"f"`
 // Floor Info
 FloorNumber int `bson:"l"`
 // Units
 FloorUnits []string `bson:"u"`
 // Statistics
 Created time.Time `bson:"y"`
}


type Building struct {
 // Binary JSON Identity
 ID bson.ObjectId `bson:"_id,omitempty"`
 // App-level Identity
 BldgUUID string `bson:"b"`
 // Address Info
 BldgNumber  string `bson:"i"` // Street Number
 BldgStreet  string `bson:"s"` // Street
 BldgCity    string `bson:"c"` // City
 BldgState   string `bson:"t"` // State
 BldgCountry string `bson:"x"` // Country
 // Building Info
 BldgName      string `bson:"w"`
 BldgOwner     string `bson:"o"`
 BldgMaxTenant int    `bson:"m"`
 BldgNumTenant int    `bson:"n"`
 // Floors
 BldgFloors []Floor `bson:"p"`
 // Statistics
 Created time.Time `bson:"z"`
}

Should I normalize Units into a table of its own and have FloorUnits[] hold reference ID's to where the units are? At this stage I feel like all the pipeline and aggregation work may be slowing down my queries. Such as for example:

func GetUnitsFromFloor(database *mgo.Database, bhash string, floor_num string) {

	fn, _ := strconv.Atoi(floor_num)
	var resp map[string]interface{}

	pipeline := []bson.M{
		{"$match": bson.M{"b": bhash}},
		{"$unwind": "$p"},
		{"$unwind": "$p.u"},
		{"$match": bson.M{
			"p.l": fn,
		}},
		{"$project": bson.M{
			"uid": "$p.u.uu",
		}},
	}

	iter := database.C("buildings").Pipe(pipeline).Iter()
	for iter.Next(&resp) {
		fmt.Printf("Result: %s\n", resp["uid"])
	}
}

I feel like I'm mostly right but the really old MySQL part of me is telling me I should split this into 2 tables. At this stage am I doing anything wrong or against Mongo conventions? Should I continue with the style of aggregate pipelining like above?

Thanks!

Wan Bachtiar

unread,

Dec 4, 2015, 12:35:30 AM12/4/15

to mongodb-user

At this stage am I doing anything wrong or against Mongo conventions?

Hi William,

Data in MongoDB has a flexible schema. Flexible schema allows you to focus on your application design and let the database design conform for the benefit of the application (See Domain Driven Design approach).

Before deciding whether to normalise the design or not, you need to consider the application usage of the data (i.e. queries and data processing) as well as the inherent structure of the data itself. i.e. You need to make sure the schema change benefit your application in overall, not just one type of operation.

See Data Modelling Introduction and Data Model Examples and Patterns for more information and examples.

Should I continue with the style of aggregate pipelining like above?

Based on your aggregation pipeline in GetUnitsFromFloor(), a similar result could be achieved using find(). For example, to query all units from building gen and floor number 10 via Mongo Shell:

db.buildings.find(
    /* Match building 'gen' and floor number '10' */
    {"b": "gen", 
     "p.l": 10, 
    },
    /* Project matching floors units id */
    {"p.$.u.u":1}
);

Try using the above find() solution, and see the performance difference.

An alternative design approach that you could try is to use sub-documents rather than an array of documents. This is given that a floor number is unique within a building, and a unit number is unique within a building floor.
For example:

{   BldgUUID : "1234-5678-9012-3456",
    BldgName : "gen",
    Floors: {
        "1": {
            CreatedTime: "2015-12-04T15:21:30",
            Units: {
                "1": {
                    TenantID: "1234",
                    LeaseStart: "2015-12-01T00:00:00",
                    LeaseCost: "500"
                },
                "2": {
                    TenantID: "5678",
                    LeaseStart: "2015-11-01T00:00:00",
                    LeaseCost: "400"
                }
            }
        },
        "2": {
            ...
        }
}}

This way you could have a more meaningful document. For example to find out a lease cost from a building document for floor number 1 and unit number 1:

building["Floors"]["1"]["Units"]["1"].LeaseCost

Regards,

Wan.

Reply all

Reply to author

Forward

0 new messages