Best practice for handling denormalization

261 views
Skip to first unread message

Rhys Patterson

unread,
May 5, 2015, 8:14:39 PM5/5/15
to fireba...@googlegroups.com
After years of RDBMS I am coming to terms with NoSQL's advantages and caveats, and Firebase is making it a much easier and enjoyable process. I am stuck, however, with handling duplicate data due to denormalization, and was hoping I could get insight into possible solutions.

As an contrived example, I have two entities, Users and Groups, with data existing for each under the root paths users and groups respectively.
When a user joins a group, I store the following data:
  • A mapped ID reference to the group within that users path (/users/{user_id_1}/group/{group_id_1} = true)
  • A mapped ID reference to the user within the group path (/groups/{group_id_1}/users/{user_id_1} = true)
  • If they are the group administrator, an entry referencing this (/group_admin/{group_id_1}/{user_id_1} = true)
  • A list array summarizing the group administrator's names (/groups/{group_id_1}/admins = [0: John Smith, 1: Jane Doe])

What is the best practice to guarantee that all this data is maintained during CRUD operations? My current ideas, with concerns:
  1. To maintain the processing and checking of this within each client app I develop that affects this data
    Concern: Issues if the latest app version processes more relations but the client is using a dated app version

  2. Build a cron worker that runs on a regular interval to verify relation integrity, with potential efficiently by indexing my main 'models' with updated timestamps to only process changes since the last cron
    Concern: A delay in building these relations for the users

Any help is appreciated. 
Loving Firebase and the security/rules are amazing, which makes me think maybe my requirements could have a similar solution.

Vincent Bergeron

unread,
May 5, 2015, 8:41:48 PM5/5/15
to fireba...@googlegroups.com

Would it be possible for you to use this structure instead?

/users/{userId}
/groups/{groupId}
/users_groups/{associationKey} (with userId: key, groupdId: key, isAdmin: boolean as the entity data)

This way, you only have to manage one relation-node, you can query the users_groups node with userId and groupId.

Also, using this way, if you only want to retrieve the user profile, you don't retrieve also the groups associations. Same logic here for groups, you don not retrieve all the user associations.

Sometimes, even with NoSQL data store, you have to think as with a DBSM.

Rhys Patterson

unread,
May 5, 2015, 9:21:12 PM5/5/15
to fireba...@googlegroups.com
Thanks, that is insightful and I'm still working through my understanding of where I should store linking/look up references with this structure opposed to relational DBs.

Is the associationKey you refer to a unique ID (eg. a Firebase push ID)? Then is there an efficient way for me to query all groups from a user ID, or vice versa all users for a group?

Vincent Bergeron

unread,
May 5, 2015, 9:25:48 PM5/5/15
to fireba...@googlegroups.com
I'm referring to the Firebase unique ID.

To query, it depends of your language. If you're using Javascript, here's an example:

var ref = new Firebase("firebaseUrl");
ref.child("users_groups").orderByChild("idUser").equalTo(myUserId).on('child_added', function(snapshot) { ... });
ref.child("users_groups").orderByChild("idGroup").equalTo(myGroupId).on('child_added', function(snapshot) { ... });

Don't forget to add an index for each key on the Rules tabs:

"users_group":{
   ".indexOn": ["idUser","idGroup"]
}

Rhys Patterson

unread,
May 5, 2015, 9:29:13 PM5/5/15
to fireba...@googlegroups.com
Perfect, I actually came across that query solution as you replied, much appreciated Vincent!
In fact, I knew of the querying (and indexing) but I had to confirm AngularFire, which I am prototyping with, supported it, which it does via $firebaseArray.

Thanks again.

Rhys Patterson

unread,
May 6, 2015, 11:42:26 AM5/6/15
to fireba...@googlegroups.com
Did you have any suggestions on handling any cache-like denormalisation? 

For example, I could store the names of the users within the users_groups pivot entries, but all versions of client code would have to handle maintaining this when users are added, modified or deleted. I understand this is possible and essentially comes down to formally structured model management but not sure if it is the best practice or there are alternatives I have not considered.

The pivot itself works perfectly for unique data (eg, admin status) but I'm still vague on understanding duplicate data management solutions.

Kato Richardson

unread,
May 6, 2015, 11:47:14 AM5/6/15
to fireba...@googlegroups.com
Rhys,

The readme on firebase-multi-write covers some use cases and alternatives. It's not exactly what you've asked for here but a good read.

Cheers,
Kato


--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-tal...@googlegroups.com.
To post to this group, send email to fireba...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/d60365bf-2886-49c7-a16e-10361e0c94fa%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vincent Bergeron

unread,
May 6, 2015, 11:47:23 AM5/6/15
to fireba...@googlegroups.com
Actually, what I would do in your case is not a data-solution but a process-solution.

I would create a node.js application that does just that. Managing the "cache". It listens to child_added, child_modified, child_removed of the Users node and update the users_groups node.

With that solution, you only have one piece of code doing the job, you do not have to write the logic in every client.

VB

Rhys Patterson

unread,
May 6, 2015, 9:04:36 PM5/6/15
to fireba...@googlegroups.com
Thanks Vincent, good advice that is along the lines of my original concept of a cron worker, but taking advantage of Firebase's push events which is much more elegant. Appreciated once again.

I think what was confusing me was the idea of denormalization and duplicate content representing the same data, maybe the cron/listener solution is the standard for managing this.
Reply all
Reply to author
Forward
0 new messages