[ExamVT13] Block 2: Dependencies and Normal Forms

119 views
Skip to first unread message

Niklas Broberg

unread,
Mar 7, 2014, 2:34:21 AM3/7/14
to tda357...@googlegroups.com
==================================================
2A (8p)

A website that hosts a centralized revision control system needs a database to store information about repositories, branches, users and patches.
The revision control system stores information about documents contained in some repository, and how those documents are successively changed. Users with access to a repository may look at the revision history to see what changes have been made, and look at particular older versions of documents. Revisions are done in the form of patches, where each patch specifies a set of changes to the repository. Further, several branches of a particular repository can exist, each specifying a different set of revisions (i.e. patches), thus allowing several versions of a repository to exist in parallel.
For the site in question, users must be registered in order to create repositories or supply patches. For each user the database should store their unique login name, the name to be displayed, a unique email adress, and an encrypted password.
Each repository is owned by some user. It also has a name, a description, and a unique identifying short name, used as part of the web page adress for the repository. On the page for the repository, the contact details of the owner will be displayed.
Each branch of the repository is identified by another short name (e.g. “master”, “experimental”, etc), unique within the repository. Users other than the owner can be given read or write access to different branches.
Each patch is created as a set of changes against a particular branch, by some user with write access to that branch. For each patch, the system stores a unique patch id (a hash code); the branch and repository that the patch applies to; the user details; a user-supplied title; the time the patch was applied; and finally the changes themselves. The order of patches is important, so for a given repository, no two patches can be applied at exactly the same time.

You are given the following schema of their intended database:

Users(login, name, encPwd)
Repositories(shortName, repoName, owner, ownerName, ownerEmail)
owner -> Users.login
Branches(branchName, repository)
repository -> Repositories.shortName
HasAccess(user, branch, repository, level)
user -> Users.login
(branch, repository) -> Branches.(branchName, repository)
Patches(patchId, repository, branch, byUser, userEmail, title, time, changes)
(branch, repository) -> Branches.(branchName, repository)
byUser -> Users.login

This schema is not fully normalized, and thus suffers from a number of problems. It is your task to solve these by normalization of the schema.

(i) (4p)
For the given domain, identify all functional dependencies that are expected to hold.

(ii) (1p)
With the dependencies you have found, identify all BCNF violations in the relations of the database.

(iii) (3p)
Do a complete normalization of the schema, so that all relations are in BCNF. Also ensure that all key constraints are properly captured. (It’s the end product that’s important, not the steps you take to get there.)

==================================================

2B (12p)
Consider the domain presented for the revision control system above, and then consider the hypothetical independencies listed below. For each of those, state whether you expect it to hold for this domain, and explain why/why not.

i. login ->> encPwd
ii. shortName ->> branchName
iii. branchName, repository ->> login, level
iv. login, repository ->> branchName, level

(Note for students having taken the course several years ago: Back then we referred to independencies as “multi-valued dependencies (MVDs)”. The two terms are equivalent.)
==================================================

Niklas Broberg

unread,
Mar 13, 2014, 11:38:33 AM3/13/14
to tda357...@googlegroups.com
Suggested answers:
 
==================================================
2A (8p)

A website that hosts a centralized revision control system needs a database to store information about repositories, branches, users and patches.
[...]
This schema is not fully normalized, and thus suffers from a number of problems. It is your task to solve these by normalization of the schema.

(i) (4p)
For the given domain, identify all functional dependencies that are expected to hold.

The first thing to realize is what the set of attributes is. Several attributes will occur several times under different names - but they are still conceptually the same attribute. For instance, 'shortName' and 'repository' are the same attribute, as are 'login' and 'owner'. Which name I use to refer to these attributes doesn't really matter. I will consistently use the one that is the 'root' of all the others, i.e. the one that the others reference.

(1) login -> email, name, encPwd
(2) email -> login (, name, encPwd)
(3) shortName -> repoName, login 
(4) shortName, branchName, login -> level
(5) patchId -> shortName, branchName, login, title, time, changes
 
(ii) (1p)
With the dependencies you have found, identify all BCNF violations in the relations of the database.

(1) violates BCNF for both Repositories and Patches, as does (2). None of the other FDs violate BCNF for any of the relations.
 
(iii) (3p)
Do a complete normalization of the schema, so that all relations are in BCNF. Also ensure that all key constraints are properly captured. (It’s the end product that’s important, not the steps you take to get there.)

We need to "fix" Repositories and Patches - both times the fix is to decompose away the user email and name. We get the following relations instead (apart from the ones that already exist):

Repositories(_shortName_, repoName, owner)
  owner -> Users.login
Patches(_patchId_, repository, branch, byUser, title, time, changes)
  (repository, branch) -> Branches.(repository, branchName)
Emails(_user_, email)
  user -> Users.login

Users' names are already present in Users. (2) indicates that we need a uniqueness constraint on Emails.email. It would be perfectly fine (even preferable, but not necessary in order to fulfill BCNF) to merge Emails with Users, to instead get one relation:

Users(_login_, name, encPwd, email)
  email unique
 
==================================================

2B (12p)
Consider the domain presented for the revision control system above, and then consider the hypothetical independencies listed below. For each of those, state whether you expect it to hold for this domain, and explain why/why not.

i. login ->> encPwd

Definitely holds. All FDs are INDs.
 
ii. shortName ->> branchName

Does not hold. For a given repository, the set of branch names in that repository are not independent of everything else, e.g. user access levels and patches are not independent of branchNames.
 
iii. branchName, repository ->> login, level

With a caveat it holds - we graded "kindly" and allowed both solutions. It doesn't literally hold, since login is not independent of e.g. email and user name. But if you make the assumption, which a few did, that we by 'login' mean 'login and all that it functionally determines', then it holds. For a given branch of a repository, the set of users (and user information) and their respective access levels are naturally dependent on one another (not every user has every access level), but they are independent of all other attributes.
 
iv. login, repository ->> branchName, level

Does not hold. The question is, for a given user and repository, are the set of branches and access levels for that user (dependent on each other but) independent of everything else? No - each branch is related to a set of patches, and not every branch is related to every patch.

Niklas Broberg

unread,
Mar 13, 2014, 3:21:32 PM3/13/14
to tda357...@googlegroups.com
==================================================
2A (8p)

The order of patches is important, so for a given repository, no two patches can be applied at exactly the same time.

Ah, no one has ever gotten full points on one of my exams (yet) - not even me! Of course we should also have the following FD:

shortName, time -> patchId

Which in turn induces the following uniqueness constraint on Patches:

(shortName, time) unique

I apologize for the potential confusion.
Reply all
Reply to author
Forward
0 new messages