2A (8p)
A website that hosts a centralized revision control system needs a database to store information about repositories, branches, users and patches.
The revision control system stores information about documents contained in some repository, and how those documents are successively changed. Users with access to a repository may look at the revision history to see what changes have been made, and look at particular older versions of documents. Revisions are done in the form of patches, where each patch specifies a set of changes to the repository. Further, several branches of a particular repository can exist, each specifying a different set of revisions (i.e. patches), thus allowing several versions of a repository to exist in parallel.
For the site in question, users must be registered in order to create repositories or supply patches. For each user the database should store their unique login name, the name to be displayed, a unique email adress, and an encrypted password.
Each repository is owned by some user. It also has a name, a description, and a unique identifying short name, used as part of the web page adress for the repository. On the page for the repository, the contact details of the owner will be displayed.
Each branch of the repository is identified by another short name (e.g. “master”, “experimental”, etc), unique within the repository. Users other than the owner can be given read or write access to different branches.
Each patch is created as a set of changes against a particular branch, by some user with write access to that branch. For each patch, the system stores a unique patch id (a hash code); the branch and repository that the patch applies to; the user details; a user-supplied title; the time the patch was applied; and finally the changes themselves. The order of patches is important, so for a given repository, no two patches can be applied at exactly the same time.
You are given the following schema of their intended database:
Users(login, name, encPwd)
Repositories(shortName, repoName, owner, ownerName, ownerEmail)
owner -> Users.login
Branches(branchName, repository)
repository -> Repositories.shortName
HasAccess(user, branch, repository, level)
user -> Users.login
(branch, repository) -> Branches.(branchName, repository)
Patches(patchId, repository, branch, byUser, userEmail, title, time, changes)
(branch, repository) -> Branches.(branchName, repository)
byUser -> Users.login
This schema is not fully normalized, and thus suffers from a number of problems. It is your task to solve these by normalization of the schema.
(i) (4p)
For the given domain, identify all functional dependencies that are expected to hold.
(ii) (1p)
With the dependencies you have found, identify all BCNF violations in the relations of the database.
(iii) (3p)
Do a complete normalization of the schema, so that all relations are in BCNF. Also ensure that all key constraints are properly captured. (It’s the end product that’s important, not the steps you take to get there.)
2B (12p)
Consider the domain presented for the revision control system above, and then consider the hypothetical independencies listed below. For each of those, state whether you expect it to hold for this domain, and explain why/why not.
i. login ->> encPwd
ii. shortName ->> branchName
iii. branchName, repository ->> login, level
iv. login, repository ->> branchName, level
(Note for students having taken the course several years ago: Back then we referred to independencies as “multi-valued dependencies (MVDs)”. The two terms are equivalent.)