Design Theory for Relational Databases

Functional Dependencies

There is a design theory for relations that lets use examine a design carefully and make improvements based on a few simple principles.

The so called "functional dependencies" is a statement of a type that generalizes the idea of a key for a relation.

Definition of Functional Dependency

A functional dependency on a relation (R) is a statement of the form "If two tuples of (R) agree on all of the attributes (A_1,A_2,cdots,A_n)", then they must agree on all of another list of attributes (B_1,B_2,cdots,B_m).

We write this functional dependency as (A_1A_2cdots A_n ightarrow B_1B_2cdots B_m) and say that "(A_1A_2cdots A_n) functionally determines (B_1B_2cdots B_m)".

If we can make sure that every record of the relation (R) will be one in which a given functional dependency is true, then we say the relation (R) satisfies an functional dependency (f).

Keys of Relations

We say a set of one or more attributes ({A_1,A_2,cdots,A_n}) is a key for a relation (R) if:

Those attributes functionally determine all other attributes of the relation.
No proper subset of ({A_1,A_2,cdots,A_n}) functionally determine all other attributes of (R).

Superkeys

A set of attributes that contains a key is called a superkey, short for "superset of a key". Every superkey satisfies the first condition of a key, however, the second condition is not satisfied for a superkey that contains more than just a key.

Rules About Functional Dependencies

Suppose we are told of a set of functional dependencies that a relation satisfies. Often, we can deduce that the relation must satisfy certain other functional dependencies. This ability of discovering additional functional dependencies is essential when we talk about the design of good relation schemas later.

Reasoning About Functional Dependencies

Functional dependencies often can be presented in several different ways, without changing the set of legal instances of the relation. We say:

The Splitting/Combining Rule

Splitting rule: If (A_1A_2cdots A_n ightarrow B_1B_2cdots B_m), then (A_1A_2cdots A_n ightarrow B_i) for (i=1,2,cdots,m).

Combining Rule: If (A_1A_2cdots A_n ightarrow B_i) for (i=1,2,cdots,m), then (A_1A_2cdots A_n ightarrow B_1B_2cdots B_m).

Trivial Functional Dependencies

A constraint of any kind on a relation is said to be trivial if it holds for every instance of the relation, regardless of what other constraints are assumed.

Say (X ightarrow Y), if (Y) is not a subset of (X), then the functional dependency (X ightarrow Y) is not trivial. If (Y) is a subset of (X), the functional dependency is trivial.

Computing the Closure of Attributes

Starting with the given set of attributes, we repeatedly expand the set by adding the right sides of functional dependencies as soon as we have included their left sides. Eventually, we cannot expand the set any further, and the resulting set is the closure.

We denote the closure of a set of attributes (A_1A_2cdots A_n) by ({A_1,A_2,cdots,A_n}^+).

By computing the closure of any set of attributes, we can test whether any given functional dependency (A_1A_2cdots A_n ightarrow B) follows from a set of functional dependencies (S). If (B) is in ({A_1,A_2,cdots,A_n}^+), then the functional dependency does follow from (S). Else, it does not follow from (S).

Armstrong Principles System

In 1974, W. W. Armstrong concluded a set of functional dependencies deduction rules, which is called the Armstrong Principles System.

Say a relation (R(U,F)), (U) is a set of attributes and (F) is a set of functional dependencies. For the relation (R), we have such deduction rules:

Reflexivity: If (Ysubseteq Xsubseteq U), then (X ightarrow Y).
Augmentation: If (X ightarrow Y), then (XZ ightarrow YZ).
Transitivity: If (X ightarrow Y, Y ightarrow Z), then (X ightarrow Z).
The Combining Rule: If (X ightarrow Y,X ightarrow Z), then (X ightarrow YZ).
The Pseudo Transitivity Rule: If (X ightarrow Y, WY ightarrow Z), then (XW ightarrow Z).
The Decomposition Rule: If (X ightarrow Y) and (Zsubseteq Y), then (X ightarrow Z).

Closing Sets of Functional Dependencies

Sometimes we have a choice of which functional dependencies we use to represent the full set of functional dependencies for a relation. If we were given a set of functional dependencies (S), then any set of functional dependencies equivalent to (S) is said to be a basis for S.

A minimal basis for a relation is a basis (B) that satisfies three conditions:

All the functional dependencies in (B) have singleton right sides.
If any functional dependency is removed from (B), the result is no longer a basis.
If for any functional dependency in (B) we remove one or more attributes from the left side of (F), the result is no longer a basis.

Projecting Functional Dependencies

Suppose we have a relation (R) with set of functional dependencies (S), and we project (R) by computing (R_1=pi_L(R)), for some list of attributes (R). When functional dependencies hold in (R_1)?

This algorithm is very useful when we need to decompose a relation.

Algorithm:

Let (T) be the eventual output set of functional dependencies. Initially, (T) is empty.
For each set of attributes (X) that is a subset of the attributes of (R_1), compute (X^+). This computation is performed with respect to the set of functional dependencies (S), and may involve attributes that are in the schema of (R) but not (R_1). Add to all nontrivial functional dependencies (X ightarrow A) such that (A) is both in (X^+) and an attribute of (R_1).
Now, (T) is a basis for the functional dependencies that hold in (R_1), but may not be the minimal basis. We can construct a minimal basis by modifying (T) as follows:
1. If there is an functional dependency (F) in (T) that follows from the other functional dependencies in (T), remove (F) from (T).
2. Let (Y ightarrow B) be an functional dependency in (T), with at least two attributes in (Y), and let (Z) be (Y) with one of its attributes removed. If (Z ightarrow B) follows from the functional dependencies in (T), then replace (Y ightarrow B) by (Z ightarrow B).
3. Repeat the above steps in all possible ways until no more changes to (T) can be made.

By definition, it may be a little hard to understand. Let's take a example: Suppose (R(A,B,C,D)) has functional dependencies (A ightarrow B,B ightarrow C,) and (C ightarrow D). Suppose also that we wish to project out the attribute (B), leaving a relation (R_1(A,C,D)). In principle, to find the functional dependencies for (R_1), we need to take the closure of all eight subsets of ({A,B,C }), using the full set of functional dependencies, including those involving (B). However, there are some obvious simplifications we can make.

Thus, we may start with the closures of the singleton sets, and then move on to the doubleton sets if necessary. For each closure of a set (X), we add the functional dependency (X ightarrow E) for each attribute (E) that is in (X^+) and in the schema of (R_1), but not in (X).

First, ({A}^+ = {A,B,C,D}). Thus, (A ightarrow C) and (A ightarrow D) hold in (R_1). Note that (A ightarrow B) is true in (R), but makes no sense in (R_1) because (B) is not an attribute of (R_1).

Next, we consider ({C}^+ = {C,D}), from which we get the additional functional dependency (C ightarrow D) for (R_1).

Since ({D}^+ = {D}), we can add no more functional dependencies, and are done with the singletons.

Since ({A}^+) already includes all attributes in (R_1), there is no point in considering any superset of ({A}). Thus, the only doubleton whose closure we need to take is ({C,D}^+ = {C,D}). This observation allows us to add nothing. We are done with the closures, and functional dependencies we have discovered are (A ightarrow C,A ightarrow D) and (C ightarrow D). A simpler, equivalent set of functional dependencies for (R_1) is (A ightarrow C, C ightarrow D).

Design of Relational Database Schemas

Careless selection of a relational database schema can lead to redundancy and related anomalies.

Anomalies

Problems such as redundancy occur when we try to cram too much into a single relation are called anomalies.

The principal kinds of anomalies that we encounter are:

Redundancy
Update Anomalies. We mat change information in one tuple but leave the same information unchanged in another.
Deletion Anomalies. If a set of values becomes empty, we may lose other information as a side effect.

Decomposing Relations

The accepted way to eliminate these anomalies is to decompose relations. Decomposition of (R) involves splitting the attributes of (R) to make the schemas of two new relations.

Given a relation (R(A_1,A_2,cdots, A_n)), we may decompose (R) into two relations (S(B_1,B_2,cdots, B_m)) and (T(C_1,C_2,cdots,C_k)) such that:

({A_1,A_2,cdots, A_n} = {B_1,B_2,cdots, B_m}cup {C_1,C_2,cdots,C_k}).
(S= pi_{B_1,B_2,cdots,B_m}(R))
(T=pi_{C_1,C_2,cdots,C_k}(R))

Boyce-Codd Normal Form

The goal of decomposition is to replace a relation by several that do not exhibit anomalies. There is a condition called Boyce-Codd normal form, or BCNF, under which the anomalies discussed can be guaranteed not to exist.