REVIEW:
RELATIONAL DATA STRUCTURE Definitions:
Formal relational term informal term(s)
relation table or file
tuple row or record
attribute column or field
The Domain of an attribute is the pool of legal values for that attr.
and corresponds roughly to notion of data type.
COMPOSITE DOMAIN: Product of simple domains
RELATION on sets, D1,...Dn consists of a schema (heading) and a body.
SCHEMA = A set of attribute names, A1,...An, one for each domain
(Ai is associated with domain, Di)
BODY = A set of tuple instances: t:{A1..An} -> UNION(D1..Dn),
such that t(Ai) is in Di, for i=1..n.
DEGREE OF A RELATION = number of attributes (informally it is
the "width" of relation and is constant)
CARDINALITY OF A RELATION = # of tuples (informally it is the
"depth" of relation and it varies over time)
There are no duplicate tuples (Some DBMSs allows duplicates,
but then they're not strictly relational)
To review a few things:
CANDIDATE KEY is a set of attributes, K=(Ai..Ak) from the schema,
A1..An, which, at all times, satisfies:
1. UNIQUENESS: no 2 distinct tuples have same K-value,
2. MINIMALITY: If any of the Ai..Ak are discarded the uniqueness property no longer holds.
PRIMARY KEY = one particular candidate key designated as primary
(it is usually chosen as the main look-up or identifier attribute)
ALTERNATE KEY: non-primary candidate key
For example, the NDSU STUDENT relation may have NAID as Primary key and SSN as alternate key.
An example database (called EDUC):
STUDENT-FILE COURSE FILE
.________________. ._____________.
|S#|SNAME |LCODE | |C#|CNAME|SITE|
|==|======|======| |==|=====|====|
|25|CLAY |NJ5101| |8 |DSDE |ND |
|32|THAISZ|NJ5102| |7 |CUS |ND |
|38|GOOD |FL6321| |6 |3UA |NJ |
|17|BAID |NY2091| |5 |3UA |ND |
|57|BROWN |NY2092| `-------------'
`----------------'
ENROLL FILE STATUS FILE
.___________. ._____________
|S#|C#|GRADE| |LCODE |STATUS|
|==|==|=====| |======|======|
|32|8 | 89 | |NJ5101| 1 |
|32|7 | 91 | |NJ5102| 1 |
|25|7 | 68 | |FL6321| 4 |
|25|6 | 76 | |NY2091| 3 |
|32|6 | 62 | |NY2092| 3 |
|38|6 | 98 | `-------------'
|17|5 | 96 |
------------
Every relation has at least 1 candidate key (it may be whole schema)
FOREIGN KEY = attribute(s) in one relation, each of whose values match
one in primary key of another relation.
(both attributes must have the same domain)
ENROLL.C# is foreign key to COURSE.C#
INTEGRITY CONSTRAINTS FOR RELATIONAL DATABASEs
The following two Integrity Constraints are almost always specified.
ENTITY INTEGRITY: no attribute value in a primary key can by null.
REFERENTIAL INTEGRITY: IF S.B (attribute, B, in the relation, S) is a
foreign key attribute to R.A (primary key in relation, R) then
every S.B-value must appear in R.A or be wholly null.
Maintaining referential integrity:
1. automatically cascading of deletes [updates] to the foreign
key reference when corresponding primary key value
is deleted [updated].
2. restrict primary key deletes to cases where there's no
foreign key reference
3. automatically set foreign key referencing values to null.
END REVIEW
----------
RELATIONAL ALGEBRA operators:
Traditional operators:
union,
intersection,
difference,
Cartesian product
Two relations are UNION COMPATIBLE if they have same degree and the same domains
- Assume relations, A and B, below are union compatible.
- R.A refers to attribute, A, of relation, R
UNION of A and B (A ∪ B) is the set of all tuples which are
either in A or in B (or in both).
SELECT emp.e# FROM emp WHERE emp.title=manager
UNION
SELECT emp.e# FROM emp WHERE emp.title=coach;
equivalently
SELECT emp.e# FROM emp
WHERE emp.title=manager OR emp.title=coach;
INTERSECTION A and B (A ∩ B) is the set of tuples which
are both in A and in B.
SELECT emp.e# FROM emp WHERE emp.title=manager
INTERSECTION
SELECT emp.e# FROM emp WHERE emp.title=coach;
equivalently
SELECT emp.e# FROM emp
WHERE emp.title=manager AND emp.title=coach;
i.e., can use INTERSECTION SQL operator or
AND logical operator (within SQL WHERE clause)
DIFFERENCE between A and B (A-B) is the set of tuples
in A but not in B.
SELECT * FROM emp
EXCEPT SELECT * FROM emp WHERE emp.title=manager
CARTESIAN PRODUCT of C and D (CxD) = set of all concatenations of a tuple
from C with a tuple from D.
SELECT emp.*, dept.* FROM emp, dept;
AxB does not require that C and D to be union compatible.
Note: R(A,B) x S(B,C) can be written RXS (R.A, R.B, S.B, S.C),
RxS(A,B,B,C) would'nt work due to the duplication of attr names.
therefore one can specify the Cartesian product of a relation with itself,
for R(A,B):
RxR=(R.A, R.B, R'.A, R'.B)
where R' is an alias for R (another name)
The following operations are commutative (x op y) = (y op x)
union, intersection (and product, under the strict definition of a relation
i.e., no order to the attributes)
(but not the difference operation)
The following operations are associative (x op y) op z = x op (y op z)
union, intersection, product
(but not the difference operation)
SPECIAL realtional operators (select, project, join, divide)
Let Θ be any predicate on the attributes
(conjunctive and/or disjunctive combination of simple comparative
operators:
= < > ≤ ≥ )
THETA-SELECTION of Relation R on attributes X and Y
R WHERE (R.X Θ R.Y) is the set of tuples, {t: (t.X Θ t.Y) is true}
ENROLL
.____________.
|S#|C# |GRADE|
|==|===|=====|
|32|89 | 89 |
|32|70 | 91 |
|25|70 | 68 |
|25|62 | 76 |
|32|62 | 62 |
|38|62 | 98 |
|17|51 | 96 |
in SQL: SELECT * FROM ENROLL WHERE ENROLL.C# ≥ ENROLL.GRADE;
.____________.
|S#|C# |GRADE|
|==|===|=====|
|32|89 | 89 |
|25|70 | 68 |
|32|62 | 62 |
When there is no abiguity, may be writen: ENROLL WHERE C# ≥ GRADE
ENROLL WHERE GRADE ≥ 80 produces:
.____________.
|S#|C# |GRADE|
|==|===|=====|
|32|89 | 89 |
|32|70 | 91 |
|38|62 | 98 |
|17|51 | 96 |
Theta extents to any condition, that is, to arbitrary predicates:
BOOLEAN COMBINATIONS OF SIMPLE COMPARISON EXPRESSIONS.
ENROLL WHERE (C#=GRADE AND GRADE ≥ 80)
.____________.
|S#|C# |GRADE|
|==|===|=====|
|32|89 | 89 |
------------
PROJECTION of R on a set of attribs, X..Z is the set of tuples,
{t=(x..z) such that there is a tuple t in R with t.X=x,..,t.Z=z}.
projection is denoted, R[X,..Z] (basically removes columns from R
whereas selection removes rows from R)
(can be implemented by removing all unspecified attributes and
eliminating duplicates)
ENROLL
.____________.
|S#|C# |GRADE|
|==|===|=====|
|32|89 | 89 |
|32|70 | 91 |
|25|70 | 91 |
|25|62 | 76 |
|32|62 | 76 |
|38|62 | 76 |
|17|51 | 76 |
ENROLL[C#,GRADE] In SQl: SELECT C#, GRADE FROM ENROLL
._________.
|C# |GRADE|
|===|=====|
|89 | 89 |
|70 | 91 |
|62 | 76 |
|51 | 76 |
ENROLL[GRADE]
._____.
|GRADE|
|=====|
| 89 |
| 91 |
| 76 |
Note: Many systems which call themselves "relational" have a projection
operator which does not eliminate duplicates
(strictly speaking, then, result is a bag and not a set
and therefore result is not a relation).
For such systems, the result of this projection if the duplicates are
not eliminated is:
._____.
|GRADE|
|=====|
| 89 |
| 91 |
| 91 |
| 76 |
| 76 |
| 76 |
| 76 |
THETA-JOIN of relation R and S is the
set of tuples, t, which are the concatenation of a tuple,
a, from R and a tuple,
b, from S,
such that a Θ b is true.
- Θ-join can be constructed using product, then a Θ selection
- Θ is called the join Condition
- if Θ is "equality" operator attributes with the same domain
then the Θ-join is called the EQUIJOIN
- if Θ is "equality" on all pairs with the same name (and domain),
then the Θ-join is the NATURAL JOIN
STUDENT
.________________.
|S#|SNAME |LCODE |
|==|======|======|
|25|CLAY |NJ5101|
|32|THAISZ|NJ5102|
|38|GOOD |FL6321|
|17|BAID |NY2091|
|57|BROWN |NY2092|
`----------------'
ENROLL
.___________.
|S#|C#|GRADE|
|==|==|=====|
|32|8 | 89 |
|32|7 | 91 |
|25|7 | 68 |
|25|6 | 76 |
|32|6 | 62 |
|38|6 | 98 |
|17|5 | 96 |
-----------
STUDENT JOIN ENROLL WHERE ( STUDENT.S# = ENROLL.S# ) (this is an equijoin)
In SQL: SELECT * FROM STUDENT, ENROLL WHERE STUDENT.S# = ENROLL.S#;
.___________________________________.
|S.S#|S.SNAME |S.LCODE |E.C#|E.GRADE|
|====|========|========|====|=======|
|25 | CLAY | NJ5101| 7 | 68 |
|25 | CLAY | NJ5101| 6 | 76 |
|32 | THAISZ| NJ5102| 8 | 89 |
|32 | THAISZ| NJ5102| 7 | 91 |
|32 | THAISZ| NJ5102| 6 | 62 |
|38 | GOOD | FL6321| 6 | 98 |
|17 | BAID | NY2091| 5 | 96 |
Closest thing to an inverse of the join is the DIVISION operator.
(the join can be thought of as a type of "multiply")
DIVISION: Dividing relation A(X,Y) (degree=m+n) by B(Y) (deg=n)
gives a quotient relation, A/B(X) (deg=m):
___________.<-degrees->._
/ / / \
m n n m
| | | |
v v v v
Given A(X,Y) and B(Y), x is in (A/B) iff (x,y) is in A for every y in B.
or x is in (A/B) iff {x} x B is a subset of A.
(x gets paired (in A) with every y in B)
For instance, if you needed to find all students who took all 3 courses
{5,6,7}, the answer is E/C below.
ENROLL[S#,C#] = E
._____.
|S#|C#|
|==|==|
|25|7 |
|25|6 |
|32|6 |
|32|7 |
|32|6 |
|38|6 |
|25|5 |
COURSE[C#] = C
.__.
|C#|
|==|
|7 |
|6 |
|5 |
E/C
.__.
|S#|
|==|
|25|
RelAlg and SQL
NOTE: relational algebra is procedural in sense that user must
specify "how" solutions are gained,
not just "what" is to be produced.
e.g., DO join, then project, then...