REVIEW:


RELATIONAL DATA STRUCTURE Definitions: 


          Formal relational term   informal term(s)

          relation                 table  or  file
          tuple                    row    or  record
          attribute                column or  field



          The Domain of an attribute is the pool of legal values for that attr.
          and corresponds roughly to notion of data type.

          COMPOSITE DOMAIN: Product of simple domains




RELATION on sets, D1,...Dn consists of a schema (heading) and a body.


   SCHEMA = A set of attribute names, A1,...An, one for each domain
                     (Ai is associated with domain, Di)


   BODY = A set of tuple instances: t:{A1..An} -> UNION(D1..Dn),
                      such that t(Ai) is in Di,  for i=1..n.




   DEGREE OF A RELATION = number of attributes (informally it is
                      the "width" of relation and is constant)



   CARDINALITY OF A RELATION = # of tuples (informally it is the
                      "depth" of relation and it varies over time)




     There are no duplicate tuples (Some DBMSs allows duplicates,
                     but then they're not strictly relational)









To review a few things:

CANDIDATE KEY is a set of attributes, K=(Ai..Ak) from the schema,
                A1..An, which, at all times, satisfies:

1. UNIQUENESS: no 2 distinct tuples have same K-value,

2. MINIMALITY: If any of the Ai..Ak are discarded the uniqueness property no longer holds.



PRIMARY KEY = one particular candidate key designated as primary
               (it is usually chosen as the main look-up or identifier attribute)



ALTERNATE KEY: non-primary candidate key

   For example, the NDSU STUDENT relation may have NAID as Primary key and SSN as alternate key.



An example database (called EDUC):

 STUDENT-FILE       COURSE FILE
.________________. ._____________.
|S#|SNAME |LCODE | |C#|CNAME|SITE|
|==|======|======| |==|=====|====|
|25|CLAY  |NJ5101| |8 |DSDE |ND  |
|32|THAISZ|NJ5102| |7 |CUS  |ND  |
|38|GOOD  |FL6321| |6 |3UA  |NJ  |
|17|BAID  |NY2091| |5 |3UA  |ND  |
|57|BROWN |NY2092| `-------------'
`----------------'

 ENROLL FILE       STATUS FILE
.___________.    ._____________
|S#|C#|GRADE|    |LCODE |STATUS|
|==|==|=====|    |======|======|
|32|8 | 89  |    |NJ5101|  1   |
|32|7 | 91  |    |NJ5102|  1   |
|25|7 | 68  |    |FL6321|  4   |
|25|6 | 76  |    |NY2091|  3   |
|32|6 | 62  |    |NY2092|  3   |
|38|6 | 98  |    `-------------'
|17|5 | 96  |
 ------------
  
Every relation has at least 1 candidate key (it may be whole schema)



FOREIGN KEY = attribute(s) in one relation, each of whose values match
           one in primary key of another relation.
           (both attributes must have the same domain)


           ENROLL.C# is foreign key to COURSE.C#





INTEGRITY CONSTRAINTS FOR RELATIONAL DATABASEs



The following two Integrity Constraints are almost always specified.


ENTITY INTEGRITY: no attribute value in a primary key can by null.


REFERENTIAL INTEGRITY: IF S.B (attribute, B, in the relation, S) is a
      foreign key attribute to R.A  (primary key in relation, R) then
      every S.B-value must appear in R.A or be wholly null.




Maintaining referential integrity:


1. automatically cascading of deletes [updates] to the foreign
   key reference when corresponding primary key value
   is deleted [updated].


2. restrict primary key deletes to cases where there's no
   foreign key reference

3. automatically set foreign key referencing values to null.



END REVIEW
----------





RELATIONAL ALGEBRA operators:

Traditional operators:
    union,
    intersection,
    difference,
    Cartesian product



Two relations are UNION COMPATIBLE if they have same degree and the same domains
     - Assume relations, A and B, below are union compatible.
     - R.A refers to attribute, A, of relation, R





UNION of A and B (A ∪ B) is the set of all tuples which are
                     either in A or in B (or in both).


      SELECT emp.e#   FROM  emp  WHERE  emp.title=manager
      UNION
      SELECT emp.e#   FROM  emp  WHERE  emp.title=coach;

      equivalently

      SELECT emp.e#   FROM  emp
      WHERE  emp.title=manager OR  emp.title=coach;







INTERSECTION A and B  (A ∩ B)  is the set of tuples which
                               are both in A and in B.

      SELECT emp.e#   FROM  emp  WHERE  emp.title=manager
      INTERSECTION
      SELECT emp.e#   FROM  emp  WHERE  emp.title=coach;

      equivalently

      SELECT emp.e#   FROM  emp
      WHERE  emp.title=manager AND  emp.title=coach;


i.e., can use INTERSECTION SQL operator or
              AND logical operator (within SQL WHERE clause)









DIFFERENCE between A and B  (A-B)  is the set of tuples
                                   in A but not in B.

      SELECT * FROM emp
      EXCEPT SELECT * FROM emp WHERE emp.title=manager







CARTESIAN PRODUCT of C and D (CxD) = set of all concatenations of a tuple
                              from C with a tuple from D.


      SELECT emp.*, dept.* FROM emp, dept;



      AxB  does not require that C and D to be union compatible.



      Note: R(A,B) x S(B,C) can be written   RXS (R.A, R.B, S.B, S.C),
            RxS(A,B,B,C) would'nt work due to the duplication of attr names.



      therefore one can specify the Cartesian product of a relation with itself,

      for R(A,B):

      RxR=(R.A, R.B, R'.A, R'.B)

      where R' is an alias for R (another name)








The following operations are commutative   (x op y) = (y op x)

    union, intersection (and product, under the strict definition of a relation
                                      i.e., no order to the attributes)

                      (but not the difference operation)






The following operations are associative    (x op y) op z = x op (y op z)  

    union, intersection, product

                       (but not the difference operation)






SPECIAL realtional operators (select, project, join, divide)


   Let Θ be any predicate on the attributes

       (conjunctive and/or disjunctive combination of simple comparative
        operators:

        =  <  >  ≤  ≥ ) 





THETA-SELECTION of Relation R on attributes X and Y


    R WHERE (R.X Θ R.Y)   is the set of tuples, {t: (t.X Θ t.Y) is true}


 ENROLL
.____________.  
|S#|C# |GRADE| 
|==|===|=====|
|32|89 | 89  |    
|32|70 | 91  |   
|25|70 | 68  |  
|25|62 | 76  | 
|32|62 | 62  |
|38|62 | 98  |
|17|51 | 96  |
  


in SQL:  SELECT *   FROM ENROLL   WHERE  ENROLL.C# ≥ ENROLL.GRADE;
.____________.  
|S#|C# |GRADE| 
|==|===|=====|
|32|89 | 89  |    
|25|70 | 68  |  
|32|62 | 62  |


When there is no abiguity, may be writen: ENROLL WHERE  C# ≥ GRADE 




ENROLL WHERE  GRADE ≥ 80  produces:
.____________.  
|S#|C# |GRADE| 
|==|===|=====|
|32|89 | 89  |    
|32|70 | 91  |   
|38|62 | 98  |
|17|51 | 96  |



Theta extents to any condition, that is, to arbitrary predicates:
 
      BOOLEAN COMBINATIONS OF SIMPLE COMPARISON EXPRESSIONS.




ENROLL WHERE (C#=GRADE AND GRADE ≥ 80)
.____________.  
|S#|C# |GRADE| 
|==|===|=====|
|32|89 | 89  |    
 ------------





PROJECTION of R on a set of attribs, X..Z is the set of tuples,


      {t=(x..z) such that there is a tuple t in R with t.X=x,..,t.Z=z}.




      projection is denoted,  R[X,..Z]   (basically removes columns from R
                                        whereas selection removes rows from R)



      (can be implemented by removing all unspecified attributes and
       eliminating duplicates)



 ENROLL
.____________.  
|S#|C# |GRADE| 
|==|===|=====|
|32|89 | 89  |    
|32|70 | 91  |   
|25|70 | 91  |  
|25|62 | 76  | 
|32|62 | 76  |
|38|62 | 76  |
|17|51 | 76  |
  

ENROLL[C#,GRADE]   In SQl:    SELECT C#, GRADE  FROM ENROLL
._________.  
|C# |GRADE| 
|===|=====|
|89 | 89  |    
|70 | 91  |   
|62 | 76  | 
|51 | 76  |

ENROLL[GRADE]
._____.  
|GRADE| 
|=====|
| 89  |    
| 91  |   
| 76  | 


Note:  Many systems which call themselves "relational" have a projection
       operator which does not eliminate duplicates

       (strictly speaking, then, result is a bag and not a set 
                          and therefore result is not a relation).



       For such systems, the result of this projection if the duplicates are
                         not eliminated is:

._____.  
|GRADE| 
|=====|
| 89  |    
| 91  |   
| 91  |  
| 76  | 
| 76  |
| 76  |
| 76  |




THETA-JOIN of relation R and S is the


    set of tuples, t, which are the concatenation of a tuple,

                  a, from R and a tuple,
                  b, from S,

    such that  a Θ b  is true.





    - Θ-join can be constructed using product, then a Θ selection

    - Θ is called the join Condition

    - if Θ is "equality" operator attributes with the same domain
      then the Θ-join is called the EQUIJOIN
 

    - if Θ is "equality" on all pairs with the same name (and domain),
      then the Θ-join is the NATURAL JOIN
                          


 STUDENT
.________________. 
|S#|SNAME |LCODE | 
|==|======|======|
|25|CLAY  |NJ5101| 
|32|THAISZ|NJ5102| 
|38|GOOD  |FL6321| 
|17|BAID  |NY2091| 
|57|BROWN |NY2092|
`----------------'

 ENROLL
.___________.
|S#|C#|GRADE|
|==|==|=====|
|32|8 | 89  |
|32|7 | 91  |
|25|7 | 68  |
|25|6 | 76  |
|32|6 | 62  |
|38|6 | 98  |
|17|5 | 96  |
 -----------
  



STUDENT JOIN ENROLL WHERE ( STUDENT.S# = ENROLL.S# )   (this is an equijoin)

In SQL: SELECT *    FROM STUDENT, ENROLL    WHERE STUDENT.S# = ENROLL.S#;
.___________________________________. 
|S.S#|S.SNAME |S.LCODE |E.C#|E.GRADE|
|====|========|========|====|=======|
|25  |  CLAY  |  NJ5101| 7  |  68   |
|25  |  CLAY  |  NJ5101| 6  |  76   |
|32  |  THAISZ|  NJ5102| 8  |  89   | 
|32  |  THAISZ|  NJ5102| 7  |  91   | 
|32  |  THAISZ|  NJ5102| 6  |  62   | 
|38  |  GOOD  |  FL6321| 6  |  98   |
|17  |  BAID  |  NY2091| 5  |  96   |





        Closest thing to an inverse of the join is the DIVISION operator.

                (the join can be thought of as a type of "multiply")
              



DIVISION: Dividing relation A(X,Y) (degree=m+n) by B(Y) (deg=n)
                  gives a quotient relation, A/B(X) (deg=m):


          ___________.<-degrees->._
         / /        /              \
        m n        n                m
        | |        |                |
        v v        v                v

Given A(X,Y) and B(Y),    x is in (A/B)    iff (x,y) is in A for every y in B.



    or  x is in (A/B)    iff {x} x B  is a subset of A.

       (x gets paired (in A) with every y in B)





For instance, if you needed to find all students who took all 3 courses
                                       {5,6,7}, the answer is E/C below.

ENROLL[S#,C#] = E
._____. 
|S#|C#|
|==|==|
|25|7 |
|25|6 |
|32|6 | 
|32|7 | 
|32|6 | 
|38|6 |
|25|5 |


COURSE[C#] = C
.__. 
|C#|
|==|
|7 |
|6 |
|5 |


E/C
.__.
|S#|
|==|
|25|


RelAlg and SQL


NOTE: relational algebra is procedural in sense that user must
 specify "how" solutions are gained,
 not just "what" is to be produced.


e.g.,   DO join, then project, then...