MATHEMATICAL CONCEPTS


Some of the mathematical concepts need for this course are collected here.


The following diagram attempts to show the IS-A relationships among some of these concepts.


                                   Cartesian Product 
                                           |
                                           |
                                        Relation 
                                       /        \ 
                                      /          \ 
                                     /            \ 
                           RelationExtension    RelationIntension 
                             /        \              /        \
                            /          \            /          \
                           /            \       Schema    FunctionalDependency
                          /              \             
                         /                \
Similarity            BinaryRelation       N-aryRelation 
  ¦  \               /        ¦  \  \                                         
  ¦   \             /         ¦   \  \                                     
  ¦   Distance     /          ¦    \  \_____________________
  ¦               /           ¦     \                        \
  ¦              /            ¦      \                        \
  ¦   ReflexiveBinaryRelation ¦        1-ManyRelationship      Many-ManyRelation
  ¦           /          \    ¦                          \
  ¦          /            \__ ¦ _                         \
   \        /                 ¦  \                         \_____
    \      /                  ¦   \                              \
  .-EquivalenceRelation---.   ¦   .PartialOrderRelation-.         \
  |                       |   ¦   |     ¦               |______    \
Function(Classifier)      |   ¦   |     ¦               |       \   \ 
  |           UndirectedGraph ¦   |  Lattice            |       Ontology
  |                       |   ¦   |                     |    (ConceptHierarchy)
  `-Partition(Clustering)-'   ¦   `-DirectedAcyclicGraph'        /        \
          ¦                   /                                 /          \
          ¦                  /                                 /            \
          ¦                 /                               PartOf          IsA
        Query              /                                                /
       /  ¦  \            /                                                /
      /   ¦   \       ___/                                                /
     /    ¦    \     /                                                   /
Select   OLAP   Join                                                    /
Project   / \                                                          /
 \       /   \           _____________________________________________/
  \     /     \         /
   \   /       \       /
   Slice        \     /
   Dice         Rollup





For purposes of these notes, we define (somewhat simplified):



A TUPLE on Domains (sets), A1..An, is an n-set (set with n elements),
{a1..an}, SuchThat : ForAll i, ai IsIn Ai



A RELATION, R(A1..An) on Domains, A1..An is a set of n-tuples on those domains.

One can model R as a subset of A1×..×An (Multidimensional Model or

one can model R as a set of A1..An tuples (Horizontal Model or

one can model R as any lossless collection of predicate maps from a key for the tuple set to {0,1} (Vertical Model).




A BINARY RELATION, R(A,B), on {A,B} can be modeled as a bipartite graph, G = ( A DisjointUnion B , R )
(i.e., the node set is the disjoint union of A and B and the edge set is R).

It is usually revealing to use a scatter plot (of (a,b) pairs in the A-B plane)
than connecting the a on the A-line with b on the perpendicular B-line.





An N-ARY RELATION, R(A1..A) can be modelled as an n-partite graph, G=(A1 DisjointUnion ... An, R).
(i.e., node set = {DisjointUnion{A1..An} and edge set = R, where an edge is an n-sided polygon).

Again it is often more revealing to display the edges as scatter plot points in A1×..×An
instead of displaying them as convex hulls of axis positions (i.e., displaying edge (a1...an)
as the convex hull connecting axis position a1 on axis A1, ..., , axis position an on axis An;
i.e., as the polyhedron connecting the n axis positions.)

A 3rd alternative is a slalom plot in which the n axes are positioned as side-by-side vertical lines
and the positions, ai on axes Ai are connected (i.e., each edge is represented by a line graph).




A FUNCTION, f, with domain D and range R, is a Relation on {D,R} SuchThat :  ForAll d in D ThereExist (d,r) in f
and ForAll {d1,r1} and {d2,r2} in f, if d1 = d2 then r1 = r2.





A PARTIAL ORDER is a Reflexive Binary Relation, R, on S SuchThat 
              ForAll s in S,   (s,s) IsIn R                (reflexive)
              if (s,t) and (t,s) AreIn R then s=t          (antisymmetric)
              if (s,t), (t,u) AreIn R then (s,u) IsIn R    (transitive)





A LINEAR ORDER, R, on S is a Partial Order SuchThat  ForAll s NotEqualTo t in S, (s,t) or (t,s) IsIn R 



A RESTRICTION, L'=(S',«), of a partial order, L=(S,«) is a POset  SuchThat
               if S'  IsSubSetOf  S and if  ForAll a,b  in S' then a « b  iff  a « b



A PREFIX L' of L  is a restriction  SuchThat  if a IsIn S'  then all S-predecessors of a are in S'.
          (Prefix closed under the predecessorship operation).

          The definition says something about « and also about S'.






A SIMILARITY on a set, S, is a function, sim:S×S --- LOS (a linearly Ordered Set) SuchThat :
              sim(x,y) = sim(y,x)     ForAll x,y in S  and
              sim(x,x) = MaxVal(LOS)  ForAll x in S


&ni:



An EQUIVALENCE RELATION is a Reflexive Binary Relation, R, on S  SuchThat :
              (s,s) IsIn R  ForAll s in S                            (reflexive)
              if (s1,s2) IsIn R  then (s2,s1) IsIn R                  (symmetric)
              if (s1,s2) and (s2,s3) AreIn R  then (s1,s3) IsIn R     (transitive)

Note that for a transitive similiarity with range {0,1}, (s1,s2) IsIn R iff sim(s1,s2)=1.




(S,R) is a LATTICE if R is a partial Order on S and  ForAll s1&nes2 in S ThereExist s3 in S  SuchThat:
       (s1,s3) IsIn S and (s2,s3) IsIn S   (i.e., every pair has an upper bound)





A PARTITION, P={C1..Cn}, of S is a subset of 2S = {s| s IsASubsetOf  S} with

             mutually exclusion    ( ForAll i NotEqual j Ci Intersection Cj = empty ) and
             collective exhaustion ( Union i=1..n Ci = S ).



A LABELED_PARTITION is SuchThat : ForAll Ci ThereExist a label assigned to it (from some label space, L)




A PARTITION LATTICE of S is lattice ordering of all Partitions of S under the
                    ordering of sub-partitions, where a sub-partition, Q, of a
                    partition, P, is SuchThat  every component of Q is a component of P (only one).



A CONCEPT HIERARCHY of an attribute is Partition Lattice of that attribute
                     (any user defined concept hierarchy based on some domain knowledge,
                      is a sub-Lattice of this Lattice).



An ONTOLOGY is a controlled vocabulary and encyclopedia of all concept hierarchies
             taken over an entire cohesive collection of related concepts.



A CLUSTERING is good Partition from the Partition Lattice of R    (good = SuchThat :
                     intra-component pairs are very similar  and
                     inter-component pairs are very dissimilar).



CLASSIFICATION is assigning a class value to an unclassified tuple
                based on the current state of a Training Set.

(It assumes Training Set is sufficiently developed to have all relationships expected.)
Usually a Model is developed from the Training Set to do the class label assignment, which
can be thought of as a function which relates non-class attribute values to class label values

(E.g., Nearest Neighbors Vote; Decision Tree Model, Bayesian Model; Neural Network Model...).




An EAGER CLASSIFIER (e.g., Decision Tree, Bayes...) builds a model to represent
	             the feature-to-class information found in the Training Set.

(Can be viewed as selecting a partition of T &ni: whose Component Class Histograms
are SuchThat : maximal class is sufficiently more populous than the next highest).




A SAMPLE-BASED or LAZY CLASSIFIER (e.g., K-Nearest-Neighbor) builds no model ahead of time,
                but the entire Training Set is employed for each classification.

(Can be viewed as finding a neighborhood around the unclassified tuple
SuchThat : the Class Historgram is sufficiently discriminatory.)


(Rings eminating out from the sample can be given decreasing vote values for a
better vote. (PINE method))




RULE MINING is discovering good antecedent&rArrconsequent relationships in 2T.



The Antecedent and Consequent sets can be defined as the TRUE SETS of tuple-predicates.

To reach the full generality of what rule mining encompasses, we allow the training table
  to be rolled up to any level in any of its domain concept hierarchies (ontologies).

Thus, wrt the item-transaction terminology, itemsets are any collection of nodes,
  at most one from any given concept hierarchy.

These nodes are sets of tuples (the set of tuples containing that node in that attribute).

Therefore the predicates are set-containment predicate wrt the original training table,
  an item is a product of domain subsets, at most one per domain.

In Market Basket Research, where every domain is {0,1}, usually, the only allowable
  antecedent-consequent subsets are the {1} subsets.

Thus, we can fully specify an itemset but simply specifying the attributes (items) involved
  (so it is a schema-level specification).

A Functional Dependency in the Training Table is an intention level specification that within
  given disjoint antecedent and consequent attribute sets exactly 1 100% confidence rule holds
  &forall antecedent item,  not only for the training set (current state) but also &forall future training
  states as well (intentional rule)







GRAPHS   A good Online Graph Theory Text





An UNDIRECTED GRAPH is a pair,  G = (N,E)  SuchThat  :

                     N = set of nodes

                     E = set of 2-sets of nodes called edges.


Note that this definition does not allow a reflexive edge, (x,x).
If one wishes to allow such loop edges one needs to replace 2-set with 2-bag.

We diagram {n1,n2} in E as  n1 -- n2




A PATH in G, (n1,n2..nk) is a sequence of nodes SuchThat : {ni,ni+1} in E   ForAll i=1..k-1


A Graph is CONNECTED if there is a path connecting every pair of nodes.




   G'= (N',E') is a   SUBGRAPH of G if   N' IsContainedIn N    and    E' IsContainedIn E.

   G'= (N',E') is the SUBGRAPH of G INDUCED BY N' if N' IsContainedIn N  and  E = { {s,d} SuchThat E | s and d in N' }

   G'= (N',E') is the CONTRACTION of G INDUCED BY Partition, N'
                      (n is partition component containing n)
                      E' = { {n1,n2} | {n1,n2} in E and n1 NotEqualTo  n2 in N' }






A PARTITION of G is a set of subgraphs of G G1=(N1,E1), G2=(N2,E2) ... Gk=(Nk,Ek)
             where Ei IsContainedIn E  and   DisjointUnionOf{Ni} = G.


         Each Gi is called a COMPONENT of the partition.




         A Graph G = (N,E)
 
        a                b
        O----------------O
        /\              /|
       /  \            / |
      /    O g        /  |
     /     |\        /   |
    /      | \      /    |
   /       |  \    /     |
  /        |   \  /      |
 /         |    \/       |
O----------O    O--------O
f          e    d        c
              
 N = { a,b,c,d,e,f,g }  
     
E = { {a,b},{a,f},{a,g},{b,c},{b,d},{c,d},{d,g},{e,g},{e,f}  }                                    
 
           Unconnected Graph

           b              f 
           O              O    
           |\             |  
           | \            |   
           |  \           |
           |   O c        | 
           |  /           | 
           | /            O 
           O              e  
           d       
              
 


A HYPERGRAPH is a pair, G=(N,E) of disjoint sets, here E={nonempty subsets of N}



A DIRECTED GRAPH (or digraph), is a graph, G=(N,E) SuchThat : ThereExist maps: Init:E -- N  Term:E -- N
                  (e in E is directed from Init(e) to Term(e)
                                    



The sequence, n1..nk, is a PATH from n1 to nk if (ni, ni+1) in E for i=1..k-1






A Graph is 


TRIVIAL if ni=nj  i,j=1,...,k
                                                
SIMPLE if all ni1 and/or nk.

CYCLE is simple nontrivial path with n1=nk (simple closed path).

MINIMAL if for nodes ni and nj in the cylce and edge (ni,nj) in E
           (ni,nj) is in the cycle


                   A Digraph
                a                b            
                O-------------- >O              
                ^^              /|         
               /  \            / |         
              /    O g        /  |         
             /     ^^        /   |        
            /      | \      /    |         
           /       |  \    /     |             
          /        |   \  /      |          
         /         |    \V       V                
        O< --------O    O< ------O             
        f          e    d        c                
              
    N = { a,b,c,d,e,f,g }  
     
    E = { (a,b),(b,c),(c,d),(d,g),(b,d),(e,g),(e,f),(f,a),(g,a)  } 



DIRECTED ACYCLIC GRAPH (DAG) digraph containing no cycles.


SOURCE = node with outgoing but no incoming edges.


SINK   = node with incoming but no outgoing edges.


ROOTED DAG = dag with unique source.

   If (a,b) in E, a is called a PARENT of b and b is called a CHILD of a.

   For a path from a to b, a is an ANCESTOR   of b
                      and  b is a  DESCENDANT of a.

   If b &ne a, a is a PROPER ANCESTOR   of b and b is a PROPER DESCENDANT of a.

   If b &ne a, and neither is an ancestor of other then they are UNRELATED.



A TOPOLOGICAL SORT of a digraph, G, is a sequence of all of the nodes of G SuchThat
       if a appears before b in the sequence, then there is no path from b to a.




Proposition A.1:  A digraph can be topologically sorted iff it is a dag.





TRANSITIVE CLOSURE: if G=(N,E) is a digraph and G+=(N,E') is SuchThat

(a,b) in E' iff there is a nontrivial path from a to b in G then G+ is a transitive closure of G.




Proposition A1.5:  ForAll G there is one and only one transitive closure

          i.e., G -- G+ is a function.

          The transitive closure is obtained by including in E',
          (ni,nj) for every non-trivial path in G from ni to nj.





Proposition A.2: G+ is a dag iff G is a dag.




A dag is TRANSITIVELY CLOSED iff G=G+.



A TREE is a rooted dag with unique path from root to each node.



 

A Partial order (PO) is: ( S, < ) where S is called the DOMAIN, and the binary relation, LessThan, is
   irreflexive:  a NotLessThan a 
   transitive:   if a LessThan b  and  b LessThan c  then  a LessThan c




A Partial Order L'=(S',LessThan') is a RESTRICTION of L = ( S, LessThan ) on domain L'
   if S' SubsetOf  S and
   if  ForAll a,b in S',  a LessThan' b  iff  a LessThan b




L'is a PREFIX of L (L' LessThanOrEqual L) if L' is restriction and a in S' then all S-predecessors of a in S'.
 
A Prefix is closed under the "predecessorship" operation.
    The definition says something about LessThan' and also S'.





Often in a DAG we only show non-transitive edges,
      i.e., if a « and b « c we do not include the edge (a,c).




Note that, given a POSet, L, we often diagram it as a DAG using the duality just described.

However, since a diagram is intended to help visualization, and therefore should not be cluttered,
 we generally do not include all edges (We don't display the closure - in fact, usually we
 display the minimal dag that corresponds to the POSet).




QUESTION:  Is it correct to say THE mininmal dag here?

Is this theorem true?  Given a dag, G=(S,E), ThereExistUnique  subdag G'=(S,E') SuchThat
 C(G') = C(G) and if G''=(S,E'') is a subdag of G'  SuchThat  C(G'') = C(G) then G'' = G'.







DUALITIES:




 PARTITION   		IsEquivalentTo    FUNCTION   IsEquivalentTo  
 EQUIVALENCE RELATION   	IsEquivalentTo   UNDIRECTED GRAPH  


Assuming a Partition has uniquely label components
(that would be required for unambiguous reference).



The   Partition-Induced              Function   takes a point to the label of its component.

The   Function-Induced               Equivalence Relation   equates a pair iff they map to the same value.

The   Equivalence Relation-Induced   Undirected Graph   has an edge for each equivalent pair.

The   Undirected Graph-Induced       Partition   is its connectivity component partition.






 PARTIAL ORDERED SET    IsEquivalentTo    DIRECTED ACYCLIC GRAPH  



              
The   Directed Acyclic Graph-Induced  Partially Ordered Set   contains ( s1, s2 ) iff it is an edge in the closure.

The   Patially Ordered Set-Induced   Directed Acyclic Graph   contains ( s1, s2 ) iff it is in the POSET.