Data Modelling


Addtional information can be found at:
 Data Modeling
 ERA Model
 ER Model

ENTITY-RELATIONSHIP MODEL (ER) is a high-level conceptual data model.

It is used to "get an idea" of how the data is to be organized in a company or enterprise.

Thus, in consultation with the creators and users of the data,
 ER diagrams provide a graphical picture of the data and data relationships
 and constraints as it is used by the organization.

IN ER modeling,

an ENTITY is a "Thing" in the real world with an independent existence.
  physical existence:    person,  car, house,  employee	
  conceptual existence:  company, job, course



We will use a COMPANY as an example in this discussion.  The COMPANY is described
to us as follows:



1. The company is organized into depts - each with a name, number, manager.
   Each manager has a startdate.
   Each department can have several locations.

2. Departments control projects - each with a name, number, location.

3. Each employee has a name, SSN, sex, address, salary, birthdate, dept, supervisor.

   An employee may work on several projects (not necessarily all controlled by his dept)
   for which we keep hoursworked by project.

4. Each employee dependent has a name, sex, birthdate and relationship.




In ER diagrams, entities are represented graphically in boxes:
 ________   _________   ____   _______
|EMPLOYEE| |DEPENDENT| |DEPT| |PROJECT|
|________| |_________| |____| |_______|



An attribute (or property) of an entity describes that entity.


An ENTITY has a TYPE.   It includes the entity name and the list of its attributes.

The ENTITY TYPE SCHEMA describes the common structure shared by all entities
 of that type.  e.g.,  Project (Name,Num,Loc,ControlDept)


Note:  Entity Type does not change often - very static.


An ENTITY INSTANCE is an individual occurrence of an entity of a particular type
   at a particular time.

e.g,,  (Dome, 46, 19 Ave N & Univ, Athletics)
       (IACC, 52, Bolley & Centennial, C.S.)
       (Bean Res,31,12 Ave N & Federal,P.S.)
        . . .

Note:  Entity instances get added and even changed often.
       They are very dynamic.



In ER modeling, ATTRIBUTES are written next to the Entity they describe - usually
something like the following:


Name---------------- .                      Name-------------- .
Number---------------|   ____________       Number-------------|   _________
Locations------------|--|_DEPARTMENT_|      Location-----------|--|_PROJECT_|
Manager--------------|                      ControlDepartment--' 
ManagerStartDate-----'
                        .-- Name
 ____________ 		|---SSN                             .- Employee
|__EMPLOYEE__|----------|---Sex             ____________    |--DependentName
                        |---Address        |__DEPENDENT_|---|--Sex
                        |---Salary                          |--BirthDate
                        |---BirthDate                       `--Relationship
                        |---Department
                        |---Supervisor
                        `---WorksOn



There are many categories of attribues:


COMPOSITE ATTRIBUTE are attributes that are subdivided into smaller parts
                    with independent meaning.

   e.g., the Name attribute of Employee may be subdivided into FName, Minit, LName.

         To indicate Name is so subdivided, we use:      Name (FName, Minit, LName)

   Also, WorksOn may be a composite of Project and Hours:  WorksOn (Project, Hours)


SINGLE-VALUED ATTRIBUTE: one value per entry.


MULTIVALUED ATTRIBUTE have multiple values per entry:

   eg, Locations (as an attribute of Department - since a Department
       can have multiple locations)

       To indicate that Locations is a Multivalued Attribute, we use   {Locations}
       WorksOn may also be mutlivalued (as well as being composite):
                                                      { WorksOn (Project, Hours) }



A DERIVED ATTRIBUTE is an attribute whose value can be calculated from other
                    attribute values.
                    eg, Age can be calculated from BirthDate and CurrentDate.



KEY ATTRIBUTE: Each value can occur at most once. (has the uniqueness property)
               Used to identify entity instances. We will * key attribute(s).



ATTRIBUTE DOMAIN: Set of values that may be assigned (also called Value Set).



Thus the Preliminary Design of Entity Types for the COMPANY database is.

*Name--------------- .                      *Name------------- .
*Number--------------|   ____________       *Number------------|   _________
{Locations}----------|--|_DEPARTMENT_|      Location-----------|--|_PROJECT_|
Manager--------------|                      ControlDepartment--' 
ManagerStartDate-----'
                        .-- Name(FName,Minit,LName)
 ____________ 		|---*SSN                            .--Employee
|__EMPLOYEE__|----------|---Sex             ____________    |--*DependentName
                        |---Address        |__DEPENDENT_|---|--Sex
                        |---Salary                          |--BirthDate
                        |---BirthDate                       `--Relationship
                        |---Department
                        |---Supervisor
                        `--{WorksOn(Project,Hours)}



RELATIONSHIPS among entities E1,,En express relationships among them:

Relationships have RELATIONSHIP TYPEs (consisting of the names of the entities and
the name of the relationship).
                                                       
A Relationship type diagram for a relationship between EMPLOYEE and DEPARTMENT
                            called "WorksFor" is as follows: 

           ________    . - ` - .   __________
          |EMPLOYEE|-< WorksFor >-|DEPARTMENT|
                      `- . _ .- '


The RELATIONSHIP INSTANCEs for the above relationship might be, for instance:

         ( John Q. Smith, Athletics )
         ( Fred T. Brown, Comp. Sci.)
         ( Betty R. Hahn, Business  )
	   .  
	   .  
	   .  



RELATIONSHIP DEGREE: Number of participating entities (usually 2)


ROLE of an Entity Type in a Relationship:
     If an entity participates more than once in the same relationship,
     role names are needed to distinguish the multiple participations.

(eg, Supervisor, Supervisee in the Supervision relationship)

     - Called Recursive Relationships.
     - Unnecessary if entity types are distinct.



CONSTRAINTS on a relationship.


The CARDINALITY CONSTRAINT can be 1:1, N:1 or N:M,

        1:1  MANAGES    (EMPLOYEE, DEPARTMENT)

        N:1  WORKS_FOR  (EMPLOYEE, DEPARTMENT)

        N:M  WORKS_ON   (EMPLOYEE, PROJECT)



The PARTICIPATION CONSTRAINT (for an entity in a relationship)
                             can be Total, Partial or Min-Max

       Total:   Every EMPLOYEE  WORKS_FOR  a DEPARTMENT

       Partial: Not every EMPLOYEE  MANAGES  a DEPARTMENT



RELATIONSHIP can have ATTRIBUTES (properties) as well:
                  eg, Hours for WORKS_ON Relationship,
                  Manager_Start_Date  in  MANAGES  relationship.



In the COMPANY example, there are 6 relationships;

cardinality
1:1         MANAGES   (EMPLOYEE, DEPARTMENT)
                       partial   total          <-participation

1:N         WORKS_FOR (DEPARTMENT, EMPLOYEE)
                       total       total

M:N         WORKS_ON  (EMPLOYEE, PROJECT)
                       total     total

1:N         CONTROLS  (DEPARTMENT, PROJECT)
                       partial     total

                         supervisor supervisee  <-role
1:N         SUPERVISION (EMPLOYEE,  EMPLOYEE)
                         partial    partial

1:N         DEPENDENTS_OF(EMPLOYEE, DEPENDENT)
                          partial   total


COMPANY Entity-Relationship Diagram showing the Schema:

(double connecting lines means "total" participation while single line means partial.)


                         1 .- '` - .
                    ..====( MANAGES )
                    ||     ` - ._.-'
                    ||  .- - - -.  |1
                    || (WORKS_FOR) |
                    ||  `- ._. -|| |
*Name-----------.   ||   1||   N|| |
*Number---------|  _||____||_   || |
{Locations}-----|-|DEPARTMENT| // /
number_employees' `----------'// /
                  /          // /
                 /          // /
      .- ` -    /1         // /
   ..(CONTROLS)'          // /
  N|| `-._ .-'           // /
   ||                   // /
   ||    . - ` - .     // /
   ||   (SUPERVISE)   // /
   ||    `-._ .- '   // /
   ||     1|  |N    // /
   ||      |__|____//_/ .Name(FN,Mi,LN)
   ||      | EMPLOYEE |-|-*SSN
   || 	   `----------' |-Sex
   || 	        //  |   |-Address
   ||  Hours.  //   |   |-Salary
   \\   .---`-// M  |   `-BirthDate
    \\ (WORKS_ON)   |
     \\ `-._ .-'   1|
      \\    N ||  .- - - ^ - - -.
       \\     || (  Dependent_0f )
        \\    ||  `- - - __  - -'
         \\   ||         ||N
*Nane---. \\__||___.     ||
*Number-|--|PROJECT|     ||
Location'  `-------'     ||
                         ||
*DependentName----. .====||=======.
Sex---------------|-|| DEPENDENT ||
BirthDate---------| `============='
Relationship------'



***WE WILL STOP HERE IN THESE NOTES ***********************************

The Entity Relationship Model (ER) described above is for "traditional"
database applications such as data processing in business and industry.

Since the 1980's newer applications of database technology have become
commonplace. ; these include databases for engineering design and manufacturing
(CAD/CAM), telecommunications, images and graphics, multimedia (storing pictures,
voice messages and video clips...), data mining, data warehousing, geographic
information systems (GIS), and databases for indexing the WWW, among others.
These types have more complex requirements.  To represent these requirements as
accurately and clearly as possible, designers must use additional concepts.
This has lead to Enhanced Entity Relationship (EER) Models.  Chapter 4 of
the text treats one particular EER.


ENHANCED ENTITY RELATIONSHIP (EER) MODEL

EER includes all the concept of the ER model above and concepts of subclass,
superclass, specialization, generalization and category.

EER also includes the mechanisms of attribute and relationship inheritance.

EER Concepts:

SUBCLASS of an entity type.

 "Entity type" is used to represent both a type of entity and the entity
        set or collection of entities of that type that exists in the DB.

        eg, EMPLOYEE describes type and current set of EMPLOYEE entities.
            EMPLOYEEs could be further grouped into SEC, ENG, MGR, TECH, etc.,
            where each is a subclass of EMP and EMP is a superclass of each.

       We call the relationship, the CLASS/SUBCLASS (or ISA) relationship.

 Subclasses need not be exhaustive of their superclass (not every superclass
                                  member needs to be in a subclass).

 Subclasses need not be exclusive within their superclass (superclass members
                                  may be in several subclasses).

 Subclass members inherit all attributes of the superclass (type inheritance).
 Subclass members inherit all relationships of superclass (relationship inheritance).

 Subclass with its own attributes and relationships along with the attributes
         and relationships of its superclass can be an entity type of its own.

SPECIALIZATION is process of defining a set of subclasses of an entity type (called
the superclass of the specialization).  Subclasses are defined on the basis of some
distinguishing characteristic (eg, EMP, above, are specialized in terms of jobtype).

There are two reason for including class/subclass relationships and specializations;

1.  Certain attributes may apply to some but not all of the class entities.  A subclass
is defined in order to group the entities to which these attributes apply (eg, SEC's
have TypingSpeed attr but other EMP's don't).

2.  Some relationships apply only to subclass entities (eg, ENG's belong to the IEEE
engineering society but SEC's and TECH's don't).

If a condition placed on an attribute of the superclass entities determines the subclass
(eg, JobType of EMP for SECR) then we say the subclass is "PREDICATE DEFINED" and the
 predicate is called the "DEFINING PREDICATE" of that subclass (otherwise it is a
"User Defined" subclass).

If all subclasses in a specialization are predicate defined on the same attribute,
the specialization is called an "ATTRIBUTE-DEFINED SPECIALIZATION" and the attribute
is called the "DEFINING ATTRIBUTE" of the specialization (or discriminator in UML).

Two other constraints apply to specialization, disjointness and completeness.

The DISJOINTNESS constraint specifies that the subclasses of the specialization must
be disjoint (and then we put a "d" within the circle in the diagram).

The COMPLETENESS constraint may be total or partial.  A TOTAL specification constraint
specifies every entity in the superclass (and then we use double lines in the diagram).
A PARTIAL specification is any non total (and we use single lines in the diagram).

Hence there are four possible constraints on specialization:
Disjoint, total
Disjoint, partial
Overlapping, total
Overlapping, Partial

A superclass identified through generalization is usually total.

There may be several levels of specialization (forming a hierarchy (in which every
subclass is in only one class/subclass relationship) called a HIERARCHY SPECIALIZATION or
forming a lattice, then called a LATTICE SPECIALIZATION).

A LEAF NODE is a node in the hierarchy or lattice which has no subclasses.

A subclass with more than one superclass is called a SHARED SUBCLASS (Note some other
models do not allow).

There are, similarly, the notions of GENERALIZATION HIERARCHIES AND LATTICES.

We note that successive specialization amounts to a top-down conceptual refinement process
and successive generalization amounts to a bottom-up conceptual synthesis process.

            __________  .Name(FN,Mi,LN)
           | EMPLOYEE |-|-*SSN
      	   `----------' |-Sex
      	        /   |   |-Address
               /    |   |-Salary
              /     U   `-BirthDate
             /       \          .
            /       __\__    . ' ` .    _______
          .-.      | MGR |----|PROJECT|
         ( d )     `-----\   `.   .'   `-------'
       /  `-' \           \    `.'
      /    |    \          \
     U     U      U         \
.___/_.  ._|_.    ._\__.     \
|SECR |  |ENG|    |TECH|      \
.-----'  `--.\    `---.'       \
:_Typing  | : \       :_TGrade  \
  Speed   | :  \____________     \
          | :               \     \
          | :_EngType        U     U
          |                   \_____\___
          |                   | ENG-MGR |
     . --'`-- .     ________  `---------'
    (BELONGS TO)---|IEEE SOC|
     `- ._ . -'    |________|



GENERALIZATION is the reverse process (to specialization) in which we suppress the
differences among several entity types, identify their common features and generalize
them into a single superclass, of which the original entity types are subclasses.

An example would be taking entity types, CAR and TRUCK, and forming the
generalization, VEHICLE.

       __________  .-*VehicleId   
      | VEHICLE  |-|-*LicensePlateNo
      `----------' `-Price
      	   ||       
           ||     
          .-.    
         ( d )  
       /  `-'\ 
      /        \
     U          U
.___/_.        ._\___.
| CAR |        |TRUCK|
.-----'        `----.'
|-NoOfPassengers    |-NoOfAxles
`-MaxSpeed          `-Tonnage
           

|| Perrizo's Home || NDSU Home


perrizo@plains.nodak.edu