DATABASE SYSTEMS, QUERIES, OPERATORS, FILES and DISKS
review of relational concepts
A database system can be viewed as having the following
software layers or components:
QUERIES coming from users (or Transactions or user-workload requests)
:
:
: SQL (or some other User Interface Language)
:
v
QUERY OPTIMIZATION LAYER
:
:
: Relational Operators (Select, Project, Join, ...)
:
v
DATABASE OPERATOR LAYER
:
:
: File processing operators (open,close file, read/write record, ...)
:
v
FILE MANAGER LAYER (provide the file concept)
:
:
: Buffer managment operators (read/flush page, ...)
:
v
BUFFER MANAGER LAYER
:
:
: Disk transfer operators (malloc, read/write block...
:
v
DISK SPACE MANAGER LAYER
:
:
v
DB on DISK
DISK SPACE MANAGER deals with space on disk
- offers an interface to higher layers (mainly the BUFFER MGR) consisting of:
allocate/deallocate space; read/write block
- can be implement on a raw disk system directly,
then it would likely access data as follows:
read block b of track t of cylinder c on disk d
- or can use OS file system (OS file = sequence of bytes)
then it would likely access data as follows:
read bytes b of file f
and then the Operating System file manager would translate that into
read block b of track t of cylinder c on disk d
(most systems do not use the OS files system
- for portability reasons,
- to avoid OS file size peculiarities (limitations)
BUFFER MANAGER partitions the main memory allocated to the DBMS into buffer page frames,
brings pages to and from disk as requested by higher layers
(mainly the FILE Manager).
FILE MANAGER supports the file concept to higher layers
(DBMS file = collection of records and pages of records)
supports access paths to the data in those files (e.g., Indexes).
Not all Higher level DBMS code recognizes/ uses page concept.
Almost all DBMS use the record concept, though.
DATABASE OPERATOR LAYER implements physical data model operators
(e.g., relational operators; select, project, join...)
QUERY OPTIMIZER produces efficient execution plans for answering user queries
(e.g., execution plans as trees of relational operators:
select, project, join, union, intersect
translated from, e.g., SQL queries).
- SQL is not adequate to answer all user-database questions.
- e.g., Knowledge workers working on Data Warehouses ask
"what if" questions (On-Line Analytic Processing or OLAP)
not retrieval questions (SQL)
Disk I/O minimization is still main performance objective in designing a DBMS.
Thus, clustering records on disk correctly is important.
CLUSTERING = storing logically related records (those accessed together)
physically close, to reduce disk access time.
If the workload typically requests sequential access by SUPPLIER#,
then cluster:
SUPPLIER-1-record close to SUPPLIER-2-record, ...
intra-file clustering
If the workload typically requests access to an individual specific SUPPLIER-#
followed by access to its shipment data, cluster:
SUPPLIER-1-record , SUPLLIER-1's_shipment_data,
SUPPLIER-2-record , SUPPLIER-2's_shipment_data,
...
inter-file clustering
FILE MANAGER RECORD IDENTIFICATION
RID = Record Identifier = permanently assigned record identifier (page#, record#)
RRN = Relative Record Number = permanently assigned order number
- usually an arrival order number
ENTITY-RELATIONSHIP MODEL (ER)
(covered in earlier powerpoint slides - this section is for review and examples)
A high-level conceptual data model.
Used to "get an idea" of how data might be organized in a
company or enterprise so that a "SCHEMA" (set of design specs)
can be constructed to guide the development and definition of
relations (if the relational model is to be used).
Thus, in consultation with the creators and users of the data,
ER diagrams provide a graphical picture of the data and data
relationships and constraints as used by the organization.
In ER modeling,
==============
ENTITY is a "Thing" in real world with an independent existence.
physical existence: person, car, house, employee
conceptual existence: company, job, course
RELATIONSHIP is a relationship between two or more entities.
"enrollment" is a relationship between student and course
"works-for" is a relationship between employee and department
An ER Example:
=============
COMPANY is described to us as follows:
1. The company is organized into depts
- each with a name, number, manager.
- Each manager has a startdate.
- Each department can have several locations.
2. Departments control projects
- each with a name, number, location.
3. Each employee has a name, SSN, sex, address, salary,
birthdate, dept, supervisor.
- An employee may work on several projects
(not necessarily all controlled by his dept)
for which we keep hoursworked by project.
4. Each employee dependent has a name, sex, birthdate
and relationship.
In ER diagrams, entities are represented graphically in boxes:
________ _________ ____ _______
|EMPLOYEE| |DEPENDENT| |DEPT| |PROJECT|
|________| |_________| |____| |_______|
An attribute (or property) of an entity describes that entity.
An ENTITY has a TYPE, including name and list of its attributes.
ENTITY TYPE SCHEMA describes the common structure shared
by all entities of that type.
Project (Name, Num,Location, Dept)
ENTITY INSTANCE is an individual occurrence of an entity
of a particular type at a particular time.
(Dome, 46, 19 Ave N & Univ, Athletics)
(IACC, 52, Bolley & Centennial, C.S.)
(Bean Res, 31, 12 Ave N & Federal, P.S.)
. . .
Entity Type does not change often - very static.
Entity instances get added, changed often - very dynamic
ATTRIBUTES are written next to Entity they describe,
usually something like the following:
Name-------------. Name-------------.
Number-----------| __________ Number-----------| ________
Locations--------|--|DEPARTMENT| Location---------|-|_PROJECT|
Manager----------| ControlDepartment'
ManagerStartDate-'
.--Name
____________ |--SSN .-Employee
|__EMPLOYEE__|----|--Sex __________ |-DependentName
|--Address |_DEPENDENT|--|-Sex
|--Salary |-BirthDate
|--BirthDate `-Relationship
|--Department
|--Supervisor
`--WorksOn
CATEGORIES OF ATTRIBUTES:
========================
COMPOSITE ATTRIBUTE are attributes that are subdivided into
smaller parts with independent meaning.
e.g., Name attribute of Employee may be subdivided
into FName, Minit, LName.
Indicated: Name (FName, Minit, LName)
Also, WorksOn may be a composite attr of Employee
composed of Project and Hours:
WorksOn (Project, Hours)
SINGLE-VALUED ATTRIBUTE: one value per entry.
MULTIVALUED ATTRIBUTE (repeating group) have multiple values per entry:
eg, Locations (as an attribute of Department
since a Department can have multiple locations)
- Multivalued Attribute, use {Locations}
- WorksOn may be a mutlivalued attr of Employee
as well composite:
{WorksOn (Project,Hours)}
DERIVED ATTRIBUTE is an attribute whose value can be
calculated from other attribute values.
eg, Age calculated from BirthDate and CurrentDate.
KEY ATTRIBUTE: Each value can occur at most once.
(has the uniqueness property)
Used to identify entity instances.
We will * key attribute(s).
ATTRIBUTE DOMAIN: Set of values that may be assigned
(also called Value Set).
Thus the Preliminary Design of Entity Types for COMPANY db is.
*Name------------. *Name------------.
*Number----------| __________ *Number----------| ________
{Locations}------|--|DEPARTMENT| Location---------|-|_PROJECT|
Manager----------| ControlDepartment'
ManagerStartDate-'
.-Name(FName,Minit,LName)
____________ |-*SSN .-Employee
|__EMPLOYEE__|---|-Sex _________ |-*DependentName
|-Address |DEPENDENT|-|-Sex
|-Salary |-BirthDate
|-BirthDate `-Relationship
|-Department
|-Supervisor
`-{WorksOn(Project,Hours)}
RELATIONSHIPS among entities express relationships among them:
Relationships have RELATIONSHIP TYPEs
(consisting of the names of the entities and
the name of the relationship).
A Relationship type diagram for a relationship between
EMPLOYEE and DEPARTMENT called "WorksFor" is diagrammed:
(in a roundish box)
________ .--------. __________
|EMPLOYEE|-( WorksFor )-|DEPARTMENT|
`--------'
RELATIONSHIP INSTANCEs for the above relationship might be, eg:
( John Q. Smith, Athletics )
( Fred T. Brown, Comp. Sci.)
( Betty R. Hahn, Business )
.
.
.
RELATIONSHIP DEGREE: Number of participating entities (usually 2)
If an entity participates more than once in
the same relationship, then ROLE NAMES are
needed to distinguish multiple participations.
eg, Supervisor, Supervisee in Supervision relationship
- Called Reflexive Relationships.
- Unnecessary if entity types are distinct.
One decision that has to be made is to decide whether attribute
or relationship is the appropriate way to model, for instance,
"WorksOn". Above we modeled it as an attribute of EMPLOYEE
{WorksOn(Project,Hours)}
The fact that it is multivalued and composite (involving another
entity, project) may suggest that it would be better to model
it as a relationship (i.e., it makes a very complex attribute!)
WORKS_FOR(EMPLOYEE, DEPARTMENT)
CONSTRAINTS ON A RELATIONSHIP
=============================
CARDINALITY CONSTRAINT can be
1-to-1
many-to-1
1-to-many or
many-to-many
1 to 1: MANAGES(EMPLOYEE, DEPARTMENT)
Each manager MANAGES 1 dept
Each dept is MANAGED-BY 1 manager
Many to 1: WORKS_FOR(EMPLOYEE, DEPARTMENT)
Each employee WORKS_FOR 1 dept
Each dept is WORKED_FOR by many emps
Many to Many: WORKS_ON(EMPLOYEE, PROJECT)
Each employee WORKS_ON many projects
Each project is WORKED_ON by many employees
PARTICIPATION CONSTRAINT (for an entity in a relationship)
can be Total, Partial or Min-Max
Total: Every EMPLOYEE WORKS_FOR some DEPARTMENT
Partial: Not every EMPLOYEE MANAGES some DEPT
RELATIONSHIP can have ATTRIBUTES (properties) as well:
eg, Hours for WORKS_ON Relationship,
Manager_Start_Date in MANAGES relationship.
6 RELATIONSHIPS;
===============
(role names, if any, above)
CARDINALITY RELATIONSHIP ATTRIBUTES
----------- ------------ ------(participation below)
1:1 MANAGES (EMPLOYEE, DEPARTMENT)
partial total
1:many WORKS_FOR (DEPARTMENT, EMPLOYEE)
total total
many:many WORKS_ON (EMPLOYEE, PROJECT)
total total
1:many CONTROLS (DEPARTMENT, PROJECT)
partial total
Reflexive relationship with role names ----.---------.
supervisor supervisee
1:many SUPERVISION (EMPLOYEE, EMPLOYEE)
partial partial
1:many DEPENDENTS_OF (EMPLOYEE, DEPENDENT)
partial total
COMPANY Entity-Relationship Diagram
===================================
(showing the Schema)
(double connecting lines means "total" while
single line means partial participation.)
.-------------------.
( MANAGES )
`-------------------'
1|| |1
|| .--------. |
|| ( WORKS_FOR) |
|| `--------' |
*Name-----------. || 1||many| |
*Number---------| _||____|| || /
{Locations}-----|-DEPARTMENT| // /
number_employees' `---------' // /
/ // /
.----------. /1 // /
( CONTROLS ) // /
`----------' // /
many // /
|| .-------. // /
|| (SUPERVISE) // /
|| `-------' // /
|| 1| |many // /
|| 'er| |'ee // /
|| |__|_____//___/ .Name(FN,Mi,LN)
|| | EMPLOYEE |---|-*SSN
|| `-------------' |-Sex
|| // | |-Address
|| Hours-. // | |-Salary
|| | /many | `-BirthDate
\\ .------. |
\\ (WORKS_ON) |
\\ `------' 1|
\\ many| .-------------.
\\ || ( Dependent_0f )
\\ || `-------------'
\\ || |many
*Nane---. \\__||___. ||
*Number-|--|PROJECT| ||
Location' `-------' ||
||
*DependentName---. .====||=======.
Sex--------------|--|| DEPENDENT ||
BirthDate--------| `============='
Relationship-----'
Data analysis = finding RELATIONSHIPS in data
We have already seen the relationship concept many times
We have just discussed relationships wrt
ER diagrams
relations R(A1,...,An) is a degree = n relationship
ENROLL FILE is a degree-3 relationship for S#, C#, GRADE
.___________.
|S#|C#|GRADE|
|==|==|=====|
|32|8 | 89 |
|32|7 | 91 |
|25|7 | 68 |
|25|6 | 76 |
|32|6 | 62 |
|38|6 | 98 |
|17|5 | 96 |
-----------
Data Warehouse relationships:
- wrt a DATA CUBE (or Subject Cube or Fact Cube):
Subject(D1,...,Dn) is a degree = n relationship among n dimensions
____________________________
y / / / / /|
t Chicago/ / / / / |
c / 854 / 882 / 89 / 623 / |
/______/______/______/______/ |
N / / / / /| |
O New York/ / / / / | /|
I / 1087 / 968 / 38 / 872 / | / |
T /______/______/______/______/ |/ 8|
A / / / / /| / 9 |
C Toronto/ / / / / | /|6 /|
O / 818 / 746 / 43 / 591 / | / | / |
L /______/______/______/______/ |/ 5|/ 9|
/ / / / /| | 2 | 8 |
Vancouver/ / / / / | /|9 /|7 /|
/ / / / / | / | / 2 / |
/______/______/______/______/ |/ 2|/ 0|/ 0|
| | | | | | 8 | 0 | 7 /
T Q1 | 605 | 825 | 14 | 400 | /|6 /|1 /|8 /
I | | | | | / | / | / | /
M |______|______|______|______|/ |/ 8|/ 4|/
E | | | | | | 2 | 8 /
Q2 | 680 | 952 | 31 | 512 | /|7 /|9 /
q | | | | | / | / | /
t |______|______|______|______|/ |/ 4|/
r | | | | | | 8 /
s Q3 | 812 | 1023 | 30 | 501 | /|7 /
| | | | | / | /
|______|______|______|______|/ |/
| | | | | /
Q4 | 927 | 1038 | 38 | 580 | /
| | | | | /
|______|______|______|______|/
I T E M shoes socks caps shirts
FILES
The three basic file organizations supported by the
File Manager of most DBMSs are:
HEAP FILES (files of un-ordered records)
SORTED or CLUSTERED FILES ( records sorted or clustered on some field(s) )
HASHED FILES (files in which records are positioned based
on a hash function on some field(s) )
file structures slides
The following is an ascii text treatment of heap and clustered files
(for additional coverage examples, practice and review).
HEAP FILE
Records are not ordered.
Supported operations:
- "create" "destroy" file
- "insert" "delete" record
- "get next record" "get all records" (scan file)
Uniform record slots (assuming fixed length records)
are arranged consecutively on pages.
Record insert?
Method-1: System inserts new records at the end of the file (need indicator)
then moves last record into freed slot following a deletion,
then updates indicator.
- doesn't allow support of the RID or RRN concept
Or a deleted record slot can remain empty (until file reorganized)
- allows support of RID/RRN concept
________________ <- page
| record |0
|________________|
| record |1
|________________|
| record |2
|________________|
| |3
|________________|
| |4
|________________|
| |5
|________________|
| 3 | <- next open slot indicator
|____|
Method-2: Insert in any open slot.
Must maintain a data structure indicating open slots
(e.g., bit filter (or bit map) identifies open slots)
- as a list or
- as a bit_filter
________________ <- page
| record |
|________________|
| | 101001 <-- availability_bit_filter
|________________|
| record |
|________________|
| |
|________________|
| |
|________________|
| record |
|________________|
Supports RID concept.
If we want all records with a given value in particular field or
attribute, we need an "index" (separate file pairing that
field value with the RIDs of the records containing it,
then can use a "get record with RID" to retrieve)
Of course index files must provide a fast way to find the
particular value entries of interest (the heap file
organization for index files would makes little sense).
Index files are usually sorted files. Indexes are examples of ACCESS PATHS.
SORTED or CLUSTERED FILE
Advantages over heap files include:
- reading records in that particular order is very efficient
- finding next record in order is efficient
For efficient "value-based" ordering (clustering),
a level of indirection is useful (unpacked, record-pointer page-format)
.------------------.
.- - - - - -> |P#3| |
. |___|______________|
. 0|record w RID (3,3)|
____RID___. |__________________|
pg#|offset| 1|record w RID (3,0)|
_3_|__4___| |__________________|
2|record w RID (3,4)|
|__________________|
3|record w RID (3,2)|
|__________________|
4|record w RID (3,1)|
|__________________|
5| |
|__________________|
| | | | |2|0|3|4|1| < - unpacked, record-pointer page-format
`------------------'
... 7 6 5 4 3 2 1 0
What happens when a page fills up???
One way is to use an overflow page.
Overflow page
.---------------------. .----------------------.
.- - - - - -> |P#3| |*-> record with RID (3,6)|
. |___|_________________| |______________________|
. 0|record with RID (3,3)| | |
____RID___. |_____________________| |______________________|
pg#|offset| 1|record with RID (3,0)| | |
_3_|__6___| |_____________________| . . .
2|record with RID (3,4)|
|_____________________|
3|record with RID (3,2)|
|_____________________|
4|record with RID (3,1)|
|_____________________|
5|record with RID (3,5)|
|_____________________|
| | |*|5|2|0|3|4|1|
`---------------------'
. . . 7 6 5 4 3 2 1 0
When a page fills up and a record must be inserted and clustered
between (3,1) and (3,5), one solution is to simply place it on an
overflow page in arrival order. Then the overflow page is
scanned like an unordered file page, when necessary.
Periodically overflow pgs are sorted, merged with other pgs
(RIDs reassigned). This is called "File Reorganization".
HASH FILES
==========
Hashing structures in general (both hash files and in memory hash tables.
Each record is placed in an Address bucket
(which is a page, if the hash structure is a hash FILE )
computed from a "hash" function applied to a key value.
.-----------.
KEY --> | HASH FCTN | --> BUCKET#
`-----------'
provides fast random access
----------------
Bucket (page) # 0
<-'
----------------
Bucket (page) # 1
<-'
----------------
Bucket (page) # 2
<-'
----------------
Bucket (page) # 3
<-'
----------------
Bucket (page) # 4
<-'
----------------
Bucket (page) # 5
<-'
----------------
Bucket (page) # 6
<-'
----------------
Example:
Suppose we have the following 14 records to store in 7 buckets,
We will assume each bucket will hold two records
called the blocking factor (bfr):
NM AGE CTY CODE
su 67 gfs M23
kz 55 gfs Y21
ng 34 ffs N12
mo 17 mhd N12
jo 37 gfs M24
bo 46 bis Y32
ab 34 gfs Y23
ez 38 min Y32
aq 54 fgo M24
qk 34 ffs N21
cp 14 dls N12
oy 82 fgo M23
he 17 zap Y32
ai 24 fgo M78
We will take k to be a hash function on the attribute, NM, as follows.
Map the letters to numbers sequentially as follows:
a->0 b->1 c->2 d->3 e->4 f->5 g->6 h->7 i->8 j->9
k->10 l->11 m->12 n->13 o->14 p->15 q->16 r->17 s->18 t->19
u->20 v->21 w->22 x->23 y->24 z->25
then add the two numbers together, then apply Mod7
So k(NM) = Mod7(sum)
_______________
su sums to 38, Mod7(38)=3, therefore the record, | su 67 gfs M23 |
---------------
"hashs" to bucket 3
--------------v0
--------------v1
--------------v2
--------------v3
su 67 gfs M23
--------------v4
--------------v5
--------------v6
----------------
kz sums to 35, Mod7(35) = 0
--------------v0
kz 55 gfs Y21
--------------v1
--------------v2
--------------v3
su 67 gfs M23
--------------v4
--------------v5
--------------v6
----------------
ng sums to 19, Mod7(19) = 5
--------------v0
kz 55 gfs Y21
--------------v1
--------------v2
--------------v3
su 67 gfs M23
--------------v4
--------------v5
ng 34 ffs N12
--------------v6
----------------
mo sums to 26, Mod7(26) = 5
--------------v0
kz 55 gfs Y21
--------------v1
--------------v2
--------------v3
su 67 gfs M23
--------------v4
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
----------------
Note: Bucket 5 is now full.
If another record hashes to 5 it will cause a "collision"
jo sums to 23, Mod7(23) = 2
--------------v0
kz 55 gfs Y21
--------------v1
--------------v2
jo 37 gfs M24
--------------v3
su 67 gfs M23
--------------v4
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
----------------
bo sums to 15, Mod7(15) = 1
--------------v0
kz 55 gfs Y21
--------------v1
bo 46 bis Y32
--------------v2
jo 37 gfs M24
--------------v3
su 67 gfs M23
--------------v4
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
----------------
ab sums to 1, Mod7(1) = 1
--------------v0
kz 55 gfs Y21
--------------v1
bo 46 bis Y32
ab 34 gfs Y23
--------------v2
jo 37 gfs M24
--------------v3
su 67 gfs M23
--------------v4
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
----------------
Note: Bucket 1 is now full.
Another hash to 5 will cause a "collision"
ez sums to 29, Mod7(29) = 1
This is called a collision or overflow condition.
One method of handling overflows is OPEN ADDRESSING
in which we search forward for the next available slot, which is in bucket 3
--------------v0
kz 55 gfs Y21
--------------v1
bo 46 bis Y32
ab 34 gfs Y23
--------------v2
jo 37 gfs M24
ez 38 min Y32
--------------v3
su 67 gfs M23
--------------v4
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
----------------
aq sums to 16, Mod7(16) = 2 (overflow because of previous open addressing)
Search ahead to find the next open slot (in bucket 3 also)
--------------v0
kz 55 gfs Y21
--------------v1
bo 46 bis Y32
ab 34 gfs Y23
--------------v2
jo 37 gfs M24
ez 38 min Y32
--------------v3
su 67 gfs M23
aq 54 fgo M24
--------------v4
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
----------------
qk sums to 26, Mod7(26) = 5 Overflow! Search ahead for open slot (to 6)
--------------v0
kz 55 gfs Y21
--------------v1
bo 46 bis Y32
ab 34 gfs Y23
--------------v2
jo 37 gfs M24
ez 38 min Y32
--------------v3
su 67 gfs M23
aq 54 fgo M24
--------------v4
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
qk 34 ffs N21
----------------
cp sums to 17, Mod7(17) = 3 Overflow! Search ahead for open slot (to 4).
--------------v0
kz 55 gfs Y21
--------------v1
bo 46 bis Y32
ab 34 gfs Y23
--------------v2
jo 37 gfs M24
ez 38 min Y32
--------------v3
su 67 gfs M23
aq 54 fgo M24
--------------v4
cp 14 dls N12
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
qk 34 ffs N21
----------------
oy sums to 38, Mod7(38) = 3 Overflow! Search ahead for open slot (to 4)
--------------v0
kz 55 gfs Y21
--------------v1
bo 46 bis Y32
ab 34 gfs Y23
--------------v2
jo 37 gfs M24
ez 38 min Y32
--------------v3
su 67 gfs M23
aq 54 fgo M24
--------------v4
cp 14 dls N12
oy 82 fgo M23
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
qk 34 ffs N21
----------------
he sums to 11, Mod7(11) = 4 Overflows even tho no previous key hashed to 4!
Search for next open slot (at 6)
--------------v0
kz 55 gfs Y21
--------------v1
bo 46 bis Y32
ab 34 gfs Y23
--------------v2
jo 37 gfs M24
ez 38 min Y32
--------------v3
su 67 gfs M23
aq 54 fgo M24
--------------v4
cp 14 dls N12
oy 82 fgo M23
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
qk 34 ffs N21
he 17 zap Y32
----------------
ai sums to 8, Mod7(8) = 1 Overflows. Search for next open slot (at 0)
( Search wraps around!)
--------------v0
kz 55 gfs Y21
ai 24 fgo M78
--------------v1
bo 46 bis Y32
ab 34 gfs Y23
--------------v2
jo 37 gfs M24
ez 38 min Y32
--------------v3
su 67 gfs M23
aq 54 fgo M24
--------------v4
cp 14 dls N12
oy 82 fgo M23
--------------v5
ng 34 ffs N12
mo 17 mhd N12
--------------v6
qk 34 ffs N21
he 17 zap Y32
----------------
Notice how often overflow occured toward the end. (The last 7 tuples!)
Retrieval? To find name, MN, apply k(MN),
if it's not in the bucket it is supposed to be in,
search forward until it is found,
if not found by the end of the hash table, , start at the top,
search forward until found or original bucket is reached,
if not found, it is not present.
Retrieve ai's record!
K(ai) = 1
Retrieve bucket 1, not there!
Retrieve bucket 2, not there!
Retrieve bucket 3, not there!
Retrieve bucket 4, not there!
Retrieve bucket 5, not there!
Retrieve bucket 6, not there!
Retrieve bucket 0, there it is!
What are the lessons learned about hashing from this example?
1. Ratio of hash-space-size to number-of-records should be high?
(How high? How dense should the table be allowed to become?)
Studies suggest no more that 70-80% full is best.
Other important tuning questions:
How large should each bucket be? (bucket size or blocking factor, bfr?)
What hash functions are best?
mod,
digit selections,
order preserving (left-3 digits) combinations?
foldings
INTERNAL HASH TABLES (in main memory):
To hash a relation, R(A1,...,An) into an internal hash table
(in main memory) on a chosen attribute, Ai,
start with an in-memory Array of M Record Slots ( numbered 0 to M-1 ).
Choose a hash function which transforms each Ai-value into a hash-value
in {0, ..., M-1} (eg, modM if Ai is positive numeric)
In the Internal hash table, each "bucket" has bfr record slots
(often bfr = 1).
Upon a collision (full bucket) the system can use
- OPEN ADDRESSING; use available next bucket
===============
- CHAINING: (overflow buckets) link to overflow areas
========
- REHASH
======
- other?
Internal Table[M=11] h(S#)=S#mod11
Suppose the table so far is:
bucket
0
1
|57|BROWN |NY |NY| 2
|25|CLAY |OUTBK|NJ| 3
4
|38|GOOD |GATER|FL| 5
|17|BAID |NY |NY| 6
7
8
9
|32|THAISZ|KNOB |NJ| 10
Next, insrt an new record, |16|BARBIE|NY|NY| into the table:
bucket
0
1
|57|BROWN |NY |NY| 2
|25|CLAY |OUTBK|NJ| 3
4 _______________
|38|GOOD |GATER|FL| 5 <--insert |16|BARBIE|NY|NY|. h(16)=5 Collision!
|17|BAID |NY |NY| 6 If OPEN ADDRESSING is used, insert at 7
7 < - - - - - - - - - - - - - - - - - - - -'
8 :
9 :
|32|THAISZ|KNOB |NJ| 10 :
:
:
Resulting in the table: :
:
bucket :
0 :
1 :
|57|BROWN |NY |NY| 2 :
|25|CLAY |OUTBK|NJ| 3 :
4 :
|38|GOOD |GATER|FL| 5 :
|17|BAID |NY |NY| 6 :
|16|BARBIE|NY |NY| 7 <- '
8
9
|32|THAISZ|KNOB |NJ| 10
OR, if CHAINING is used to handle collision, set aside
an overflow area just for colliding records (e.g., slots 11 onward)
and use an Overflow_pointer field, OFP, to point from 5 to that overlfow slot
OFP bucket
vv v-----
0
1
|57|BROWN |NY |NY| | 2
|25|CLAY |OUTBK|NJ| | 3
| | | | 4
|38|GOOD |GATER|FL|11|__5_
|17|BAID |NY |NY| | 6 :
| | | | | 7 :
| | | | | 8 :
| | | | | 9 :
|32|THAISZ|KNOB |NJ| | 10 :
:
:
====OVERFLOW_AREA=== :
|16|BARBIE|NY |NY| | 11<'
Next, Insert |35|DODD|FGO|ND|. h(35)=2 Collision!
Place new record in slot 12 and "chain" it to 2 (2 points to 12)
OFP bucket
vv v-----
0
1
|57|BROWN |NY |NY|12|__2_
|25|CLAY |OUTBK|NJ| | 3 :
| | | | 4 :
|38|GOOD |GATER|FL|11| 5 :
|17|BAID |NY |NY| | 6 :
| | | | | 7 :
| | | | | 8 :
| | | | | 9 :
|32|THAISZ|KNOB |NJ| | 10 :
:
:
====OVERFLOW_AREA=== :
|16|BARBIE|NY |NY| | 11 :
|35|DODD |FGO |ND| | 12<'
Next, Insert |60|DENT|FGO|ND|. h(60)=5 Collision! Place new record
in next overflow record slot 13 and "chain" it to 5 (5 points to 13)
OFP bucket
vv v-----
0
1
|57|BROWN |NY |NY|12| 2
|25|CLAY |OUTBK|NJ| | 3
| | | | 4
|38|GOOD |GATER|FL|11| 5
|17|BAID |NY |NY| | 6
| | | | | 7
| | | | | 8
| | | | | 9
|32|THAISZ|KNOB |NJ| | 10
====OVERFLOW_AREA===
|16|BARBIE|NY |NY|13|_11__
|35|DODD |FGO |ND| | 12 :
|60|DENT |FGO |ND| | 13<-'
Next, Insert |13|LISA|MHD|MN|. h(13)=2 Collision! Place new record
in next overflow slot (14) and "chain" it to 2 (2 points to 12 points to 14)
OFP bucket
vv v-----
0
1
|57|BROWN |NY |NY|12| 2
|25|CLAY |OUTBK|NJ| | 3
| | | | 4
|38|GOOD |GATER|FL|11| 5
|17|BAID |NY |NY| | 6
| | | | | 7
| | | | | 8
| | | | | 9
|32|THAISZ|KNOB |NJ| | 10
====OVERFLOW_AREA===
|16|BARBIE|NY |NY|13| 11
|35|DODD |FGO |ND|14|_12__
|60|DENT |FGO |ND| | 13 :
|13|LISA |MHD |MN| | 14<-'
Chaining doesn't change previously set pointers.
Another overflow handling method:
Rehash each collision with a different hash function
(e.g., with h'=mod9 )
bucket
0
|16|BARBIE|NY |NY| 1
|57|BROWN |NY |NY| 2
|25|CLAY |OUTBK|NJ| 3
4
|38|GOOD |GATER|FL| 5 <-insert |16|BARBIE|NY|NY| collides! Rehash: h'(16)=7
|17|BAID |NY |NY| 6 :
7 <- - - - - - - - - - - - - - - - - - - - - - - - - - '
8 :
9 :
|32|THAISZ|KNOB |NJ| 10 :
:
:
:
bucket :
0 :
|16|BARBIE|NY |NY| 1 :
|57|BROWN |NY |NY| 2 :
|25|CLAY |OUTBK|NJ| 3 :
4 :
|38|GOOD |GATER|FL| 5 :
|17|BAID |NY |NY| 6 :
|16|BARBIE|NY |NY| 7 < -'
8
9
|32|THAISZ|KNOB |NJ| 10
EXTERNAL HASHING
================
Typically a "bucket" = a disk page.
Hash function maps to a "Relative Page Addr" or RPA (0,1,2..)
A Lookup_Table translates RPA to actual allocated physical-page-addr (PPA).
h(S#)=S#mod11
pg-# lookup tbl
_____________________23 RPA PPA
|57|BROWN |NY |NY| | 0 13
|__|______|_____|__|___| 1 64
2 23
3 45
_____________________45 4 62
|25|CLAY |OUTBK|NJ| | 5 78
|__|______|_____|__|___| 6 24
7 52
8 36
_____________________78 9 2
|38|GOOD |GATER|FL| | 10 98
|16|BARBIE|NY___|NY|___|
overflow PPA: 39 ( doesn't need RPAs)
_____________________24
|17|BAID |NY |NY| |
|__|______|_____|__|___|
_____________________98
|32|THAISZ|KNOB |NJ| |
|__|______|_____|__|___|
OVERFLOW PG 3
_____________________39
| | | | | |
|__|______|_____|__|___|
Chain for overflow, by having "pointer" to linked list of overflow record
positions on overflow buckets (pages) ( bfr=2, i.e., 2 slots/page )
pg-# lookup tbl
_____________________23 RPA PPA
|57|BROWN |NY |NY| | 0 13
|__|______|_____|__|___| 1 64
2 23
3 45
_____________________45 4 62
|25|CLAY |OUTBK|NJ| | 5 78
|__|______|_____|__|___| 6 24
7 52
8 36
_____________________78 9 2
|38|GOOD |GATER|FL| | 10 98
|16|BARBIE|NY___|NY|___|
overflow PPA: 39 (doesn't need RPA)
_____________________24
|17|BAID |NY |NY| |
|__|______|_____|__|___|
_____________________98
|32|THAISZ|KNOB |NJ| |
|__|______|_____|__|___|
OVERFLOW PG 3
_____________________39
|27|JONES |MHD |MN| |
|__|______|_____|__|___|
Insert S#=27 RPA=h(27)=5 --> PPA=78 Collision! Chain it from
RID (78, 1) to RID ( 39, 0)
PPA,offset PPA,offset
pg-#
______________________23
|57|BROWN |NY |NY| |
|__|______|_____|__|____|
______________________45
|25|CLAY |OUTBK|NJ| |
|__|______|_____|__|____|
______________________78
|38|GOOD |GATER|FL| |
|16|BARBIE|NY___|NY|39,0|
______________________24
|17|BAID |NY |NY| |
|__|______|_____|__|____|
______________________98
|32|THAISZ|KNOB |NJ| |
|__|______|_____|__|____|
OVERFLOW PG 3
______________________39
|27|JONES |MHD |MN| |
|__|______|_____|__|____|
Except for the fact that RIDs are used as points rather than just record
slot numbers, it is exactly the same as chaining for internal hash tables.
HASHING TECHNIQUES THAT ALLOW DYNAMIC FILE EXPANSION
(as needed, rather than up-front, one-time)
Extensible Hashing
Linear Hashing
The following are some powerpoint slides on extensible hashing.
hashing slides
The following are ascii notes on extensible hashing,
for your review, elaboration and additional examples.
EXTENDIBLE HASHING
The following is a text based treatment of Extentible hashing which
was covered in powerpoint slides. We will skip this part in lecture,
but it may be useful as reference material and as practice with the
technique.
Extendible hashing has a directory of length 2^d which is a FILE
with hash_function_values (psuedokey values) and
page-pointers to the page of record(s) with that value.
The Pseudokey value is expressed as a binary string.
The key value, k, is hashed to produce
the pseudokey value, k' , ie, h(k)=k'.
We then look up least significant d-bits of k' in the directory
(d is called the "global depth number"),
then follow the ptr to the page with the record(s) containing k.
For each page, keep a "local depth number" = minimum number of
bits needed for any record on that page
- Local depth is never greater than global depth.
When page fills up, split it, reset the directory ptrs and
increase local depths by 1.
When page split requires it, double directory size and
increment global depth by 1.
h=mod11 h(57)=2 = 10
h(22)=0 = 00
DIRECTORY global-depth=1
pg=14, local-depth=1 pg# k'
|57|BROWN |NY |NY| 14 0
|22|CLAY |OUTBK|NJ| 14 1
insert |42|CASEY|DENT|MN| h(42)=9 = 1001 Collision!
A MALLOC is done for a new page, giving, for instance, page 17.
Split bucket 14 into 14 and 17
h(22)=0 = 0000
h(57)=2 = 0010
h(42)=9 = 1001
pg=14, ld=1 pg# k' gd=1
|57|BROWN |NY |NY| 14 0
|22|CLAY |OUTBK|NJ| 17 1
pg=17, ld=1
|42|CASEY |DENT |MN|
insert |38|GOOD|GATER|FL|, h(38)=5=0101
h(22)=0 = 0000
h(57)=2 = 0010
h(42)=9 = 1001
h(38)=5 = 0101
pg=14, ld=1 pg# k' gd=1
|57|BROWN |NY |NY| 14 0
|22|CLAY |OUTBK|NJ| 17 1
pg=17, ld=1
|42|CASEY |DENT |MN|
|38|GOOD |GATER|FL|
insert |14|BARBIE|NY|NY| h(14)=3=0011 Collision!
h(22)=0 = 0000
h(57)=2 = 0010
h(42)=9 = 1001
h(38)=5 = 0101
h(14)=3 = 0011
A MALLOC is done for a new page, giving, for instance, page 32.
Split 17 into 17 and 32. Copy directory and add hi-order digits.
pg=14, ld=1 pg# k' gd=2
|57|BROWN |NY |NY| 14 00
|22|CLAY |OUTBK|NJ| 17 01
14 10
pg=17, ld=2 32 11
|42|CASEY |DENT |MN|
|38|GOOD |GATER|FL|
pg=32, ld=2
|14|BARBIE|NY |NY|
delete |38|GOOD|GATER|FL|
h(22)=0 = 0000
h(57)=2 = 0010
h(42)=9 = 1001
h(38)=5 = 0101
h(14)=3 = 0011
pg=14, ld=1 pg# k' gd=2
|57|BROWN |NY |NY| 14 00
|22|CLAY |OUTBK|NJ| 17 01
14 10
pg=17, ld=2 32 11
|42|CASEY |DENT |MN|
pg=32, ld=2
|14|BARBIE|NY |NY|
delete |42|CASEY |DENT|MN|
h(22)=0 = 0000
h(57)=2 = 0010
h(42)=9 = 1001
h(38)=5 = 0101
h(14)=3 = 0011
pg=14, ld=1 pg# k' gd=1 pg# k' gd=1
|57|BROWN |NY |NY| 14 00 ===========> 14 0
|22|CLAY |OUTBK|NJ| Contract dir 32 1
14 10 ===========>
pg=17 deallocated 32 11
pg=32, ld=1
|14|BARBIE|NY |NY|
LINEAR HASHING
Starts with M buckets (numbered 0, 1, ..., M-1
and initial hash function, h0=modM
(or more generally, h0(key) = h(key)modM for any
hash function h which maps into the integers)
Use Chaining in shared overflow-pages to handle overflows.
At the first overflow, split page-0 into pages 0 and M.
Rehash the page-0 records using h1=mod2M.
Henceforth whenever h0 yields a value <= 0, rehash using h1=mod2M
At the next overflow, split page-1 into pages 1 and M+1.
Rehash the page-1 records using h1=mod2M.
Henceforth when h0 yields a value <= 1, use h1.
etc. Until all of the original M pages have been split (after M collisions),
then rehash all overflow records using h1.
Relabel h1 as h0, start a new "round" by repeating` the process for future
collisions
(i.e., now we have buckets 0,1,...,2M-1 and h0 = MOD(2M)...
To search, let n = number of splits so far in the given round,
if h0(key) is not greater than n,
then use h1, else use h0.
Example: M=11
h0(S#)=S#mod11 Lookup_Tbl
rel actual
0 13
pg-# 1 64
|57|BROWN |NY |NY| 23 2 23
3 45
|25|CLAY |OUTBK|NJ| 45 4 62
5 78
|38|GOOD |GATER|FL| 78 6 24
|16|BARBIE|NY |NY| 7 52
8 36
|17|BAID |NY |NY| 24 9 2
10 98
|32|THAISZ|KNOB |NJ| 98
Insert: |27|JONES|MHD|MN|, h0(27)=5 Collision!
Lookup_Tbl
rel actual
0 13 < -
pg-# 1 64 \
|57|BROWN |NY |NY| 23 2 23
3 45 \
|25|CLAY |OUTBK|NJ| 45 4 62
5 78 \ split 0 into 0,11
|38|GOOD |GATER|FL| 78 6 24 /
|16|BARBIE|NY |NY|(3,1) 7 52
8 36 /
|17|BAID |NY |NY| 24 9 2
10 98 /
|32|THAISZ|KNOB |NJ| 98 11 21 < -
------------------------
overflow pgs 3,29,100
|27|JONES |MHD |MN| 3
Insert: |33|SMITH|MHD|MN|, h0(33)=0, so use h1(33)=33mod22=11
pg# rel actual
|33|SMITH |MHD |MN| 21 0 13
1 64
|57|BROWN |NY |NY| 23 2 23
3 45
|25|CLAY |OUTBK|NJ| 45 4 62
5 78
|38|GOOD |GATER|FL| 78 6 24
|16|BARBIE|NY |NY|(3,1) 7 52
8 36
|17|BAID |NY |NY| 24 9 2
10 98
|32|THAISZ|KNOB |NJ| 98 11 21
------------------------
|27|JONES |MHD |MN| 3
overflow pgs 3,29,100---
n=1 (next bucket to split)
When n=11, all buckets have been split,
apply h1 to all records in overflow area.
INDEXES
Indices are auxiliary files which provide a vertical view of the data
(one attribute per index).
INDEX TYPES
LINEAR INDEX STRUCTURE (simplest type)
A lookup file, with a "value"-column and a "pointer"-column.
That is clustered (ordered) on the value-column for fast access to
particular values (using binary search of the index file).
Pointer can be RID or RRN (or just page-# in some cases)
__pg1_______
|RRN | S# |ST| STATE-INDEX
|====|====|==| .__._____.
| 1 | 25 |NJ|<._ PTR|STATE|
|----|----|--| \ |==|=====|
| 2 | 32 |NJ|<-. \ .--|3_|_FL__|
|----|----|--| `-\-/---|2_|_NJ__|
| 3 | 38 |FL|<-----'----|1_|_NJ__|
|----|----|--| __-|4_|_NY__|
| 4 | 17 |NY|<------' _-|5_|_NY__|
`------------' '
_pg2________ |
| 5 | 57 |NY|<------'
|----|----|--|
PRIMARY INDEX: I(k,p)
k = ordered or clustered "key" field values from ordered
or clustered field of file with uniqueness property
Uniqueness property: individual value occurrences are "unique"
i.e., each value can occur at most once.
p = pointer to page containing record(s) with value, k
Primary indexes can be either:
DENSE: (every record is indexed) or
NON-DENSE: only key-values of records at the beginning
of a page are indexed (anchor record of page).
(and then the pointer is page-# only)
Example:
-------
Assume the blocking factor (bfr) is 2
which means 2 records/page.
_STUDENT________
|S#|SNAME |LCODE |
|==|------|------|pg
|17|BAID |NY2091|1
|25|CLAY |NJ5101|1
|----------------|
|32|THAISZ|NJ5102|2
|38|GOOD |FL6321|2
|----------------|
|57|BROWN |NY2092|3
|83|THOM |ND3450|3
Non-dense Primary Index on S#
|S#|pg|
|17| 1|
|32| 2|
|57| 3|
_STUDENT________
|S#|SNAME |LCODE |
|==|------|------|pg
|17|BAID |NY2091|1
|25|CLAY |NJ5101|1
|----------------|
|32|THAISZ|NJ5102|2
|38|GOOD |FL6321|2
|----------------|
|57|BROWN |NY2092|3
|83|THOM |ND3450|3
Dense Primary Index on S#
|S#|pg|offset
|17| 1| 0|
|25| 1| 1|
|32| 2| 0|
|38| 2| 1|
|57| 3| 0|
|83| 3| 1|
`---v-'
RID
Inserting and deleting are major problems.
- must move records to maintain ordering
- anchors change (in non-dense case)
USING OVERFLOW PAGES and UNPACKED RECORD-SLOT FORMAT (1 level of indirection)
Overflow page
.---------------------. .----------------------.
.- - - - - -> |P#3| *|->| record with RID (3,6)|
. |___|_________________| |______________________|
. 0|record with RID (3,3)| | |
____RID___. |_____________________| |______________________|
pg#|offset| 1|record with RID (3,0)| | |
_3_|__6___| |_____________________| . . .
2|record with RID (3,4)|
|_____________________|
3|record with RID (3,2)|
|_____________________|
4|record with RID (3,1)|
|_____________________|
5|record with RID (3,5)|
|_____________________|
| | |*|5|2|0|3|4|1|< - unpacked record-slot-ptr table
`---------------------'
. . . 7 6 5 4 3 2 1 0
CLUSTERING INDEX: (like a primary index except that the
attribute nee not be a key)
- the file must be clustered on the attribute, k
- the pointer for any k is the address of 1st page with that k-value
ENROLL2
.-----------.
|S#|C#|GRADE|
|==|==|-----|pg
|17|6 | 96 |1
|25|6 | 76 |1
|-----------|
|32|6 | 62 |2
|38|6 | 98 |2
|-----------|
|32|6 | 91 |3
|25|7 | 68 |3
|-----------|
|32|8 | 89 |4
|17|9 | 95 |4
`-----------'
|C#|pg| Dense Clustering_Index on C#
|6 | 1|
|7 | 3|
|8 | 4|
|9 | 4|
or
ENROLL2
.-----------.
|S#|C#|GRADE|
|==|==|-----|pg
|17|6 | 96 |1
|25|6 | 76 |1
|-----------|
|32|6 | 62 |2
|38|6 | 98 |2
|-----------|
|32|6 | 91 |3
|25|7 | 68 |3
|-----------|
|32|8 | 89 |4
|17|9 | 95 |4
`-----------'
|C#|pg| Non-dense Clustering_Index on C#
|6 | 1| (indexing new anchor records only)
|8 | 4|
There's no more search overhead with this 2nd type of clustering index.
- How can you know which page has C#=7?
- How can you know which page has C#=9?
In general, there are insert/delete problems because of clustering.
Solution? Reserve 1 page for each value and use an overflow ptr
___________
|S#|C#|GRADE| ENROLL (clustered on C#)
|==|==|-----|pg
|17|5 | 96 |1
|25|5 | 76 |1
|-----------|
|32|6 | 62 |2
|38|6 | 98 |2
|-----------|
|32|7 | 91 |3
|25|7 | 68 |3
|-----------|
|32|8 | 89 |4
|17|8 | 95 9|4
`----------\'
\ overflow pointer
:
.-----------.v
|25|8 | 86 |9 (overflow page for value, C#=8)
| |
`-----------'
Clustering Index on C#
|C#|pg|
|5 | 1|
|6 | 2|
|7 | 3|
|8 | 4|
SECONDARY INDEX: These indexes are the same except,
- the file is need not be clustered on k
- p points to the page or record containing k
- every record must be indexed (dense)
Option1: If there are multiple occurences of k,
use multiple index entries for that k.
|S#|C#|GRADE| ENROLL (unclustered C#)
|32|8 | 89 |1
|25|6 | 76 |1
|32|6 | 62 |2
|25|8 | 86 |2
|38|6 | 98 |3
|32|7 | 91 |3
|17|5 | 96 |4
|25|7 | 68 |4
|17|8 | 95 |5
|C#|page| Secondary_Index, opt1 on C#
|5 | 4 |
|6 | 1 |
|6 | 2 |
|6 | 3 |
|7 | 3 |
|7 | 4 |
|8 | 1 |
|8 | 2 |
|8 | 4 |
Option2: Use repeating groups of pointers
(requires variable length pointer(s) field
|S#|C#|GRADE| ENROLL (unclustered C#)
|32|8 | 89 |1
|25|6 | 76 |1
|32|6 | 62 |2
|25|8 | 86 |2
|38|6 | 98 |3
|32|7 | 91 |3
|17|5 | 96 |4
|25|7 | 68 |4
|17|8 | 95 |5
|C#|page Secondary_Index, opt2 on C#
|5 | 4
|6 | 1,2,3
|7 | 3,4
|8 | 1,2,4
Option3: Use 1 index entry for each value, 1 pointer to "list"
or "linked list" of record pointers. (1 level of indirection)
|S#|C#|GRADE| ENROLL (unclustered C#)
|32|8 | 89 |1
|25|6 | 76 |1
|32|6 | 62 |2
|25|8 | 86 |2
|38|6 | 98 |3
|32|7 | 91 |3
|17|5 | 96 |4
|25|7 | 68 |4
|17|8 | 95 |5
|C#|page Secondary_Index, opt3 on C#
|5 | -->|4|
|6 | -->|1|->|2|->|3|
|7 | -->|3|->|4|
|8 | -->|1|->|2|->|4|
INVERTED FILE = A file in which there is an index on every field
(All are secondary indexes except the CITY index.
the CITY index is a clustering index)
._________.________.______._________.
|GATER 1,1|KNOB 1,2|NY 2,2|OUTBK 2,1| CITY
`---------`--------`------`---------'
___________________
S# |S#|SNAME | CITY |ST|
====== |==|======|======|==|pg
|17 2| |38|GOOD | GATER|FL|1 STATE
|----| |--|------|------|--| =======
|25 2| |32|THAISZ| KNOB |NJ|1 |FL 1 |
|----| |--|------|------|--| |-----|
|32 1| |25|CLAY | OUTBK|NJ|2 |NJ 1 |
|----| |--|------|------|--| |-----|
|38 1| |17|BAID | NY |NY|2 |NJ 2 |
`----' `------------------- |-----|
|NY 2 |
`-----'
.______.______.______.________.
SNAME|BAID 2|CLAY 2|GOOD 1|THAISZ 1|
`------`------`------`--------'
MULTILEVEL INDEX:
For any index, since it is a file clustered on k,
it can have a primary or clustering index on it.
(constituting the second level of the multilevel index).
_ STUDENT_______
|S#|SNAME |LCODE |pg
|==|======|======|
|17|BAID |NY2091|1
|25|CLAY |NJ5101|1
|--|------|------|
|32|THAISZ|NJ5102|2
|38|GOOD |FL6321|2
|--|------|------|
|57|BROWN |NY2092|3
|83|THOM |ND3450|3
|--|------|------|
|91|PARK |MN7334|4
|94|SIVA |OR1123|4
`--`------`------'
|S#|pg|pg(of index file) S#-index (nondense, primary)
|--|--|
|17| 1|1
|32| 2|1
|--|--|
|57| 3|2
|91| 4|2
`--`--'
2nd_LEV (a second level, nondense index)
|S#|pg|
|--|--|
|17| 1|
|57| 2|
`--`--'
Multilevel index forms a "tree"
- nodes at the last level are called leaf nodes
- nodes at the other levels are called internal nodes or inodes
- Inserts and deletes can be troublesome.
Indexes slides
The following is a text based treatment of B+ Tree Indexes, which
was covered in powerpoint slides. We will skip this part in lecture,
but it may be useful as reference material and as practice with the technique.
B+ TREE INDEX
(record pointers are in leaf nodes only)
Leaf level (is called the SEQUENCE SET) and is a
linked list of blocks containing all key values in order
(for fast sequential access), together with their record ptr(s)
Inodes (is called the INDEX SET), for fast direct access, is a
tree index into the sequence set.
- never becomes unbalanced
- limits wasted space (keeps nodes at least 1/2 full)
- sequential scans (in order) are fast
The structure of B+ index of order p:
1.Each index set inode is of the form
[P0, K1, P1, K2, P2, . . . Kq, Pq ] q <= p
| | | |
subtree subtree subtree subtree
containing containing containing containing
all values all vaules all values all values
lessthan K1 at least K1 & at least K2 and at least Kq
less than K2 less than K3
All Ki values appear again at leaf level (in order)
together with their record pointer(s).
The structure of a sequence set node:
[ K1,K1ptr(s), K2,K2ptr(s), ... Kq-1,Kq-1ptr(s), Pnext ]
Pnext points to next sequence set node of the B+ tree.
B) Insertion into B+ tree
Insert into the sequence set node first.
If sequence set node is not full, just insert.
If sequence set node is full,
the node overflows and must be split.
First j=ceil(p/2) entries in original node are kept there and
remaining entries moved to a new leaf node.
jth value (middle) is replicated to the parent internal node.
recursively, If parent inode is full, it is split, etc.
(Note, the reason the tree stays balanced is that the only way
a new level gets introduced is via a new root, which is part
of every path).
C) Deletion of B+ tree
When an entry is deleted, it is removed from the leaf level.
If it happens to occur in an inode, it is also removed from
there.
Deletion may cause the number of entries in the leaf node
to fall below the minimum required.
In this case, find a sibling leaf node that is more than
half full and redistribute entries among node and its
sibling such that both are at least half full; otherwise
the node is merged with one of its siblings.
This can propagate and reduce tree levels.
The tree always stays balanced.
There are many variations on the basic B+-tree structure,
owing to its popularity and excellent service (providing
both fast sequential file access and fast direct file access)
- One such variation is to defer key value deletes from the
index set until its sequence set node is merged with an
adjacent sequence set node (as described above).
- Another is the page-oriented method used by IBM,
as briefly described below:
MULTIDIMENSIONAL INDEXES
Multidimensional data (datawarehouses and data cubes) were covered in the
previous set of notes.
Multidimensional data almost always requires multidimensional indexing
One dimensional indexes assume a single search column or key
(which can be a composite column or key)
Data structures that support queries into multidimensional data
specifically, fall in two categories:
1. Hash-table-like (e.g., Grid files and partitioned hash fctns)
2. Tree-like
(eg, multi-key indexes, kd-trees, quad-trees (for sets of points);
R-trees (for sets of regions as well as sets of points) ),
Predicate-trees (P-trees) for vertical compressed, representations of data
Hash-like Structures for Multidimensional Data
--------------------
Grid Files
- partition the POINTS space into a grid
- in each dim "grid lines" partition space into stripes
- points that fall right on a grid line belong to the stripe above it
(i.e., grid-lines are the lower boundaries)
- example: 12 customer(age,sal) data points (i.e., records or tuples)
age,sal
(24,60)
(46,60)
(50,80)
(50,100)
(50,120)
(70,100)
(84,140)
(30,260)
(26,400)
(44,360)
(50,280)
(60,260)
if grid lines at age=40 and age=65
SAL=90K SAL=224K
4 5
0 6
. ---- ---- ---- ---- ---- ---- ---- ---- ---- ----.
400K | * : : |
380K | : : |
360K | : * : |
340K | : : |
320K | : : |
300K | : : |
280K | : * : |
260K | * : : * |
S 240K | : : |
A 220K | ...................:.......:.....................|224K
L 200K | : : |
180K | : : |
160K | : : |
140K | : : * |
120K | : * : |
100K | ...................:....*..:......*..............| 90K
80K | : * : |
60K | * : * : |
40K | : : |
20K | : : |
` ---- ---- ---- ---- ---- ---- ---- ---- ---- ----'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0
AGE
Grid hash function is defined using a table lookup:
Age: 0-39|40-55| 56+
Sal _____|_____|_____
0-90K |_ptr1|_ptr2|_____|
90-224K|_____|_ptr5|_ptr6|
224K+ |_ptr7|_ptr8|_ptr9|
____________ :
|__(24,60)___| < - ptr1 --'
|____________|
____________ :
|__(46,60)___| < - ptr2 --'
|__(50,80)___|
____________ :
|__(50,100)__| < - ptr5 --'
|__(50,120)__|
____________ :
|__(70,100)__| < - ptr6 --'
|__(84,140)__|
____________ :
|__(30,260)__| < - ptr7 --'
|__(26,400)__|
____________ :
|__(44,360)__| < - ptr8 --'
|__(50,280)__|
____________ :
|__(60,260)__| < - ptr9 --'
|____________|
Inserting into Grid files:
If there is room, insert, else (two methods)
1. add overflow block and chain it to the primary block, or
2. reorganize the structure by adding or moving grid lines
(similar to dynamic hashing)
A problem with Grid files is that the number of buckets grows
exponentially with dimension and the grid may become sparse.
Partitioned Hash Functions
==========================
Actually a sequence of hash functions, h=(h1,...hn) such that
hi produces the ith segment of bits in the hash key,
that is, h(a) is the concatenation of h1(a)h2(a)..hn(a).
- example: The data file is CUSTOMER(AGE,SAL) again
(24,60) (46,60) (50,80) (50,100) (50,120) (70,100)
(84,140) (30,260) (26,400) (44,360) (50,280) (60,260)
We use 3 bit hash values,
1st bit for age with hash function, mod2(tens_digit of age) and
last 2 bits for salary with hash fctn, mod4(hundreds_digit of sal)
The lookup table is:
mod4 of the hundreths digit of salary
//
||
vv
000 |ptr0-|- - >
001 |ptr1-|- - >
010 |ptr2-|- - >
011 |ptr3-|- - >
100 |ptr4-|- - >
101 |ptr5-|- - >
110 |ptr6-|- - >
111 |ptr7-|- - >
^
|
`-mod 2 of the tens of age
tens digit of age
.- - - - - - - - -.
Starting with (24,060)- - > mod2(2) = 0 - - - - - - - - - .
: mod4(0) = 00 concatenated = 000
` - - - - - - -' ``- - - - - - - - - -''
100ths digit of sal
.- - ptr0 (from 000)
v___________
|__(24,60)___|
|____________|
(46,60) - - > mod2(4) = 0
mod4(0) = 00, concatenated = 000
.- - ptr0 (from 000)
v___________
|__(24,60)___|
|__(46,60)___|
(50,80) - - > mod2(5) = 1
mod4(0) = 00, concatenated = 100
.- - ptr0 (from 000)
v___________
|__(24,60)___|
|__(46,60)___|
.- - ptr4 (from 100)
v___________
|__(50,80)___|
|____________|
Etc. yielding the following:
.- - ptr0 (from 000)
v___________ ____________
|__(24,60)___| - - - - > |__(26,400)__| (overflow block)
|__(46,60)___| |____________|
.- - ptr1 (from 001)
v___________
|__(84,140)__|
|____________|
.- - ptr2 (from 010)
v___________
|__(60,260)__|
|____________|
.- - ptr3 (from 011)
v___________
|__(44,360)__|
|____________|
.- - ptr4 (from 100)
v___________
|__(50,80)___|
|____________|
.- - ptr5 (from 101)
v___________ ____________
|__(50,100)__| - - - - > |__(70,100)__|
|__(50,120)__| |____________|
.- - ptr6 (from 110)
v___________ ____________
|__(30,260)__| - - - - > |__(50,280)__|
|____________| |____________|
.- - ptr7 (from 111)
v___________
|____________|
|____________|
Tree-like Structured Indexes for Multidimensional Data
------------------------------------------------------
Multiple-key indexes
--------------------
- assume there are several attributes representing "dimensions"
of the data points (data cube tuples)
- use a multilevel index e.g., suppose there are 2 attributes:
Provides a second level of Indexes on 2nd attribute
(to all tuples with same 1st attr val)
/|-- >
/ |-- >
Index on .- > < |-- >
1st attr/ \ |..
/|/ \|-- >
/ |
/ | /|-- >
/ | / |-- >
-- > < |---- > < |-- >
\ | \ |..
\ |\ \|-- >
\ | \
\| \ /|-- >
\ \ / |-- >
\ `> < |-- >
\ \ |..
\ \|-- >
\
`- > . . .
Take the multiple key attributes, age, sal again
age,sal
(24,60)
(24,260)
(24,400)
(50,80)
(50,100)
(50,120)
(50,280)
(60,100)
(60,260)
(84,140)
. - - - - - - - - -> (24,60)
/ .- - - - - - -> (24,60)
/ / .- - - -> (24,400)
___/_____/______/_
. - > |_60_|_260_|_400__|
/ .- - - - - - - - - > (50,80)
age / ____/________________
24-' . - > |_80_|_100|_120_|_280_|- > (50,280)
50---' \ ` - - - - > (50,120)
60. _______ `- - - - - - - > (50,100)
84 `- >|100|260| - - - - - - - - - - > (60,260)
\ `- - - - - - - - - - - - - > (60,100)
\ _____
`- >|_140_|- - - - - - - - - - - > (84,140)
k-dimensional-trees (kd-trees)
=============================
- interior nodes have (Attribute, Value, LowPtr, HiPtr)
- Value is a value which splits data points
- The example below will show (a, V, down, up)
with pointers going down for LowPtr and up for HiPtr.
(We'll go up on greater or equal).
- Attributes used for different levels in the index are different
and ROTATE among the dimensions (round robin).
- The leaves are blocks of records
(assume data blocks hold 2 data points,
i.e., the blocking factor, bfr, is 2).
- to search: decide along the tree until you reach a leaf
(going up on greater or equal)
- to insert: decide along the tree until you reach the proper leaf
if there is room there,
insert;
else split the block and
divide its contents according to the appropriate
attribute (next one in the rotation).
- Example: (insert into kd-tree in this order using age first then sal):
age,sal
(50,80)
(84,140)
(30,260)
(44,360)
(50,120)
(70,100)
(24,60)
(26,400)
(50,280)
(46,60)
(60,260)
(50,100)
insert the first 2 pairs (no tree yet, since just 1 leaf block):
50,80
84,140
age sal
30,260 (leaf is full so split it and divide the contents by sal=150)
30,260
sal /
{ ,150} <
\
50,80
84,140
age sal
44,360 (leaf is not full so insert)
30,260
44,360
sal /
{ ,150} <
\
50,80
84,140
age sal
50,120 (leaf is full so split, divide contents by age=55)
30,260
44,360
sal /
{ ,150} <
\ 84,140
\age /
{ 55, }<
\
50,80
50,120
age sal
70,100 (leaf is not full so insert)
30,260
44,360
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, }<
\
50,80
50,120
age sal
24,60 (leaf is full so split, divide contents by sal=75)
30,260
44,360
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, }<
\ 50,80
\ 50,120
\ sal /
{ , 75}<
\
24,60
age sal
26,400 (leaf is full so split, divide contents by age=28)
30,260
44,360
age /
{ 28, }<
/ \
/ 26,400
/
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, <
\ 50,80
\ 50,120
\ sal /
{ , 75}<
\
24,60
age sal
50,280 (full so split, divide contents by salary=300
44,360
sal /
{300, }<
age / \
{ 28, }< 30,260
/ \ 50,280
/ 26,400
/
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, <
\ 50,80
\ 50,120
\ age /
{ , 75}<
\
24,60
age sal
46,60 (leaf is not full so insert)
44,360
sal /
{ ,300}<
age / \
{ 28, < 30,260
/ \ 50,280
/ 26,400
/
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, <
\ 50,80
\ 50,120
\ age /
{ , 75}<
\
24,60
46,60
age sal
60,260 (full so split, divide contents by age=40
44,360
sal /
{ ,300}<
age / \ 30,260
{ 28, }< \age /
/ \ { 40, }<
/ 26,400 \
/ 50,280
sal / 60,260
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, }<
\ 50,80
\ 50,120
\ sal /
{ , 75}<
\
24,60
46,60
age sal
50,100 (full so split using age=50
(leaf is full again so split using sal=90)
44,360
sal /
{ ,300}< 30,260
age / \ 60,260
{ 28, }< \age /
/ \ { 40, }<
/ 26,400 \
/ 50,280
sal /
{ ,150} < 50,120
\ 84,140 50,100
\ 70,100 sal /
\ age / { , 90}<
{ 55, }< / \
\ age / 50,80
\ { 50, }<
\ sal / \
{ , 75}<
\
24,60
46,60
QUAD-TREES
==========
- Interior nodes (Inodes) correspond to rectangulars in 2-D
(more generally, they can be constructed to represent hypercubes
higher dimensional space)
- If the number of pts in the rectangle fits in a block, it's a leaf,
else the rectangle is treated as interior node with children
corresponding to its 4 quadrants.
- to insert into the quad treee index:
search to find the proper leaf;
if there is room, insert;
else split node into 4 quadrants,
divide contents appropriately.
Example: Build the Quad-tree index as it would develop,
assuming (age,sal) arrive in this order:
age,sal
(24,60)
(46,60)
(50,80)
(50,100)
(50,120)
(70,100)
(84,140)
(30,260)
(26,400)
(44,360)
(50,280)
(60,260)
insert: (24,60) (46,60)
__________________________________________________
400K | |
380K | |
360K | |
340K | |
320K | |
300K | |
280K | |
260K | |
SAL 240K | |
220K | |
200K | |
180K | |
160K | |
140K | |
120K | |
100K | |
80K | |
60K | * * |
40K | |
20K | |
`--------------------------------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0
AGE
(24,60) (Only leaf node)
(46,60)
Next, insert (50,80)
Single leaf is full, so split (e.g., at AGE=50 and SAL=200)
and divide contents by quadrants:
__________________________________________________
400K | | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | |
160K | | |
140K | | |
120K | | |
100K | | |
80K | * |
60K | * * | |
40K | | |
20K | | |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.-NW-
/
/---NE-
age,sal /
(50,200) <
\
\---SW-(24,60)
\ (46,60
\
`SE-(50,80)
Next, insert (50,100)
__________________________________________________
400K | | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | |
160K | | |
140K | | |
120K | | |
100K | * |
80K | * |
60K | * * | |
40K | | |
20K | | |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/
/---NE-
/
(50,200) <
\
\---SW-(24,60)
\ (46,60)
\
`SE-(50,80)
(50,100)
insert (50,120) (overflow, so split SE)
__________________________________________________
400K | | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | : |
160K | | : |
140K | | : |
120K | * : |
100K | *...........:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/
/
/---NE-
/
(50,200) < .NW-(50,100)
\ / (50,120)
\---SW-(24,60) /
\ (46,60) /---NE-
\ /
`SE-(75,100)<
\----SW-(50,80)
\
\
`-SE-
insert (70,100) (overflows, so split NW)
__________________________________________________
400K | | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | . : |
160K | | . : |
140K | | - - + - - : |
120K | * . : |
100K | *.........*.:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/ .-NW-
/ /
/---NE- /---NE-(70,100)
/ /
(50,200) < .NW-(62.5,150)-<
\ / \----SW(50,100)
\---SW-(24,60) / \ (50,120)
\ (46,60) /---NE- \
\ / `-SE
`SE-(75,100)<
\----SW-(50,80)
\
\
`-SE-
insert (84,140), (30,260), (26,400)
__________________________________________________
400K | * | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | * | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | . : |
160K | | . : |
140K | | - - + - - : * |
120K | * . : |
100K | *.........*.:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-(30,260)
/ (26,400) .-NW-
/ /
/---NE- /---NE-(70,100)
/ /
(50,200) < .NW-(62.5,150)-<
\ / \----SW(50,100)
\---SW-(24,60) / \ (50,120)
\ (46,60) /---NE-(84,140) \
\ / `-SE
`SE-(75,100)<
\----SW-(50,80)
\
\
`-SE-
insert (44,360) (overflows: split NW)
__________________________________________________
400K | :* | |
380K | : | |
360K | : * | |
340K | : | |
320K | : | |
300K |............:............| |
280K | : | |
260K | : * | |
SAL 240K | : | |
220K | : | |
200K |____________:____________|________________________|
180K | | . : |
160K | | . : |
140K | | - - + - - : * |
120K | * . : |
100K | *.........*.:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/
/--NE-(26,400)
/ (44,360)
/
.-NW-(25,300)-< ----SW-
/ \
/ `---SE-(30,260)
/ .-NW-
/ /
/---NE- /---NE-(70,100)
/ /
(50,200) < .NW-(62.5,150)-<
\ / \----SW(50,100)
\---SW-(24,60) / \ (50,120)
\ (46,60) /---NE-(84,140) \
\ / `-SE
`SE-(75,100)<
\----SW-(50,80)
\
\
`-SE-
insert (50,280), (60,260)
__________________________________________________
400K | :* | |
380K | : | |
360K | : * | |
340K | : | |
320K | : | |
300K |............:............| |
280K | : * |
260K | : * | * |
SAL 240K | : | |
220K | : | |
200K |____________:____________|________________________|
180K | | . : |
160K | | . : |
140K | | - - + - - : * |
120K | * . : |
100K | *.........*.:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/
/--NE-(26,400)
/ (44,360)
.-NW-(25,300)-< ----SW-
/ \
/ `---SE-(30,260)
/ .-NW-
/ /
/---NE-(50,280) /---NE-(70,100)
/ (60,260) /
(50,200) < .NW-(62.5,150)-<
\ / \----SW(50,100)
\---SW-(24,60) / \ (50,120)
\ (46,60) /---NE-(84,140) \
\ / `-SE
`SE-(75,100)<
\----SW-(50,80)
\
`-SE-
There is, of course, a similarity between QUAD Trees and P-trees.
The difference is:
Quad-Trees are search trees and P-trees are the actual data.
R-TREES (Region tree)
======================
- inodes of an R-tree correspond to interior regions,
(which can be overlapping)
(usually regions are rectangles, tho, not necessarily)
- R-tree regions have subregions that represent the
contents of their children
- And the subregions need not cover the region they
subdivide (but all data must be within a subregion)
Example, Consider the spatial image:
100 __________________________________________________
| |
| |
| |
| |
| .-----------. |
| | | |
| |school | |
| |___________| |
| |
| |
|-------------------------. |
| road1 | .-------. |
|-------------------------| |house2 | |
| |r | |_______| |
| .------.________|o_|________________________|
| |house1|________|a_|________pipeline________|
| |______| |d | |
| |2 | |
| | | |
| | | |
0 `--------------------------------------------------'
0 100
Assume a leaf can hold 6 regions (bfr=6)
and that the 6 regions or objects above are together on 1 leaf block,
whose region is shown as the outer dotted rectangle - .
\
|
100__________________________________________________ /
| | /
| | /
..................................................../
: :
: .-----------. :
: | | :
: |school | :
: |___________| :
: :
: :
:-------------------------. :
: road1 | .-------. :
:----------------------+--| |house2 | :
: |r | |_______| :
: .--------.________|o_|________________________:
: | house1|________|a_|________pipeline________:
: | | |d | :
: |________| |2 | :
: | | :
: | | :
:..................................................:
0 100
Therefore the R-tree has a root and one leaf as follows:
.---------------------.
| ( (0,0), (100,90) ) | (Outer dotted region)
|_____________________|
:
v (full leaf with six objects)
.-----------------------------------------------------.
| road1 | road2 | house1 | school | house2 | pipeline |
+-------+-------+--------+--------+--------+----------+
Now suppose a local cellular phone company adds a POP
(pt of presence or antenna) as below:
100__________________________________________________
| |
| |
....................................................
: :
: .-----------. :
: | | .---. :
: |school | |pop| :
: |___________| `---' :
: :
: :
:-------------------------. :
: road1 | .-------. :
:----------------------+--| |house2 | :
: |r | |_______| :
: .--------.________|o_|________________________:
: | house1|________|a_|________pipeline________:
: | | |d | :
: |________| |2 | :
: | | :
: | | :
0 :--------------------------------------------------:
0 100
Since the 7 objects do not fit in one leaf, we split leaf,
putting 4 in one new leaf and 3 in the other (attempting to
minimize the overlap and to split evenly?).
Redefine enclosing regions:
.---------------------.
| ( (0,0), (100,90) ) | (Outer dotted region)
|_____________________|
:
v
.-----------------------.--------------------------.
| ( (0,0), (60,50) ) | ( (20,20), (100,80) ) |
|_______________________|__________________________|
: :
: :
v v
.------------------.---. .------------------------------.
|road1|road2|house1| | |school|house2| pipeline | pop |
+-----+-----+------+---+ +------+------+----------+-----+
100 __________________________________________________
| |
| |
..|..................................................|..
: | ...........................................|.:
: | : .-----------. |::
: | : | | .---. |::
: | : |school | |pop| |::
: | : |___________| `---' |::
: | : |::
:.|......:.................... |::
::|------:------------------.: |::
::| :road1 |: .-------. |::
::|------:---------------+--|: |house2 | |::
::| : | |: |_______| |::
::| .-:------.________|__|:_______________________|::
::| | :house1|________|__|:_______pipeline________|::
::| | :......|........|..|:.......................|::
::| |________| |ro|: | :
::| |ad|: | :
::| |2 |: | :
0 ::`--------------------------:-----------------------' :
::...........................: :
:......................................................:
0 100
Suppose we insert another house (house3) below house2.
100 __________________________________________________
| |
| |
| ...........................................|.
| : .-----------. |:
| : | | .---. |:
| : |school | |pop| |:
| : |___________| `---' |:
| : |:
.|......:.................... |:
:|------:------------------.: |:
:| :road1 |: .-------. |:
:|------:---------------+--|: |house2 | |:
:| : | |: |_______| |:
:| .-:------.________|__|:_______________________|:
:| | :house1|________|__|:_______pipeline________|:
:| | :......|........|..|:.......................|:
:| |________| |ro|: .-------. |
:| |ad|: |house3 | |
:| |2 |: |_______| |
0 :`--------------------------:-----------------------'
:...........................:
0 100
Since house3 is not in a either of the regions,
we must decide to expand one of the regions.
If we pick the first, we add 1000 square units and if we pick
the 2nd we add 1200 square units, thus we pick the first:
100 __________________________________________________
| |
| |
| ...........................................|.
| : .-----------. |:
| : | | .---. |:
| : |school | |pop| |:
| : |___________| `---' |:
| : |:
.|......:................................. |:
:|------:------------------. : |:
:| :road1 | .-------.: |:
:|------:---------------+--| |house2 |: |:
:| : | | |_______|: |:
:| .-:------.________|__|_____________:__________|:
:| | :house1|________|__|________pipeline________|:
:| | :......|........|..|.............:..........|:
:| |________| |ro| .-------.: |
:| |ad| |house3 |: |
:| |2 | |_______|: |
0 :`---------------------------------------:----------'
:........................................:
0 100
.---------------------.
| ( (0,0), (100,90) ) | (Outer dotted region)
|_____________________|
:
v
.-----------------------.--------------------------.
| ( (0,0), (80,50) ) | ( (20,20), (100,80) ) |
|_______________________|__________________________|
: :
: :
v v
.------------------.------.------. .--------------------------.
|road1|road2|house1|house2|house3| |school|house2|pipeline|pop|
+-----+-----+------+------+------+ +------+------+--------+---+
We note that house2 is in both subregions.
A brief look at an additional index still in use:
BINARY RADIX TREE INDEX
=======================
(aka, trie) (e.g., used in IBM AS/400s)
Similar to B-tree, except
- only the common parts of key values are embedded in inodes
- a single bit is used to make the navigation direction decision at each level
(0 for up and 1 for down). (zero-based bit positions are used)
Example: (in this example, the tree structure is being
built left-to-right)
Starting with an empty structure,
INSERT [JAY,LA,25,STAR], (assigned RRN=1)
nam_INDEX CUSTOMER FILE
name RRN RRN nam loc age job
JAY 1 1 JAY LA 25 STAR
INSERT [JON,LA,45,HOOD], assign RRN=2
1st letters are same (J), so that common part is embedded in root
2nd letters are A and O, bit-3 is first difference (makes the decision)
0123 4567 < - bit positions
EBCDIC for A=1100 0001
and O=1101 0110
so we go up for A (bit-3 = 0) and down for O (bit-3=1)
CUSTOMER
RRN nam loc age job
name 1 JAY LA 25 STAR
part RRN 2 JON LA 45 HOOD
b3 AY 1
J <
ON 2
INSERT [JAN,RO,93,DOCT], assign RRN=3
(EBCDIC for N=1101 0101
and Y=1110 0000)
INDEX CUSTOMER
b2 N 3 RNN nam loc age job
b3 A < 1 JAY LA 25 STAR
J < Y 1 2 JON LA 45 HOOD
ON 2 3 JAN RO 93 DOCT
INSERT [SUE,RO,16,PRGR], assign RRN=4
(EBCDIC for J=1101 0001
and S=1110 0010)
INDEX CUSTOMER
b2 N 3 RNN nam loc age job
b3 A < 1 JAY LA 25 STAR
J < Y 1 2 JON LA 45 HOOD
b2 ON 2 3 JAN RO 93 DOCT
< 4 SUE RO 16 PRGR
SUE 4
*************************
*************************
*************************
*************************
*************************
*************************
*************************
*************************
APPENDIX
Multidimensional queries in SQL
===============================
Nearest neighbor queries:
Represent points as a relation; Points(x,y,A1,...)
- x,y are the spatial "structure" attributes (key attributes).
- The other attributes, Ai, represent properties or features
of the points.
- Find the nearest point to (10,20), for example?
SELECT *
FROM POINTS p
WHERE NOT EXISTS(
SELECT *
FROM POINTS q
WHERE (q.x-10)^2 + (q.y-20)^2 < p.x-10)^2 + (p.y-20)^2
)
"Select points, p s.t. there does not exist a point q closer to (10,20)"
Represent rectangles by the relation:
Rectangles( id, (xll,yll), (xur,yur) )
- Find rectangles enclosing (10,20), for example?
- SELECT id
FROM Rectangles
WHERE xll<=10 AND yll<=20 AND xur>=10 AND yur>=20;
******************************************************
******************************************************
IBM DB-2 B+-tree index structure:
The file containing index data includes a Space Map Page (SMP)
at regular intervals.
Each SMP contains allocation/deallocation status of a certain
number of the following pages in the file. An index page is
in the allocated state if it is part of an index.
All leaf pages of an index contain key_value-RID pairs
(referred to as keys)
The leaf pages are forward and backward chained using.
Every nonleaf page contains a certain number of child page ptrs
(page#s) and 1 less highkey
Each highkey is associated with one child page ptr
(last page ptr has no highkey associate).
A highkey for a given child page is > the highest key in the
child page. (right on equal).
Keys are stored in the leaves and nonleaves are varying lengths
One of the indexes of a table may be designated CLUSTERING INDEX
In a UNIQUE index key-values appear only once.
In a NON-UNIQUE index key-values may appear many times
(keyvalues stored only once on a leaf, followed by RID list).
(Note F <= X < L for every KeyValue, X, in the middle subtree)
0______
|_______| 3 TreeLevel
/F |L \
__/ | \____
1______/ 2__|___ \ 3________
|_______| |_______| |_________| 2
|C | | |P \
| \ \ | \
| | | | \
4___v 5_v___ 6_v__ 7_v__ 8_v__
| |- >| |- >| |- >| |- >| | 1
|_____|< -|_____|< -|_____|< - |_____|< -|_____|
(# HighKeys in each
leaf = #children-1)
Leaf Page of a Nonunique Index
(i.e., leaf page 6 in the above index)
.-----------------------------------------------.
| SM_bit | Delete_bit | PreviousPage | NextPage |
|-----------------------------------------------|
|lf | cluster of duplicates |
|---' .---.---.---. .---.---. |
|VN | | H | 4 | 8 | | G | 5 | |
|---' `---+---+---' `---+---' |
| ^ ^ |
| .---.---. \ .---.---. \ |
| | K | 3 | \ | F | 7 |< ---\--. |
| `---+---' \ `---+---' \ \ |
| ^ \_____________ \ \ |
| \____________________--.\-.-\.-\|
| | | | | |KeyMap
`----------------------------------------------- (ptrs to
entries in KeyValue sequence)
Non-Leaf Root page, 0, above
.-----------------------------------------------.
|lf |latch flag | 3 | RightMost
|---' ---| child ptr
|VN | .---.---. .---.---. |
|---' | L | 2 |< -. | F | 1 |< ------ . |
| `---+---' \ `---+---' \ |
| \___________________--.-\|
| | | |KeyMap
`-----------------------------------------------' (ptrs to
pairs in HighKey sequence)
Non-Leaf page, 3, above
.-----------------------------------------------.
|lf |latch flag | 8 | RightMost
|---' `---| child ptr
|VN |Version Number .---.---. |
|---' | P | 7 |< ------ . |
| `---+---' \ |
| .-\|
| | |KeyMap
`-----------------------------------------------' (ptrs to
pairs in HighKey sequence)
Structure Modification Operators - SMOs
(page splits and page deletions)
- SMOs are performed by the same transaction that
encountered a need for it.
- A page is removed from tree at the time the only key
in page is deleted.
- When page is split by trans, other trans are not
prevented from reading it or even modifying it before
the trans which performed the split commits.
- SMOs are propagated in the tree bottom-up without a
"safe" node.
- Latches at lowest level must be released before acquiring
any at higher levels (prevents latch deadlock).
- Once an index is created, its root does not change
(i.e., when a root split is needed, its contents are
copied to a new page which becomes the root's only
child and which is then split)
- During the split of a leaf, split to the right
(i.e., the higher KeyValues are put on the new page it
inherits the old HighKey and the HighKey of the original
page is set to the smallest keyvalue moved).
Space Map Pages
- SMPs exist in the index file at regular intervals.
- Each SMP contains the allocation/deallocation status of
the following pages in the file.
Some numbers:
CACHE: (System cache - on integrated circuits or chips)
- level-1: (on board) on processor chip
- for holding instructions, parameters, etc.
- typically 64KB
- level-2: (SRAM - static RAM)
- need not be refreshed constantly
- lost when power is off however
- typically 512KB or larger.
- One sees references to "level 3" cache as well. Definitions
aren't completely consistent but suffice it to say that as
the level goes down, the access speed typically goes down
(and the capacity goes up).
MAIN MEMORY:
Typically from 32 MB to ?? these days.
(how much RAM do you have in your personal workstation,
at home? at work?)
Random access:
any byte can be obtained in about same amount of time.
largeness terminology review:
name: deka hecto kilo mega giga tera peta exa
symbol: da h k or K M G T P E
power of 10: 1 2 3 6 9 12 15 18
smallness terminology review:
name: deci centi milli micro nano pico femto atto
symbol: d c m mu n p f a
power of 10: -1 -2 -3 -6 -9 -12 -15 -18
Most main memory is composed of DRAM - Dynamic RAM
needs to be refreshed and recharged hundred_times/sec
- working memory of most computers
- design has remained unchanged but capacities/densities change
- over last 10 years chip capacities are increasing:
256Kb - 1 Mb - 4 Mb - 16 Mb - 64Mb - 256 Mb...
- MM range from 64 MB - 128 MB - 512MB - 1 GB - 10 GB ...
- speed of DRAM has not increased
(moving data between processor and L2 ~ 10 nanosec.
(moving data between L2 cache and MM ~100 nanosec.)
resulting in performance gap between MM and processors.
- As a result, designers add L2 cache
- And new "Fast Page Mode" memories have been developed
which allow sequential reading of entire page once initial
address has been specified (no waiting for external memory
controller to supply next address within the page)
See, eg, following types (EDO,BEDO, SDRAM, SLDRAM, RDRAM):
- Extended Data Out (EDO) and "Burst EDO" (BEDO)
- Synchronous DRAM (SDRAM) (popular)
- SyncLink DRAM (SLDRAM)
- Rambus DRAM (RDRAM)
(transfers data over "Rambus Channel")
(note: there might be a paper topic for someone with expertise
in this area, namely "Fast Page Mode Memories Now and in the
Future and How, Specifically, They Will Affect DBMSs")
VIRTUAL MEMORY:
program data occupies the "virtual memory address space"
(typically 2^32 addresses)
actual RAM memory capacity is typically much smaller in size
than the virtual address space.
most of the content of virtual memory is actually stored on disk
(in blocks of 4-~56KB)
virtual memory contents are moved between disk and
actual MM in blocks (aka "pages")
In fact, main memory databases manage their data entirely thru
the virtual address mechanism, letting the operating system
manage the bringing of needed data into main memory.
with 64-bit virtual address spaces, 2^64 addresses
- = 16 exawords of virtual address space!!
- To convert, remember that
2^10 ~= 10^3 so divide the power of 2 by ~3.33
2NDARY STORAGE (or online):
usually magnetic disk
files are moved between disk and MM in blocks (4K - 56K or larger)
in a file system, when a file is opened for reading,
a buffer block is reserved for it and
it is filled with the first block from the file,
then the next block etc.
a database system will manage usually disk blocks itself, rather than
relying on the OS file sytem to move blocks in and out.
In either case, it takes 10-30 milliseconds
to transfer a block between disk and memory.
By comparison, one million instructions can be performed on the
contents of a block in that time.
Therefore I/O dominates the cost of database activity.
Individual hard disk sizes (on desktop and deskside computers)
range from 5-50 GB these days.
Bottom line: secondary storage is
~100,000 times slower than MM to access,
~100 times large than MM,
cost ~$.05 per MB, whereas, MM cost ~$1 per
MB (or less!)
Typical measures associated with disks these days are:
- Rotation speed = 5400 RPM (1 rot every 11 milsec)
- Number of platters/unit = 5
- Num of surfaces/platters = 2
- Number of tracks/surface = ~10,000
- Number of bytes/track = ~100,000
- Number of sectors/track = 12 - 500
- Number of bytes/sector = 512 - 4096
( See page 118 or EN text for a picture)
Several techniques are used to speed up disk access:
divide data among several disks (striping, RAID)
mirroring disks
clustering (organizing data that will be accessed
together by track/cylinder
prefetching (read/write entire tracks)
double buffering (filling one buffer while using another)
elevator algorithm (queue request by cylinder;
head stops at each cylinder,
fills all requests from that cylinder,
moves to next cylinder, etc.
- Not First-Come-First-Serve)
TERTIARY STORAGE (or nearline and offline):
to serve very large storage needs (e.g., satellite imagery)
tertiary storage is used
characterized by terabytes/petabytes capacities and r/w speeds
~10 times slower than disk.
offline storage is tape storage
(offline, on a shelf, must be mounted by a human for access)
nearline:
- Optical-disk Juke boxes; rack of CD-ROMS mounted/dismounted
by a robotic arm, Tape-cartridge "silos".
- Tape silo:
large size devise that holds racks of cartridge tapes,
robotic arm mountable.
Bottom line: teriary storage is ~1000 times slower than disk,
Bottom line: teriary storage is ~1000 times slower than disk,
~1000 times more capacity
volatile storage "forgets" contents when powered off
nonvolatile storage "keeps" its contents intact for periods
of time after power off.
MM is generally volatile and disk is nonvolatile.
There is nonvolatile MM, e.g.,:
flash memory (Electrically Erasable Programmable ROM - EEPROM)
on cards, 1-32 KB,
RAM disk (battery backup for main power supply)