INDEXES
Indices are auxiliary files which provide a vertical view of the data
(one attribute per index).
INDEX TYPES
LINEAR INDEX STRUCTURE (simplest type)
A lookup file, with a "value"-column and a "pointer"-column.
That is clustered (ordered) on the value-column for fast access to
particular values (using binary search of the index file).
Pointer can be RID or RRN (or just page-# in some cases)
__pg1_______
|RRN | S# |ST| STATE-INDEX
|====|====|==| .__._____.
| 1 | 25 |NJ|<._ PTR|STATE|
|----|----|--| \ |==|=====|
| 2 | 32 |NJ|<-. \ .--|3_|_FL__|
|----|----|--| `-\-/---|2_|_NJ__|
| 3 | 38 |FL|<-----'----|1_|_NJ__|
|----|----|--| __-|4_|_NY__|
| 4 | 17 |NY|<------' _-|5_|_NY__|
`------------' '
_pg2________ |
| 5 | 57 |NY|<------'
|----|----|--|
PRIMARY INDEX: I(k,p)
k = ordered or clustered "key" field values from ordered
or clustered field of file with uniqueness property
Uniqueness property: individual value occurrences are "unique"
i.e., each value can occur at most once.
p = pointer to page containing record(s) with value, k
Primary indexes can be either:
DENSE: (every record is indexed) or
NON-DENSE: only key-values of records at the beginning
of a page are indexed (anchor record of page).
(and then the pointer is page-# only)
Example:
-------
Assume the blocking factor (bfr) is 2
which means 2 records/page.
_STUDENT________
|S#|SNAME |LCODE |
|==|------|------|pg
|17|BAID |NY2091|1
|25|CLAY |NJ5101|1
|----------------|
|32|THAISZ|NJ5102|2
|38|GOOD |FL6321|2
|----------------|
|57|BROWN |NY2092|3
|83|THOM |ND3450|3
Non-dense Primary Index on S#
|S#|pg|
|17| 1|
|32| 2|
|57| 3|
_STUDENT________
|S#|SNAME |LCODE |
|==|------|------|pg
|17|BAID |NY2091|1
|25|CLAY |NJ5101|1
|----------------|
|32|THAISZ|NJ5102|2
|38|GOOD |FL6321|2
|----------------|
|57|BROWN |NY2092|3
|83|THOM |ND3450|3
Dense Primary Index on S#
|S#|pg|offset
|17| 1| 0|
|25| 1| 1|
|32| 2| 0|
|38| 2| 1|
|57| 3| 0|
|83| 3| 1|
`---v-'
RID
Inserting and deleting are major problems.
- must move records to maintain ordering
- anchors change (in non-dense case)
USING OVERFLOW PAGES and UNPACKED RECORD-SLOT FORMAT
(1 level of indirection)
Overflow page
.---------------------. .----------------------.
.- - - - - -> |P#3| *|->| record with RID (3,6)|
. |___|_________________| |______________________|
. 0|record with RID (3,3)| | |
____RID___. |_____________________| |______________________|
pg#|offset| 1|record with RID (3,0)| | |
_3_|__6___| |_____________________| . . .
2|record with RID (3,4)|
|_____________________|
3|record with RID (3,2)|
|_____________________|
4|record with RID (3,1)|
|_____________________|
5|record with RID (3,5)|
|_____________________|
| | |*|5|2|0|3|4|1|< - unpacked record-slot-ptr table
`---------------------'
. . . 7 6 5 4 3 2 1 0
CLUSTERING INDEX:
(like a primary index except that the
attribute nee not be a key)
- the file must be clustered on the attribute, k
- the pointer for any k is the address of 1st page with that k-value
ENROLL2
.-----------.
|S#|C#|GRADE|
|==|==|-----|pg
|17|6 | 96 |1
|25|6 | 76 |1
|-----------|
|32|6 | 62 |2
|38|6 | 98 |2
|-----------|
|32|6 | 91 |3
|25|7 | 68 |3
|-----------|
|32|8 | 89 |4
|17|9 | 95 |4
`-----------'
|C#|pg| Dense Clustering_Index on C#
|6 | 1|
|7 | 3|
|8 | 4|
|9 | 4|
or
ENROLL2
.-----------.
|S#|C#|GRADE|
|==|==|-----|pg
|17|6 | 96 |1
|25|6 | 76 |1
|-----------|
|32|6 | 62 |2
|38|6 | 98 |2
|-----------|
|32|6 | 91 |3
|25|7 | 68 |3
|-----------|
|32|8 | 89 |4
|17|9 | 95 |4
`-----------'
|C#|pg| Non-dense Clustering_Index on C#
|6 | 1| (indexing new anchor records only)
|8 | 4|
There's no more search overhead with this 2nd type of clustering index.
- How can you know which page has C#=7?
- How can you know which page has C#=9?
In general, there are insert/delete problems because of clustering.
Solution? Reserve 1 page for each value and use an overflow ptr
___________
|S#|C#|GRADE| ENROLL (clustered on C#)
|==|==|-----|pg
|17|5 | 96 |1
|25|5 | 76 |1
|-----------|
|32|6 | 62 |2
|38|6 | 98 |2
|-----------|
|32|7 | 91 |3
|25|7 | 68 |3
|-----------|
|32|8 | 89 |4
|17|8 | 95 9|4
`----------\'
\ overflow pointer
:
.-----------.v
|25|8 | 86 |9 (overflow page for value, C#=8)
| |
`-----------'
Clustering Index on C#
|C#|pg|
|5 | 1|
|6 | 2|
|7 | 3|
|8 | 4|
SECONDARY INDEX:
These indexes are the same except,
- the file is need not be clustered on k
- p points to the page or record containing k
- every record must be indexed (dense)
Option1: If there are multiple occurences of k,
use multiple index entries for that k.
|S#|C#|GRADE| ENROLL (unclustered C#)
|32|8 | 89 |1
|25|6 | 76 |1
|32|6 | 62 |2
|25|8 | 86 |2
|38|6 | 98 |3
|32|7 | 91 |3
|17|5 | 96 |4
|25|7 | 68 |4
|17|8 | 95 |5
|C#|page| Secondary_Index, opt1 on C#
|5 | 4 |
|6 | 1 |
|6 | 2 |
|6 | 3 |
|7 | 3 |
|7 | 4 |
|8 | 1 |
|8 | 2 |
|8 | 4 |
Option2: Use repeating groups of pointers
(requires variable length pointer(s) field
|S#|C#|GRADE| ENROLL (unclustered C#)
|32|8 | 89 |1
|25|6 | 76 |1
|32|6 | 62 |2
|25|8 | 86 |2
|38|6 | 98 |3
|32|7 | 91 |3
|17|5 | 96 |4
|25|7 | 68 |4
|17|8 | 95 |5
|C#|page Secondary_Index, opt2 on C#
|5 | 4
|6 | 1,2,3
|7 | 3,4
|8 | 1,2,4
Option3: Use 1 index entry for each value, 1 pointer to "list"
or "linked list" of record pointers. (1 level of indirection)
|S#|C#|GRADE| ENROLL (unclustered C#)
|32|8 | 89 |1
|25|6 | 76 |1
|32|6 | 62 |2
|25|8 | 86 |2
|38|6 | 98 |3
|32|7 | 91 |3
|17|5 | 96 |4
|25|7 | 68 |4
|17|8 | 95 |5
|C#|page Secondary_Index, opt3 on C#
|5 | -->|4|
|6 | -->|1|->|2|->|3|
|7 | -->|3|->|4|
|8 | -->|1|->|2|->|4|
INVERTED FILE = A file in which there is an index on every field
(All are secondary indexes except the CITY index.
the CITY index is a clustering index)
._________.________.______._________.
|GATER 1,1|KNOB 1,2|NY 2,2|OUTBK 2,1| CITY
`---------`--------`------`---------'
___________________
S# |S#|SNAME | CITY |ST|
====== |==|======|======|==|pg
|17 2| |38|GOOD | GATER|FL|1 STATE
|----| |--|------|------|--| =======
|25 2| |32|THAISZ| KNOB |NJ|1 |FL 1 |
|----| |--|------|------|--| |-----|
|32 1| |25|CLAY | OUTBK|NJ|2 |NJ 1 |
|----| |--|------|------|--| |-----|
|38 1| |17|BAID | NY |NY|2 |NJ 2 |
`----' `------------------- |-----|
|NY 2 |
`-----'
.______.______.______.________.
SNAME|BAID 2|CLAY 2|GOOD 1|THAISZ 1|
`------`------`------`--------'
MULTILEVEL INDEX:
For any index, since it is a file clustered on k,
it can have a primary or clustering index on it.
(constituting the second level of the multilevel index).
_ STUDENT_______
|S#|SNAME |LCODE |pg
|==|======|======|
|17|BAID |NY2091|1
|25|CLAY |NJ5101|1
|--|------|------|
|32|THAISZ|NJ5102|2
|38|GOOD |FL6321|2
|--|------|------|
|57|BROWN |NY2092|3
|83|THOM |ND3450|3
|--|------|------|
|91|PARK |MN7334|4
|94|SIVA |OR1123|4
`--`------`------'
|S#|pg|pg(of index file) S#-index (nondense, primary)
|--|--|
|17| 1|1
|32| 2|1
|--|--|
|57| 3|2
|91| 4|2
`--`--'
2nd_LEV (a second level, nondense index)
|S#|pg|
|--|--|
|17| 1|
|57| 2|
`--`--'
Multilevel index forms a "tree"
- nodes at the last level are called leaf nodes
- nodes at the other levels are called internal nodes or inodes
- Inserts and deletes can be troublesome.
Indexes powerpoint slides covering also ISAM and B+tree indexes
The following is a text based treatment of B+ Tree Indexes.
B+-trees were covered in powerpoint slides, but this additional treatment
may be useful as reference material and as practice with the technique.
B+ TREE INDEX
(record pointers are in leaf nodes only)
Leaf level (is called the SEQUENCE SET) and is a
linked list of blocks containing all key values in order
(for fast sequential access), together with their record ptr(s)
Inodes (is called the INDEX SET), for fast direct access, is a
tree index into the sequence set.
- never becomes unbalanced
- limits wasted space (keeps nodes at least 1/2 full)
- sequential scans (in order) are fast
The structure of B+ index of order p:
1.Each index set inode is of the form
[P0, K1, P1, K2, P2, . . . Kq, Pq ] q <= p
| | | |
subtree subtree subtree subtree
containing containing containing containing
all values all vaules all values all values
lessthan K1 at least K1 & at least K2 and at least Kq
less than K2 less than K3
All Ki values appear again at leaf level (in order)
together with their record pointer(s).
The structure of a sequence set node:
[ K1,K1ptr(s), K2,K2ptr(s), ... Kq-1,Kq-1ptr(s), Pnext ]
Pnext points to next sequence set node of the B+ tree.
B) Insertion into B+ tree
Insert into the sequence set node first.
If sequence set node is not full, just insert.
If sequence set node is full,
the node overflows and must be split.
First j=ceil(p/2) entries in original node are kept there and
remaining entries moved to a new leaf node.
jth value (middle) is replicated to the parent internal node.
recursively, If parent inode is full, it is split, etc.
(Note, the reason the tree stays balanced is that the only way
a new level gets introduced is via a new root, which is part
of every path).
C) Deletion of B+ tree
When an entry is deleted, it is removed from the leaf level.
If it happens to occur in an inode, it is also removed from
there.
Deletion may cause the number of entries in the leaf node
to fall below the minimum required.
In this case, find a sibling leaf node that is more than
half full and redistribute entries among node and its
sibling such that both are at least half full; otherwise
the node is merged with one of its siblings.
This can propagate and reduce tree levels.
The tree always stays balanced.
There are many variations on the basic B+-tree structure,
owing to its popularity and excellent service (providing
both fast sequential file access and fast direct file access)
- One such variation is to defer key value deletes from the
index set until its sequence set node is merged with an
adjacent sequence set node (as described above).
- Another is the page-oriented method used by IBM,
as briefly described below:
MULTIDIMENSIONAL INDEXES
Multidimensional data (datawarehouses and data cubes) were covered in the
previous set of notes.
Multidimensional data almost always requires multidimensional indexing
One dimensional indexes assume a single search column or key
(which can be a composite column or key)
Data structures that support queries into multidimensional data
specifically, fall in two categories:
1. Hash-table-like (e.g., Grid files and partitioned hash fctns)
2. Tree-like
(eg, multi-key indexes, kd-trees, quad-trees (for sets of points);
R-trees (for sets of regions as well as sets of points) ),
Predicate-trees (P-trees) for vertical compressed, representations of data
Hash-like Structures for Multidimensional Data
Grid Files
- partition the POINTS space into a grid
- in each dim "grid lines" partition space into stripes
- points that fall right on a grid line belong to the stripe above it
(i.e., grid-lines are the lower boundaries)
- example: 12 customer(age,sal) data points (i.e., records or tuples)
age,sal
(24,60)
(46,60)
(50,80)
(50,100)
(50,120)
(70,100)
(84,140)
(30,260)
(26,400)
(44,360)
(50,280)
(60,260)
if grid lines at age=40 and age=65
SAL=90K SAL=224K
4 5
0 6
. ---- ---- ---- ---- ---- ---- ---- ---- ---- ----.
400K | * : : |
380K | : : |
360K | : * : |
340K | : : |
320K | : : |
300K | : : |
280K | : * : |
260K | * : : * |
S 240K | : : |
A 220K | ...................:.......:.....................|224K
L 200K | : : |
180K | : : |
160K | : : |
140K | : : * |
120K | : * : |
100K | ...................:....*..:......*..............| 90K
80K | : * : |
60K | * : * : |
40K | : : |
20K | : : |
` ---- ---- ---- ---- ---- ---- ---- ---- ---- ----'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0
AGE
Grid hash function is defined using a table lookup:
Age: 0-39|40-55| 56+
Sal _____|_____|_____
0-90K |_ptr1|_ptr2|_____|
90-224K|_____|_ptr5|_ptr6|
224K+ |_ptr7|_ptr8|_ptr9|
____________ :
|__(24,60)___| < - ptr1 --'
|____________|
____________ :
|__(46,60)___| < - ptr2 --'
|__(50,80)___|
____________ :
|__(50,100)__| < - ptr5 --'
|__(50,120)__|
____________ :
|__(70,100)__| < - ptr6 --'
|__(84,140)__|
____________ :
|__(30,260)__| < - ptr7 --'
|__(26,400)__|
____________ :
|__(44,360)__| < - ptr8 --'
|__(50,280)__|
____________ :
|__(60,260)__| < - ptr9 --'
|____________|
Inserting into Grid files:
If there is room, insert, else (two methods)
1. add overflow block and chain it to the primary block, or
2. reorganize the structure by adding or moving grid lines
(similar to dynamic hashing)
A problem with Grid files is that the number of buckets grows
exponentially with dimension and the grid may become sparse.
Partitioned Hash Functions
Actually a sequence of hash functions, h=(h1,...hn) such that
hi produces the ith segment of bits in the hash key,
that is, h(a) is the concatenation of h1(a)h2(a)..hn(a).
- example: The data file is CUSTOMER(AGE,SAL) again
(24,60) (46,60) (50,80) (50,100) (50,120) (70,100)
(84,140) (30,260) (26,400) (44,360) (50,280) (60,260)
We use 3 bit hash values,
1st bit for age with hash function, mod2(tens_digit of age) and
last 2 bits for salary with hash fctn, mod4(hundreds_digit of sal)
The lookup table is:
mod4 of the hundreths digit of salary
//
||
vv
000 |ptr0-|- - >
001 |ptr1-|- - >
010 |ptr2-|- - >
011 |ptr3-|- - >
100 |ptr4-|- - >
101 |ptr5-|- - >
110 |ptr6-|- - >
111 |ptr7-|- - >
^
|
`-mod 2 of the tens of age
tens digit of age
.- - - - - - - - -.
Starting with (24,060)- - > mod2(2) = 0 - - - - - - - - - .
: mod4(0) = 00 concatenated = 000
` - - - - - - -' ``- - - - - - - - - -''
100ths digit of sal
.- - ptr0 (from 000)
v___________
|__(24,60)___|
|____________|
(46,60) - - > mod2(4) = 0
mod4(0) = 00, concatenated = 000
.- - ptr0 (from 000)
v___________
|__(24,60)___|
|__(46,60)___|
(50,80) - - > mod2(5) = 1
mod4(0) = 00, concatenated = 100
.- - ptr0 (from 000)
v___________
|__(24,60)___|
|__(46,60)___|
.- - ptr4 (from 100)
v___________
|__(50,80)___|
|____________|
Etc. yielding the following:
.- - ptr0 (from 000)
v___________ ____________
|__(24,60)___| - - - - > |__(26,400)__| (overflow block)
|__(46,60)___| |____________|
.- - ptr1 (from 001)
v___________
|__(84,140)__|
|____________|
.- - ptr2 (from 010)
v___________
|__(60,260)__|
|____________|
.- - ptr3 (from 011)
v___________
|__(44,360)__|
|____________|
.- - ptr4 (from 100)
v___________
|__(50,80)___|
|____________|
.- - ptr5 (from 101)
v___________ ____________
|__(50,100)__| - - - - > |__(70,100)__|
|__(50,120)__| |____________|
.- - ptr6 (from 110)
v___________ ____________
|__(30,260)__| - - - - > |__(50,280)__|
|____________| |____________|
.- - ptr7 (from 111)
v___________
|____________|
|____________|
Tree-like Structured Indexes for Multidimensional Data
Multiple-key indexes
- assume there are several attributes representing "dimensions"
of the data points (data cube tuples)
- use a multilevel index e.g., suppose there are 2 attributes:
Provides a second level of Indexes on 2nd attribute
(to all tuples with same 1st attr val)
/|-- >
/ |-- >
Index on .- > < |-- >
1st attr/ \ |..
/|/ \|-- >
/ |
/ | /|-- >
/ | / |-- >
-- > < |---- > < |-- >
\ | \ |..
\ |\ \|-- >
\ | \
\| \ /|-- >
\ \ / |-- >
\ `> < |-- >
\ \ |..
\ \|-- >
\
`- > . . .
Take the multiple key attributes, age, sal again
age,sal
(24,60)
(24,260)
(24,400)
(50,80)
(50,100)
(50,120)
(50,280)
(60,100)
(60,260)
(84,140)
. - - - - - - - - -> (24,60)
/ .- - - - - - -> (24,60)
/ / .- - - -> (24,400)
___/_____/______/_
. - > |_60_|_260_|_400__|
/ .- - - - - - - - - > (50,80)
age / ____/________________
24-' . - > |_80_|_100|_120_|_280_|- > (50,280)
50---' \ ` - - - - > (50,120)
60. _______ `- - - - - - - > (50,100)
84 `- >|100|260| - - - - - - - - - - > (60,260)
\ `- - - - - - - - - - - - - > (60,100)
\ _____
`- >|_140_|- - - - - - - - - - - > (84,140)
k-dimensional-trees (kd-trees)
- interior nodes have (Attribute, Value, LowPtr, HiPtr)
- Value is a value which splits data points
- The example below will show (a, V, down, up)
with pointers going down for LowPtr and up for HiPtr.
(We'll go up on greater or equal).
- Attributes used for different levels in the index are different
and ROTATE among the dimensions (round robin).
- The leaves are blocks of records
(assume data blocks hold 2 data points,
i.e., the blocking factor, bfr, is 2).
- to search: decide along the tree until you reach a leaf
(going up on greater or equal)
- to insert: decide along the tree until you reach the proper leaf
if there is room there,
insert;
else split the block and
divide its contents according to the appropriate
attribute (next one in the rotation).
- Example: (insert into kd-tree in this order using age first then sal):
age,sal
(50,80)
(84,140)
(30,260)
(44,360)
(50,120)
(70,100)
(24,60)
(26,400)
(50,280)
(46,60)
(60,260)
(50,100)
insert the first 2 pairs (no tree yet, since just 1 leaf block):
50,80
84,140
age sal
30,260 (leaf is full so split it and divide the contents by sal=150)
30,260
sal /
{ ,150} <
\
50,80
84,140
age sal
44,360 (leaf is not full so insert)
30,260
44,360
sal /
{ ,150} <
\
50,80
84,140
age sal
50,120 (leaf is full so split, divide contents by age=55)
30,260
44,360
sal /
{ ,150} <
\ 84,140
\age /
{ 55, }<
\
50,80
50,120
age sal
70,100 (leaf is not full so insert)
30,260
44,360
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, }<
\
50,80
50,120
age sal
24,60 (leaf is full so split, divide contents by sal=75)
30,260
44,360
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, }<
\ 50,80
\ 50,120
\ sal /
{ , 75}<
\
24,60
age sal
26,400 (leaf is full so split, divide contents by age=28)
30,260
44,360
age /
{ 28, }<
/ \
/ 26,400
/
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, <
\ 50,80
\ 50,120
\ sal /
{ , 75}<
\
24,60
age sal
50,280 (full so split, divide contents by salary=300
44,360
sal /
{300, }<
age / \
{ 28, }< 30,260
/ \ 50,280
/ 26,400
/
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, <
\ 50,80
\ 50,120
\ age /
{ , 75}<
\
24,60
age sal
46,60 (leaf is not full so insert)
44,360
sal /
{ ,300}<
age / \
{ 28, < 30,260
/ \ 50,280
/ 26,400
/
sal /
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, <
\ 50,80
\ 50,120
\ age /
{ , 75}<
\
24,60
46,60
age sal
60,260 (full so split, divide contents by age=40
44,360
sal /
{ ,300}<
age / \ 30,260
{ 28, }< \age /
/ \ { 40, }<
/ 26,400 \
/ 50,280
sal / 60,260
{ ,150} <
\ 84,140
\ 70,100
\ age /
{ 55, }<
\ 50,80
\ 50,120
\ sal /
{ , 75}<
\
24,60
46,60
age sal
50,100 (full so split using age=50
(leaf is full again so split using sal=90)
44,360
sal /
{ ,300}< 30,260
age / \ 60,260
{ 28, }< \age /
/ \ { 40, }<
/ 26,400 \
/ 50,280
sal /
{ ,150} < 50,120
\ 84,140 50,100
\ 70,100 sal /
\ age / { , 90}<
{ 55, }< / \
\ age / 50,80
\ { 50, }<
\ sal / \
{ , 75}<
\
24,60
46,60
QUAD-TREES
- Interior nodes (Inodes) correspond to rectangulars in 2-D
(more generally, they can be constructed to represent hypercubes
higher dimensional space)
- If the number of pts in the rectangle fits in a block, it's a leaf,
else the rectangle is treated as interior node with children
corresponding to its 4 quadrants.
- to insert into the quad treee index:
search to find the proper leaf;
if there is room, insert;
else split node into 4 quadrants,
divide contents appropriately.
Example: Build the Quad-tree index as it would develop,
assuming (age,sal) arrive in this order:
age,sal
(24,60)
(46,60)
(50,80)
(50,100)
(50,120)
(70,100)
(84,140)
(30,260)
(26,400)
(44,360)
(50,280)
(60,260)
insert: (24,60) (46,60)
__________________________________________________
400K | |
380K | |
360K | |
340K | |
320K | |
300K | |
280K | |
260K | |
SAL 240K | |
220K | |
200K | |
180K | |
160K | |
140K | |
120K | |
100K | |
80K | |
60K | * * |
40K | |
20K | |
`--------------------------------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0
AGE
(24,60) (Only leaf node)
(46,60)
Next, insert (50,80)
Single leaf is full, so split (e.g., at AGE=50 and SAL=200)
and divide contents by quadrants:
__________________________________________________
400K | | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | |
160K | | |
140K | | |
120K | | |
100K | | |
80K | * |
60K | * * | |
40K | | |
20K | | |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.-NW-
/
/---NE-
age,sal /
(50,200) <
\
\---SW-(24,60)
\ (46,60
\
`SE-(50,80)
Next, insert (50,100)
__________________________________________________
400K | | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | |
160K | | |
140K | | |
120K | | |
100K | * |
80K | * |
60K | * * | |
40K | | |
20K | | |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/
/---NE-
/
(50,200) <
\
\---SW-(24,60)
\ (46,60)
\
`SE-(50,80)
(50,100)
insert (50,120) (overflow, so split SE)
__________________________________________________
400K | | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | : |
160K | | : |
140K | | : |
120K | * : |
100K | *...........:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/
/
/---NE-
/
(50,200) < .NW-(50,100)
\ / (50,120)
\---SW-(24,60) /
\ (46,60) /---NE-
\ /
`SE-(75,100)<
\----SW-(50,80)
\
\
`-SE-
insert (70,100) (overflows, so split NW)
__________________________________________________
400K | | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | . : |
160K | | . : |
140K | | - - + - - : |
120K | * . : |
100K | *.........*.:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/ .-NW-
/ /
/---NE- /---NE-(70,100)
/ /
(50,200) < .NW-(62.5,150)-<
\ / \----SW(50,100)
\---SW-(24,60) / \ (50,120)
\ (46,60) /---NE- \
\ / `-SE
`SE-(75,100)<
\----SW-(50,80)
\
\
`-SE-
insert (84,140), (30,260), (26,400)
__________________________________________________
400K | * | |
380K | | |
360K | | |
340K | | |
320K | | |
300K | | |
280K | | |
260K | * | |
SAL 240K | | |
220K | | |
200K |_________________________|________________________|
180K | | . : |
160K | | . : |
140K | | - - + - - : * |
120K | * . : |
100K | *.........*.:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-(30,260)
/ (26,400) .-NW-
/ /
/---NE- /---NE-(70,100)
/ /
(50,200) < .NW-(62.5,150)-<
\ / \----SW(50,100)
\---SW-(24,60) / \ (50,120)
\ (46,60) /---NE-(84,140) \
\ / `-SE
`SE-(75,100)<
\----SW-(50,80)
\
\
`-SE-
insert (44,360) (overflows: split NW)
__________________________________________________
400K | :* | |
380K | : | |
360K | : * | |
340K | : | |
320K | : | |
300K |............:............| |
280K | : | |
260K | : * | |
SAL 240K | : | |
220K | : | |
200K |____________:____________|________________________|
180K | | . : |
160K | | . : |
140K | | - - + - - : * |
120K | * . : |
100K | *.........*.:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/
/--NE-(26,400)
/ (44,360)
/
.-NW-(25,300)-< ----SW-
/ \
/ `---SE-(30,260)
/ .-NW-
/ /
/---NE- /---NE-(70,100)
/ /
(50,200) < .NW-(62.5,150)-<
\ / \----SW(50,100)
\---SW-(24,60) / \ (50,120)
\ (46,60) /---NE-(84,140) \
\ / `-SE
`SE-(75,100)<
\----SW-(50,80)
\
\
`-SE-
insert (50,280), (60,260)
__________________________________________________
400K | :* | |
380K | : | |
360K | : * | |
340K | : | |
320K | : | |
300K |............:............| |
280K | : * |
260K | : * | * |
SAL 240K | : | |
220K | : | |
200K |____________:____________|________________________|
180K | | . : |
160K | | . : |
140K | | - - + - - : * |
120K | * . : |
100K | *.........*.:............|
80K | * : |
60K | * * | : |
40K | | : |
20K | | : |
`-------------------------+------------------------'
1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 AGE
.NW-
/
/--NE-(26,400)
/ (44,360)
.-NW-(25,300)-< ----SW-
/ \
/ `---SE-(30,260)
/ .-NW-
/ /
/---NE-(50,280) /---NE-(70,100)
/ (60,260) /
(50,200) < .NW-(62.5,150)-<
\ / \----SW(50,100)
\---SW-(24,60) / \ (50,120)
\ (46,60) /---NE-(84,140) \
\ / `-SE
`SE-(75,100)<
\----SW-(50,80)
\
`-SE-
There is, of course, a similarity between QUAD Trees and P-trees.
The difference is:
Quad-Trees are search trees and P-trees are the actual data.
R-TREES (Region tree)
- inodes of an R-tree correspond to interior regions,
(which can be overlapping)
(usually regions are rectangles, tho, not necessarily)
- R-tree regions have subregions that represent the
contents of their children
- And the subregions need not cover the region they
subdivide (but all data must be within a subregion)
Example, Consider the spatial image:
100 __________________________________________________
| |
| |
| |
| |
| .-----------. |
| | | |
| |school | |
| |___________| |
| |
| |
|-------------------------. |
| road1 | .-------. |
|-------------------------| |house2 | |
| |r | |_______| |
| .------.________|o_|________________________|
| |house1|________|a_|________pipeline________|
| |______| |d | |
| |2 | |
| | | |
| | | |
0 `--------------------------------------------------'
0 100
Assume a leaf can hold 6 regions (bfr=6)
and that the 6 regions or objects above are together on 1 leaf block,
whose region is shown as the outer dotted rectangle - .
\
|
100__________________________________________________ /
| | /
| | /
..................................................../
: :
: .-----------. :
: | | :
: |school | :
: |___________| :
: :
: :
:-------------------------. :
: road1 | .-------. :
:----------------------+--| |house2 | :
: |r | |_______| :
: .--------.________|o_|________________________:
: | house1|________|a_|________pipeline________:
: | | |d | :
: |________| |2 | :
: | | :
: | | :
:..................................................:
0 100
Therefore the R-tree has a root and one leaf as follows:
.---------------------.
| ( (0,0), (100,90) ) | (Outer dotted region)
|_____________________|
:
v (full leaf with six objects)
.-----------------------------------------------------.
| road1 | road2 | house1 | school | house2 | pipeline |
+-------+-------+--------+--------+--------+----------+
Now suppose a local cellular phone company adds a POP
(pt of presence or antenna) as below:
100__________________________________________________
| |
| |
....................................................
: :
: .-----------. :
: | | .---. :
: |school | |pop| :
: |___________| `---' :
: :
: :
:-------------------------. :
: road1 | .-------. :
:----------------------+--| |house2 | :
: |r | |_______| :
: .--------.________|o_|________________________:
: | house1|________|a_|________pipeline________:
: | | |d | :
: |________| |2 | :
: | | :
: | | :
0 :--------------------------------------------------:
0 100
Since the 7 objects do not fit in one leaf, we split leaf,
putting 4 in one new leaf and 3 in the other (attempting to
minimize the overlap and to split evenly?).
Redefine enclosing regions:
.---------------------.
| ( (0,0), (100,90) ) | (Outer dotted region)
|_____________________|
:
v
.-----------------------.--------------------------.
| ( (0,0), (60,50) ) | ( (20,20), (100,80) ) |
|_______________________|__________________________|
: :
: :
v v
.------------------.---. .------------------------------.
|road1|road2|house1| | |school|house2| pipeline | pop |
+-----+-----+------+---+ +------+------+----------+-----+
100 __________________________________________________
| |
| |
..|..................................................|..
: | ...........................................|.:
: | : .-----------. |::
: | : | | .---. |::
: | : |school | |pop| |::
: | : |___________| `---' |::
: | : |::
:.|......:.................... |::
::|------:------------------.: |::
::| :road1 |: .-------. |::
::|------:---------------+--|: |house2 | |::
::| : | |: |_______| |::
::| .-:------.________|__|:_______________________|::
::| | :house1|________|__|:_______pipeline________|::
::| | :......|........|..|:.......................|::
::| |________| |ro|: | :
::| |ad|: | :
::| |2 |: | :
0 ::`--------------------------:-----------------------' :
::...........................: :
:......................................................:
0 100
Suppose we insert another house (house3) below house2.
100 __________________________________________________
| |
| |
| ...........................................|.
| : .-----------. |:
| : | | .---. |:
| : |school | |pop| |:
| : |___________| `---' |:
| : |:
.|......:.................... |:
:|------:------------------.: |:
:| :road1 |: .-------. |:
:|------:---------------+--|: |house2 | |:
:| : | |: |_______| |:
:| .-:------.________|__|:_______________________|:
:| | :house1|________|__|:_______pipeline________|:
:| | :......|........|..|:.......................|:
:| |________| |ro|: .-------. |
:| |ad|: |house3 | |
:| |2 |: |_______| |
0 :`--------------------------:-----------------------'
:...........................:
0 100
Since house3 is not in a either of the regions,
we must decide to expand one of the regions.
If we pick the first, we add 1000 square units and if we pick
the 2nd we add 1200 square units, thus we pick the first:
100 __________________________________________________
| |
| |
| ...........................................|.
| : .-----------. |:
| : | | .---. |:
| : |school | |pop| |:
| : |___________| `---' |:
| : |:
.|......:................................. |:
:|------:------------------. : |:
:| :road1 | .-------.: |:
:|------:---------------+--| |house2 |: |:
:| : | | |_______|: |:
:| .-:------.________|__|_____________:__________|:
:| | :house1|________|__|________pipeline________|:
:| | :......|........|..|.............:..........|:
:| |________| |ro| .-------.: |
:| |ad| |house3 |: |
:| |2 | |_______|: |
0 :`---------------------------------------:----------'
:........................................:
0 100
.---------------------.
| ( (0,0), (100,90) ) | (Outer dotted region)
|_____________________|
:
v
.-----------------------.--------------------------.
| ( (0,0), (80,50) ) | ( (20,20), (100,80) ) |
|_______________________|__________________________|
: :
: :
v v
.------------------.------.------. .--------------------------.
|road1|road2|house1|house2|house3| |school|house2|pipeline|pop|
+-----+-----+------+------+------+ +------+------+--------+---+
We note that house2 is in both subregions.
A brief look at an additional index still in use:
BINARY RADIX TREE INDEX
(aka, trie) (e.g., used in IBM AS/400s)
Similar to B-tree, except
- only the common parts of key values are embedded in inodes
- a single bit is used to make the navigation direction decision at each level
(0 for up and 1 for down). (zero-based bit positions are used)
Example: (in this example, the tree structure is being
built left-to-right)
Starting with an empty structure,
INSERT [JAY,LA,25,STAR], (assigned RRN=1)
nam_INDEX CUSTOMER FILE
name RRN RRN nam loc age job
JAY 1 1 JAY LA 25 STAR
INSERT [JON,LA,45,HOOD], assign RRN=2
1st letters are same (J), so that common part is embedded in root
2nd letters are A and O, bit-3 is first difference (makes the decision)
0123 4567 < - bit positions
EBCDIC for A=1100 0001
and O=1101 0110
so we go up for A (bit-3 = 0) and down for O (bit-3=1)
CUSTOMER
RRN nam loc age job
name 1 JAY LA 25 STAR
part RRN 2 JON LA 45 HOOD
b3 AY 1
J <
ON 2
INSERT [JAN,RO,93,DOCT], assign RRN=3
(EBCDIC for N=1101 0101
and Y=1110 0000)
INDEX CUSTOMER
b2 N 3 RNN nam loc age job
b3 A < 1 JAY LA 25 STAR
J < Y 1 2 JON LA 45 HOOD
ON 2 3 JAN RO 93 DOCT
INSERT [SUE,RO,16,PRGR], assign RRN=4
(EBCDIC for J=1101 0001
and S=1110 0010)
INDEX CUSTOMER
b2 N 3 RNN nam loc age job
b3 A < 1 JAY LA 25 STAR
J < Y 1 2 JON LA 45 HOOD
b2 ON 2 3 JAN RO 93 DOCT
< 4 SUE RO 16 PRGR
SUE 4