INDEXES






Indices are auxiliary files which provide a vertical view of the data
 (one attribute per index).






INDEX TYPES





LINEAR INDEX STRUCTURE (simplest type) 

     A lookup file, with a "value"-column and a "pointer"-column.

     That is clustered (ordered) on the value-column for fast access to
      particular values (using binary search of the index file).

     Pointer can be RID or RRN  (or just page-# in some cases)

 __pg1_______
|RRN | S# |ST|           STATE-INDEX
|====|====|==|           .__._____.
| 1  | 25 |NJ|<._        PTR|STATE|
|----|----|--|   \       |==|=====|
| 2  | 32 |NJ|<-. \   .--|3_|_FL__|
|----|----|--|   `-\-/---|2_|_NJ__|
| 3  | 38 |FL|<-----'----|1_|_NJ__|
|----|----|--|        __-|4_|_NY__|
| 4  | 17 |NY|<------' _-|5_|_NY__|
`------------'        '
 _pg2________        |
| 5  | 57 |NY|<------'
|----|----|--|




PRIMARY INDEX:  I(k,p)


     k =  ordered or clustered "key" field values from ordered
          or clustered field of file with uniqueness property

     Uniqueness property: individual value occurrences are "unique"
                          i.e., each value can occur at most once.



     p =  pointer to page containing record(s) with value, k




     Primary indexes can be either:

             DENSE: (every record is indexed)  or

             NON-DENSE: only key-values of records at the beginning
             of a page are indexed (anchor record of page).
             (and then the pointer is page-# only)



Example:
-------

Assume the blocking factor (bfr) is 2
        which means 2 records/page.

 _STUDENT________  
|S#|SNAME |LCODE |
|==|------|------|pg
|17|BAID  |NY2091|1
|25|CLAY  |NJ5101|1
|----------------|
|32|THAISZ|NJ5102|2
|38|GOOD  |FL6321|2
|----------------|
|57|BROWN |NY2092|3
|83|THOM  |ND3450|3

Non-dense Primary Index on S#
|S#|pg|
|17| 1|
|32| 2|
|57| 3|




 _STUDENT________  
|S#|SNAME |LCODE |
|==|------|------|pg
|17|BAID  |NY2091|1
|25|CLAY  |NJ5101|1
|----------------|
|32|THAISZ|NJ5102|2
|38|GOOD  |FL6321|2
|----------------|
|57|BROWN |NY2092|3
|83|THOM  |ND3450|3

Dense Primary Index on S#
|S#|pg|offset
|17| 1| 0|
|25| 1| 1|
|32| 2| 0|
|38| 2| 1|
|57| 3| 0|
|83| 3| 1|
   `---v-'
      RID





Inserting and deleting are major problems.
     - must move records to maintain ordering
     - anchors change (in non-dense case)





USING OVERFLOW PAGES and UNPACKED RECORD-SLOT FORMAT 
 (1 level of indirection)

                                            Overflow page
               .---------------------.  .----------------------.
 .- - - - - -> |P#3|                *|->| record with RID (3,6)|
 .             |___|_________________|  |______________________|
 .            0|record with RID (3,3)|  |                      |
____RID___.    |_____________________|  |______________________|
pg#|offset|   1|record with RID (3,0)|  |                      |
_3_|__6___|    |_____________________|      . . .
              2|record with RID (3,4)|
               |_____________________|
              3|record with RID (3,2)|
               |_____________________|
              4|record with RID (3,1)|
               |_____________________|
              5|record with RID (3,5)|
               |_____________________|
               |     | |*|5|2|0|3|4|1|< - unpacked record-slot-ptr table
               `---------------------'
               . . .  7 6 5 4 3 2 1 0







CLUSTERING INDEX: 
(like a primary index except that the

                    attribute nee not be a key)

        - the file must be clustered on the attribute, k
        - the pointer for any k is the address of 1st page with that k-value

ENROLL2
.-----------.
|S#|C#|GRADE|
|==|==|-----|pg
|17|6 | 96  |1
|25|6 | 76  |1
|-----------|
|32|6 | 62  |2
|38|6 | 98  |2
|-----------|
|32|6 | 91  |3
|25|7 | 68  |3
|-----------|
|32|8 | 89  |4
|17|9 | 95  |4
`-----------'

|C#|pg|     Dense Clustering_Index on C#
|6 | 1|
|7 | 3|
|8 | 4|
|9 | 4|





or

ENROLL2
.-----------.
|S#|C#|GRADE|
|==|==|-----|pg
|17|6 | 96  |1
|25|6 | 76  |1
|-----------|
|32|6 | 62  |2
|38|6 | 98  |2
|-----------|
|32|6 | 91  |3
|25|7 | 68  |3
|-----------|
|32|8 | 89  |4
|17|9 | 95  |4
`-----------'

|C#|pg|     Non-dense Clustering_Index on C#
|6 | 1|       (indexing new anchor records only)
|8 | 4|






There's no more search overhead with this 2nd type of clustering index.
       - How can you know which page has C#=7?
       - How can you know which page has C#=9?


In general, there are insert/delete problems because of clustering.

Solution?  Reserve 1 page for each value and use an overflow ptr

 ___________
|S#|C#|GRADE| ENROLL (clustered on C#)
|==|==|-----|pg
|17|5 | 96  |1
|25|5 | 76  |1
|-----------|
|32|6 | 62  |2
|38|6 | 98  |2
|-----------|
|32|7 | 91  |3
|25|7 | 68  |3
|-----------|
|32|8 | 89  |4
|17|8 | 95 9|4
`----------\'
            \ overflow pointer
             :
.-----------.v
|25|8 | 86  |9  (overflow page for value, C#=8)
|           |
`-----------'

Clustering Index on C#
|C#|pg|
|5 | 1|
|6 | 2|
|7 | 3|
|8 | 4|







SECONDARY INDEX: 
 These indexes are the same except,

          - the file is need not be clustered on k

          - p points to the page or record containing k

          - every record must be indexed (dense)




Option1: If there are multiple occurences of k,
 use multiple index entries for that k.

            |S#|C#|GRADE|  ENROLL (unclustered C#)
            |32|8 | 89  |1
            |25|6 | 76  |1
            |32|6 | 62  |2
            |25|8 | 86  |2
            |38|6 | 98  |3
            |32|7 | 91  |3
            |17|5 | 96  |4
            |25|7 | 68  |4
            |17|8 | 95  |5

            |C#|page|   Secondary_Index, opt1 on C#
            |5 | 4  |
            |6 | 1  |
            |6 | 2  |
            |6 | 3  |
            |7 | 3  |
            |7 | 4  |
            |8 | 1  |
            |8 | 2  |
            |8 | 4  |



Option2:  Use repeating groups of pointers
          (requires variable length pointer(s) field 

            |S#|C#|GRADE|  ENROLL (unclustered C#)
            |32|8 | 89  |1
            |25|6 | 76  |1
            |32|6 | 62  |2
            |25|8 | 86  |2
            |38|6 | 98  |3
            |32|7 | 91  |3
            |17|5 | 96  |4
            |25|7 | 68  |4
            |17|8 | 95  |5

            |C#|page        Secondary_Index, opt2 on C#
            |5 | 4  
            |6 | 1,2,3
            |7 | 3,4
            |8 | 1,2,4



Option3:  Use 1 index entry for each value, 1 pointer to "list"
          or "linked list" of record pointers. (1 level of indirection)

            |S#|C#|GRADE|  ENROLL (unclustered C#)
            |32|8 | 89  |1
            |25|6 | 76  |1
            |32|6 | 62  |2
            |25|8 | 86  |2
            |38|6 | 98  |3
            |32|7 | 91  |3
            |17|5 | 96  |4
            |25|7 | 68  |4
            |17|8 | 95  |5

            |C#|page   Secondary_Index, opt3 on C#
            |5 | -->|4|              
            |6 | -->|1|->|2|->|3|
            |7 | -->|3|->|4| 
            |8 | -->|1|->|2|->|4|






INVERTED FILE  = A file in which there is an index on every field
               (All are secondary indexes except the CITY index.
                the CITY index is a clustering index)
._________.________.______._________.
|GATER 1,1|KNOB 1,2|NY 2,2|OUTBK 2,1| CITY
`---------`--------`------`---------'

         ___________________
 S#     |S#|SNAME | CITY |ST|
======  |==|======|======|==|pg  
|17 2|  |38|GOOD  | GATER|FL|1   STATE
|----|  |--|------|------|--|   =======
|25 2|  |32|THAISZ| KNOB |NJ|1  |FL 1 |
|----|  |--|------|------|--|   |-----|
|32 1|  |25|CLAY  | OUTBK|NJ|2  |NJ 1 |
|----|  |--|------|------|--|   |-----|
|38 1|  |17|BAID  | NY   |NY|2  |NJ 2 |
`----'  `-------------------    |-----|
                                |NY 2 |
                                `-----'
     .______.______.______.________.
SNAME|BAID 2|CLAY 2|GOOD 1|THAISZ 1|
     `------`------`------`--------'







MULTILEVEL INDEX: 
           For any index, since it is a file clustered on k,
           it can have a primary or clustering index on it.
           (constituting the second level of the multilevel index).


 _ STUDENT_______
|S#|SNAME |LCODE |pg
|==|======|======|
|17|BAID  |NY2091|1
|25|CLAY  |NJ5101|1
|--|------|------|
|32|THAISZ|NJ5102|2
|38|GOOD  |FL6321|2
|--|------|------|
|57|BROWN |NY2092|3
|83|THOM  |ND3450|3
|--|------|------|
|91|PARK  |MN7334|4
|94|SIVA  |OR1123|4
`--`------`------'


|S#|pg|pg(of index file)      S#-index (nondense, primary)
|--|--|
|17| 1|1
|32| 2|1    
|--|--|
|57| 3|2   
|91| 4|2  
`--`--'

2nd_LEV   (a second level, nondense index)
|S#|pg|
|--|--|
|17| 1|
|57| 2|
`--`--'



Multilevel index forms a "tree"
     - nodes at the last level are called leaf nodes
     - nodes at the other levels are called internal nodes or inodes


 - Inserts and deletes can be troublesome.









Indexes powerpoint slides covering also ISAM and B+tree indexes










The following is a text based treatment of B+ Tree Indexes.
B+-trees were covered in powerpoint slides, but this additional treatment
may be useful as reference material and as practice with the technique.



B+ TREE INDEX 


(record pointers are in leaf nodes only)

Leaf level (is called the SEQUENCE SET) and is a
 linked list of blocks containing all key values in order
 (for fast sequential access), together with their record ptr(s)

Inodes (is called the INDEX SET), for fast direct access, is a
 tree index into the sequence set.

   - never becomes unbalanced
   - limits wasted space (keeps nodes at least 1/2 full)
   - sequential scans (in order) are fast




The structure of B+ index of order p:





1.Each index set inode is of the form


[P0,        K1, P1,       K2, P2,    .  .  .   Kq, Pq ]   q <= p
 |              |             |                    |
subtree      subtree         subtree              subtree
containing   containing      containing           containing
all values   all vaules      all values           all values
lessthan K1  at least K1 &   at least K2 and      at least Kq
             less than K2    less than K3                        


All Ki values appear again at leaf level (in order)
              together with their record pointer(s).





The structure of a sequence set node:

[ K1,K1ptr(s),  K2,K2ptr(s), ... Kq-1,Kq-1ptr(s),  Pnext ]  

Pnext points to next sequence set node of the B+ tree.







B) Insertion into B+ tree
	
Insert into the sequence set node first.
If sequence set node is not full, just insert.
If sequence set node is full,
   the node overflows and must be split.

   First j=ceil(p/2) entries in original node are kept there and
   remaining entries moved to a new leaf node.
   
   jth value (middle) is replicated to the parent internal node.

   recursively, If parent inode is full, it is split, etc.

(Note, the reason the tree stays balanced is that the only way
 a new level gets introduced is via a new root, which is part
 of every path).







C) Deletion of B+ tree

When an entry is deleted, it is removed from the leaf level.

If it happens to occur in an inode, it is also removed from
 there. 

   Deletion may cause the number of entries in the leaf node
     to fall below the minimum required.

   In this case, find a sibling leaf node that is more than
     half full and redistribute entries among node and its
     sibling such that both are at least half full; otherwise 

     the node is merged with one of its siblings. 

This can propagate and reduce tree levels.





The tree always stays balanced.


There are many variations on the basic B+-tree structure,
 owing to its popularity and excellent service (providing
 both fast sequential file access and fast direct file access)

   - One such variation is to defer key value deletes from the
     index set until its sequence set node is merged with an
     adjacent sequence set node (as described above). 

   - Another is the page-oriented method used by IBM,
     as briefly described below:







MULTIDIMENSIONAL INDEXES 




Multidimensional data (datawarehouses and data cubes) were covered in the
previous set of notes.

Multidimensional data almost always requires multidimensional indexing


One dimensional indexes assume a single search column or key
          (which can be a composite column or key)







Data structures that support queries into multidimensional data
 specifically, fall in two categories:

1. Hash-table-like (e.g., Grid files and partitioned hash fctns)

2. Tree-like
   (eg, multi-key indexes, kd-trees, quad-trees (for sets of points);
    R-trees (for sets of regions as well as sets of points) ),
    Predicate-trees (P-trees) for vertical compressed, representations of data






Hash-like Structures for Multidimensional Data 


Grid Files 
 - partition the POINTS space into a grid
   - in each dim "grid lines" partition space into stripes
   - points that fall right on a grid line belong to the stripe above it
          (i.e., grid-lines are the lower boundaries)


 - example: 12 customer(age,sal) data points     (i.e., records or tuples)

age,sal

(24,60)
(46,60)
(50,80)
(50,100)
(50,120)
(70,100)
(84,140)
(30,260)
(26,400)
(44,360)
(50,280)
(60,260)

if grid lines at age=40 and age=65
                 SAL=90K    SAL=224K

                             4       5
                             0       6
        . ---- ---- ---- ---- ---- ---- ---- ---- ---- ----.
  400K  |             *      :       :                     |
  380K  |                    :       :                     |
  360K  |                    : *     :                     |
  340K  |                    :       :                     |
  320K  |                    :       :                     |
  300K  |                    :       :                     |
  280K  |                    :    *  :                     |
  260K  |               *    :       : *                   |
S 240K  |                    :       :                     |
A 220K  | ...................:.......:.....................|224K
L 200K  |                    :       :                     |
  180K  |                    :       :                     |
  160K  |                    :       :                     |
  140K  |                    :       :             *       |
  120K  |                    :    *  :                     |
  100K  | ...................:....*..:......*..............| 90K
   80K  |                    :    *  :                     |
   60K  |            *       :  *    :                     |
   40K  |                    :       :                     |
   20K  |                    :       :                     |
        ` ---- ---- ---- ---- ---- ---- ---- ---- ---- ----'
              1    2    3    4    5    6    7    8    9        
              0    0    0    0    0    0    0    0    0   
                             AGE




Grid hash function is defined using a table lookup:

    Age: 0-39|40-55| 56+   
Sal     _____|_____|_____ 
0-90K  |_ptr1|_ptr2|_____|
90-224K|_____|_ptr5|_ptr6|
224K+  |_ptr7|_ptr8|_ptr9|
 ____________             :
|__(24,60)___| < - ptr1 --'
|____________|
 ____________             :
|__(46,60)___| < - ptr2 --'
|__(50,80)___|
 ____________             :
|__(50,100)__| < - ptr5 --'
|__(50,120)__|
 ____________             :
|__(70,100)__| < - ptr6 --'
|__(84,140)__|
 ____________             :
|__(30,260)__| < - ptr7 --'
|__(26,400)__|
 ____________             :
|__(44,360)__| < - ptr8 --'
|__(50,280)__|
 ____________             :
|__(60,260)__| < - ptr9 --'
|____________|




Inserting into Grid files:

If there is room, insert, else (two methods)

1. add overflow block and chain it to the primary block, or

2. reorganize the structure by adding or moving grid lines
              (similar to dynamic hashing)


A problem with Grid files is that the number of buckets grows
 exponentially with dimension and the grid may become sparse.










Partitioned Hash Functions 



Actually a sequence of hash functions, h=(h1,...hn) such that
 hi produces the ith segment of bits in the hash key,
 that is, h(a) is the concatenation of h1(a)h2(a)..hn(a).

  - example: The data file is  CUSTOMER(AGE,SAL)   again

    (24,60)  (46,60)  (50,80)  (50,100) (50,120) (70,100)
    (84,140) (30,260) (26,400) (44,360) (50,280) (60,260)
    
We use 3 bit hash values,

 1st bit for age with hash function,    mod2(tens_digit of age) and
last 2 bits for salary  with hash fctn, mod4(hundreds_digit of sal)


The lookup table is:

   mod4 of the hundreths digit of salary
  //
 ||
 vv
000 |ptr0-|- - >
001 |ptr1-|- - >
010 |ptr2-|- - >
011 |ptr3-|- - >
100 |ptr4-|- - >
101 |ptr5-|- - >
110 |ptr6-|- - >
111 |ptr7-|- - >
^
|
`-mod 2 of the tens of age

                tens digit of age
               .- - - - - - - - -.
Starting with (24,060)- - > mod2(2) = 0 - - - - - - - - - .
                  :         mod4(0) = 00   concatenated = 000
                  ` - - - - - - -'    ``- - - - - - - - - -''
                100ths digit of sal

 .- - ptr0 (from 000)
 v___________    
|__(24,60)___|
|____________|  






              (46,60) - - > mod2(4) = 0
                            mod4(0) = 00,  concatenated = 000
 .- - ptr0 (from 000)
 v___________    
|__(24,60)___|
|__(46,60)___|  






              (50,80) - - > mod2(5) = 1
                            mod4(0) = 00,  concatenated = 100
 .- - ptr0 (from 000)
 v___________    
|__(24,60)___|
|__(46,60)___|  

 .- - ptr4 (from 100)
 v___________             
|__(50,80)___|
|____________|






Etc.  yielding the following:


 .- - ptr0 (from 000)
 v___________             ____________             
|__(24,60)___| - - - - > |__(26,400)__| (overflow block)
|__(46,60)___|           |____________|

 .- - ptr1 (from 001)
 v___________             
|__(84,140)__|
|____________|

 .- - ptr2 (from 010)
 v___________             
|__(60,260)__|
|____________|

 .- - ptr3 (from 011)
 v___________             
|__(44,360)__|
|____________|

 .- - ptr4 (from 100)
 v___________             
|__(50,80)___|
|____________|

 .- - ptr5 (from 101)
 v___________             ____________             
|__(50,100)__| - - - - > |__(70,100)__| 
|__(50,120)__|           |____________|

 .- - ptr6 (from 110)
 v___________             ____________             
|__(30,260)__| - - - - > |__(50,280)__|
|____________|           |____________|

 .- - ptr7 (from 111)
 v___________
|____________|
|____________|




Tree-like Structured Indexes for Multidimensional Data 



Multiple-key indexes 


      - assume there are several attributes representing "dimensions"
              of the data points (data cube tuples)

      - use a multilevel index   e.g., suppose there are 2 attributes:



      Provides a second level of Indexes on 2nd attribute
      (to all tuples with same 1st attr val)                                    

                    /|-- >                                          
                   / |-- >                                          
    Index on .- > <  |-- >                                           
    1st attr/      \ |..                                             
         /|/        \|-- >                                            
        / |                                    
       /  |         /|-- >                                          
      /   |        / |-- >                                          
-- > <    |---- > <  |-- >                                           
      \   |        \ |..                                             
       \  |\        \|-- >                                            
        \ | \                                                        
         \|  \      /|-- >                                          
           \  \    / |-- >                                         
            \  `> <  |-- >                                           
             \     \ |..                                             
              \     \|-- >                                            
               \                                                    
                `- > . . .  



Take the multiple key attributes, age, sal again

age,sal
(24,60)
(24,260)
(24,400)
(50,80) 
(50,100)
(50,120)
(50,280)
(60,100)
(60,260)
(84,140)
                  . - - - - - - - - -> (24,60)
                 /     .- - - - - - -> (24,60)                 
                /     /      .- - - -> (24,400)
            ___/_____/______/_
      . - > |_60_|_260_|_400__|
     /            .- - - - - - - - - > (50,80)
age /        ____/________________
24-'  . - > |_80_|_100|_120_|_280_|- > (50,280)
50---'               \     ` - - - - > (50,120)
60.     _______       `- - - - - - - > (50,100)
84 `- >|100|260| - - - - - - - - - - > (60,260)
  \       `- - - - - - - - - - - - - > (60,100)
   \     _____      
    `- >|_140_|- - - - - - - - - - - > (84,140)







k-dimensional-trees (kd-trees)




         - interior nodes have (Attribute, Value, LowPtr, HiPtr)

            - Value is a value which splits data points

            - The example below will show (a, V, down, up)
              with pointers going down for LowPtr and up for HiPtr.
              (We'll go up on greater or equal).



 - Attributes used for different levels in the index are different
   and ROTATE among the dimensions (round robin).

 - The leaves are blocks of records

   (assume data blocks hold 2 data points,
    i.e., the blocking factor, bfr, is 2).
              


 - to search: decide along the tree until you reach a leaf
              (going up on greater or equal)



 - to insert: decide along the tree until you reach the proper leaf
              if there is room there,
                 insert;
              else split the block and
                 divide its contents according to the appropriate
                 attribute (next one in the rotation).




 - Example: (insert into kd-tree in this order using age first then sal):

age,sal
(50,80)
(84,140)
(30,260)
(44,360)
(50,120)
(70,100)
(24,60)
(26,400)
(50,280)
(46,60)
(60,260)
(50,100)

insert the first 2 pairs (no tree yet, since just 1 leaf block):
50,80
84,140

age sal
 30,260   (leaf is full so split it and divide the contents by sal=150)
            30,260

     sal   /
{   ,150} <                                                                     
           \                                                           
            50,80                                                      
            84,140                                                     
                                                                       

age sal
 44,360 (leaf is not full so insert)
           30,260
           44,360
    sal   /
{  ,150} <                                                                     
          \                                                           
           50,80                                                      
           84,140                                                     
                                                                      

age sal
 50,120 (leaf is full so split, divide contents by age=55)
           30,260
           44,360
    sal   /
{  ,150} <
          \          84,140                                                 
           \age     /
           { 55,  }< 
                    \
                     50,80                                                      
                     50,120

age sal
 70,100 (leaf is not full so insert)
           30,260
           44,360
    sal   /
{  ,150} <
          \            84,140                                
           \           70,100
            \ age     /
             { 55,  }< 
                      \
                       50,80                               
                       50,120
                                                                       

age sal
 24,60 (leaf is full so split, divide contents by sal=75)
           30,260
           44,360
    sal   /
{  ,150} <
          \            84,140                                
           \           70,100
            \ age     /
             { 55,  }< 
                      \            50,80
                       \           50,120
                        \    sal  /
                         {  , 75}<
                                  \                   
                                   24,60                                                      
age sal
 26,400 (leaf is full so split, divide contents by age=28)
                        30,260
                        44,360
               age     /
              { 28,  }<
             /         \                               
            /           26,400
           /                        
    sal   /
{  ,150} <
          \            84,140                                
           \           70,100
            \ age     /
             { 55,   < 
                      \            50,80
                       \           50,120
                        \    sal  /
                         {  , 75}<
                                  \                   
                                   24,60                                                      

age sal
 50,280 (full so split, divide contents by salary=300

                                  44,360
                        sal      /
                       {300,   }<
               age     /         \
              { 28,  }<           30,260
             /         \          50,280               
            /           26,400
           /                        
    sal   /
{  ,150} <
          \            84,140                                
           \           70,100
            \ age     /
             { 55,   < 
                      \            50,80
                       \           50,120
                        \    age  /
                         {  , 75}<
                                  \                   
                                   24,60                                                      

age sal
 46,60 (leaf is not full so insert)
                                  44,360
                            sal  /
                       {   ,300}<
               age     /         \
              { 28,   <           30,260
             /         \          50,280               
            /           26,400
           /                        
    sal   /
{  ,150} <
          \            84,140                                
           \           70,100
            \ age     /
             { 55,   < 
                      \            50,80
                       \           50,120
                        \    age  /
                         {  , 75}<
                                  \                   
                                   24,60                              
                                   46,60 



age sal
 60,260 (full so split, divide contents by age=40 
                                  44,360
                            sal  /
                       {   ,300}<          
               age     /         \           30,260
              { 28,  }<           \age     /  
             /         \          { 40,  }<                
            /           26,400             \
           /                                 50,280
    sal   /                                  60,260
{  ,150} <
          \            84,140                                
           \           70,100
            \ age     /
             { 55,  }< 
                      \            50,80
                       \           50,120
                        \    sal  /
                         {  , 75}<
                                  \                   
                                   24,60    
                                   46,60 


age sal
 50,100 (full so split using age=50
       (leaf is full again so split using sal=90)
                                  44,360
                            sal  /
                       {   ,300}<            30,260
               age     /         \           60,260
              { 28,  }<           \age     /  
             /         \          { 40,  }<                
            /           26,400             \
           /                                 50,280
    sal   /
{  ,150} <                                            50,120
          \            84,140                         50,100       
           \           70,100                   sal  /
            \ age     /                     {  , 90}<
             { 55,  }<                      /        \
                      \            age     /          50,80
                       \          { 50,  }<    
                        \    sal  /        \
                         {  , 75}<
                                  \                   
                                   24,60   
                                   46,60 









QUAD-TREES




 - Interior nodes (Inodes) correspond to rectangulars in 2-D

   (more generally, they can be constructed to represent hypercubes
    higher dimensional space)


 - If the number of pts in the rectangle fits in a block, it's a leaf,
   else the rectangle is treated as interior node with children
   corresponding to its 4 quadrants.

 - to insert into the quad treee index:

     search to find the proper leaf;
     if there is room, insert;
     else split node into 4 quadrants,
       divide contents appropriately.







Example:  Build the Quad-tree index as it would develop,
 assuming (age,sal) arrive in this order:

age,sal
(24,60)
(46,60)
(50,80)
(50,100)
(50,120)
(70,100)
(84,140)
(30,260)
(26,400)
(44,360)
(50,280)
(60,260)





insert: (24,60) (46,60)
            __________________________________________________
     400K  |                                                  |
     380K  |                                                  |
     360K  |                                                  |
     340K  |                                                  |
     320K  |                                                  |
     300K  |                                                  |
     280K  |                                                  |
     260K  |                                                  |
SAL  240K  |                                                  |
     220K  |                                                  |
     200K  |                                                  |
     180K  |                                                  |
     160K  |                                                  |
     140K  |                                                  |
     120K  |                                                  |
     100K  |                                                  | 
      80K  |                                                  |
      60K  |            *          *                          |
      40K  |                                                  |
      20K  |                                                  |
           `--------------------------------------------------'
                 1    2    3    4    5    6    7    8    9    
                 0    0    0    0    0    0    0    0    0   
                                    AGE
(24,60)   (Only leaf node)
(46,60)







Next, insert (50,80)

Single leaf is full, so split    (e.g., at AGE=50 and SAL=200)
and divide contents by quadrants:

            __________________________________________________
     400K  |                         |                        |
     380K  |                         |                        |
     360K  |                         |                        |
     340K  |                         |                        |
     320K  |                         |                        |
     300K  |                         |                        |
     280K  |                         |                        |
     260K  |                         |                        |
SAL  240K  |                         |                        |
     220K  |                         |                        |
     200K  |_________________________|________________________|
     180K  |                         |                        |
     160K  |                         |                        |
     140K  |                         |                        |
     120K  |                         |                        |
     100K  |                         |                        | 
      80K  |                         *                        |
      60K  |            *          * |                        |
      40K  |                         |                        |
      20K  |                         |                        |
           `-------------------------+------------------------'
                 1    2    3    4    5    6    7    8    9    
                 0    0    0    0    0    0    0    0    0   AGE
              .-NW-
             /
            /---NE-
age,sal    /
(50,200)  <                                                                     
           \                                                           
            \---SW-(24,60)                                                     
             \     (46,60                                               
              \                                                         
               `SE-(50,80)                                                     
                                                                       





Next, insert (50,100)

            __________________________________________________
     400K  |                         |                        |
     380K  |                         |                        |
     360K  |                         |                        |
     340K  |                         |                        |
     320K  |                         |                        |
     300K  |                         |                        |
     280K  |                         |                        |
     260K  |                         |                        |
SAL  240K  |                         |                        |
     220K  |                         |                        |
     200K  |_________________________|________________________|
     180K  |                         |                        |
     160K  |                         |                        |
     140K  |                         |                        |
     120K  |                         |                        |
     100K  |                         *                        | 
      80K  |                         *                        |
      60K  |            *          * |                        |
      40K  |                         |                        |
      20K  |                         |                        |
           `-------------------------+------------------------'
                 1    2    3    4    5    6    7    8    9    
                 0    0    0    0    0    0    0    0    0   AGE
              .NW-
             /
            /---NE-
           /
(50,200)  <                                                                     
           \                                                           
            \---SW-(24,60)
             \     (46,60)
              \
               `SE-(50,80)                                                     
                   (50,100)                                             
                                                                       






insert (50,120) (overflow, so split SE)
            __________________________________________________
     400K  |                         |                        |
     380K  |                         |                        |
     360K  |                         |                        |
     340K  |                         |                        |
     320K  |                         |                        |
     300K  |                         |                        |
     280K  |                         |                        |
     260K  |                         |                        |
SAL  240K  |                         |                        |
     220K  |                         |                        |
     200K  |_________________________|________________________|
     180K  |                         |           :            |
     160K  |                         |           :            |
     140K  |                         |           :            |
     120K  |                         *           :            |
     100K  |                         *...........:............| 
      80K  |                         *           :            |
      60K  |            *          * |           :            |
      40K  |                         |           :            |
      20K  |                         |           :            |
           `-------------------------+------------------------'
                 1    2    3    4    5    6    7    8    9    
                 0    0    0    0    0    0    0    0    0   AGE

               .NW-
              /
             /
            /---NE-
           /
(50,200)  <                     .NW-(50,100)
           \                   /    (50,120)
            \---SW-(24,60)    /
             \     (46,60)   /---NE-   
              \             /           
               `SE-(75,100)<
                            \----SW-(50,80)
                             \             
                              \                                               
                               `-SE-
                                               
                                                                       





insert (70,100) (overflows, so split NW)
            __________________________________________________
     400K  |                         |                        |
     380K  |                         |                        |
     360K  |                         |                        |
     340K  |                         |                        |
     320K  |                         |                        |
     300K  |                         |                        |
     280K  |                         |                        |
     260K  |                         |                        |
SAL  240K  |                         |                        |
     220K  |                         |                        |
     200K  |_________________________|________________________|
     180K  |                         |     .     :            |
     160K  |                         |     .     :            |
     140K  |                         | - - + - - :            |
     120K  |                         *     .     :            |
     100K  |                         *.........*.:............| 
      80K  |                         *           :            |
      60K  |            *          * |           :            |
      40K  |                         |           :            |
      20K  |                         |           :            |
           `-------------------------+------------------------'
                 1    2    3    4    5    6    7    8    9    
                 0    0    0    0    0    0    0    0    0   AGE

               .NW-
              /                                    .-NW-
             /                                    /
            /---NE-                              /---NE-(70,100)
           /                                    /
(50,200)  <                     .NW-(62.5,150)-<           
           \                   /                \----SW(50,100)
            \---SW-(24,60)    /                  \     (50,120)
             \     (46,60)   /---NE-              \
              \             /                      `-SE
               `SE-(75,100)<
                            \----SW-(50,80)
                             \             
                              \                                               
                               `-SE-






insert (84,140), (30,260), (26,400)
            __________________________________________________
     400K  |             *           |                        |
     380K  |                         |                        |
     360K  |                         |                        |
     340K  |                         |                        |
     320K  |                         |                        |
     300K  |                         |                        |
     280K  |                         |                        |
     260K  |               *         |                        |
SAL  240K  |                         |                        |
     220K  |                         |                        |
     200K  |_________________________|________________________|
     180K  |                         |     .     :            |
     160K  |                         |     .     :            |
     140K  |                         | - - + - - :    *       |
     120K  |                         *     .     :            |
     100K  |                         *.........*.:............| 
      80K  |                         *           :            |
      60K  |            *          * |           :            |
      40K  |                         |           :            |
      20K  |                         |           :            |
           `-------------------------+------------------------'
                 1    2    3    4    5    6    7    8    9    
                 0    0    0    0    0    0    0    0    0   AGE

               .NW-(30,260)
              /    (26,400)                        .-NW-
             /                                    /
            /---NE-                              /---NE-(70,100)
           /                                    /
(50,200)  <                     .NW-(62.5,150)-<           
           \                   /                \----SW(50,100)
            \---SW-(24,60)    /                  \     (50,120)
             \     (46,60)   /---NE-(84,140)      \
              \             /                      `-SE
               `SE-(75,100)<
                            \----SW-(50,80)
                             \             
                              \                                               
                               `-SE-
                                                                       





insert (44,360) (overflows: split NW)
            __________________________________________________
     400K  |            :*           |                        |
     380K  |            :            |                        |
     360K  |            :         *  |                        |
     340K  |            :            |                        |
     320K  |            :            |                        |
     300K  |............:............|                        |
     280K  |            :            |                        |
     260K  |            :  *         |                        |
SAL  240K  |            :            |                        |
     220K  |            :            |                        |
     200K  |____________:____________|________________________|
     180K  |                         |     .     :            |
     160K  |                         |     .     :            |
     140K  |                         | - - + - - :    *       |
     120K  |                         *     .     :            |
     100K  |                         *.........*.:............| 
      80K  |                         *           :            |
      60K  |            *          * |           :            |
      40K  |                         |           :            |
      20K  |                         |           :            |
           `-------------------------+------------------------'
                 1    2    3    4    5    6    7    8    9    
                 0    0    0    0    0    0    0    0    0   AGE

                                    .NW-                
                                   /                     
                                  /--NE-(26,400)        
                                 /      (44,360)
                                /             
                 .-NW-(25,300)-< ----SW-
                /               \                        
               /                 `---SE-(30,260)         
              /                                    .-NW-
             /                                    /
            /---NE-                              /---NE-(70,100)
           /                                    /
(50,200)  <                     .NW-(62.5,150)-<           
           \                   /                \----SW(50,100)
            \---SW-(24,60)    /                  \     (50,120)
             \     (46,60)   /---NE-(84,140)      \
              \             /                      `-SE
               `SE-(75,100)<
                            \----SW-(50,80)
                             \             
                              \                                               
                               `-SE-







insert (50,280), (60,260)
            __________________________________________________
     400K  |            :*           |                        |
     380K  |            :            |                        |
     360K  |            :         *  |                        |
     340K  |            :            |                        |
     320K  |            :            |                        |
     300K  |............:............|                        |
     280K  |            :            *                        |
     260K  |            :  *         |    *                   |
SAL  240K  |            :            |                        |
     220K  |            :            |                        |
     200K  |____________:____________|________________________|
     180K  |                         |     .     :            |
     160K  |                         |     .     :            |
     140K  |                         | - - + - - :    *       |
     120K  |                         *     .     :            |
     100K  |                         *.........*.:............| 
      80K  |                         *           :            |
      60K  |            *          * |           :            |
      40K  |                         |           :            |
      20K  |                         |           :            |
           `-------------------------+------------------------'
                 1    2    3    4    5    6    7    8    9    
                 0    0    0    0    0    0    0    0    0   AGE

                                   .NW-                
                                  /                     
                                 /--NE-(26,400)        
                                /      (44,360)
                 .-NW-(25,300)-< ----SW-
                /               \                        
               /                 `---SE-(30,260)         
              /                                    .-NW-
             /                                    /
            /---NE-(50,280)                      /---NE-(70,100)
           /       (60,260)                     /
(50,200)  <                     .NW-(62.5,150)-<           
           \                   /                \----SW(50,100)
            \---SW-(24,60)    /                  \     (50,120)
             \     (46,60)   /---NE-(84,140)      \
              \             /                      `-SE
               `SE-(75,100)<
                            \----SW-(50,80)
                             \                                               
                              `-SE-



There is, of course, a similarity between QUAD Trees and P-trees.

The difference is:
Quad-Trees are search trees and P-trees are the actual data.






R-TREES  (Region tree)




       - inodes of an R-tree correspond to interior regions,
         (which can be overlapping)
         (usually regions are rectangles, tho, not necessarily)

       - R-tree regions have subregions that represent the
         contents of their children

       - And the subregions need not cover the region they
         subdivide (but all data must be within a subregion)






Example,  Consider the spatial image:

100  __________________________________________________
    |                                                  |
    |                                                  |
    |                                                  |
    |                                                  |
    |        .-----------.                             |
    |        |           |                             |
    |        |school     |                             |
    |        |___________|                             |
    |                                                  |
    |                                                  |
    |-------------------------.                        |
    |      road1              |    .-------.           |
    |-------------------------|    |house2 |           |
    |                      |r |    |_______|           |
    |      .------.________|o_|________________________|
    |      |house1|________|a_|________pipeline________|
    |      |______|        |d |                        |
    |                      |2 |                        |
    |                      |  |                        |
    |                      |  |                        |
  0 `--------------------------------------------------'
     0                                                 100





Assume a leaf can hold 6 regions (bfr=6)
 and that the 6 regions or objects above are together on 1 leaf block,
  whose region is shown as the outer dotted rectangle - .
                                                         \
                                                          |
100__________________________________________________    /
  |                                                  |  / 
  |                                                  | /   
  ..................................................../
  :                                                  :  
  :        .-----------.                             :
  :        |           |                             :
  :        |school     |                             :
  :        |___________|                             :
  :                                                  :
  :                                                  :
  :-------------------------.                        :
  :       road1             |    .-------.           :
  :----------------------+--|    |house2 |           :
  :                      |r |    |_______|           :
  :    .--------.________|o_|________________________:
  :    |  house1|________|a_|________pipeline________:
  :    |        |        |d |                        :
  :    |________|        |2 |                        :
  :                      |  |                        :
  :                      |  |                        :
  :..................................................:
  0                                                 100








Therefore the R-tree has a root and one leaf as follows:

 .---------------------.
 | ( (0,0), (100,90) ) |     (Outer dotted region)
 |_____________________|
                  :
                  v       (full leaf with six objects)
.-----------------------------------------------------.
| road1 | road2 | house1 | school | house2 | pipeline |
+-------+-------+--------+--------+--------+----------+









Now suppose a local cellular phone company adds a POP
 (pt of presence or antenna) as below:

100__________________________________________________    
  |                                                  |   
  |                                                  | 
  ....................................................
  :                                                  :  
  :        .-----------.                             :
  :        |           |         .---.               :
  :        |school     |         |pop|               :
  :        |___________|         `---'               :
  :                                                  :
  :                                                  :
  :-------------------------.                        :
  :       road1             |    .-------.           :
  :----------------------+--|    |house2 |           :
  :                      |r |    |_______|           :
  :    .--------.________|o_|________________________:
  :    |  house1|________|a_|________pipeline________:
  :    |        |        |d |                        :
  :    |________|        |2 |                        :
  :                      |  |                        :
  :                      |  |                        :
0 :--------------------------------------------------:
  0                                                100







Since the 7 objects do not fit in one leaf, we split leaf,
 putting 4 in one new leaf and 3 in the other (attempting to
 minimize the overlap and to split evenly?).

Redefine enclosing regions:

 .---------------------.
 | ( (0,0), (100,90) ) |     (Outer dotted region)
 |_____________________|
                  :
                  v
 .-----------------------.--------------------------.
 | ( (0,0), (60,50) )    | ( (20,20), (100,80) )    | 
 |_______________________|__________________________|
                      :                        :
                      :                        :
                      v                        v
.------------------.---.  .------------------------------.
|road1|road2|house1|   |  |school|house2| pipeline | pop |
+-----+-----+------+---+  +------+------+----------+-----+

100  __________________________________________________    
    |                                                  |   
    |                                                  | 
  ..|..................................................|..
  : |       ...........................................|.:  
  : |      : .-----------.                             |::
  : |      : |           |         .---.               |::
  : |      : |school     |         |pop|               |::
  : |      : |___________|         `---'               |::
  : |      :                                           |::
  :.|......:....................                       |::
  ::|------:------------------.:                       |::
  ::|      :road1             |:   .-------.           |::
  ::|------:---------------+--|:   |house2 |           |::
  ::|      :               |  |:   |_______|           |::
  ::|    .-:------.________|__|:_______________________|::
  ::|    | :house1|________|__|:_______pipeline________|::
  ::|    | :......|........|..|:.......................|::
  ::|    |________|        |ro|:                       | :
  ::|                      |ad|:                       | :
  ::|                      |2 |:                       | :
0 ::`--------------------------:-----------------------' :
  ::...........................:                         :
  :......................................................:
     0                                                100







Suppose we insert another house (house3) below house2.

100  __________________________________________________    
    |                                                  |   
    |                                                  | 
    |       ...........................................|.   
    |      : .-----------.                             |: 
    |      : |           |         .---.               |: 
    |      : |school     |         |pop|               |: 
    |      : |___________|         `---'               |: 
    |      :                                           |: 
   .|......:....................                       |: 
   :|------:------------------.:                       |: 
   :|      :road1             |:   .-------.           |: 
   :|------:---------------+--|:   |house2 |           |: 
   :|      :               |  |:   |_______|           |: 
   :|    .-:------.________|__|:_______________________|: 
   :|    | :house1|________|__|:_______pipeline________|: 
   :|    | :......|........|..|:.......................|: 
   :|    |________|        |ro|:   .-------.           |  
   :|                      |ad|:   |house3 |           |  
   :|                      |2 |:   |_______|           |  
0  :`--------------------------:-----------------------'  
   :...........................:                          
     0                                                100







Since house3 is not in a either of the regions, 
   we must decide to expand one of the regions.

If we pick the first, we add 1000 square units and if we pick
 the 2nd we add 1200 square units, thus we pick the first:

100  __________________________________________________    
    |                                                  |   
    |                                                  | 
    |       ...........................................|.   
    |      : .-----------.                             |: 
    |      : |           |         .---.               |: 
    |      : |school     |         |pop|               |: 
    |      : |___________|         `---'               |: 
    |      :                                           |: 
   .|......:.................................          |: 
   :|------:------------------.             :          |: 
   :|      :road1             |    .-------.:          |: 
   :|------:---------------+--|    |house2 |:          |: 
   :|      :               |  |    |_______|:          |: 
   :|    .-:------.________|__|_____________:__________|: 
   :|    | :house1|________|__|________pipeline________|: 
   :|    | :......|........|..|.............:..........|: 
   :|    |________|        |ro|    .-------.:          |  
   :|                      |ad|    |house3 |:          |  
   :|                      |2 |    |_______|:          |  
0  :`---------------------------------------:----------'  
   :........................................:             
     0                                                100






 .---------------------.
 | ( (0,0), (100,90) ) |     (Outer dotted region)
 |_____________________|
                  :
                  v
 .-----------------------.--------------------------.
 | ( (0,0), (80,50) )    | ( (20,20), (100,80) )    | 
 |_______________________|__________________________|
                       :                        :
                       :                        :
                       v                        v
.------------------.------.------.  .--------------------------.
|road1|road2|house1|house2|house3|  |school|house2|pipeline|pop|
+-----+-----+------+------+------+  +------+------+--------+---+


We note that house2 is in both subregions.







A brief look at an additional index still in use:



BINARY RADIX TREE INDEX




(aka, trie) (e.g., used in IBM AS/400s)




Similar to B-tree, except

 - only the common parts of key values are embedded in inodes

 - a single bit is used to make the navigation direction decision at each level
  (0 for up and 1 for down).  (zero-based bit positions are used)




Example:  (in this example, the tree structure is being
           built left-to-right)

Starting with an empty structure,




INSERT [JAY,LA,25,STAR], (assigned RRN=1)

nam_INDEX          CUSTOMER FILE
name  RRN          RRN nam  loc  age  job
 JAY  1              1 JAY  LA     25 STAR







INSERT [JON,LA,45,HOOD], assign RRN=2

1st letters are same (J), so that common part is embedded in root
2nd letters are A and O, bit-3 is first difference (makes the decision)

              0123 4567 < - bit positions
 EBCDIC for A=1100 0001
        and O=1101 0110

so we go up for A (bit-3 = 0) and down for O (bit-3=1)



                 CUSTOMER
                 RRN nam  loc  age  job
     name          1 JAY  LA     25 STAR
     part  RRN     2 JON  LA     45 HOOD
b3     AY  1
 J  <
       ON  2
    








INSERT [JAN,RO,93,DOCT], assign RRN=3
(EBCDIC for N=1101 0101
        and Y=1110 0000)

INDEX          CUSTOMER
     b2    N 3  RNN nam  loc  age  job
b3    A   <       1 JAY  LA     25 STAR
 J   <     Y 1    2 JON  LA     45 HOOD
      ON 2        3 JAN  RO     93 DOCT







INSERT [SUE,RO,16,PRGR], assign RRN=4
(EBCDIC for J=1101 0001
        and S=1110 0010)

INDEX          CUSTOMER
      b2   N 3  RNN nam  loc  age  job
   b3  A  <       1 JAY  LA     25 STAR
    J <    Y 1    2 JON  LA     45 HOOD
b2     ON 2       3 JAN  RO     93 DOCT
  <               4 SUE  RO     16 PRGR
    SUE 4