****************************************************************************************
* These notes contain NDSU confidential and proprietary material.  Patents are pending *
* on the concepts and applications of bSQ organization and P-tree technology, etc.     *
****************************************************************************************

THE DATA

Spatial Data Organizations

First we consider ways of organizing spatial data.  Spatial attributes such as
remotely sensed reflectances (R,G,B,NIR,..), ground attributes (yield levels,
soil moisture levels, elevations),etc., are referred to as "bands".

Let R(P, B1,...Bn) be the file or relation containing these data bands as columns
or attributes for a particular space or area, where P is the key (pixel coordinates,
x-y, of the points in the space) and  each column, B1,..,Bn, measures the level of
that attribute for each pixel location.  If the inverted list model is used rather
than the relational model (so that one can assume an ordering of the tuples), the
raster ordering of coordinates is usually assumed (first row, followed by second row,
followed by third row, ...)

This Relational (REL) organization is the starting point or basic organization.

In Band-SeQuential (BSQ) organization, the REL organization is projected into many
files - a separate files for each band or column.  The coordinate ordering is
assumed to be raster order, and thus need not be part of each band file.  Each
band file is then a 1-column file of the measurements for that attribute at each
pixel in raster order (eg, TM data from Landsat satellites is organized as BSQ).


In Band-Interleaved-by-Line (BIL) there is just one file in which the
first row (line) of the first band is followed by the
first row of the second band, ..., followed by the
first row of the last band, followed by the
second row of the first band, followed by the
second row of the second band, ... etc.  (e.g., SPOT data from French Satellites is BIL)


In Band-Interleaved-by-Pixel (BIP), there is just one file in which the
first pixel-value of the first band is followed by the
first pixel-value of the second band,..., the
first pixel-value of the last band, followed by the
second pixel-value of the first band,...     (e.g., tiff images are BIP).

We note BIP is nearly identical to REL except there are no explicit "record"
or row boundary markers (ie, data is not organized into records, but the values
are in the same order as they are in REL).


A new organization at the "interleaving extreme" end of this spectrum of
organizations is Band-Interleaved-by-bit (BIb) in which there is just one file, the
first bit of the first pixel-value of the first band is followed by the
first bit of the first pixel-value of the second band,..., the
first bit of the first pixel-value of the last band, followed by the
second bit of the first pixel-value of the first band,...

Another new organization, at the other end of this organization spectrum is
bit-SeQential (bSQ) in which each bit of each band, B11..,18, B21..B28 ... Bn1..Bn8
is a separate file.  We will use bSQ organization later in this course.

We have the following spectrum of Band-oriented organizations:

REL is the basic organization in which there is one file an no interleaving (we say
there is no interleaving since a relation is a "set" of tuples, not a sequence and
each tuple is a "set" of attribute values, not a sequence.  i.e., in a relation,
there is no ordering of values).

         more interleaving-- >

bSQ      BSQ       BIL      BIP      BIb

       < -- more files

A very simple illustrative example (with only 2 bands, each having only 2 rows and 2 columns)

BAND-1                                BAND-2
    254          127                       37           240
(1111 1110)  (0111 1111)              (0010 0101)     (1111 0000)

     14          193                      200            19
(0000 1110)  (1100 0001)              (1100 1000)     (0001 0011)

REL organization: RRN |x-y | B1 | B2 |  (a set of tuples,
                      |====|====|====|   each tuple is a set of attribute values)
                   0  |0,0 |254 | 37 |  
                   1  |0,1 |127 |240 |  
                   2  |1,0 | 14 |200 |  
                   3  |1,1 |193 | 19 |  

bSQ organization:  (16 files)
 B11 B12 B13 B14 B15 B16 B17 B18   B21 B22 B23 B24 B25 B26 B27 B28   
 --- --- --- --- --- --- --- ---   --- --- --- --- --- --- --- ---  
  1   1   1   1   1   1   1   0     0   0   1   0   0   1   0   1 
  0   1   1   1   1   1   1   1     1   1   1   1   0   0   0   0 
  0   0   0   0   1   1   1   0     1   1   0   0   1   0   0   0 
  1   1   0   0   0   0   0   1     0   0   0   1   0   0   1   1 


BSQ organization:  B1       B2  (two separate files, values given in decimal)
                  ----     ----
                   254       37
                   127      240
                    14      200
                   193       19

BIL organization:   (one file, values given in decimal)
254 127  37 240  14 193 200  19

BIP organization:   (one file, values given in decimal)
254  37 127 240  14 200 193  19

Bib organization:   (one file, values given in decimal)
10 10 11 10 10 11 10 01 01 11 11 11 10 10 10 10
01 01 00 00 11 10 10 00 10 10 00 01 00 00 01 11

Thru simple offset arithmetic, one can convert among these organizations.


**************************************************************

Note that in traditional Market Basket Data Mining each "item"
is treated as a separate column or attribute of REL and the values
are Boolean (1 or 0, for yes or no).  Thus, for MBDM, we start with:

                         B1   B2   B3   B4   B5   B6   B7    ...
REL organization: |trans|hat |shoe|coat|milk|beer|soap|nails|...
                  |=====|====|====|====|====|====|====|=====|...
(one relation)    |tid-1| 1  | 0  | 0  | 1  | 0  | 1  | 0   |...
                  |tid-2| 0  | 0  | 0  | 0  | 1  | 0  | 0   |...
                  |tid-3| 0  | 0  | 1  | 0  | 0  | 0  | 0   |...
                  |tid-4| 0  | 1  | 0  | 1  | 0  | 1  | 0   |...
                  |tid-5| 0  | 0  | 0  | 0  | 1  | 0  | 1   |...
                   .  .  .

bSQ organization: B1   B2   B3   B4   B5   B6   B7    ...
                   1    0    0    1    0    1    0   
(separate          0    0    0    0    1    0    0    
 file  for         0    0    1    0    0    0    0  
 for each item)    0    1    0    1    0    1    0 
                   0    0    0    0    1    0    1
                   .  .  .

BSQ organization: B1   B2   B3   B4   B5   B6   B7    ...
                   1    0    0    1    0    1    0 
(identical to      0    0    0    0    1    0    0
 bSQ)              0    0    1    0    0    0    0   
                   0    1    0    1    0    1    0  
                   0    0    0    0    1    0    1 
                   .  .  .


BIL organization: (transactions are not in any natural 2-D arrangement
                   so we consider the tid's to constitute one big row)
10000...00010...00100...10010...01001...10010...00001...

BIP organization:
10010100000100001000001010100000101...

BIb organization: (each pixel is a bit, thus BIb = BIL)


With Boolean data from a Market Basket Database,

 in bSQ=BSQ the data is organized into
    a separate file for each item ordered by transaction

 in BIL the data is organized onto one file ordered by transaction first
    and then by item.

 in BIP=BIb the data is organized onto one file ordered by item first
    and then by transaction.

Note:  Market Basket Data Mining is done assuming the REL organization.

**************************************************************



An example of spatial data comes from precision agriculture,
 we subdivided or "grid" a field into "pixels" or points (usually evenly).

   0   1   2   3   4   5   6   7   8   9  10  11  12 
 .---.---.---.---.---.---.---.---.---.---.---.---.---.
0|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
1|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
2|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
3|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
4|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
5|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
6|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
7|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
8|   |   |   |   |   |   |   |   |   |   |   |   |   |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
.|   |   |   |   |   |   |   |   |   |   |   |   |   |
.
.

The reflectance levels within given spectral ranges (e.g., Red, Green, Blue..)
are captured by a sensor and recorded in raster-ordered BANDs

RED-band
pix refl
0,0  24
0,1  26
0,2  49
0,3  68
0,4  93
0,5 119
.
.
.

The key for each band is the x,y coordinates.  This attribute is usually
    omitted since the raster ordering is taken to be understood.


So a "BAND" is a single attribute file of
the relative reflectance levels (expressed as numbers in [0, 255]) observed
in a particular color range (or non-visible range such as infra-red...) or an
agricultural band (yield levels - e.g., bushels per acre for each pixel).


An association rule example: "At points in a field where the midsummer,
   Near-Infrared (NIR) reflectance is greater than 48 and
   Red reflectance is less than 31, then the
   Yield will be greater than 128 bu/acre"


The rule is written  { NIR>47, R<32 } => { Y>128 }
  - the set, { NIR>47, R<32 } is called the "antecedent" of the rule
  - the set  { Y>128 } is called the "consequent" of the rule


"SUPPORT" of the rule = % (or ratio) of pixels with NIR>47 and R<32 and Y>128.
   - as a ratio, it can be expressed  |antecedent UNION consequent| / Total


"CONFIDENCE" = %(or ratio) of pixels with NIR>47 and R<32 which also have Y>128
  as a ratio it can be expressed |antecedent UNION consequent| / |antecedent|


If support and confidence of this rule is high, that suggests to the producer
that nitrogen fertilizer should be applied where NIR<47 and/or R>32,
so as to maximize the yield in those areas (get it up over 128 Bu/acre).



For ARM, we need to formally define the notions of items, itemsets and
transactions in spatial datasets.

The items:         I = {(b,v) : b= a band, v= a reflectance value}


The transactions:  D = {t : t=(tid,t-itemset},

tid=(x,y), the pixel row,col and
t-itmeset = {(b,v): b ranges over all bands and
                    v is the reflectance at pixel, t, in band, b.}


Note right away that the sizes are very very large in the ARM sense
(e.g., for TM satellite images (with yield bands), there are ~40,000,000
transactions, 8*256 = 2048 items and 2^(2048) itemsets!)

The number of transactions (pixels) can be reduced by focusing on a
particular small area (e.g., a field).

The number of itemsets can be reduced by noting that a pixel can have only
one reflectance value from a given band.  Almost always we are interested
in knowing when the values are in a particular range or interval.  
Therefore we can restrict our itemset consideration to those composed of
one interval from each band.
---------------------------

In a given band there are 255 ways to pick the left endpoint of the interval
and for left-endpoint, l, there are 255-l ways to pick a right endpoint.
On the average there will be 127 ways to pick the right endpoint.
Thus, there are really only (255*127)^8 or ~(2^8*2^7)^8 = 2^120 = 10^36 =
1,000,000,000,000,000,000,000,000,000,000,000,000 itemsets to consider.


  - We can reduce the number of items by  partitioning the Bands into intervals
    and letting each interval correspond to an value.



Partitioning bands into intervals:

Equilength interval partitioning.

By truncating some of the right-most bits of the values (low order or least
significant bits) we can reduce the size of the itemset dramatically without
loosing too much information (the low order bits show only slight differences).

For example, we can truncate the right-most 6 bits, resulting in 4 intervals,
   each of which we consider to be a "value" (e.g., identify each interval
   with its midpoint):

[0,64), [64,128), [128,192), [192,256)  identified with values, 32, 96, 160, 224

Then there are only 10^8 itemsets or ~ = 100,000,000 itemsets (10 intervals in
each band?).  That's still a lot!


Further pruining can be done by understanding what kinds of rules are probably
of interest to the user and focusing on those only.  For instance:

For a precision farmer, there is probably little interest in rules of the type,
R>48 => G<134.

A physicist might be interested in relationships among colors observed
(both antecedent and consequent from visible bands), but the farmer is
interested only in relationships where the antecedent is from the color
bands and the consequent if from the yield band (he or she wants to know
what observed color combinations predict high yield).

Therefore, for precision agriculture, we could restrict to those rules that have
consequent from the yield band (and then only the particular interval which
indicates "high yeild") and antecedent from the others, so 10^7 = 10,000,000
itemsets to consider.

We will refer to restrictions of this time (in the type of itemsets allowed for
antecedent and consequent based on interest) as restricting to rules which
are "of interest" (OI rules), as distinct from the notion of rules that are
"interesting".  OI rules can be interesting or not interesting, depending on
such measures as support and confidence, etc.


Slalom analogy:

Each transaction (pixel), t, is like a path down a ski hill, each
item is an interval in one band and therefore like a "gate" on the ski slope:

A transaction (pixel) "contains" an itemset, if it "goes thru" each gate
(has band-i reflectance in interval-i).

So if x is an itemset (set of "gates", one for each band),
s(x) is the proportion of paths passing thru the gates of x.

                  b1    b2    b3    b4    b5    b6    b7    b8 
                  |                 |    .---.        |     | 
      t---.       |                     / |   \       |     | 
           `---------------------------'  |    \      |     | 
                  |     |                 |      \    |    .----
                  |     |           |     |       \_______/ | 
 



Non-equi-length:
In some cases, it would be better to allow users to partition interval into
uneven lengths.  User knowledge can be applied in interval partition.
Eg,, band bi can be partitioned into 3 intervals {[0,63), [64,127), [128,256)
(if aren't many values between 128 to 255.)

Applying user's domain knowledge increases assoc rules accuracy and efficiency. 

Equi-depth partitioning (each partition has approx. the same number of pixels).
Can be done by setting the endpoints so that there are (approximately) the same
number of values in each interval (at the mean value), etc.
  Sometime this leads to more reasonable rules.
 

Whether partitioning is equilength or not, it can be easily characterized as:
For each band, choose interval end-points, e0=0, e1, ..., en+1=256,

 then the items are   ( bi, [ei,ei+1 ) ), i=0,..n

(in the equilength case there is a common length, ei - e(i-1) = a constant),


******************************************************************
* These notes contain NDSU confidential and proprietary material.*
******************************************************************

We consider a data structure which is particulary well suited for data mining
spatial data.  Assume the data is in bSQ organization, B11,..,Bn8
(a separate file for each bit position of each band).

It is common practice to reduce data volume by truncating off a certain
number of low-order bits of each byte.  Thus we will speak of "8-bit values"
(they are the full byte values), 7-bit values (with the low order bit truncated)
6-bit values (low-order two bits truncated)....

And assume each band has been separated by
bit-position into 8 "bit-bands" or bit vectors.)

Let the ith bit band of the kth band be denoted, Bki.

Each bit-band can be represented using a spatial data structure called a
P-tree (for Peano-tree).  These P-trees are formulated to facilitate data
mining of spatial byte-bands.

***********************************************************
The Peano Count Tree (P-tree) concept and algebra is      *
NDSU confidential and proprietary material.               *
***********************************************************

The Peano-Tree for Bij is a lossless tree representation of the bit band
from which the bit band can be completely reconstructed and which also
contains the 1-bit count for each and every quadrant in the original space.


Example:
Suppose we have a band, Bk, in a 64 pixel space (8 rows by 8 columns):

11110001 10010010 11100011 11010101  10000000 11100101 01111000 00110011
10110001 11010011 11101010 11000001  11100100 00101101 00011110 01010101
11010001 10010010 11100011 11010101  10000000 11100101 01111000 00110011
10010001 11010011 11101010 11000001  11100100 10101101 10011110 01010101

11110001 10010010 11100011 11010101  10001110 11100101 11111000 10110011
10110001 11010011 11101010 11000001  11100110 10101101 10011110 11011101
11010001 10010010 11100011 10011101  10001010 11100101 11111000 10110111
00010001 11010011 11101010 10101001  11101100 10101101 10011110 11010101

Consider the bit-band, Bk1 of the above band file:

Bk1
1111 1100
1111 1000
1111 1100
1111 1110

1111 1111 
1111 1111
1111 1111
0111 1111

Which, of course, when saved on disk in raster sequence is:
1111 1100 1111 1000 1111 1100 1111 1110 1111 1111 1111 1111 1111 1111 0111 1111

For this bit-band, the P-tree structure is:
Pk1                       55
           ____________//    \\___________   
          /          __/      \_          \       
        16          8           15         16    
               ___//|\         /|\\__ 
              /   / | \       / | \  \
             3   0  4  1     4  4  3  4         
           1110       0010       1101          
                                                          

Here is how we arrive at it:

The root holds the count of 1-bits for its quadrant
                    (which is the entire bit array).
11111100
11111000
11111100
11111110   count=55
11111111 
11111111
11111111
01111111

Each inode has the 1-bit count for its quadrant (order inodes at each level
using Peano ordering (recursive raster ordering) or ul, ur, ll, lr).

cnt=16 - > cnt=8
  1111     1100
  1111     1000
  1111     1100
  1111     1110
          /
         /
        /
       /
      /
     /
    v
cn=15 - > cnt=16
  1111     1111 
  1111     1111
  1111     1111
  0111     1111

giving:
                          55
           ____________//    \\___________   
          /          __/      \_          \       
        16          8           15         16    



Note that the ul and lr quadrants at this level need no further detailing
since they are entirely 1-bits.  Thus, the tree ends here for those quadrants.

For the ur quadrant:

           1100
           1000
           1100
           1110

recursively, we count 1-bits for the subquadrants in raster order:

        cnt=4 - > cnt=0
           11     00
           10     00
                /
               /
              /
             /
            v
        cnt=4 - > cnt=1
           11     00
           11     10


giving:                   55
           ____________//    \\___________   
          /          __/      \_          \       
        16          8           15         16    
               ___//|\                
              /   / | \                  
             3   0  4  1                        
                                                          
We note that only the ul and lr subquandrants need detailing and we
detail by listing the bits in raster order (this is a recursive step
also since we are now down to 1x1 quadrants and the count is either 1 or 0).

                          55
           ____________//    \\___________   
          /          __/      \_          \       
        16          8           15         16    
               ___//|\    
              /   / | \    
             3   0  4  1     
           1110       0010  




For the ll quadrant:
  1111  
  1111 
  1111
  0111

recursively, we count 1-bits for the subquadrants in raster order:

cnt=4 - > cnt=4
  11      11  
  11      11 
        /
       /
      /
     /
    /
   v
cnt=3 - > cnt=4
  11      11
  01      11

giving:                   55
           ____________//    \\___________   
          /          __/      \_          \       
        16          8           15         16    
               ___//|\         /|\\__ 
              /   / | \       / | \  \
             3   0  4  1     4  4  3  4         
           1110       0010                     

Finally, only the ll subquandrant need detailing:

                          55
           ____________//    \\___________   
          /          __/      \_          \       
        16          8           15         16    
               ___//|\         /|\\__ 
              /   / | \       / | \  \
             3   0  4  1     4  4  3  4         
           1110       0010       1101          


If we complete the counts for all subquadrants, the leaf sequence is
just the well-known Peano ordering sequence for the bit-band.
(thus the terminology, "Peano Count Tree")

(and we can think of the P-tree as a compressed form of the Peano sequence
with the addition of having all quadrant 1-counts as well)

                                55
            _________________/|    \\_______________________________
           /                  |     \____________                   \       
         16                   8                  15                  16    
   _____//\\____       _____//\\____       _____//\\____       _____//\\____
  /     /  \    \     /     /  \    \     /     /  \    \     /     /  \    \
 4     4    4    4   3     0    4    1   4     4    3    4   4     4    4    4
1111 1111 1111 1111 1110 0000 1111 0010 1111 1111 1101 1111 1111 1111 1111 1111

Start_here
|
v
1-1  1-1    /1-1  0-0
 /  / /    /  /  / /
1-1/ 1-1  |  1-0/ 0-0
 ______/  |   ______/
/         |  /
1-1  1-1  |  1-1  0-0
 /  / /  /    /  / /
1-1/ 1-1'    1-1/ 1-0
 ___________________/
/
1-1  1-1    /1-1  1-1
 /  / /    /  /  / /
1-1/ 1-1  |  1-1/ 1-1
 ______/  |   ______/
/         |  /
1-1  1-1  |  1-1  1-1
 /  / /  /    /  / /
0-1/ 1-1'    1-1/ 1-1 <-End_here

Peano ordering is a "space filling" curve ordering which can be thought of as
"recursive raster ordering" (recursing over ever increasing quadrant sizes).

Hilbert is another "space filling" ordering which preserves distances better
than Peano (every move is to a neighbor).  It may result in better compression
but it does not appear to be as useful for our purposes.

Start_here
|
v
1-1  1-1--1 1--0-0
  |  |    | |    |
1-1  1-1  1-0  0-0
|      |       |  
1 1--1 1  1-1  0-0
| |  | |  | |    |
1-1  1-1  1 1--1-0
          |        
          |
1-1  1-1  1 1--1-1
| |  | |  | |    |
1 1--1 1  1-1  1-1
|      |       |          
1-1  1-1  1-1  1-1
  |  |    | |    |
0-1  1-1--1 1--1-1
^
|
End_here

Here is a discussion of the two orderings:

ab cd  ef gh
ij kl  mn op

qr st  uv wx
yz 01  23 45


AB CD  EF GH
IJ KL  MN OP

QR ST  UV WX
YZ 67  89 -+                                             

Peano ordering follows the pattern:
abij cdkl qryz st01 efmn ghop uv23 wx45 ABIJ CDKL QRYZ ST67 EFMN GHOP UV89 WX-+

Hilbert ordering follows the pattern:
abji qyzr s01t lkcd emnf ghpo wx54 3vu2 EMNF GHPO WX+- 9VU8 76ST LDCK JBAI QRZY

Note that in Hilbert, every move is to a neighbor.  It is the best ordering.
The bottom up construction is:
			       _
The pattern is: starting with  _|  in the upper left 4-value quadrant,
1. rotate along y=-x axis and drag down (completing an 8-value pattern)
2. fold the 8-value pattern to the right (completing a 16-value pattern
     - the 4x4 upper left quadrant)
3. rotate the 16-value pattern (along y=-x always) and drag right
4. fold the 32-value pattern down
5. rotate the 64-value pattern and drag down
6. fold the 128-value pattern right
7. rotate the 256-value pattern and drag right
8. fold the 512-value pattern down
 . . . 
                                 
So the 3 bitvectors with the orderings (reorderings) are:
Raster:
1111 1100 1111 1000 1111 1100 1111 1110 1111 1111 1111 1111 1111 1111 0111 1111
Peano:
1111 1111 1111 1111 1110 0000 1111 0010 1111 1111 1101 1111 1111 1111 1111 1111
Hilbert:
1111 1111 1111 1111 1101 0000 0001 1111 1111 1111 1111 1111 1111 1111 1111 1110



To address quadrants we use a Quadrant-ID scheme: 

 - First assign level numbers to the quadrants (and the P-tree levels)

   The root is Level-n if there are 4^n elements (2^n X 2^n quadrants) in the space.
   Each quadrant at Level-i has 4^i elements (2^i X 2^i quadrants).

   Level-0 is the lowest level - the leaf level (2^0 X 2^0 or quadrants)

 - Assign 2-bit addresses to the quadrants within each level:
          ul=00   ur=01   ll=10   lr=11

 - A quadrant is identified by the sequence of its 2-bit addresses
   (along its inodes in its path):

   - Therefore the ul subquadrant of the lr subquandrant of the ur
     subquandrant of the bit-band has QID:   01.10.00

                                   .--------.                                  
                                   |        |                            L3    
             _____________________/`--------'\___________________             
           /00                 01 /         \ 10                  \ 11         
       .------.           .------.           .------.           .------.        
      /`------'\         /`------'\         /`------'\         /`------'\  L2  
     / |    |   \       /  |    |  \       /  |    |  \       /   |    | \ 
    /  |    |    \     /   |    |   \     /   |    |   \     /    |    |  \
  00  01   10    11  00   01   10   11   00   01   10   11   00   01   10  11
.--. .--. .--. .--. .--. .--. .--. .--. .--. .--. .--. .--. .--. .--. .--. .--L1
`--' `--' `--' `--' `--' `--' `--' `--' `--' `--' `--' `--' `--' `--' `--' `--' 
----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|---L0
                              ^
                              |

     01.10.00 is QID of this 1x1 quadrant (or writing it in non-binary as 1.2.0)
                                                                   


Now continuing with the definitions of Peano Count Trees:

For the bit band, Bk1:

1111 1100
1111 1000
1111 1100
1111 1110

1111 1111 
1111 1111
1111 1111
0111 1111

We have P-tree, Pk1 (assuming the above data comes from band-k
and is the high order bit or bit-1 of that band)

Pk1                       55
           ____________//    \\___________   
          /          __/      \_          \       
        16          8           15         16    
               ___//|\         /|\\__ 
              /   / | \       / | \  \
             3   0  4  1     4  4  3  4         
           1110       0010       1101          


There is also Pk,0 for the other possible 1-bit value, 0.
Usually we don't express it because it is so easily derivable
from Pk,1 as the complement:

Pk0                        9
           ____________//    \\___________   
          /          __/      \_          \       
         0          8           1          0     
               ___//|\         /|\\__ 
              /   / | \       / | \  \
             1   4  0  3     0  0  1  0         
           0001       1101       0010          

Single bit P-trees can be built for the other bit positions as well:

Pk2 (using the second high-order bit instead of the 1st).
...
Pk8 (using the 8th high order bit (or lowest order bit).

11110001 10010010 11100011 11010101  10000000 11100101 01111000 00110011
10110001 11010011 11101010 11000001  11100100 00101101 00011110 01010101
11010001 10010010 11100011 11010101  10000000 11100101 01111000 00110011
10010001 11010011 11101010 11000001  11100100 10101101 10011110 01010101

11110001 10010010 11100011 11010101  10001110 11100101 11111000 10110011
10110001 11010011 11101010 11000001  11100110 10101101 10011110 11011101
11010001 10010010 11100011 10011101  10001010 11100101 11111000 10110111
00010001 11010011 11101010 10101001  11101100 10101101 10011110 11010101

Bk2:
1011 1100
0111 1000
1011 1100
0111 1110

1011 1111
0111 1111
1010 1111
0110 1111

Pk2:                         46
           _______________//    \\___________   
          /             __/      \_          \       
    ____12             8           10         16     
   /   /|\        ___//|\         /|\\__ 
  /   / | \      /   / | \       / | \  \
 2   4  2  4    3   0  4  1     2  4  2  2         
1001  1001    1110       0010 1001  1001 1010     


Bk3:
1010 0111
1010 1100
0010 0111
0010 1100

1010 0111
1010 1100
0010 0111
0011 1100

Pk3:                           33
           _________________//    \\__________________
          /               __/      \_____             \       
    ____ 6              10                7  _         10 ________
   /   /  |\        ___//\\____         / | \_\__      /\\____    \
  /   /   | \      /   /  \    \       /  |   \  \    /  \    \    \
 2   2    0  2    3   2    3    2     2   2    0  3  3    2    3    2
1010 1010  1010  0111 1100 0111 1100 1010 1010  1011 0111 1100 0111 1100


Bk4:
1101 0011
1100 0011
1101 0011
1100 0011

1101 0011
1100 0011
1101 0011
1100 0011

Pk4:                          36 
           _________________//    \\___________________
          /               __/      \______             \       
    ____ 10              8               _10            8  ________
   /   /  |\        ___//\\____         / | \_\__       /\\____    \
  /   /   | \      /   /  \    \       /  |   \  \     /  \    \    \
 4   1    4  1    0   4    0    4     4   1    4  1   0    4    0    4
     0100  0100                          0100    0100                  

Bk5:
0000 0010
0010 0110
0000 0010
0010 0110

0000 1010
0010 0111
0001 1010
0011 1110

Pk5:                          22 
           _________________//    \\___________________
          /               __/      \______             \       
    ____ 2               6                 4            10 ________
   /   /  |\        ___//\\____         / | \_\__       /\\____    \
  /   /   | \      /   /  \    \       /  |   \  \     /  \    \    \
 0   1    0  1    1   2    1    2     0   1    0  3   2    3    3    2
     0010   0010 0001 1010 0001 1010     0010    0111 1001 1011 1011 1010

Bk6:
0001 0100
0000 1111
0001 0100
0000 1111

0001 1100
0000 1111
0001 0101
0000 1111

Pk6:                          26 
           _________________//    \\___________________
          /               __/      \______             \       
    ____ 2              10                2              12________
   /   /  |\        ___//\\____         / | \_\__       /\\____    \
  /   /   | \      /   /  \    \       /  |   \  \     /  \    \    \
 0   1    0  1    3   2    3    2     0   1    0  1   4    2    3    3
     0100   0100 0111 0011 0111 0011     0100    0100     0011 0111 0111

Bk7:
0110 0001
0110 0010
0110 0001
0110 0010

0110 1001
0110 1010
0110 1001
0110 0010

Pk7:                            27
           ___________________//    \\___________________
          /                 __/      \______             \       
    ____ 8                 4                 8             7 ________
   /   /  |\__        ___//\\____         / | \_\__        /\\____    \
  /   /   |   \      /   /  \    \       /  |   \  \      /  \    \    \
 2   2    2    2    0   2    0    2     2   2    2  2    2    2    1    2
0101 1010 0101 1010   0110     0110  0101 1010 0101 1010 1010 0110 1000 0110


Bk8:
1011 0101
1101 0101
1011 0101
1101 0101

1011 0101
1101 0101
1011 0101
1101 0101

Pk8:                             39
           ___________________//    \\________________________
          /                 __/      \_______                 \       
    ____12                 8                _ 11               8 ________
   /   /  |\__        ___/ /\\____         / | \_\____        /\\____    \
  /   /   |   \      /    /  \    \       /  |   \    \      /  \    \    \
 3   3    3    3    2    2    2    2     3   3    3    2    2    2    2    2
1011 1101 1011 1101 0101 0101 0101 0101 1011 1101 1011 1101 0101 0101 0101 0101


These 8 P-trees are called the "basic P-trees"



The basic P-trees can be combined together to produce other
useful P-trees (including the original data again)

There is an "algebra" on the universe of P-trees for a spatial dataset.
It includes unary operator COMP (complement, sometimes denoted by ')
       and binary operators, AND, OR, XOR, etc.


COMP is done as follows:
   At level-i, replace each count, c, by (4^i - c)

AND is done as follows:
    For Pk,v AND Pk,j: Working down in depth-first order from the root until
        you reach a pure quadrant, Q (pure0 means all 0's & pure1 means all 1's)
    if the Pk,v branch terminates in pure1's at Q, Pk,v AND Pk,j|Q = Pk,j|Q
elseif the Pk,j branch terminates in pure1's at Q, Pk,v AND Pk,j|Q = Pk,v|Q 
elseif either termininates in a quadrant of pure0's at Q, Pk,v AND Pk,j|Q = 0|Q

OR is done as follows:
    For Pk,v OR Pk,j: Working down in depth-first order from the root until
        you reach a pure quadrant, Q (pure0 means all 0's & pure1 means all 1's)
    if the Pk,v branch terminates in pure0's at Q, Pk,v OR Pk,j|Q = Pk,j|Q
elseif the Pk,j branch terminates in pure0's at Q, Pk,v OR Pk,j|Q = Pk,v|Q 
elseif either termininates in a quadrant of pure1's at Q, Pk,v OR Pk,j|Q = 1|Q

XOR is done as follows:
    For Pk,v XOR Pk,j: Working down in depth-first order from the root until
        you reach a pure quadrant, Q (pure0 means all 0's & pure1 means all 1's)
    if the Pk,v branch terminates in pure0's at Q, Pk,v XOR Pk,j|Q = Pk,j|Q
elseif the Pk,j branch terminates in pure0's at Q, Pk,v XOR Pk,j|Q = Pk,v|Q 
elseif the Pk,v branch terminates in pure1's at Q, Pk,v XOR Pk,j|Q = COMP(Pk,j)|Q
elseif the Pk,j branch terminates in pure1's at Q, Pk,v XOR Pk,j|Q = COMP(Pk,v)|Q 


We can construct the P-trees for 2-bit values
       (gives all quadrant counts of "hits" on the particular 2-bit value).

   - Pk,11 (PCT for 11) = Pk1 AND Pk2

   - Pk,01 (PCT for 01) = COMP(Pk1) AND Pk2

   - Pk,10 (PCT for 10) = Pk1 AND COMP(Pk2)

   - Pk,00 (PCT for 00) = COMP(Pk1) AND COMP(Pk2)

and for 3-bit values, 4-bit values, etc.

     Pk,11 = Pk,1  AND Pk,2
     Pk,01 = Pk,1' AND Pk,2
     Pk,10 = Pk,1  AND Pk,2'
     Pk,00 = Pk,1' AND Pk,2'
     Pk,111= Pk,1  AND Pk,2  AND Pk,3           = Pk,11   AND Pk,3  
     Pk,101= Pk,1  AND Pk,2' AND Pk,3           = Pk,10   AND Pk,3  
     Pk,011= Pk,1' AND Pk,2  AND Pk,3           = Pk,01   AND Pk,3  
     Pk,001= Pk,1' AND Pk,2' AND Pk,3           = Pk,00   AND Pk,3  
     Pk,110= Pk,1  AND Pk,2  AND Pk,3'          = Pk,11   AND Pk,3'
     Pk,100= Pk,1  AND Pk,2' AND Pk,3'          = Pk,10   AND Pk,3' 
     Pk,010= Pk,1' AND Pk,2  AND Pk,3'          = Pk,01   AND Pk,3' 
     Pk,000= Pk,1' AND Pk,2' AND Pk,3'          = Pk,00   AND Pk,3' 
     Pk,1111=Pk,1  AND Pk,2  AND Pk,3  AND Pk,4 = Pk,111  AND Pk,4
     Pk,1011=Pk,1  AND Pk,2' AND Pk,3  AND Pk,4 = Pk,101  AND Pk,4
     Pk,0111=Pk,1' AND Pk,2  AND Pk,3  AND Pk,4 = Pk,011  AND Pk,4
     Pk,0011=Pk,1' AND Pk,2' AND Pk,3  AND Pk,4 = Pk,001  AND Pk,4
     Pk,1101=Pk,1  AND Pk,2  AND Pk,3' AND Pk,4 = Pk,110  AND Pk,4
     Pk,1001=Pk,1  AND Pk,2' AND Pk,3' AND Pk,4 = Pk,100  AND Pk,4
     Pk,0101=Pk,1' AND Pk,2  AND Pk,3' AND Pk,4 = Pk,010  AND Pk,4
     Pk,0001=Pk,1' AND Pk,2' AND Pk,3' AND Pk,4 = Pk,000  AND Pk,4
     Pk,1110=Pk,1  AND Pk,2  AND Pk,3  AND Pk,4'= Pk,111  AND Pk,4'
     Pk,1010=Pk,1  AND Pk,2' AND Pk,3  AND Pk,4'= Pk,101  AND Pk,4'
     Pk,0110=Pk,1' AND Pk,2  AND Pk,3  AND Pk,4'= Pk,011  AND Pk,4'
     Pk,0010=Pk,1' AND Pk,2' AND Pk,3  AND Pk,4'= Pk,001  AND Pk,4'
     Pk,1100=Pk,1  AND Pk,2  AND Pk,3' AND Pk,4'= Pk,110  AND Pk,4'
     Pk,1000=Pk,1  AND Pk,2' AND Pk,3' AND Pk,4'= Pk,100  AND Pk,4'
     Pk,0100=Pk,1' AND Pk,2  AND Pk,3' AND Pk,4'= Pk,010  AND Pk,4'
     Pk,0000=Pk,1' AND Pk,2' AND Pk,3' AND Pk,4'= Pk,000  AND Pk,4'
  . . .
Pk,00000000=Pk,1' & Pk,2' & Pk,3' & Pk,4' & Pk,5' & Pk,6' & Pk,7' & Pk,8'
  . . .
    

Actual storage might be done as: (assume Ln = root, ie, 4^n pixels)


Breadth-first layout is a structure with n+1 elements (one for each level).

Ln:     a 1+2*n bit field (to hold counts up to 4^n)

L(n-1): a 1+2*(n-1) bit field for each of the L(n-1)-quadrants if the root
                      is not pure ("mixed" root)
L(n-2): a 1+2*(n-2) bit field for each of the L(n-2)-quadrants whose L(n-1)
   . . .              parent is not pure ("mixed" parent)
Lk:     a 1+2*k     bit field for each of the Lk-quadrants whose L(k+1) parent
   . . .              is not pure ("mixed" parent)
L1:     a 1+2*1=3   bit field for each of the L1-quadrants whose L2 parent
                      is not pure ("mixed" parent)
L0:     a 1+2*0=1   bit field for each of the L0-quadrants whose
                      L1 parent is not pure ("mixed" parent)


Depth-first layout:
a 1+2*n bit field for the root-count. If the root is not pure, it is followed by
a 1+2*(n-1) bit field the 0th L(n-1)-quadrant.
  if it is pure, a 1+2*(n-1) bit field the 1st L(n-1)-quadrant,
     if it is pure, a 1+2*(n-1) bit field the 2nd L(n-1)-quadrant,
        if it is pure, a 1+2*(n-1) bit field the 3rd L(n-1)-quadrant,
        else a 1+2*(n-2) bit field its 0th L(n-2)-quadrant,
           if it is pure, a 1+2*(n-2) bit field the 1st L(n-2)-quadrant,
        ...
     else a 1+2*(n-2) bit field its 0th L(n-2)-quadrant,
     ...

We will use breadth-first layout.

Then we actually store:
Pk,1                      55                      L3
           ____________//    \\___________   
          /          __/      \_          \       
        16          8           15         16     L2
               ___//|\         /|\\__ 
              /   / | \       / | \  \
             3   0  4  1     4  4  3  4           L1
           1110       0010       1101             L0
as:
0110111
10000 01000 01111 10000
011 000 100 001 100 100 011 100
1110 0010 1101


Next we note that there is an even more concise storage method
using this same depth-first layout.  Instead of storing quadrant 1-counts we
can simply store a "purity indicator".  Then the counts can be quickly constructed
from the purity mask tree (PMT) structure.  At each node in the PMT,
we will be to use 3-value logic, 11=pure1; 00=pure0; and 01=mixed quadrants.
Except at Level-0 where there are no mixed quadrants so we can use
 1=pure1 and 0=pure0.

PMTk1                     01                      L3
           ____________//    \\___________   
          /          __/      \_          \       
         11         01          01        11      L2
               ___//|\         /|\\__ 
              /   / | \       / | \  \
            01   00 11 01   11 11 01  11          L1
           1110       0010       1101             L0
store as:
01
11 01 01 11
01 00 11 01  11 11 01 11
1110 0010 1101

(for human understanding we can replace 2-bit symbols with
 1-char symbols: 1=pure1; 0=pure0; m=mixed):

PMTk1                      m                      L3
           ____________//    \\___________   
          /          __/      \_          \       
         1          m           m          1      L2
               ___//|\         /|\\__ 
              /   / | \       / | \  \
             m   0  1  m     1  1  m  1           L1
           1110       0010       1101             L0
store as:
m
1 m m 1
m 0 1 m  1 1 m 1
1110 0010 1101




Before going further we will note here that we actually AND
 two of these using a depth-first AND algorithm on these breadth-first layouts:

PMT1:  m
1 m m 1
m 0 1 m  1 1 m 0
1110 0010 1101

PMTk2:  m
m 1 m 0
0 0 m 1  m m m 0
1101 0110 0101 1110

then the PMTk1 AND PMTk2: root is m




Next, descend depth-first to:
v                      PMTk1
1 m m 1
m 0 1 m  1 1 m 0
1110 0010 1101

v                      PMTk2
m 1 m 0
0 0 m 1  m m m 0
1101 0110 0101 1110

Since quadrant (PMTk1)0 is pure1, (PMTk1 AND PMTk2)0 is
      (PMTk2)0 (the part of PMTk2 to the left of the line):
m|1 m 0
 |______
0 0 m 1 | m m m 0
    ____|
1101| 0110 0101 1110

Thus, so far,
PMTk1 AND PMTk2 has root, m, and lower levels:
m      
0 0 m 1
1101



Next, descend depth-first to:
  v                    PMTk1
1|m m 1
_|
m 0 1 m  1 1 m 0

1110 0010 1101

  v                    PMTk2
m|1 m 0
 |______
0 0 m 1 | m m m 0
    ____|
1101| 0110 0101 1110

Since quadrant (PMTk2)1 is pure1, (PMTk1 AND PMTk2)1 is
      (PMTk1)1 (the part of PMTk1 between the lines):
  v                    PMTk1
1|m|m 1
_| |____
m 0 1 m |1 1 m 0
        |_
1110 0010 | 1101

Thus, so far,
PMTk1 AND PMTk2 has root, m, and lower levels:
m m      
0 0 m 1  m 0 1 m
1101 1110 0010




Next, descend depth-first to:
    v                  PMTk1
1 m|m 1
   |____
m 0 1 m |1 1 m 0
        |_
1110 0010 | 1101

    v                  PMTk2
m 1|m 0
   |____
0 0 m 1 | m m m 0
    ____|
1101| 0110 0101 1110

Since both are mixed, install m in PMTk1 AND PMTk2 and then descend another level:

1 m|m 1               PMTk1
   |____ v
m 0 1 m |1 1 m 0
        |_
1110 0010 | 1101


m 1|m 0               PMTk2
   |____  v
0 0 m 1 | m|m m 0
    ____|  |
1101| 0110 |0101 1110

Since quadrant (PMTk1)2.0 is pure1, (PMTk1 AND PMTk2)2.0 is
      (PMTk2)2.0 (the part of PMTk2 between the lines):

Thus, so far,
PMTk1 AND PMTk2 has root, m, and lower levels:
m m m    
0 0 m 1   m 0 1 m   m
1101 1110 0010 0110




Next, descend depth-first:
1 m|m 1               PMTk1
   |______ v
m 0 1 m  1|1 m 0
          |
1110 0010 | 1101


m 1|m 0               PMTk2
   |_______ v
0 0 m 1   m|m|m 0
           | |__
1101  0110 |0101| 1110

Since quadrant (PMTk1)2.1 is pure1, (PMTk1 AND PMTk2)2.1 is
      (PMTk2)2.1 (the part of PMTk2 between the lines):

Thus, so far,
PMTk1 AND PMTk2 has root, m, and lower levels:
m m m    
0 0 m 1   m 0 1 m   m m
1101 1110 0010 0110 0101




Next, descend depth-first:
1 m|m 1               PMTk1
   |________ v
m 0 1 m  1 1|m 0
            |
1110 0010   |1101


m 1|m 0               PMTk2
   |_________ v
0 0 m 1   m m|m 0
             |__
1101  0110  0101| 1110

Since both are mixed, install m and descend
 (which at L0 is just to AND the nibbles):
1 m|m 1               PMTk1
   |________  
m 0 1 m  1 1|m 0
            |v
1110 0010   |1101


m 1|m 0               PMTk2
   |_________  
0 0 m 1   m m|m 0
             |__  v
1101  0110  0101| 1110

Thus, so far,
PMTk1 AND PMTk2 has root, m, and lower levels:
m m m    
0 0 m 1   m 0 1 m   m m m
1101 1110 0010 0110 0101 1100





Next, descend depth-first:
1 m|m 1               PMTk1
   |__________ v
m 0 1 m  1 1 m|0
              |___
1110 0010    1101 |


m 1|m 0               PMTk2
   |___________ v
0 0 m 1   m m m|0
               |______ 
1101  0110  0101  1110|

Since quadrant (PMTk1)2.3 is pure0, (PMTk1 AND PMTk2)2.3 is pure0

Thus,
PMTk11 = PMTk1 AND PMTk2 has root, m, and lower levels:
m m m 0
0 0 m 1   m 0 1 m   m m m 0
1101 1110 0010 0110 0101 1100










Implementations notes:
  - The lines can be replaced by pointers or cursors
  - One can view this entirely in terms of shifting cells
        from one of the operands to the result:
v
1 m m 1                PMTk1
m 0 1 m  1 1 m 0
1110 0010 1101
v
m 1 m 0                PMTk2
0 0 m 1  m m m 0
1101 0110 0101 1110

Since quadrant (PMTk1)0 is pure1, (PMTk11)0 is
      (PMTk2)0 (shift subtree to the left of the line to result)
m|1 m 0
 |______
0 0 m 1 | m m m 0
    ____|
1101| 0110 0101 1110

Thus, so far, PMTk11 is:
m      
0 0 m 1
1101



Next, shift from Pk1.1 to Pk11.1 (since Pk2.1 is pure1)
v
m|m 1                  PMTk1
 |______
m 0 1 m |1 1 m 0
        |_
1110 0010 |1101

v
1 m 0                  PMTk2
m m m 0
0110 0101 1110

Thus, so far, PMTk11 is:
m m      
0 0 m 1  m 0 1 m
1101 1110 0010




Next, shift m to PMTk11.2 from both and descend (since both PMTki.2 are mixed)
v
m 1                  PMTk1
1 1 m 0
1101

v
m 0                  PMTk2
m m m 0
0110 0101 1110

Thus, so far, PMTk11 is:
m m m    
0 0 m 1  m 0 1 m
1101 1110 0010




Next, shift from PMTk2.2.0 to PMTk11.2.0 (since PMTk1.2.0 is pure1)
1                  PMTk1
v
1 1 m 0
1101

0                  PMTk2
v
m|m m 0
 |__
0110|0101 1110

Thus, so far, PMTk11 is:
m m m     
0 0 m 1  m 0 1 m  m
1101 1110 0010 0110




Next, shift from PMTk2.2.1 to PMTk11.2.1 (since PMTk1.2.1 is pure1)
1                  PMTk1
v
1 m 0
1101

0                  PMTk2
v
m| m 0
 |__
0101|1110

Thus, so far, PMTk11 is:
m m m     
0 0 m 1  m 0 1 m  m m
1101 1110 0010 0110 0101




Next, shift m from both to PMTk11.2.2 and descend (since both PMTki.2.2 are mixed)
(since the descent is to L0, AND)
1                  PMTk1
v
m 0
1101

0                  PMTk2
v
m 0
1110

Thus, so far, PMTk11 is:
m m m     
0 0 m 1  m 0 1 m  m m m
1101 1110 0010 0110 0101 1100




Next, shift 0 from PMTk1.2.3 to PMTk11.2.3 (since both PMTk1.2.3 is pure0)
1                  PMTk1
v
0


0                  PMTk2
v
0


Thus, so far, PMTk11 is:
m m m     
0 0 m 1  m 0 1 m  m m m 0
1101 1110 0010 0110 0101 1100



Descend (since at L0, ascend)   Next, shift 0 from PMTk2.3 to PMTk11.3 (PMTk2.3 is pure0)
v
1                  PMTk1

v
0                  PMTk2


Thus, so far, PMTk11 is:
m m m 0
0 0 m 1  m 0 1 m  m m m 0
1101 1110 0010 0110 0101 1100


*******************************************
A final storage arrangement is uncompressed PMTs using 4-value logic
and breadth-first layout (called PMTbr for "breadth-first and redundant):

00=pure0 run
11=pure1 run
01=mixed run
10=uncompressed bit segment

For human readability we will enclose runlengths in:
()  for pure0 run
[]  for pure1 run
{}  for mixed run
<>  for uncompressed segment


PMT11:
m                       L5
1mm0                    L4
01mm 001m               L3
0m10 0010 1m10          L2
01m0 0100               L1
0011                    L0

becomes
PMTbr11:
{1}                   
[1]  {2}                                                       (1)
[4]  (1) [1] {2}                         (2)  [1] {1}          (4)
[16] (4) [4] (1) {1}         [1] (3) [1] (9)  [5] {1}      [1] (17)
[64] (16)[16](5) [1]{1}   (1)[4] (12)[4] (36) [20](1)[1](2)[4] (68)
[256](64)[64](20)[4](2)[2](4)[16](48)[16](144)[80](4)[4](8)[16](272)

and

PMTbr12:
{1}                   
{1}                      (1)  {1}       [1]
[1] (2)  {1}             (4)  [2]  (2)  [4]
[4] (9)  {2}         [1] (16) [8]  (8)  [16]
[16](38) [1]{3}      [6] (64) [32] (32) [64]
[64](152)[6](5)[4](1)[24](256)[128](128)[256]


ANDing these:
{1} & [1}                                        ={1}

[1]{2}(1) & {1}(1){1}[1]                         ={1}(1){1}(1)

[4](1)[1]{2}(2)[1]{1}(4) & [1](2){1}(4)[2](2)[4] =[1](2){1}(12)

[64](16)[16](5)[1]{1}(1)[4](12)[4](36)[20](1)[1](2)[4](68) &
[16](38)[1]{3}[6](64)[32](32)[64]                =[16](38)[1]{3}[6](192)

[256](64)[64](20)[4](2)[2](4)[16](48)[16](144)[80](4)[4](8)[16](272) &
[64](152)[6](5)[4](1)[24](256)[128](128)[256]    =[64](152)[6](5)[4](1)[24](768)



ANDing and ORing are as above (except that OR with purei is just the reverse of
AND with purei)


To complement, COMP(PMT): swap () and []



To extract a subquadrant, say, qid 0.0.2.3, from PMT12:
{1}                                             L5
{1}                      (1)  {1}       [1]     L4
[1] (2)  {1}             (4)  [2]  (2)  [4]     L3
[4] (9)  {2}         [1] (16) [8]  (8)  [16]    L2
[16](38) [1]{3}      [6] (64) [32] (32) [64]    L1
[64](152)[6](5)[4](1)[24](256)[128](128)[256]   L0

cut final 3 from L4, final 12 from L3, final 48 from L2, final 192 from L1,
final 768 from L0 (due to 0.0 qid segment)
{1}                                             L5
{1}                                             L4
[1] (2)  {1}                                    L3
[4] (9)  {2}         [1]                        L2
[16](38) [1]{3}      [6]                        L1
[64](152)[6](5)[4](1)[24]                       L0

cut initial 2 from L3, initial 8 from L2, initial 32 from L1,
initial 128 from L0 and cut final 1 from L3, final 4 from L2,
final 16 from L1, final 64 from L0 (due to 0.0.2 qid segment)
    (1)                                         L3
    (4)                                         L2
    (16)                                        L1
    (64)                                        L0

cut initial 3 from L2, initial 12 from L1,
initial 48 from L0 (due to 0.0.2.3 qid segment)
    (1)                                         L2
    (4)                                         L1
    (16)                                        L0



*****************************************
A simpler example (with only 16 pixels but 4-bit values):
X-Y  B1   B2   B3   B4
0,0 0011 0111 1000 1011
0,1 0011 0011 1000 1111
0,2 0111 0011 0100 1011
0,3 0111 0010 0101 1011
1,0 0011 0111 1000 1011
1,1 0011 0011 1000 1011
1,2 0111 0011 0100 1011
1,3 0111 0010 0101 1011
2,0 0010 1011 1000 1111
2,1 0010 1011 1000 1111
2,2 1010 1010 0100 1011
2,3 1111 1010 0100 1011
3,0 0010 1011 1000 1111
3,1 1010 1011 1000 1111
3,2 1111 1010 0100 1011
3,3 1111 1010 0100 1011

B11  B12  B13  B14
0000 0011 1111 1111
0000 0011 1111 1111
0011 0001 1111 0001
0111 0011 1111 0011


P1,1                               The "basic"    P1,2      P1,3      P1,4
5                                datastructures   7         16        11
0 0 1 4                                           0 4 0 3             4 4 0 3
0001                                              0111                0111


P1,00     P1,01     P1,10      P1,11
7         4         2          3      
4 0 3 0   0 4 0 0   0 0 1 1    0 0 0 3 
1110                0001 1000  0111


P1,000   P1,010    P1,100    P1,110    P1,001    P1,011    P1,101     P1,111
0        0         0         0         7         4         2          3      
                                       4 0 3 0   0 4 0 0   0 0 1 1    0 0 0 3 
                                       1110                0001 1000  0111   


P1,0000  P1,0100   P1,1000   P1,1100   P1,0010   P1,0110   P1,1010   P1,1110
0        0         0         0         3         0         2         0      
                                       0 0 3 0             0 0 1 1        
                                       1110                0001 1000    

P1,0001  P1,0101   P1,1001   P1,1101   P1,0011   P1,0111   P1,1011   P1,1111
0        0         0         0         4         4         0         3      
                                       4 0 0 0   0 4 0 0             0 0 0 3 
                                                                     0111  


B21  B22  B23  B24
0000 1000 1111 1110 
0000 1000 1111 1110    
1111 0000 1111 1100  
1111 0000 1111 1100 
                   
P2,1                                              P2,2      P2,3      P2,4
8                                                 2         16        10
0 0 4 4                                           2 0 0 0   4 4 4 4   4 2 4 0
                                                  1010                1010   
P2,00     P2,01     P2,10     P2,11
6         2         8         0      
2 4 0 0   2 0 0 0   0 0 4 4               
0101      1010


P2,000    P2,010    P2,100    P2,110    P2,001    P2,011    P2,101    P2,111
0         0         0         0         6         2         8         0      
                                        2 4 0 0   2 0 0 0   0 0 4 4          
                                        0101      1010    


P2,0000  P2,0100   P2,1000   P2,1100   P2,0010    P2,0110   P2,1010   P2,1110
0        0         0         0         2          0         4         0      
                                       0 2 0 0              0 0 0 4          
                                       0101                                  

P2,0001  P2,0101   P2,1001   P2,1101   P2,0011    P2,0111   P2,1011   P2,1111
0        0         0         0         4          2         4         0      
                                       2 2 0 0    2 0 0 0   0 0 4 0          
                                       0101 1010  1010



B31  B32  B33  B34
1100 0011 0000 0001                           
1100 0011 0000 0001                         
1100 0011 0000 0000                        
1100 0011 0000 0000                       
P3,1                                              P3,2      P3,3      P3,4
8                                                 8         0         2 
4 0 4 0                                           0 4 0 4             0 2 0 0
                                                                      0101
P3,00     P3,01     P3,10     P3,11
0         8         8         0      
          0 4 0 4   4 0 4 0               


P3,000    P3,010    P3,100    P3,110    P3,001    P3,011    P3,101    P3,111
0         8         8         0         0         0         0         0      
          0 4 0 4   4 0 4 0                                                  



P3,0000   P3,0100   P3,1000   P3,1100   P3,0010   P3,0110   P3,1010   P3,1110
0         6         8         0         0         0         0         0      
          0 2 0 4   4 0 4 0                                                  
          1010                                                               

P3,0001   P3,0101   P3,1001   P3,1101   P3,0011   P3,0111   P3,1011   P3,1111
0         2         0         0         0         0         0         0      
          0 2 0 0                                                            
          0101



B41  B42  B43  B44
1111 0100 1111 1111         
1111 0000 1111 1111          
1111 1100 1111 1111            
1111 1100 1111 1111             

P4,1                                             P4,2      P4,3      P4,4
16                                               5         16        16
                                                 1 0 4 0                    
                                                 1010                      
P4,00     P4,01     P4,10     P4,11
0         0         11        5      
                    3 4 0 4   1 0 4 0     
                    0101      1010


P4,000    P4,010    P4,100    P4,110    P4,001    P4,011    P4,101    P4,111
0         0         0         0         0         0         11        5      
                                                            3 4 0 4   1 0 4 0
                                                            0101      1010


P4,0000   P4,0100   P4,1000   P4,1100   P4,0010   P4,0110   P4,1010   P4,1110
0         0         0         0         0         0         0         0      
                                                                             
                                                                             
P4,0001   P4,0101   P4,1001   P4,1101   P4,0011   P4,0111   P4,1011   P4,1111
0         0         0         0         0         0         11        5      
                                                            3 4 0 4   1 0 4 0
                                                            0101      1010




How can these P-trees be used in Apriori Association Rule Mining?


The Apriori Algorithm  for discovering all frequent itemsets;

First determine all frequent itemsets: 
     start will 1-itemsets,
     then only consider unions of frequent 1-itemsets as candidate 2-itemsets, etc.

Next, for each frequent itemset found, search for the high-confidence rules it supports,
     by trying all 1-item consequents first (so that the antecedent is maximal size),
     then try 2-item consequents, but only those that are the union of 1-item consequents
          of high-confidence rules.  etc.


Assume that B1 is Yield, B2=blue, B3=green, B4=Red

Spatial Data mining

I={(b,v)|b=band=1..n, v=value=(1-bit or 2-bit or ...or 8 bit)}

T={pixels}

Admissible Itemsets (Asets)= {Int1 x Int2 x ... x Intn}
  where Inti is an interval in Band-i (some may be the entire band).

Modeled on Apriori, we first find all frequent itemsets.
  - pruned by specifying "restricted interest" (e.g., If Bn=Yield, user
    may wish to restrict attention to those Asets for which Intn is not all
    of Bn.  At the 1-bit value level in the value concept hierarchy,
    this means either y<128 or y>=128.  Then the user may want to restrict
    interest to those rules for which the consequent is Intn only)

For a frequent Aset, B=PROD(i=1..n)[Inti], rules are formed by partitioning
{1..n} into two disjoint sets, {i1..im} and {j1..jq} (note q+m=n)
and forming A=>C, where A=PROD(k=1..m)[Intik] and C=PROD(k=1..q)[Intjk]


As noted above, users may be interested only in rules where {j1..jq}
is a certain subset (such as {n}, above - then there is just one rule
of interest for each frequent set found and it must be checked as to
whether it is highconfidence or not).  If this is the case we further
restrict our definition of Asets to only those itemsets.

  - For restricted interest above, q=1, and C=Int1 (in the Yield band)



For rule, A=>C above, the support is the support of B and thus is the
count of pixels, p, such that p(i) is in all Inti, i=1..n.

The confidence of a rule, A=>C is its support divided by the support of A
(supp(A) is the count of pixels, p, such that p(i) is in all Intik, k=1..m

   - In restricted interest case, with B1=Yield B2=blue, B3=green, B4=red,
     we need to calculate supp(B)=supp(Int1xInt2xInt3xInt4); calculate
     supp(A)=supp(Int2xInt3xInt4).

    If supp(B) >= minsup  and  then supp(B)/supp(A) >= minconf,
    then  A=>B is a strong rule.



A k-band Aset (kAset) is an Aset in which k of the Inti intervals are non-full
(i.e., in k of the bands the intervals are restricted - i.e., not the fully
 unrestricted intervals of all possible values)

We start by finding all frequent 1Asets.

Then the candidate 2Asets are those whose every 1Aset subset is frequent.
Etc., the candidate kAsets are those whose every (k-1)Aset subset is frequent.
That's the main pruning technique.


Next we look for a pruning technique based on the value concept hierarchy.
If we find all 1-bit frequent kAsets first, we can use the fact that a
2-bit kAset cannot be frequent if its "enclosing" 1-bit kAset is infrequent.

A 1-bit Aset "encloses" a 2-bit Aset if when the endpoints of the 2-bit Aset
are shifted right 1-bit position, it is a subset of the 1-bit Aset, etc.                                  




Assume minsupp-50% and minconf=50%

1. FIND ALL FREQUENT 1Asets.
    for 1-bit values
       for B1
           2 possibilities for Int1: [1,1] and [0,0]
P1,1
5
0 0 1 4
0001
           supp([1,1]1)=5  not frequent
           supp([0,0]1)=11 frequent
 
       for B2
           2 possibilities for Int2: [1,1] and [0,0]
P2,1
8
0 0 4 4
           supp([1,1]2)=8  frequent
           supp([0,0]2)=8  frequent
 
       for B3
           2 possibilities for Int3: [1,1] and [0,0]
P3,1
8
4 0 4 0
           supp([1,1]3)=8  frequent
           supp([0,0]3)=8  frequent

       for B4
           2 possibilities for Int4: [1,1] and [0,0]
P4,1
16
           supp([1,1]4)=16 frequent
           supp([0,0]4)=0  not frequent


1L1 (1-bit value frequent 1Asets):
           supp([0,0]1)=11 frequent
           supp([1,1]2)=8  frequent
           supp([0,0]2)=8  frequent
           supp([1,1]3)=8  frequent
           supp([0,0]3)=8  frequent
           supp([1,1]4)=16 frequent

1C2 (1-bit-value candidate 2Asets):

         [0,0]1x[1,1]2   (support = root-count of P1,0 & P2,1 = 3, no) 
         [0,0]1x[0,0]2   (support = root-count of P1,0 & P2,0 = 8, yes)
         [0,0]1x[1,1]3   (support = root-count of P1,0 & P3,1 = 7, no)
         [0,0]1x[0,0]3   (support = root-count of P1,0 & P3,0 = 4, no)
         [0,0]1x[1,1]4   (support = root-count of P1,0 & P4,1 = 11, yes)

         [0,0]2x[1,1]3   (support = root-count of P2,0 & P3,1 = 4, no)
         [0,0]2x[0,0]3   (support = root-count of P2,0 & P3,0 = 4, no)
         [0,0]2x[1,1]4   (support = root-count of P2,0 & P4,1 = 8, yes)

         [1,1]2x[1,1]3   (support = root-count of P2,1 & P3,1 = 4, no)
         [1,1]2x[0,0]3   (support = root-count of P2,1 & P3,0 = 4, no)
         [1,1]2x[1,1]4   (support = root-count of P2,1 & P4,1 = 8, yes)

         [0,0]3x[1,1]4   (support = root-count of P3,0 & P4,1 = 8, yes)

         [1,1]3x[1,1]4   (support = root-count of P3,1 & P4,1 = 8, yes)
from:
P1,0
11
4 4 3 0
1110

P2,1        P2,0
8           8
0 0 4 4     4 4 0 0
 
P3,1        P3,0
8           8
4 0 4 0     0 4 0 4

P4,1         
16


1L2 (1-bit value frequent 2Asets):
         [0,0]1x[0,0]2   (support = root-count of P1,0 & P2,0 = 8, yes)
         [0,0]1x[1,1]4   (support = root-count of P1,0 & P4,1 = 11, yes)
         [0,0]2x[1,1]4   (support = root-count of P2,0 & P4,1 = 8, yes)
         [1,1]2x[1,1]4   (support = root-count of P2,1 & P4,1 = 8, yes)
         [0,0]3x[1,1]4   (support = root-count of P3,0 & P4,1 = 8, yes)
         [1,1]3x[1,1]4   (support = root-count of P3,1 & P4,1 = 8, yes)

1C3 (1-bit-value candidate 3Asets):
         [0,0]1x[0,0]2x[1,1]4 (support = rc of P1,0 & P2,0 & P4,1 = 8, yes)

1L3 (1-bit-value frequent 3Asets):
         [0,0]1x[0,0]2x[1,1]4 (support = rc of P1,0 & P2,0 & P4,1 = 8, yes)

Only the following frequent sets involve Yield:
         [0,0]1x[0,0]2        (support = root-count of P1,0 & P2,0 = 8, yes)
         [0,0]1x[1,1]4        (support = root-count of P1,0 & P4,1 = 11, yes)
         [0,0]1x[0,0]2x[1,1]4 (support = rc of P1,0 & P2,0 & P4,1 = 8, yes)

and the rules which can be formed with yield as the consequent are:
         [0,0]2 => [0,0]1        (support = 8)
         [1,1]4 => [0,0]1        (support = 11)
         [0,0]2x[1,1]4 => [0,0]1 (support = 8)

The supports of the antecedents are:
    supp([0,0]2) = 8
    supp([1,1]4) = 16
    supp([0,0]2x[1,1]4) = 8

The confidences are:
    conf( [0,0]2 => [0,0]1 )         = 1
    conf( [1,1]4 => [0,0]1 )         = 11/16
    conf( [0,0]2x[1,1]4 => [0,0]1 )  = 1

All are strong rules.


Assume minsupp-60% and minconf=60%

1. FIND ALL FREQUENT 1Asets.
    for 1-bit values
       for B1
           2 possibilities for Int1: [1,1] and [0,0]
P1,1
5
0 0 1 4
0001
           supp([1,1]1)=5  not frequent
           supp([0,0]1)=11 frequent
 
       for B2
           2 possibilities for Int2: [1,1] and [0,0]
P2,1
8
0 0 4 4
           supp([1,1]2)=8  not frequent
           supp([0,0]2)=8  not frequent
 
       for B3
           2 possibilities for Int3: [1,1] and [0,0]
P3,1
8
4 0 4 0
           supp([1,1]3)=8  not frequent
           supp([0,0]3)=8  not frequent

       for B4
           2 possibilities for Int4: [1,1] and [0,0]
P4,1
16
           supp([1,1]4)=16 frequent
           supp([0,0]4)=0  not frequent


1L1 (1-bit value frequent 1Asets):
                [0,0]1 =11
                [1,1]4 =16

1C2 (1-bit-value candidate 2Asets):


         [0,0]1x[1,1]4   (support = root-count of P1,0 & P4,1 = 11, yes)

from:
P1,0
11
4 4 3 0
1110

P2,1        P2,0
8           8
0 0 4 4     4 4 0 0
 
P3,1        P3,0
8           8
4 0 4 0     0 4 0 4

P4,1         
16


1L2 (1-bit value frequent 2Asets):
         [0,0]1x[1,1]4   (support = root-count of P1,0 & P4,1 = 11, yes)

1C3 (1-bit-value candidate 3Asets): empty

The supports of the antecedent:
    supp([1,1]4) = 16

    conf( [1,1]4 => [0,0]1 )         = 11/16

This one rule is a strong rule.


The frequent 1-bit 1Asets were:
                [0,0]1
                [1,1]4
Therefore the infrequent 1-bit 1Asets are:
                [1,1]1
                [0,0]2
                [1,1]2
                [0,0]3
                [1,1]3
                [0,0]4
Which means all enclosed 2-bit subintervals are infrequent:

                   0                                       1                         
         00                  01                  10                  11                 
   000       001       010       011       100       101       110       111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

The candidate 2-bit band-1 intervals have left endpt in the 0-subtree.
([00,01] is [0,0] and [00,10] is a superset, so both are frequent)

[00,01] [00,10] are already known to be frequent

Others to consider are:
[00,00] [01,01] [01,10] [01,11]

For [00,00] we use P1,00, count=7, not frequent.
For [01,01] we use P1,01, count=4, not frequent.

For [01,10] we use P1,01 OR P1,10  if it's frequent so is [01,11], else
For [01,11] we use P1,01 OR P1,10 OR P1,11

P1,01 OR P1,10:
6
0 4 1 1
0001 1000
so  [01,10] we use P1,01 OR P1,10, count=6, not frequent.

P1,01 OR P1,10 OR P1,11
9
0 4 1 4
0001
so  [01,11] we use P1,01 OR P1,10 OR P1,11, count=9, not frequent.

Therefore the only new frequent 2-bit band-1 1Aset is:
[00,10]s

etc.