Source Code for Thesis
P-tree code
- Main class that represents table
PTreeSetAD
extends PTreeSet
(as defined in API)
- Class PTreeAD
is not part of API because its interface is implementation-dependent
- Class PTreeSpec
of the API is extended by
PTreeSpecAD
and PTreeSpecEnrichedAD
(maintains copy of initial state. Note that modifications
to the API, namely using a pattern and mask in PTreeSpec will make this much
more elegant.)
- Classes PTreeFormat
, PTreeFormatG
and PTreeFormatS
are defined in the API
- Interface PTreeFeeder
is specialized into versions for different dimensionality
PTreeFeeder1D
and PTreeFeeder2D
(will no longer be necessary for the current version
of the API, because location information will be part of each data point
that is passed to PTreeSet). Implementations of these interfaces
are listed among the applications.
- Class PTreeInfo
extended to
PTreeInfoAD
has to redo some things that PTreeFeeder did already, which
lead to the introduction of BandInfo into the API.
- PTreeDemo
shows off some of the functionality
Code used for classification (chapters 4-6)
- Main program for classification is
KSig
- Feeders used are VersatileFeeder
, VersatileFeederIndirect
, and VersatileSFeeder
that use the
SortablePoint
class (others have been used but aren't currently
- I think)
- Statistics calculations are done in
StatAD
- Bayesian methods (chapter 6) use
CorrelationMatrixM
, CorrelationM
, PodiumOneM
, and PodiumSetM
- The Podium method (chapter 4) use
CorrelationMatrixP
, CorrelationP
, PodiumOneP
, and PodiumSetP
(The main difference between both is the treatment
of "not equal" for categorical data. For the Podium method that weight
had to be non-0 and was determined by the Gaussian function at distance 1
because otherwise the curse of dimensionality would not have been resolved.
For the Bayesian methods this was uncritical because few attributes
were joined and the better speed of considering fewer volumes was more important.
A second difference lies in the fact that the Bayesian methods were
not limited to binary class labels. The classes could have been designed
much better, if not being programmed months behind schedule ...)
Code used for clustering (chapter 7)