MRS  1.0
A C++ Class Library for Statistical Set Processing
subpavings::SPSnode Class Reference

A derived class based on SPnode for processing sample data. More...

+ Inheritance diagram for subpavings::SPSnode:
+ Collaboration diagram for subpavings::SPSnode:

List of all members.

Public Member Functions

 SPSnode ()
 Default constructor.
 SPSnode (const ivector &v, bool cntOnly)
 Initialised constructor.
 SPSnode (const ivector &v)
 Initialised constructor.
 SPSnode (const ivector &v, size_t max, bool cntOnly)
 Initialised constructor.
 SPSnode (const ivector &v, size_t max)
 Initialised constructor.
 SPSnode (const LabBox &lb, bool cntOnly=false)
 Initialised constructor.
 SPSnode (const LabBox &lb, size_t max, bool cntOnly=false)
 Initialised constructor.
 SPSnode (const SPSnode &other)
 Copy constructor.
SPSnodeoperator= (SPSnode rhs)
 Copy assignment operator.
size_t getCounter () const
 Accessor for the counter.
virtual int getSplitDim () const
 Get the split dimension.
virtual real getSplitValue () const
 Get the split value.
real getCountsOnly () const
 Accessor for the countsOnly value.
NodeData getData () const
 Accessor for the node's data collection.
bool checkTreeStateLegal () const
 Check tree rooted at this is legal with respect to isSplittableNode().
bool checkTreeStateLegal (size_t minChildPoints, double minVol) const
 Check tree rooted at this is legal with respect to isSplittableNode(size_t minChildPoints, double minVol).
bool checkTreeStateLegal (size_t minChildPoints)
 Check tree rooted at this is legal with respect to isSplittableNode(size_t minChildPoints).
virtual bool isSplittableNode () const
bool isSplittableNode (size_t minChildPoints, double minVol) const
 Method to check whether a node is splittable.
bool isSplittableNode (size_t minChildPoints) const
 Method to check whether a node is splittable.
size_t getLeftCountIfSplit () const
 The count the left child would have if this node was split.
size_t getRightCountIfSplit () const
 The count the right child would have if this node was split.
size_t getMinChildCountIfSplit () const
 Smallest number of points in either child if this was split.
size_t getMinChildCountIfSplitNEW () const
Size_tVecgetChildrensLeftAndRightCountsIfSplit (Size_tVec &grandchildCounts) const
 return a container of counts for prospective grandchildren.
Size_tVecgetChildrensLeftAndRightCountsIfSplitNEW (Size_tVec &grandchildCounts) const
Size_tVecgetLeafNodeCounts (Size_tVec &counts) const
SPSnodePtrsgetLeavesInIntersection (const SPnode *const spn, SPSnodePtrs &leaves)
SPSnodeConstPtrsgetConstLeavesInIntersection (const SPnode *const spn, SPSnodeConstPtrs &leaves) const
SPSnodePtrsgetSubLeavesInIntersection (const SPnode *const spn, SPSnodePtrs &subleaves)
SPSnodeConstPtrsgetConstSubLeavesInIntersection (const SPnode *const spn, SPSnodeConstPtrs &subleaves) const
size_t getRootCounter () const
 The count in the node's ultimate ancestor root.
double getCountOverVolume () const
 The count divided by the node volume.
rvector getMean () const
 Get the sample mean.
std::pair< size_t, cxsc::real > getNonEmptyBoxSummary () const
 Get summary information on non-empty leaf box numbers and volumes.
real getSumLeafCountOverVol () const
 Get the sum of the count over volume in the leaf nodes.
size_t getSmallestLeafCount () const
 Get the count of the leaf with the smallest count.
size_t getLargestLeafCount () const
 Get the count in the leaf with the smallest count.
virtual real getLogLik (const size_t n) const
 Get this node's contribution to loglikelihood.
virtual dotprecision getSplitChangeLogLik () const
 Get change in log likelihood on split of this node.
virtual cxsc::real getUnscaledTreeLogLik () const
 Get the unscaled log likihood for the tree rooted at this.
virtual dotprecision getMergeChangeLogLik () const
 Get change in log likelihood on merge of this' leaf chidren.
dotprecision getBestSplitChangeEMPCOPERR (const size_t n) const
 Get best change in EMP under COPERR from splitting any leaf.
dotprecision getBestSplitChangeEMPAIC () const
 Get best change in EMP under AIC from splitting any leaf.
dotprecision getBestMergeChangeEMPCOPERR (const size_t n) const
 Get best change in EMP under COPERR from merging any subleaf.
dotprecision getBestMergeChangeEMPAIC () const
 Get best change in EMP under AIC from merging any subleaf.
real getEMPContributionCOPERR (const size_t n) const
 Get this node's scaled contribution to EMP under COPERR.
real getEMPContributionAIC (const size_t n) const
 Get this node's scaled contribution to EMP under AIC.
dotprecision getSplitChangeEMPCOPERR (const size_t n) const
 Get scaled change in sum term in EMP under COPERR on split.
dotprecision getSplitChangeEMPAIC () const
 Get change in sum term in EMP under AIC on split.
dotprecision getMergeChangeEMPCOPERR (const size_t n) const
 Get scaled change in sum term in EMP under COPERR on merge.
dotprecision getMergeChangeEMPAIC () const
 Get change in sum term in EMP under AIC on merge.
void reshapeToUnion (const SPnode &other)
 Reshape so that the tree rooted at this has shape that is the union of this shape and the shape of another tree.
void reshapeToUnion (const SPnode &other, size_t minChildPoints)
 Reshape so that the tree rooted at this has shape that is as close as possible to the union of this shape and the shape of another tree.
virtual std::ostream & nodePrint (std::ostream &os) const
 Output details of a specific node.
virtual std::ostream & leavesOutputTabsWithEMPs (const size_t bigN, std::ostream &os, int prec=5) const
 Output for for all leaves of a binary tree.
virtual std::ostream & leavesOutputTabsWithHistHeight (std::ostream &os, int prec=5) const
 Output for for all leaves of a binary tree.
virtual std::ostream & leavesOutputTabsWithHistHeight (const size_t bigN, std::ostream &os, int prec=5) const
 Output for for all leaves of a binary tree.
virtual std::ostream & leavesOutputTabsWithHistHeightAndEMPs (const size_t bigN, std::ostream &os, int prec=5) const
 Output for for all leaves of a binary tree.
dotprecision getEMPSumCOPERR (const size_t n) const
 Get scaled EMP sum under COPERR for tree rooted at this.
dotprecision getEMPSumAIC (const size_t n) const
 Get the unscaled EMP sum under AIC for tree rooted at this.
virtual void nodeExpand (int comp)
 Expand a leaf node.
virtual void nodeExpand (const SplitDecisionObj &boolTest, int comp)
 Expand a leaf node.
virtual void nodeExpand ()
 Expand a leaf node.
virtual void nodeExpand (const SplitDecisionObj &boolTest)
 Expand a leaf node.
virtual void nodeReabsorbChildren ()
 Reabsorbs both children of the node.
bool splitRootAtLeastToShapeSPS (std::vector< size_t > reqDepths, size_t minPoints=0)
 Try to get tree rooted at this into shape at least as deep as described by instruction reqDepths.
bool randomMCMCSplitRootAtLeastSPS (unsigned long int numLeaves, const MCMCPartitionGenerator &partitioner, RealMappedSPnode *rmsp, dotprecision &nlogn, int &ndepth, bool saveInstructions=false)
bool randomMCMCSplitRootAtLeastSPS (unsigned long int numLeaves, const MCMCPartitionGenerator &partitioner, RealMappedSPnode *rmsp, dotprecision &nlogn, int &ndepth, size_t minPoints, bool saveInstructions=false)
bool randomKnuthMCMCSplitRootAtLeastSPS (unsigned long int numLeaves, const MCMCPartitionGenerator &partitioner, RealMappedSPnode *rmsp, dotprecision &nlogn, int &ndepth, bool saveInstructions, const std::string &failureLogFilename="")
bool randomKnuthMCMCSplitRootAtLeastSPS (unsigned long int numLeaves, const MCMCPartitionGenerator &partitioner, RealMappedSPnode *rmsp, dotprecision &nlogn, int &ndepth, size_t minPoints, bool saveInstructions, const std::string &failureLogFilename="")
SPSnodeinsertOneFind (BigDataItr newItr, OPERATIONS_ON childInd, const SplitDecisionObj &boolTest)
 Tries to insert data into the leaves of the tree rooted at this.
const SPSnodefindContainingNode (const rvector &pt, OPERATIONS_ON childInd=ON_PARENT) const
 Find the leaf node which would contain a data point.
void unionTreeStructure (const SPSnode *const rhs)
cxsc::real getL1Distance (const SPSnode *const other) const
virtual void clearAllDataHeld () const
 Clear all the data associated with the tree rooted at this.
virtual void setCountsOnly (bool setTo)
 Set an indicator controlling whether this maintains all available statistics or not.
void swapSPS (SPSnode &spn)
 Swap this and another node.
virtual std::string nodeStringSummary () const
 Get a string summary of this node's properties.
virtual std::string doubleCheckStringSummary () const
 Get a string summary of the properties of nodes in the tree rooted at this which can be used for checking and debugging.
Accessors for links between the nodes.

Note that pointers for parent, leftChild, and rightChild are not reference counted so there could potentially be problems with the use of returned pointers (for instance, being used to delete nodes). These pointers might be better implemented with boost::shared_ptr .

SPSnodegetParent () const
 Accessor for the parent of a node.
SPSnodegetLeftChild () const
 Accessor for the left child of a node.
SPSnodegetRightChild () const
 Accessor for the right child of a node.
Get a container of all descendent leaf nodes.

Will contain just this if this is a leaf.

Parameters:
leavesa reference to a container of node pointers to fill in.
Returns:
a reference to the container leaves filled with pointers to leaf nodes.
SPSnodePtrsgetLeaves (SPSnodePtrs &leaves)
 Returns pointers to non-const nodes.
SPSnodeConstPtrsgetConstLeaves (SPSnodeConstPtrs &leaves) const
 Returns pointers to const nodes.
Get a container of all sub-leaf nodes.

Sub-leaf nodes (aka 'cherry nodes) have at least one child but any child must be a leaf, ie sub-leaves are the parents only of leaf nodes.

Will just contain this if this is a sub-leaf.

Parameters:
subleavesa reference to a container of node pointers to fill in.
Returns:
a reference to the container subleaves filled with pointers to sub-leaf nodes.
SPSnodePtrsgetSubLeaves (SPSnodePtrs &subleaves)
 Returns pointers to non-const nodes.
SPSnodeConstPtrsgetConstSubLeaves (SPSnodeConstPtrs &subleaves) const
 Returns pointers to const nodes.
Get the sample variance-covariance matrix.

This calculates the sample variance-covariance matrix for the sample and returns it as a d*d-dimensional vector of reals, where d is the dimension of the data.

cov(i,j) = [sumproduct(i,j)-sum(i)xsum(j)/counter]/(counter-1)

cov(i,j) is at index i*d+j in the returned RealVec (indices from 0 to d*d-1, where d is the dimension of the data).

Returns:
If variance-covariances are held (see countsOnly) and there are at least two data points (see getCounter()) then return the sample variance-covariance matrix as a vector. Otherwise return a RealVec of cxsc::SignalingNaN values.
RealVec getVarCovar () const
RealVecgetVarCovar (RealVec &varCovar) const

Protected Member Functions

virtual void stripData () const
virtual void stripOptionalStatsOnly ()
virtual void addOptionalStatsOnly ()
virtual void splitData (const SplitDecisionObj &boolTest)
 Send the data associated with this down to children.
virtual void nodeExpansionOnly (int comp)
 Expand the node with no reallocation of data.
bool _splitAtLeastToShapeSPS (std::vector< size_t > &reqDepths, size_t myDepth, size_t minPoints)
 Try to get tree rooted at this into shape at least as deep as described by instruction reqDepths.
bool _randomMCMCSplitAtLeastSPS (unsigned long int numLeaves, const MCMCPartitionGenerator &partitioner, RealMappedSPnode *rmsp, size_t myDepth, dotprecision &nlogn, int &ndepth, size_t minPoints)
int levelsUpToNextRight () const
bool _randomKnuthMCMCSplitAtLeastSPS (unsigned long int p, unsigned long int q, std::stack< size_t > &lastPair, const MCMCPartitionGenerator &partitioner, RealMappedSPnode *rmsp, size_t myDepth, dotprecision &nlogn, int &ndepth, size_t minPoints, const std::string &failureLogFilename, bool across=false)
void _reshapeToUnion (const SPnode *const other)
 Internal method to reshape this to a union.
bool _reshapeToUnion (const SPnode *const other, size_t minChildPoints, const std::string &errorFilename)
 Internal method to reshape this to a union with a restriction of minChildPoints.
virtual void _getUnscaledTreeLogLik (dotprecision &nlogn, int &ndepth, int depth) const
 Internal method to accumulate the components of the unscaled log likelihood for a tree rooted at this.
dotprecision & accumulateLeafCountOverVol (dotprecision &sum) const
 Accumulate a sum of node counts divided by node volumes.
void accumulateNonEmptyBoxSummary (size_t &nNonEmptyBoxes, real &vNonEmptyBoxVolumes) const
 Accumulate summary information on non-empty box numbers and volumes.
virtual std::ostream & nodeDataPrint (std::ostream &os) const
 Print the data in a specified format.
virtual std::ostream & nodeMeanPrint (std::ostream &os) const
 Print the mean in a specified format.
virtual std::ostream & nodeVarCovarPrint (std::ostream &os) const
 Print the variance-covariance in a specified format.
virtual std::ostream & leafOutputTabs (std::ostream &os) const
 Output for a node in a binary tree, tab-delimited.
virtual std::ostream & leafOutputTabsWithEMPs (const size_t bigN, std::ostream &os, int prec=5) const
 Output for a node in a binary tree, tab-delimited.
virtual std::ostream & leafOutputTabsWithHistHeight (const size_t bigN, std::ostream &os, int prec=5) const
 Output for a node in a binary tree, tab-delimited.
std::ostream & leafOutputTabsWithHistHeightAndEMPs (const size_t bigN, std::ostream &os, int prec=5) const
 Output for a node in a binary tree, tab-delimited.
virtual void clearNodeData () const
 Clears the node's data collection.
virtual void recalculateStats (const rvector &newdata) const
 Recalculate summary statistics associated with node.
virtual void recalculateSums (const rvector &newdata) const
 Recalculate summary statistics associated with node.
virtual void recalculateSumProducts (const rvector &newdata) const
 Recalculate summary statistics associated with node.
virtual void recalculateOptionalStatsAndGatherData (NodeData &container)
 Recalculates optional summary statistics associated with node from scratch, using all the data currently held.
virtual void recalculateOptionalStatsForData (const NodeData &nodedata) const

Protected Attributes

size_t spaceIndication
 An indication of the maximum number of data points a node needs to carry.
int splitDim
 Dimension the node's box has been split along.
real splitValue
 The value, on split dimension, where node's box was split.
bool countsOnly
 Determines the amount of statistical summary data in node.
mutable data members.

These data members are mutable to allow them to be modified by const functions as data passed to or through the node.

Only leaf nodes have data associated with them but the recursively computable statistics, such as counter and sum, are maintained for all nodes. Thus when a data point is sent to the root node and progresses down the tree to find which leaf node it should be associated with, the counter is incremented and the data sum increased for each non-leaf node it passes through (ie, where it is contained in the box of that node but that node is not a leaf node so the box has been sub- divided and the datapoint continues on to one of the children).

size_t counter
 A counter for how many data points are covered by theBox.
VecDotPrec dpSums
 A container representing the sum of the data points covered by theBox.
VecDotPrec dpSumProducts
 A container representing the sumproduct matrix of the data points covered by theBox.
NodeData dataItrs
 A container for the association of data with a node.

Detailed Description

A derived class based on SPnode for processing sample data.

The base class SPnode is a node in the representation of a regular subpaving as a binary tree. A node represents a box (interval vector). SPnodes are linked together to form the tree. The initial box of the subpaving is the box represented by the root node of the tree. A box which has been split will be represented as node with 1 or 2 children.

A subpaving of [x] (union of non-overlapping subboxes of [x]) is represented by the leaves (degenerate/child-less) nodes in the tree.

The SPSnode class has additional data members for statistical data analysis. The SPSnode class is used to form a regular subpaving representing containers of sample data where some data-related criteria is used to determine when the subpaving should be bisected. For example, keeping the number of data points associated with a node below a specified maximum -- "a maximally statistically equivalent blocks" criterion.

Data points are of type csxc::rvector.

Leaves of the SPSnode class have data associated with them in the form of pointers to some big collection of sample data. If an SPSnode is bisected the data associated with it descends to its children, so that only leaf SPSnodes have data associated with them. However, "recursively-computable statistical summaries", such as, count, sum, etc, of the data which would be contained in the box an SPSnode represents are kept for all SPSnodes and continue to be updated when the SPSnode has children and data reaching the node is passed on to be finally associated with a leaf.

For an algebraic statistical formalisation of Fisher's ideas on recursively computable statistics see "Notions of Sufficiency" by S.L. Lauritzen, Contributed Paper, 44th Session of the International Statistical Institute, Madrid, Spain, September 12th--22nd, 1983.

By default, all recursively computable statistics provided are maintained in each SPSnode. However, since this uses memory and is not always needed, an SPSnode can be constructed to only maintain count statistics.

When a box is split along some dimension d, only the right (upper) subbox is closed on the actual split value (the midpoint of the interval which is the dth element in the interval vector/box to be bisected) whereas the left (lower) box has an open interval on this value. Thus [1 5] is subdivided to [1 3) and [3 5]. The parent SPnode class can ignore this point but it must be addressed if we are to be able to decide which subbox (left or right) a datapoint sitting exactly on the split value in the split dimension should descend to. This is an extension of the empirical distribution to a partition of the root box with the leaves of the subpaving.

The class also needs to know which dimension the box represented by a node was split on and what the split value was, so that data reaching the node can be correctly (ie, in cognisance of the open and closed intervals described above) associated with either the right or left child. These values therefore become data members of the class, with default values for leaf nodes.


Constructor & Destructor Documentation

SPSnode::SPSnode ( const ivector &  v,
bool  cntOnly 
)

Initialised constructor.

Throws a MalconstructedBox_Error if v has no practical (0 or negative) dimensions.

Parameters:
vinterval vector for the box this represents.
cntOnlyan indicator for whether all available stats are maintained (cntOnly = false) or just counts (cntOnly = true).
Precondition:
v should be a proper ivector, with length >= 1.
Postcondition:
this will have box and countsOnly as specified.
                                               : SPnode(v),
  counter(0), splitDim(-1), splitValue(0.0), countsOnly(cntOnly)
{
  //invokes the base class constructor with ivector
  // and then initialises additional data members

  //dpSums, a vector of dotprecision terms, is not initialised
  //dpSumProducts, similarly not initialised
  
  spaceIndication = static_cast<size_t>(defaultMaxPts);
  //reserve space - not sure if important - leave for moment
  dataItrs.reserve(spaceIndication);
  
}
SPSnode::SPSnode ( const ivector &  v) [explicit]

Initialised constructor.

Throws a MalconstructedBox_Error if v has no practical (0 or negative) dimensions.

Parameters:
vinterval vector for the box this represents.
Precondition:
v should be a proper ivector, with length >= 1.
Postcondition:
this will have box as specified and countsOnly = false.
                                 : SPnode(v),
  counter(0), splitDim(-1), splitValue(0.0), countsOnly(false)
{
  //invokes the base class constructor with ivector
  // and then initialises additional data members

  //dpSums, a vector of dotprecision terms, is not initialised
  //dpSumProducts, similarly not initialised

  spaceIndication = static_cast<size_t>(defaultMaxPts);
  //reserve space - not sure if important - leave for moment
  dataItrs.reserve(spaceIndication);
  
}
SPSnode::SPSnode ( const ivector &  v,
size_t  max,
bool  cntOnly 
)

Initialised constructor.

Throws a MalconstructedBox_Error if v has no practical (0 or negative) dimensions.

Parameters:
vinterval vector for the box this represents.
maxan indication of the maximum number of datapoints that may eventually be associated with a leaf. This is used for efficiency only and does not constitute a maximum count for a node.
cntOnlyan indicator for whether all available stats are maintained (cntOnly = false) or just counts (cntOnly = true).
Precondition:
v should be a proper ivector, with length >= 1.
Postcondition:
this will have box and countsOnly as specified.
                                                           :
  SPnode(v),
  spaceIndication(max), counter(0), splitDim(-1), splitValue(0.0),
  countsOnly(cntOnly)
{
  //invokes the base class constructor with ivector argument
  // and then initialises additional data members

  //dpSums, a vector of dotprecision terms, is not initialised
  //dpSumProducts, similarly not initialised

  //reserve space - not sure if important - leave for moment
  dataItrs.reserve(spaceIndication+1);
}
SPSnode::SPSnode ( const ivector &  v,
size_t  max 
)

Initialised constructor.

Throws a MalconstructedBox_Error if v has no practical (0 or negative) dimensions.

Parameters:
vinterval vector for the box this represents.
maxan indication of the maximum number of datapoints that may eventually be associated with a leaf. This is used for efficiency only and does not constitute a maximum count for a node.
Precondition:
v should be a proper ivector, with length >= 1.
Postcondition:
this will have box as specified and countsOnly = false.
                                             :
  SPnode(v),
  spaceIndication(max), counter(0), splitDim(-1), splitValue(0.0),
  countsOnly(false)
{
  //invokes the base class constructor with ivector argument
  // and then initialises additional data members

  //dpSums, a vector of dotprecision terms, is not initialised
  //dpSumProducts, similarly not initialised

  //reserve space - not sure if important - leave for moment
  dataItrs.reserve(spaceIndication+1);
}
SPSnode::SPSnode ( const LabBox lb,
bool  cntOnly = false 
) [explicit]

Initialised constructor.

Throws a MalconstructedBox_Error if lb has a box with no practical dimensions.

Parameters:
lba labeled box which provides the box to be used for this.
cntOnlyan indicator for whether all available stats are maintained (cntOnly = false) or just counts (cntOnly = true).
Precondition:
lb should represent a proper box.
Postcondition:
this will have box and countsOnly as specified.
                                               : SPnode(lb), counter(0),
  splitDim(-1), splitValue(0.0), countsOnly(cntOnly)
{
  //invokes the base class constructor with LabBox argument
  // and then initialises additional data members

  //dpSums, a vector of dotprecision terms, is not initialised
  //dpSumProducts, similarly not initialised

  spaceIndication = static_cast<size_t>(defaultMaxPts);
  //reserve space - not sure if important - leave for moment
  dataItrs.reserve(spaceIndication);
}
SPSnode::SPSnode ( const LabBox lb,
size_t  max,
bool  cntOnly = false 
)

Initialised constructor.

Throws a MalconstructedBox_Error if v has no practical (0 or negative) dimensions.

Parameters:
lba labeled box which provides the box to be used for this.
maxan indication of the maximum number of datapoints that may eventually be associated with a leaf. This is used for efficiency only and does not constitute a maximum count for a node.
cntOnlyan indicator for whether all available stats are maintained (cntOnly = false) or just counts (cntOnly = true).
Precondition:
lb should represent a proper box.
Postcondition:
this will have box and countsOnly as specified.
                                                           : SPnode(lb),
  spaceIndication(max), counter(0), splitDim(-1), splitValue(0.0),
  countsOnly(cntOnly)
{
  //invokes the base class constructor with LabBox argument
  //and then initialises additional data members

  //dpSums, a vector of dotprecision terms, is not initialised
  //dpSumProducts, similarly not initialised

  //reserve space - not sure if important - leave for moment
  dataItrs.reserve(spaceIndication+1);
}
SPSnode::SPSnode ( const SPSnode other)

Copy constructor.

Note that if the node to be copied is a leaf, and therefore has a container of iterators to some data collection, then the node created by a copy constructor will simply copy that container of iterators - ie, both the original and copy are associated with the same data collection. If the node is contained within by some other object that also manages the data collection, it is advisable to ensure that copying that other object creates a different data collection and that the leaf nodes of the copied subpaving tree reference that new data collection.

Parameters:
otherthe node to be copied to construct this.
                                     : 
  spaceIndication(other.spaceIndication),
  counter(other.counter), dpSums(other.dpSums),
    dpSumProducts(other.dpSumProducts), splitDim(other.splitDim),
  splitValue(other.splitValue), countsOnly(other.countsOnly)
{
  if (other.theBox != NULL) {
    theBox = new ivector( other.getBox() );
  }
  nodeName = other.nodeName;
  
  //reserve space
  dataItrs.reserve((other.dataItrs).size());
  //copy dataItrs from other to this
  dataItrs = other.dataItrs;
  //recursion on the children
  if (other.leftChild) {
    nodeAddLeft(new SPSnode(*(other.getLeftChild())));
  }
  else leftChild=NULL;

  if (other.rightChild) {
    nodeAddRight(new SPSnode(*(other.getRightChild())));
  }
  else rightChild=NULL;

}

Member Function Documentation

bool SPSnode::_reshapeToUnion ( const SPnode *const  other,
size_t  minChildPoints,
const std::string &  errorFilename 
) [protected]

Internal method to reshape this to a union with a restriction of minChildPoints.

Parameters:
minChildPointsis the minimum child points to allow for a split (as usual, if one child gets all the points and the number of points in this is >= minChildPoints, the split is allowed).
errorFileNameis a file to which to log messages about nodes which could not be split because of minChildPoints.
Returns:
true if this was able to reshape to exactly the union of this and other, false if the extent to which descendents of this could split to mimic other was limted by minChildPoints.
{
  // indictator for being able to do union exactly
  bool success = true;
  
  if ( other != NULL && !(other->isEmpty()) ) {

    // this is not a leaf, other is a leaf
    if (!isLeaf() && other->isLeaf()) {

      // no need to do anything
    }

    // this is a leaf, other is not a leaf
    if (isLeaf() && !other->isLeaf()) {

      //we need to expand this
      if (isSplittableNode(minChildPoints)) nodeExpand();
      else {
        success = false;
        // log file
        std::string line = "Could not split " + getNodeName() 
         + " because of minChildPoints";
        
        outputFile(errorFilename, line); 
        
      }
      
    }

    // now recurse on the children if both have children
    // note - it won't go here is !success because still isLeaf()
    if (!isLeaf() && !other->isLeaf()) {
      success = getLeftChild()->_reshapeToUnion(
          other->getLeftChild(), minChildPoints, errorFilename);
      success = getRightChild()->_reshapeToUnion(
          other->getRightChild(), minChildPoints, errorFilename)
          && success;
    }
    
    return success;
  }
}
bool SPSnode::_splitAtLeastToShapeSPS ( std::vector< size_t > &  reqDepths,
size_t  myDepth,
size_t  minPoints 
) [protected]

Try to get tree rooted at this into shape at least as deep as described by instruction reqDepths.

Tries to ensures that this has at least shape of reqDepths, but allows the tree rooted at this to also be more split at some or all nodes.

This can only follow the instruction if the nodes it is required to split are splittable nodes, with respect to both width and minPoints (to be split a node must return isSplittableNode(minPoints) = true).

Parameters:
reqDepthsa collection of leaf levels to try to split to.
myDepththe depth this node is in the tree.
minPointsthe minimum child points used to determine whether this is splittable.
Returns:
true if this was able to follow the instruction, false otherwise (ie if instruction could not be followed because nodes were not splittable).
Precondition:
reqDepths is not empty
{
  if(reqDepths.empty()) 
    throw std::invalid_argument(
      "SPSnode::_splitAtLeastToShapeSPS(...): : reqDepths.empty()");
    
  bool success = true;
  
  int depth = reqDepths.back();
  
  if (myDepth < depth) { // need to try to go down more

    if ( isLeaf() ) {
      
      if (isSplittableNode(minPoints)) {
        // split, using the nodeExpand for this subtype if not base
        nodeExpand();
      }
      else success = false;
    }

    // if we are okay, send instruction down RIGHT SIDE FIRST
    if (success) success = getRightChild()->_splitAtLeastToShapeSPS(reqDepths,
            myDepth+1,
            minPoints);
    if (success) success = getLeftChild()->_splitAtLeastToShapeSPS(reqDepths,
            myDepth+1,
            minPoints);
    
  }
  else if (myDepth == depth) {     // split enough

    /* I am not necessarily a leaf though - allow for mcmc situation
     * where I want to be split to at least this much */
  
    // knock the last element out
    reqDepths.erase(reqDepths.begin()+reqDepths.size()-1);
  }
  else { // myDepth is > depth
    throw std::invalid_argument(
      "SPSnode::_splitAtLeastToShapeSPS(...): node depth > instruction depth");
  }

  return success;
}
dotprecision & SPSnode::accumulateLeafCountOverVol ( dotprecision &  sum) const [protected]

Accumulate a sum of node counts divided by node volumes.

Parameters:
suma reference to an accumulation of the sum so far.
Returns:
a reference to the accumulation including the leaf nodes of this.
{
  if (isLeaf()) {  // this is a leaf
    accumulate(sum, 1.0*counter, (1.0/nodeRealVolume()));
  }

  else { // this is not a leaf

    sum = getLeftChild()->accumulateLeafCountOverVol(sum);
    sum = getRightChild()->accumulateLeafCountOverVol(sum); 
  }
  return sum;
  
}
void SPSnode::accumulateNonEmptyBoxSummary ( size_t &  nNonEmptyBoxes,
real &  vNonEmptyBoxVolumes 
) const [protected]

Accumulate summary information on non-empty box numbers and volumes.

Parameters:
nNonEmptyBoxesa reference to an accumulation the number of non-empty boxes so far.
vNonEmptyBoxVolumesa reference to an accumulation the volume of non-empty boxes so far.
{
  if (isLeaf()) {
    if (counter > 0) {
      ++nNonEmptyBoxes;
      vNonEmptyBoxVolumes+=nodeRealVolume();
      
    }
  }
  else { // recurse
    getLeftChild()->accumulateNonEmptyBoxSummary(nNonEmptyBoxes, 
                        vNonEmptyBoxVolumes);
    getRightChild()->accumulateNonEmptyBoxSummary(nNonEmptyBoxes, 
                        vNonEmptyBoxVolumes);
  }
  // we want to count all empty leaves even if parent is empty too
}
void SPSnode::addOptionalStatsOnly ( ) [protected, virtual]

Recalculate the values held to enable the calculation of optional statistics, and set countsOnly to false, in this and the children of this.

{
  if (countsOnly) {
    NodeData container;
    recalculateOptionalStatsAndGatherData(container);
  }
}
bool SPSnode::checkTreeStateLegal ( ) const [virtual]

Check tree rooted at this is legal with respect to isSplittableNode().

'Legal' means that all non-leaf nodes in the tree are splittable, ie return isSplittableNode() = true;

Returns:
true if all non-leaf nodes in the tree rooted at this are splittable, false if any non-leaf node in the tree rooted at this is not splittable.

Reimplemented from subpavings::SPnode.

bool SPSnode::checkTreeStateLegal ( size_t  minChildPoints,
double  minVol 
) const

Check tree rooted at this is legal with respect to isSplittableNode(size_t minChildPoints, double minVol).

'Legal' means that all non-leaf nodes in the tree are splittable, ie return isSplittableNode(size_t minChildPoints, double minVol) = true;

Parameters:
minChildPointsis the minimum number of points that there would be in the children if the node were to be split.
minVolis the minimum node volume to be tested for.
Returns:
true if all non-leaf nodes in the tree rooted at this are splittable with respect to minChildPoints and minVol, false if any non-leaf node in the tree rooted at this is not splittable according to these criteria. eg, a tree with a cherry node with volume < 2*minVol is not legal.
{
  // check current state is legal by looking at everything not a leaf
  bool legal = true;
  if ( !isLeaf() ) {
    // to be splittable, need at least 2*min vol in this node, so each child has min vol
    legal = isSplittableNode(minChildPoints, minVol);
    if (legal && hasLCwithBox() ) {
        legal = 
          getLeftChild()->checkTreeStateLegal(minChildPoints,
                            minVol);
    }
    if (legal && hasRCwithBox() ) {
        legal = 
          getRightChild()->checkTreeStateLegal(minChildPoints,
                            minVol);
    }
  }
  
  return legal;
}
bool SPSnode::checkTreeStateLegal ( size_t  minChildPoints)

Check tree rooted at this is legal with respect to isSplittableNode(size_t minChildPoints).

'Legal' means that all non-leaf nodes in the tree are splittable, ie return isSplittableNode(size_T minChildPoints) = true;

Parameters:
minChildPointsis the minimum number of points that there would be in the children if the node were to be split.
Returns:
true if all non-leaf nodes in the tree rooted at this are splittable with respect to minChildPoints, false otherwise.
{
  // check current state is legal by looking at everything not a leaf
  bool legal = true;
  if ( !isLeaf() ) {
    legal = isSplittableNode(minChildPoints);
    if (legal && hasLCwithBox() ) {
        legal = 
          getLeftChild()->checkTreeStateLegal(minChildPoints);
    }
    if (legal && hasRCwithBox() ) {
        legal = 
          getRightChild()->checkTreeStateLegal(minChildPoints);
    }
  }
  return legal;
}
void SPSnode::clearAllDataHeld ( ) const [virtual]

Clear all the data associated with the tree rooted at this.

Calling this method resets counters to 0, discards values held to enable the computation of optional statistics, and discards data associated with root nodes. The structure of the tree rooted at this is unchanged.

Throws a NonRootNode_Error if this is not a root node.

Precondition:
this is a root node.
Postcondition:
For all nodes in the tree, getCounter() = 0 and the optional statistcs are not available. Leaf nodes of the tree have no data associated with them. The indicator countsOnly is unchanged for all nodes in the tree and the tree structure is unchanged (ie each node previously in the tree is still there).
{
  if (getParent() != NULL) {
    throw NonRootNode_Error("SPSnode::clearAllDataHeld()");
  }
  stripData(); // clears recursively on this and children
}
void SPSnode::clearNodeData ( ) const [protected, virtual]

Clears the node's data collection.

Note - only clears the data collection in this node, and does nothing to counters or stats - used when expanding a node.

{ NodeData().swap (dataItrs); }
std::string SPSnode::doubleCheckStringSummary ( ) const [virtual]

Get a string summary of the properties of nodes in the tree rooted at this which can be used for checking and debugging.

Includes the descendents of this node.

Shows the addresses of the values pointed to by the dataItrs associated with leaf nodes, rather than values themselves.

Returns:
the string summary.
{
  std::ostringstream oss;
  
  oss << "I am " << getNodeName() << "(address " << this << "),\n";
  oss << "Dimension is " << getDimension() << ", address of box is " << theBox << "\n"; 
  oss << "spaceIndication is " << spaceIndication << ", splitDim is " << splitDim << ", splitValue is " << splitValue << "\n"; 
  oss << "countsOnly is " << countsOnly << ", counter is " << counter << "\n";
  oss << "address of dataItrs is " << &dataItrs;
  if (isLeaf() ) {
    
    oss << " and dataItrs (giving addresses pointed to by iterators) is " << endl;
  
    for (NodeDataConstItr cit = dataItrs.begin(); cit < dataItrs.end(); ++cit) {
      
      oss << &(**cit) << "\t";
    }
  }
  oss << "\n" << endl;
  
  if ( getLeftChild() ) oss << getLeftChild()->doubleCheckStringSummary() << endl;
  if ( getRightChild() ) oss << getRightChild()->doubleCheckStringSummary() << endl;
  
  return oss.str();
  
}
const SPSnode * SPSnode::findContainingNode ( const rvector &  pt,
OPERATIONS_ON  childInd = ON_PARENT 
) const

Find the leaf node which would contain a data point.

No data is actually inserted.

Throws a NoBox_Error if this has no box.

Note:
for efficiency, there is no check that the data point given has the same dimension as this; the results of trying this operation with a point with incompatible dimensions are undefined.
Parameters:
pta data point.
childIndan indicator for whether the current node is a treated as a left or right child or a root. Defaults to ON_PARENT.
Returns:
a pointer to the node which would contain the point, or NULL if no node contains the point.
Precondition:
This should have a box to check for containment against.
{
  // start at the top
  if (isEmpty()) {
    throw NoBox_Error(
      "SPSnode::findContainingNode(const rvector&, OPERATIONS_ON)");
  }

  const SPSnode* retObj = NULL;

  if(nodeContains(pt, childInd)) {

    if(isLeaf()) {

      // give this node as return value
      retObj = this;

    } // end of isLeaf

    // if not a leaf and contains data
    // recurse on the children if any
    else {

      if(rightChild!=NULL && !rightChild->isEmpty()){

        retObj =
        (getRightChild())->findContainingNode(
          pt, ON_RIGHT);
      }
      // only try left if we did not find on the right
      if(retObj == NULL && leftChild!=NULL &&
                !leftChild->isEmpty()) {

        retObj =
        (getLeftChild())->findContainingNode(
          pt, ON_LEFT);
      }
    }

  } // end if node contains

  // will return null if does not contain the data

  return retObj;
}
dotprecision SPSnode::getBestMergeChangeEMPAIC ( ) const

Get best change in EMP under AIC from merging any subleaf.

n, the value to use for scaling, cancels out of change.

Finding the best merge change only makes sense for a node that is not already a leaf: this method throws an UnfulfillableRequest_Error if called on a leaf node.

Returns:
best (lowest or most negative) scaled change in AIC from merging any of the subleaves.
{
  if (isLeaf() ) {
    throw subpavings::UnfulfillableRequest_Error(
          "SPSnode::getBestMergeChangeEMPAIC()");
  }
  
  if (isSubLeaf() ) {
    return getMergeChangeEMPAIC();
  }
  else {
    
    if (getLeftChild()->isLeaf()) {
      return getRightChild()->getBestMergeChangeEMPAIC();
    }
    else if (getRightChild()->isLeaf()) {
      return getLeftChild()->getBestMergeChangeEMPAIC();
    }
    else {
      dotprecision myChangeL = getLeftChild()->getBestMergeChangeEMPAIC();
      dotprecision myChangeR = getRightChild()->getBestMergeChangeEMPAIC();
      return (myChangeL < myChangeR ? myChangeL : myChangeR);
    }
  }

}
dotprecision SPSnode::getBestMergeChangeEMPCOPERR ( const size_t  n) const

Get best change in EMP under COPERR from merging any subleaf.

Finding the best merge change only makes sense for a node that is not already a leaf: this method throws an UnfulfillableRequest_Error if called on a leaf node.

Parameters:
nthe value to use for scaling, the total number of points.
Returns:
best (lowest or most negative) scaled change in COPERR from merging any of the subleaves.
{
  
  if (isLeaf() ) {
    throw subpavings::UnfulfillableRequest_Error(
        "SPSnode::getBestMergeChangeEMPCOPERR(const size_t)");
  }
  if (isSubLeaf() ) {
    return getMergeChangeEMPCOPERR(n);
  }
  else {
    if (getLeftChild()->isLeaf()) {
      return getRightChild()->getBestMergeChangeEMPCOPERR(n);
    }
    else if (getRightChild()->isLeaf()) {
      return getLeftChild()->getBestMergeChangeEMPCOPERR(n);
    }
    else {
      dotprecision myChangeL = getLeftChild()->getBestMergeChangeEMPCOPERR(n);
      dotprecision myChangeR = getRightChild()->getBestMergeChangeEMPCOPERR(n);
      return (myChangeL < myChangeR ? myChangeL : myChangeR);
    }
  }
  
}
dotprecision SPSnode::getBestSplitChangeEMPAIC ( ) const

Get best change in EMP under AIC from splitting any leaf.

n, the value to use for scaling, cancels out of change.

Returns:
best (lowest or most negative) scaled change in AIC from splitting any of the leaves.
{
  if (isLeaf() ) {
    return getSplitChangeEMPAIC();
  }
  else {
    dotprecision myChangeL = getLeftChild()->getBestSplitChangeEMPAIC();
    dotprecision myChangeR = getRightChild()->getBestSplitChangeEMPAIC();
    return (myChangeL < myChangeR ? myChangeL : myChangeR);
  }
}
dotprecision SPSnode::getBestSplitChangeEMPCOPERR ( const size_t  n) const

Get best change in EMP under COPERR from splitting any leaf.

Parameters:
nthe value to use for scaling, the total number of points.
Returns:
best (lowest or most negative) scaled change in COPERR from splitting any of the leaves.
{
  if (isLeaf() ) {
    return getSplitChangeEMPCOPERR(n);
  }
  else {
    dotprecision myChangeL = getLeftChild()->getBestSplitChangeEMPCOPERR(n);
    dotprecision myChangeR = getRightChild()->getBestSplitChangeEMPCOPERR(n);
    return (myChangeL < myChangeR ? myChangeL : myChangeR);
  }
}

return a container of counts for prospective grandchildren.

Should be called only on leaf nodes.

returns an indexable container of the number of points the prospective children of each prospective child (ie all four prospective grandchildren) would be associated with, indexed like this [0] = left child's left child count, [1] = left child's rght child count, [2] = rght child's left child count, [3] = rght child's rght child count,

Parameters:
grandchildCountsa reference to a container to be filled with the prospective grandchild counts
Returns:
grandchildCounts filled with the prospective grandchild counts.
{
  // first find what the children's boxes would be would be
  int splitMe; // variable to hold first longest dimension
  ivector box = getBox();
  double temp1 = MaxDiam(box, splitMe);

  // ivectors to be new boxes for new children
  ivector rCBox;
  ivector lCBox;
  
  // Call Upper() to get what would be the right hand child box
  Upper(box, rCBox, splitMe);
  // Call Lower() to get what would be the left hand child box
  Lower(box, lCBox, splitMe);
  
  // mid point of my box on first longest dimension
  cxsc::real midSplit = cxsc::mid(box[splitMe]);

  // and if those children were split
  // left Child 
  int splitChildren = MaxDiamComp(lCBox);
  
  cxsc::real midSplitLC = cxsc::mid(lCBox[splitChildren]);
  
  // right child 
  // will split on the same dimension as LC
  
  cxsc::real midSplitRC = cxsc::mid(rCBox[splitChildren]);
  
  // now find how many of this node's data points would go right
  // and left children of left and right children
  size_t rightRightCount = 0;
  size_t rightLeftCount = 0;
  size_t leftRightCount = 0;
  size_t leftLeftCount = 0;
  NodeDataItr it;

  for (it = dataItrs.begin(); it < dataItrs.end(); it++) {
    // DataItrs is a container of iterators to a BigDataCollection
    rvector p = **it;
    // increment left child?
    if ( p[splitMe] < midSplit ) {
      if ( p[splitChildren] < midSplitLC ) leftLeftCount++;
      else rightLeftCount++;
    }
    else { // on right of me
      if ( p[splitChildren] < midSplitRC) leftRightCount++;
      else rightRightCount++;
    }
  }

  grandchildCounts.push_back(leftLeftCount);
  grandchildCounts.push_back(rightLeftCount);
  grandchildCounts.push_back(leftRightCount);
  grandchildCounts.push_back(rightRightCount);

  return grandchildCounts;
  
}

Accessor for the node's data collection.

Returns a copy of the node's collection of iterators to the big data set.

{ return dataItrs; }
real SPSnode::getEMPContributionAIC ( const size_t  n) const

Get this node's scaled contribution to EMP under AIC.

Under AIC, EMP is -1 x sum over leaves of (counts in leaf x ln(count in leaf / (n * vol of leaf))) where n is the total number of data points in the histogram. And this is -1 * the node's contribution to the loglikelihood of the data given the current state.

The EMP of an entire tree of nodes comes only from the contributions of the current leaf nodes, but this method will not throw an exception if called on a non-leaf node. This allows 'what-if' type calculations to be carried out (eg what if the node reabsorbed its children)...

Parameters:
nthe value to use for scaling, the total number of points.
Returns:
counter * (-ln(counter/(n*volume))) for this node.
{
  #ifdef EMPSDEBUG
    cout << "in getEMPContributionAIC, n = " << n << endl;
    cout << "EMPContributionAIC = " << -getLogLik(n) << endl;
  #endif
  return -getLogLik(n);
}
real SPSnode::getEMPContributionCOPERR ( const size_t  n) const

Get this node's scaled contribution to EMP under COPERR.

Under COPERR, EMP is -1/n^2 x sum over leaves of (counts in leaf squared / volume of leaf) where n is the total number of data points in the histogram

The EMP of an entire tree of nodes comes only from the contributions of the current leaf nodes, but this method will not throw an exception if called on a non-leaf node. This allows 'what-if' type calculations to be carried out (eg what if the node reabsorbed its children)...

Parameters:
nthe value to use for scaling, the total number of points.
Returns:
-counter^2/(n*volume) for this node.
{
  // current number of data points associated to node is counter
  // current node volume from nodeVolume, and each child will have half

  #ifdef EMPSDEBUG
    cout << "in getEMPContributionCOPERR, n = " << n << endl;
  #endif

  dotprecision contribution(0.0);
  if ((n > 0) && (counter > 0)) {
    accumulate(contribution, -(1.0*counter)/(1.0*n),
        (1.0*counter)/((n*1.0)*nodeRealVolume()));
  }

  // contribution is -counter^2/(n^2 * vol)
  // default cxsc rounding to nearest

  #ifdef EMPSDEBUG
    cout << "EMPContributionCOPERR = " << rnd(contribution) << endl;
  #endif
  
  //return contribution;
  return rnd(contribution);
}
dotprecision SPSnode::getEMPSumAIC ( const size_t  n) const

Get the unscaled EMP sum under AIC for tree rooted at this.

Parameters:
nthe total number of points in the histogram, for scaling
{
  dotprecision retValue;
  retValue = 0.0;

  // uses  member function getEMPContributionAIC for leaf result
  if (!(isEmpty()) && isLeaf()) { // this is a non-empty leaf
    retValue = getEMPContributionAIC(n);
  }

  //recurse on the children
  if (getLeftChild()!=NULL) {
    retValue+=getLeftChild()->getEMPSumAIC(n);
  }

  if (getRightChild()!=NULL) {
    retValue+=getRightChild()->getEMPSumAIC(n);
  }
  return retValue;
}
dotprecision SPSnode::getEMPSumCOPERR ( const size_t  n) const

Get scaled EMP sum under COPERR for tree rooted at this.

Parameters:
nthe total number of points in the histogram, for scaling
{
  dotprecision retValue;
  retValue = 0.0;

  // uses  member function getEMPContributionCOPERR for leaf value
  if (!(isEmpty()) && isLeaf()) { // this is a non-empty leaf
    retValue = getEMPContributionCOPERR(n);
  }

  //recurse on the children
  if (getLeftChild()!=NULL) {
    retValue = retValue + getLeftChild()->getEMPSumCOPERR(n);
  }

  if (getRightChild()!=NULL) {
    retValue = retValue + getRightChild()->getEMPSumCOPERR(n);
  }
  return retValue;

}
cxsc::real SPSnode::getL1Distance ( const SPSnode *const  other) const

Gets the L1 distance between this and another subpaving.

The L1 distance is defined as the sum of the absolute values of the differences in 'area' represented by the leaf nodes of this and the other paving. The 'area' represented by a leaf node is the proportion of the total count of the tree the leaf is part of that is in that leaf node.

Throws the following exceptions:

Parameters:
othera pointer to the root of SPSnode tree to calculate the L1 distance against.
Precondition:
other must be a non-NULL pointer. Both this and the node pointed to by other must have boxes and those boxes must be the same.
Postcondition:
this will be unchanged.
{
  cxsc::real retValue(0.0);
  
  if ( other == NULL )
  {
    throw NullSubpavingPointer_Error(
        "SPSnode::getL1Distance(const SPSnode * const)");
  }
  
  if ( isEmpty() || other->isEmpty() ) {
    throw NoBox_Error(
        "SPSnode::getL1Distance(const SPSnode * const)");
  }
  
  if ( getBox() != other->getBox() )
    {
    throw IncompatibleDimensions_Error(
        "SPSnode::getL1Distance(const SPSnode * const)");
    }
  
  std::size_t thisBigN = getCounter();
  std::size_t otherBigN = other->getCounter();
  
  // if both BigN's are zero, L1 distance is 0
  // else if thisBigN is 0 or other bigN is 0 , L1 distance will be 1
  
  if( (thisBigN + otherBigN)) { // at least one of them has some data
  
    if (thisBigN * otherBigN) { // both have some data
    
      cxsc::dotprecision retDP(retValue);
      retDP = _getL1distance(retDP, other, thisBigN, otherBigN);
      retValue = cxsc::rnd(retDP);
    
    }
    else { // only one has some data
    
      retValue += 1.0;
    
    }
  }
  
  return retValue;  
    
}
size_t SPSnode::getLargestLeafCount ( ) const

Get the count in the leaf with the smallest count.

Returns the count of the largest (by count) leaf node.

{
  if (isLeaf() ) {
    return getCounter();
  }
  else {
    size_t myL = getLeftChild()->getLargestLeafCount();
    size_t myR = getRightChild()->getLargestLeafCount();
    return (myL > myR ? myL : myR);
  }
}

Fills in container of leaf counts, left to right.

Traverses the leaves left to right, puts the leaf counts into container.

Parameters:
countsis reference to the container to fill in.
Returns:
the container filled in with leaf counts.
{
  if (getLeftChild() != NULL) {
    getLeftChild()->getLeafNodeCounts(counts);
  }
  if (getRightChild() != NULL) {
    getRightChild()->getLeafNodeCounts(counts);
  }
  if (isLeaf()) {

    counts.push_back(counter);
  }
  return counts;
}

Returns pointers to non-const nodes.

Todo:
This is bad bad bad and should not happen. Change when need for it for testing is over...
{
  //if children, recurse on the children
  if (hasLCwithBox()) {
    getLeftChild()->getLeaves(leaves);
  }

  if (hasRCwithBox()) {
    getRightChild()->getLeaves(leaves);
  }

  if ( isLeaf() ) { // this is a leaf
    leaves.push_back(this);
  }
  return leaves;
}

Accessor for the left child of a node.

Returns a pointer to leftChild node.

Reimplemented from subpavings::SPnode.

{ return (SPSnode*) leftChild; }
size_t SPSnode::getLeftCountIfSplit ( ) const

The count the left child would have if this node was split.

Does not split the nodes, just calculates how many of the data points currently associated with this node would go to the left child if the node were to be split.

Note that the left child's interval on the split dimension would be an open interval.

{
  size_t leftCount = 0;
  
  if (isLeaf()) {
    // first find what the dimension for the split would be 
    // if the split were made
    // right hand child's box would be if that child
    // were to be created
    cxsc::ivector box = getBox();
    
    int split = MaxDiamComp(box);
    
    cxsc::real midSplit = cxsc::mid(box[split]);

    // left child would have everything up to but not including
    // midSplit, on the split dimension
    NodeDataItr it;

    for (it = dataItrs.begin(); it < dataItrs.end(); it++) {
      // DataItrs is a container of iterators to a BigDataCollection
      // increment rightCount if the point is in rC
      if(  (**it)[split] < midSplit ) leftCount++;
    }
  }
  else { // already split
    leftCount = getLeftChild()->getCounter();
  
  }

  return leftCount;
}
real SPSnode::getLogLik ( const size_t  n) const [virtual]

Get this node's contribution to loglikelihood.

A leaf node's contribution to the log likelihood of overall state given the data is (count in leaf x ln(count in leaf / (n * vol of leaf))) where n is the total number of data points in the histogram.

The loglikelihood of an entire tree of nodes comes only from the contributions of the current leaf nodes, but this method will not throw an exception if called on a non-leaf node. This allows 'what-if' type calculations to be carried out (eg what if the node reabsorbed its children)...

Parameters:
nthe value to use for scaling, the total number of points.
Returns:
counter * (ln(counter/(n*volume))) for this node.
{
  // contribution to loglikelihood is counter*log(counter/(n * vol))

  dotprecision contribution(0.0);
  if ((n > 0) && (counter > 0)) {

    accumulate(contribution, 1.0*counter, log(1.0*counter));
    accumulate(contribution, -1.0*counter, log(1.0*n));
    real nv = nodeRealVolume();
    if (nv < cxsc::MinReal) {
      accumulate(contribution, -1.0*counter, log(nodeVolume()));
    }
    else {
      
      accumulate(contribution, -1.0*counter, ln(nv));
    }
  }
  // default cxsc rounding to nearest
  return rnd(contribution);
  
}
rvector SPSnode::getMean ( ) const

Get the sample mean.

Returns:
If means are held (see countsOnly) and there is at least one data point (see getCounter()) then return the sample mean. Otherwise return an rvector of cxsc::SignalingNaN values.
{
  int dimension = getDimension();
  // set up an rvector retMean of the correct dimensions
  rvector retMean(dimension);
  // loop through the elements in the dpSums vector
  for (int i = 0; i< dimension; i++) {

    // if no data elements each element or if only counts are held,
    // that element of the mean is cxsc::SignalingNaN
    if (countsOnly || (counter == 0)) {
      // cxsc::rvector is indexed 1 to n
      retMean[i+1] = cxsc::SignalingNaN;
    }
    // if data elements, find the element-by-element mean
    else {
      // default cxsc rounding dotprecision rnd_next
      retMean[i+1] = rnd(dpSums[i])/(1.0*counter);
    }
  }// end loop through the elements in dpSums

  return retMean;
}
dotprecision SPSnode::getMergeChangeEMPAIC ( ) const

Get change in sum term in EMP under AIC on merge.

Under AIC, EMP is -1 x sum over leaves of (counts in leaf x ln(count in leaf / (n * volume of leaf))) where n is the total number of data points in the histogram.

The merge change only makes sense for a node that is not already a leaf: this method throws an UnfulfillableRequest_Error if called on a leaf node.

Returns:
the change in the sum over leaves of (counts in leaf squared over volume of leaf) which would result if this node were merged.
{
  if (isLeaf() ) {
    throw subpavings::UnfulfillableRequest_Error(
          "SPSnode::getMergeChangeEMPAIC()");
  }
  return -getMergeChangeLogLik();
}
dotprecision SPSnode::getMergeChangeEMPCOPERR ( const size_t  n) const

Get scaled change in sum term in EMP under COPERR on merge.

Under COPERR, EMP is -1/n^2 x sum over leaves of (counts in leaf squared / (n * volume of leaf)) where n is the total number of data points in the histogram.

The merge change only makes sense for a node that is not already a leaf: this method throws an UnfulfillableRequest_Error if called on a leaf node.

Parameters:
nthe value to use for scaling, the total number of points.
Returns:
the change in the sum over leaves of (counts in leaf squared over (n * volume of leaf) which would result if this node were merged.
{
  if (isLeaf() ) {
    throw subpavings::UnfulfillableRequest_Error(
          "SPSnode::getMergeChangeEMPCOPERR(const size_t)");
  }
  
  // change is 1/(n^2 * vol) x (2(lc_count^2 + rc_count^2) - counter^2)
  // Change is scaled by n, total points in histogram
  dotprecision change;
  change = 0.0;
  
  if ((n > 0) && (counter > 0)) {
    
    real nv = nodeRealVolume();
          
    if (nv < cxsc::MinReal) {
      change = cxsc::Infinity;
    }
    else {
      
      // first find what the left hand child's counter is
      size_t leftCount = getLeftChild()->getCounter();

      // and right child
      size_t rightCount = counter - leftCount;

      // current number of data points associated to node is counter
      // current node volume from nodeVolume, and each child will have half

      accumulate(change, (1.0*leftCount)/(1.0*n),
                (2.0*leftCount)/(n*nodeVolume()));
      accumulate(change, (1.0*(rightCount))/(1.0*n),
                (2.0*(rightCount))/(n*nodeVolume()));
      accumulate(change, -(1.0*counter)/(n*nodeVolume()),
                  (1.0*counter)/(1.0*n));
    }
  }
  return change;
}
dotprecision SPSnode::getMergeChangeLogLik ( ) const [virtual]

Get change in log likelihood on merge of this' leaf chidren.

log likelihood is sum over leaves of (counts in leaf x ln(count in leaf / (n * volume of leaf))) where n is the total number of data points in the histogram.

The merge change only makes sense for a node that is not already a leaf: this method throws an UnfulfillableRequest_Error if called on a leaf node.

Returns:
the change in the sum over leaves of (counts in leaf squared over volume of leaf) which would result if this node's leaf chilren were merged back into this.
{
  if (isLeaf() ) {
    throw subpavings::UnfulfillableRequest_Error(
          "SPSnode::getMergeChangeLogLik()");
  }
  dotprecision change(0.0);
  
  // if counter is 0 there can be no change on merging
  if (counter > 0) {

    // first find what the left hand child's counter is
    size_t leftCount = getLeftChild()->getCounter();

    // and right child
    size_t rightCount = getRightChild()->getCounter();

    // change is (count*ln(count)
    //      - (lc_count*ln(lc_count) + rc_count*ln(rc_count) +
    //          count*ln(2))
    // note that the terms involving the total count in the histogram
    // and the volume of this node cancel so this change
    // is effectively scaled and does not need to use n

    dotprecision currentEMP(0.0);
    dotprecision childEMP(0.0);
    if (leftCount > 0) accumulate(childEMP, 1.0*leftCount,
             log(1.0*leftCount));
  
    if (rightCount > 0) accumulate(childEMP, 1.0*rightCount,
             log(1.0*rightCount));
  
    accumulate(childEMP, 1.0*counter, log(2.0));

    accumulate(currentEMP, 1.0*counter, log(1.0*counter));

    change = currentEMP - childEMP;
  }
  return change;
}

Smallest number of points in either child if this was split.

Does not split the nodes, just calculates how many of the data points currently associated with this node would go to the left and right child if the node were to be split.

{
  size_t min = getLeftCountIfSplit();
  if ((counter - min) < min) min = counter - min;
  return min;
  
}
std::pair< size_t, cxsc::real > SPSnode::getNonEmptyBoxSummary ( ) const

Get summary information on non-empty leaf box numbers and volumes.

Returns:
A pair, where the first value in the pair is the total number of non-empty leaf boxes in the tree rooted at this, and the second value is the proportion of the total volume of this that is in the non-empty leaf boxes in the tree rooted at this.
{
  if(isEmpty()) 
    throw NoBox_Error("SPSnode::getNonEmptyBoxSummary");
    
  size_t nNonEmptyBoxes = 0;
  real vNonEmptyBoxVolumes(0.0);
  
  accumulateNonEmptyBoxSummary(nNonEmptyBoxes, vNonEmptyBoxVolumes);
  
  return std::pair <size_t, real >(nNonEmptyBoxes, 
                vNonEmptyBoxVolumes/nodeRealVolume());
  
}

Accessor for the parent of a node.

Returns a pointer to parent node.

Reimplemented from subpavings::SPnode.

{ return (SPSnode*) parent; }

Accessor for the right child of a node.

Returns a pointer to rightChild node.

Reimplemented from subpavings::SPnode.

{ return (SPSnode*) rightChild; }

The count the right child would have if this node was split.

Does not split the nodes, just calculates how many of the data points currently associated with this node would go to the right child if the node were to be split.

Note that the left child's interval on the split dimension would be a closed interval.

{
  size_t rightCount = 0;
  
  if (isLeaf()) {
    rightCount = counter - getLeftCountIfSplit();
  }
  else {
    rightCount = getRightChild()->getCounter();
  }
  
}

Get the count of the leaf with the smallest count.

Returns the count in the smallest (by count) leaf node.

{
  if (isLeaf() ) {
    return getCounter();
  }
  else {
    size_t myL = getLeftChild()->getSmallestLeafCount();
    size_t myR = getRightChild()->getSmallestLeafCount();
    return (myL < myR ? myL : myR);
  }
}
dotprecision SPSnode::getSplitChangeEMPAIC ( ) const

Get change in sum term in EMP under AIC on split.

Under AIC, EMP is -1 x sum over leaves of (counts in leaf x ln(count in leaf / (n * volume of leaf))) where n is the total number of data points in the histogram.

The split change only makes sense for a leaf node, but the calculation can be carried out for a non-leaf and so this method will not throw an exception if called on a non-leaf node.

Returns:
the change in the sum over leaves of (counts in leaf squared over volume of leaf) which would result if this node were split.
{
  return -getSplitChangeLogLik();

}
dotprecision SPSnode::getSplitChangeEMPCOPERR ( const size_t  n) const

Get scaled change in sum term in EMP under COPERR on split.

Under COPERR, EMP is -1/n^2 x sum over leaves of (counts in leaf squared / (n * volume of leaf)) where n is the total number of data points in the histogram.

The split change only makes sense for a leaf node, but the calculation can be carried out for a non-leaf and so this method will not throw an exception if called on a non-leaf node.

Parameters:
nthe value to use for scaling, the total number of points.
Returns:
the change in the sum over leaves of (counts in leaf squared over (n * volume of leaf) which would result if this node were split.
{
  #ifdef EMPSDEBUG
    cout << "in getSplitChangeEMPCOPERR, n = " << n << endl;
  #endif
  
  // change is 1/(n^2 * vol) x (counter^2 - 2(lc_count^2 + rc_count^2))
  // if we split and lc_count, rc_count were the new counts in
  // left and right children respectively
  // Change is scaled by n, total points in histogram
  dotprecision change;
  change = 0.0;
  
  #ifdef EMPSDEBUG
    cout << "counter = = " << counter << endl;
  #endif
  
  if ((n > 0) && (counter > 0)) {
    // first find what the left hand child's counter would be if that child
    // were to be created
    size_t leftCount = getLeftCountIfSplit();

    #ifdef EMPSDEBUG
      cout << "leftCount = " << leftCount << endl;
    #endif
    // current number of data points associated to node is counter
    // current node volume from nodeVolume, and each child will have half

    real nv = nodeRealVolume();
    #ifdef EMPSDEBUG
      cout << "node volume = " << nv << endl;
      cout << "1.0/((n*1.0)*nv) = " << (10.0/((n*1.0)*nv)) << endl;
      cout << "1.0/((n*1.0)*MinReal) = " << (10.0/((n*1.0)*MinReal)) << endl;
    #endif
      
    if (nv < cxsc::MinReal) {
      change = -cxsc::Infinity;
    }
    else {
      accumulate(change, (1.0*counter)/((n*1.0)*nv),
                (1.0*counter)/(1.0*n));
      #ifdef EMPSDEBUG
        cout << "change stage 1 rnd(change) = " << rnd(change) << endl;
      #endif
      accumulate(change, (1.0*leftCount)/(1.0*n),
                -(2.0*leftCount)/((n*1.0)*nv));
      #ifdef EMPSDEBUG
        cout << "change stage 2 = " << rnd(change) << endl;
      #endif
      accumulate(change, (1.0*(counter - leftCount))/(1.0*n),
                -(2.0*(counter - leftCount))/((1.0*n)*nv));
      #ifdef EMPSDEBUG
        cout << "change stage 3 = " << rnd(change) << endl;
      #endif
    }
  }
  #ifdef EMPSDEBUG
    cout << "returning rnd(change) = " << rnd(change) << endl;
  #endif
  return change;
}
dotprecision SPSnode::getSplitChangeLogLik ( ) const [virtual]

Get change in log likelihood on split of this node.

log likelihood is sum over leaves of (counts in leaf x ln(count in leaf / (n * volume of leaf))) where n is the total number of data points in the histogram.

The split change loglikelihood only makes sense for a leaf node, but the calculation can be carried out for a non-leaf and so this method will not throw an exception if called on a non-leaf node.

Returns:
the change in the sum over leaves of (counts in leaf squared over volume of leaf) which would result if this node were split.
{
  
  dotprecision change(0.0);
  
  // if counter is 0 there can be no change on splitting
  if (counter > 0) {

    // first find what the left hand child's counter would be if
    // that child were to be created
    size_t leftCount = getLeftCountIfSplit();

    // current number of data points associated to node is counter
    size_t rightCount = counter-leftCount;

    // current node volume from nodeVolume; each child will have half

    // change is
    //      (lc_count*ln(lc_count) + rc_count*ln(rc_count) +
    //          count*ln(2)) - (count*ln(count)
    // if we split and lc_count, rc_count were the new counts in
    // left and right children respectively
    // note that the terms involving the total count in the histogram
    // and the volume of this node cancel so this change
    // is effectively scaled and does not need to use n

    dotprecision currentEMP(0.0);
    dotprecision childEMP(0.0);

    if (leftCount > 0) accumulate(childEMP, 1.0*leftCount,
             log(1.0*leftCount));

    if (rightCount > 0) accumulate(childEMP, 1.0*rightCount,
             log(1.0*rightCount));

    accumulate(childEMP, 1.0*counter, log(2.0));

    accumulate(currentEMP, 1.0*counter, log(1.0*counter));

    change = childEMP - currentEMP;
  }
  return change;
}

Returns pointers to non-const nodes.

Todo:
This is bad bad bad and should not happen. Change when need for it for testing is over...
{
  if (isSubLeaf()) { // this is a subleaf
    subleaves.push_back(this);
  }
  
  //else if children, recurse on the children
  else if (!isLeaf()) {
    getLeftChild()->getSubLeaves(subleaves);

    getRightChild()->getSubLeaves(subleaves);
  }

  return subleaves;
  
}
cxsc::real SPSnode::getUnscaledTreeLogLik ( ) const [virtual]

Get the unscaled log likihood for the tree rooted at this.

The unscaled log likelihood is sum over leaves of ( count in leaf x ( ln(count in leaf) + depth of leaf * ln2) )

To get the scaled log likelihood we would add ln (1/(n x vol)^n ) = -n x ln (n x vol) where n is the counter for this and vol is the volume of the box associated with this. If we are taking ratios of likelihoods for trees rooted at nodes with the same vol and n (eg different states of a tree rooted at this) we can often ignore this scaling factor.

Returns:
the unscaled log likihood for the tree rooted at this.
Precondition:
this has a box.
{
  if(isEmpty()) 
    throw NoBox_Error("SPSnode::getUnscaledTreeLogLik()");
    
  dotprecision nlogn(0.0);
  int ndepth = 0.0;
  int depth = 0;
  
  _getUnscaledTreeLogLik(nlogn, ndepth, depth);
  // this accumulates values in nlogn, ndepth
  
  dotprecision ndlog2(0.0);
  accumulate (ndlog2, cxsc::Ln2_real, 1.0*ndepth);
  
  return (cxsc::rnd(nlogn + ndlog2));
  

}
SPSnode * SPSnode::insertOneFind ( BigDataItr  newItr,
OPERATIONS_ON  childInd,
const SplitDecisionObj boolTest 
)

Tries to insert data into the leaves of the tree rooted at this.

Passes down the tree, seeking the leaf node whose box contains the data point.

If the box that this represents does not contain the data point then the return value is NULL.

If the box that this represents contains the data point, statistics held by this are updated (subject to the value of countsOnly for this).

If this also a leaf node then the data is associated with this leaf and the address of this becomes the return value. If the SplitDecisionObj boolTest determines that this should expand following the insertion of the data, this is expanded.

If the box that this represents contains the data point but this is not a leaf node, the return value will be the address of the ultimate leaf descendent of this that contains the data point.

Note:
For efficiency, there is no check that this has a box to check for containment against. The results of trying this operation on a node with no box are undefined. There is also no check that the data point given has the same dimension as this; the results of trying this operation with a point with incompatible dimensions are undefined.
Parameters:
newItran iterator to the data in big data collection.
childIndan indicator for whether the current node is a treated as a left or right child or a root.
boolTestis a reference to an object determining whether to split a node when a data point arrives.
Returns:
a pointer to the node the data was 'inserted' into, (before it was split, if insertion of the data was followed by expansion of the node), or NULL if no insert.
{
  rvector newData = *newItr;

  // start at the top

  SPSnode* retObj = NULL;

  if(nodeContains(newData, childInd)) {
    
    recalculateStats(newData);

    bool wasLeaf = (isLeaf());
    
    // if it is a leaf, add the data and return this object
    if(wasLeaf) {

      dataItrs.push_back(newItr);

      // give this node as return value
      retObj = this;

      // split if we need to
      if (boolTest(this)) {
        // expand and split data to children

        nodeExpand(boolTest);

      } // end if we need to split
      
    } // end of wasLeaf

    // if not a leaf before we had split, and contains data
    // recurse on the children if any
    if (!wasLeaf) {
      
      if( rightChild != NULL && !rightChild->isEmpty() ){
        
        retObj =
        (getRightChild())->insertOneFind(
          newItr, ON_RIGHT, boolTest);
      }
      // only try left if we did not find on the right
      if(retObj == NULL && leftChild!=NULL &&
                !leftChild->isEmpty()) {
                  
        retObj =
        (getLeftChild())->insertOneFind(newItr,
        ON_LEFT, boolTest);
      }
    }

  } // end if node contains

  // will return null if does not contain the data

  return retObj;
}
bool SPSnode::isSplittableNode ( ) const [virtual]

Return boolean to indicate if node is splittable.

A node is splittable if the node volume is >= 2 * cxsc::MinReal (the smallest representable real number).

Reimplemented from subpavings::SPnode.

bool SPSnode::isSplittableNode ( size_t  minChildPoints,
double  minVol 
) const

Method to check whether a node is splittable.

Decides whether a node is splittable based on checking volume and number of points that would result in child nodes on split.

Node must satisfy the basic isSplittableNode() test and volume must be >=2*minVol to split and if minChildPoints > 0, then

  • either the node must have at least minChildPoints and all the points go to one of the children (the other getting none)
  • or the smallest number of points which would go to the either of the prospective new children must be >= minChildPoints
Note:
if minVol is < minimum for passing isSplittableNode() then node will fail test even if node volume > minVol, ie isSplittableNode() overrides all other criteria.

Thus in general the method will only return true if the given node satisfies both the minVol test and, if it were to be split, both children would have at least minChildPoints data points, but if all the data points would go to one child (none to the other), this will also satisfy the minChildPoints test.

If the node has already been split, the test will use the actual numbers of points in the children; if the node is a leaf (ie not split) then the test will consider the number of points that would go to the each child if it were to be split.

Parameters:
minChildPointsis the minimum number of points that there would be in the children if the node were to be split.
minVolis the minimum node volume to be tested for. This is the minimum volume that each child is permitted to have, i.e. to be splittable the node volume itself must be >= 2*minVol.
Returns:
true if has been a test conditions satisfied, false otherwise.
{
    bool retValue = true;
  
  if (minVol > 0.0) retValue = (!(nodeVolume() < 2*minVol));
  
  if (retValue) {
    retValue = isSplittableNode(minChildPoints);
  }
  else {
    #ifdef DEBUG_CHECK_NODE_COUNT
      cout << "isSplittableNode: node failed vol test" << endl;
    #endif
  }
  return retValue;
}
bool SPSnode::isSplittableNode ( size_t  minChildPoints) const

Method to check whether a node is splittable.

Decides whether a node is splittable based number of points that would result in child nodes on split.

Node must satisfy the basic isSplittableNode() test and if minChildPoints > 0, then

  • either the node must have at least minChildPoints and all the points go to one of the children (the other getting none)
  • or the smallest number of points which would go to the either of the prospective new children must be >= minChildPoints

Thus in general the method will only return true if, if it were to be split, both children would have at least minChildPoints data points, but if all the data points would go to one child (none to the other), this will also satisfy the minChildPoints test.

If the node has already been split, the test will use the actual numbers of points in the children; if the node is a leaf (ie not split) then the test will consider the number of points that would go to the each child if it were to be split.

Parameters:
minChildPointsis the minimum number of points that there would be in the children if the node were to be split.
Returns:
true if has been a test conditions satisfied, false otherwise.
{
    bool retValue = isSplittableNode();  //basic check
  if (!retValue) {
    #ifdef DEBUG_CHECK_NODE_COUNT
      cout << "isSplittableNode: node failed basic is splittable test" << endl;
    #endif
    #ifdef DEBUG_MCMC_SPLIT_FAIL
      cout << "Failed isSplittableNode: I am " << nodeName << endl;
      {
        ivector box = getBox();
        interval maxD = box[MaxDiamComp(box)];
  
        cout << cxsc::SaveOpt;
        cout << Scientific << SetPrecision(35,30);
        cout << "interval to be split is " << maxD << endl;
        cout << cxsc::RestoreOpt;
      }
    #endif
  }
  #ifdef DEBUG_CHECK_NODE_COUNT
    cout << "isSplittableNode minChildPoints = " << minChildPoints << endl;
  #endif
  if (retValue && minChildPoints > 0) {
    retValue = false; // need to retest
  
    size_t  minChildCount = getMinChildCountIfSplitNEW();
    
        
    if ( (counter >= minChildPoints) &&
      ((minChildCount == 0) || (minChildCount >= minChildPoints)) ) {
        retValue = true;
      }
    #ifdef DEBUG_CHECK_NODE_COUNT
      cout << "isSplittableNode minChildCount = " << minChildCount << endl;
      cout << "(minChildCount >= minChildPoints) = " << (minChildCount >= minChildPoints) << endl;
  
      cout << "isSplittable = " << retValue << endl;
    #endif
  }
    return retValue;
}
std::ostream & SPSnode::leafOutputTabs ( std::ostream &  os) const [protected, virtual]

Output for a node in a binary tree, tab-delimited.

Output intended for a txt file, in numeric form only.

Replaces the format that that the cxsc::<< operator produces for interval vectors. The format used here produces alpha-numeric tab-delimited data. The format for an n-dimensional interval vector is:

nodeName [tab] volume [tabl] counter [tab] Inf(ivector[1]) [tab] Sup(ivector[1]) [tab] ... [tab] Inf(ivector[n]) [tab] Sup(ivector[n])

Parameters:
osis the stream to send to.

Reimplemented from subpavings::SPnode.

{
  if(theBox != NULL) { // do nothing if there is no box
  
    ivector thisBox = *theBox; // copy of theBox
  
    // output the node name, nodeVolume, counter
    os << nodeName;
    
    os << "\t" << nodeRealVolume();
    os << "\t" << counter;
    // followed by the intervals of box using Inf and Sup
    // ie unlike cxsc output, there is no [  ] around them

    for (int i= Lb(thisBox); i <= Ub(thisBox) ; i++) {

      os << "\t" << Inf(thisBox[i])
        << "\t" << Sup(thisBox[i]);
    }

  }
}
std::ostream & SPSnode::leafOutputTabsWithEMPs ( const size_t  bigN,
std::ostream &  os,
int  prec = 5 
) const [protected, virtual]

Output for a node in a binary tree, tab-delimited.

Output intended for a txt file, in numeric form only.

Replaces the format that that the cxsc::<< operator produces for interval vectors. The format used here produces alpha-numeric tab-delimited data.

The format for a d-dimensional interval vector is

nodeName [tab] volume [tab] counter [tab] scaled EMP contribution COPERR [tab] change in scaled EMP contribution COPERR if split [tab] scaled EMP contribution AIC [tab] change in scaled EMP contribution AIC if split [tab] Inf(ivector[1]) [tab] Sup(ivector[1].[tab] . . [tab] Inf(ivector[d]) [tab] Sup(ivector[d]

Parameters:
bigNis total datapoints, used by the emps calculation
osis the stream to send to
precis the precision used for printing, defaulting to 5
{
  if(theBox != NULL) { // do nothing if there is no box

    
    ivector thisBox = *theBox; // copy of theBox

    // output the name, nodeVolume, counter
    os << nodeName;
    
    os << cxsc::SaveOpt;
    os << cxsc::Variable << cxsc::SetPrecision(prec+2,prec);
    
    os << "\t" << nodeRealVolume();
    os << "\t" << counter;
    // EMP contributions and changes if split
    os << "\t" << getEMPContributionCOPERR(bigN);
    os << "\t" << rnd(getSplitChangeEMPCOPERR(bigN));
    os << "\t" << getEMPContributionAIC(bigN);
    os << "\t" << rnd(getSplitChangeEMPAIC());
    
    // followed by the intervals of box using Inf and Sup
    // ie unlike cxsc output, there is no [  ] around them
    for (int i= Lb(thisBox); i <= Ub(thisBox) ; i++) {

      os << "\t" << Inf(thisBox[i])
        << "\t" << Sup(thisBox[i]);
    }
    os << cxsc::RestoreOpt;

  }
}
std::ostream & SPSnode::leafOutputTabsWithHistHeight ( const size_t  bigN,
std::ostream &  os,
int  prec = 5 
) const [protected, virtual]

Output for a node in a binary tree, tab-delimited.

Output intended for a txt file, in numeric form only.

Includes the height of a bin represented by this leaf node for a normalised histogram, ie counter/(node volume * total count in tree)

Replaces the format that that the cxsc::<< operator produces for interval vectors. The format used here produces alpha-numeric tab-delimited data. The format for a d-dimensional interval vector is:

nodeName [tab] volume [tab] counter [tab] counter/(volume*total count) [tab] Inf(ivector[1]) [tab] Sup(ivector[1]) [tab] ... [tab] Inf(ivector[d]) [tab] Sup(ivector[d])

Parameters:
bigNis total data points, used by the height calculation
osis the stream to send to
precis the precision used for printing, defaulting to 5
{
  if(theBox != NULL) { // do nothing if there is no box

    ivector thisBox = *theBox; // copy of theBox
    
    // output the node name, nodeVolume, counter, counter/(bigN * vol)
    os << cxsc::SaveOpt;
    os << cxsc::Variable << cxsc::SetPrecision(prec+2,prec);
    
    cxsc::real vol = nodeRealVolume();
    
    os << setprecision(prec);
    os << "\t" << vol;
    os << "\t" << counter;
    os << "\t" << (1.0 * counter)/(vol * (1.0* bigN));
    // followed by the intervals of box using Inf and Sup
    // ie unlike cxsc output, there is no [  ] around them
    
    for (int i= Lb(thisBox); i <= Ub(thisBox) ; ++i) {

      os << "\t" << Inf(thisBox[i])
        << "\t" << Sup(thisBox[i]);
    }
    os << cxsc::RestoreOpt;

  }
}
std::ostream & SPSnode::leafOutputTabsWithHistHeightAndEMPs ( const size_t  bigN,
std::ostream &  os,
int  prec = 5 
) const [protected]

Output for a node in a binary tree, tab-delimited.

Output intended for a txt file, in numeric form only.

Replaces the format that that the cxsc::<< operator produces for interval vectors. The format used here produces alpha-numeric tab-delimited data.

Includes the height of a bin represented by this leaf node for a normalised histogram, ie counter/(node volume * total count in tree)

The format for a d-dimensional interval vector is

nodeName [tab] volume [tabl] counter [tab] counter/(volume*total count) scaled EMP contribution COPERR [tab] change in scaled EMP contribution COPERR if split [tab] scaled EMP contribution AIC [tab] change in scaled EMP contribution AIC if split [tab] Inf(ivector[1]) [tab] Sup(ivector[1].[tab] . . [tab] Inf(ivector[d]) [tab] Sup(ivector[d]

Parameters:
bigNis total data points, used by emps and height calculations
osis the stream to send to
precis the precision used for printing, defaulting to 5
{
  if(theBox != NULL) { // do nothing if there is no box

    ivector thisBox = *theBox; // copy of theBox
    
    // output the node name, nodeVolume, counter, counter/(n * vol)
    os << nodeName;
    
    os << cxsc::SaveOpt;
    os << cxsc::Variable << cxsc::SetPrecision(prec+2,prec);
    
    cxsc::real vol = nodeRealVolume();
    
    os << "\t" << vol;
    os << "\t" << counter;
    os << "\t" << (1.0 * counter)/(vol * (1.0*bigN));
    // EMP contributions and changes if split
    os << "\t" << getEMPContributionCOPERR(bigN);
    os << "\t" << rnd(getSplitChangeEMPCOPERR(bigN));
    os << "\t" << getEMPContributionAIC(bigN);
    os << "\t" << rnd(getSplitChangeEMPAIC());

    // followed by the intervals of box using Inf and Sup
    // ie unlike cxsc output, there is no [  ] around them
    for (int i= Lb(thisBox); i <= Ub(thisBox) ; i++) {

      os << "\t" << Inf(thisBox[i])
        << "\t" << Sup(thisBox[i]);
    }
    os << cxsc::RestoreOpt;

  }
}
std::ostream & SPSnode::leavesOutputTabsWithEMPs ( const size_t  bigN,
std::ostream &  os,
int  prec = 5 
) const [virtual]

Output for for all leaves of a binary tree.

Output intended for a txt file, in numeric form only.

Includes scaled COPERR and AIC contributions and changes if split.

Parameters:
bigNis total data points, used by emps and height calculations
osis the stream to send to
precis the precision used for printing, defaulting to 5
{
  if (parent == NULL) { // root
    std::string headers = "node \t vol \t count \t EMP COPERR ";
    headers += "\t &change \t EMP AIC \t &change \t dimensions \n";
    os << headers;
  }

  // uses  member function leafOutputTabsWithEMPs to generate node output
  if (!(isEmpty()) && isLeaf()) { // this is a non-empty leaf
    leafOutputTabsWithEMPs(bigN, os, prec);
    return (os << "\n");

  }

  //recurse on the children
  if (getLeftChild()!=NULL) {
    getLeftChild()->leavesOutputTabsWithEMPs(bigN, os, prec);
  }

  if (getRightChild()!=NULL) {
    getRightChild()->leavesOutputTabsWithEMPs(bigN, os, prec);
  }

}
std::ostream & SPSnode::leavesOutputTabsWithHistHeight ( std::ostream &  os,
int  prec = 5 
) const [virtual]

Output for for all leaves of a binary tree.

Output intended for a txt file, in numeric form only.

Includes output for the height of histogram bins for a normalised histogram based on tree with total number of data points of this root.

Parameters:
osis the stream to send to
precis the precision used for printing, defaulting to 5
{
  leavesOutputTabsWithHistHeight(counter, os, prec);
  return (os);
}
std::ostream & SPSnode::leavesOutputTabsWithHistHeight ( const size_t  bigN,
std::ostream &  os,
int  prec = 5 
) const [virtual]

Output for for all leaves of a binary tree.

Output intended for a txt file, in numeric form only.

Includes output for the height of histogram bins for a normalised histogram based on tree with total number of data points of this root.

Parameters:
bigNis total data points, used by emps and height calculations
osis the stream to send to
precis the precision used for printing, defaulting to 5
{
  // uses  member function leafOutputTabs to generate node output
  if (!(isEmpty()) && isLeaf()) { // this is a non-empty leaf
    leafOutputTabsWithHistHeight(bigN, os, prec);
    return (os << "\n");

  }

  //recurse on the children
  if (getLeftChild()!=NULL) {
    getLeftChild()->leavesOutputTabsWithHistHeight(bigN, os, prec);
  }

  if (getRightChild()!=NULL) {
    getRightChild()->leavesOutputTabsWithHistHeight(bigN, os, prec);
  }

}
std::ostream & SPSnode::leavesOutputTabsWithHistHeightAndEMPs ( const size_t  bigN,
std::ostream &  os,
int  prec = 5 
) const [virtual]

Output for for all leaves of a binary tree.

Output intended for a txt file, in numeric form only.

Recursively uses leafOutputTabsWithHistHeightAndEMPs() to output information for each leaf node. Includes output for the height of histogram bins for a normalised histogram based on tree with total number of data points n.

Parameters:
bigNis total data points, used by emps and height calculations
osis the stream to send to
precis the precision used for printing, defaulting to 5
{
  if (parent == NULL) { // root
    std::string headers = "node \t vol \t count \t height ";
    headers += "\t EMP COPERR \t &change \t EMP AIC \t &change ";
    headers += "\t dimensions \n";
    os << headers;
  }

  // uses  member function leafOutputTabsWithEMPs to generate node output
  if (!(isEmpty()) && isLeaf()) { // this is a non-empty leaf
    leafOutputTabsWithHistHeightAndEMPs(bigN, os, prec);
    return (os << "\n");

  }

  //recurse on the children
  if (getLeftChild()!=NULL) {
    getLeftChild()->leavesOutputTabsWithHistHeightAndEMPs(bigN,
                                os, prec);
  }

  if (getRightChild()!=NULL) {
    getRightChild()->leavesOutputTabsWithHistHeightAndEMPs(bigN,
                                os, prec);
  }

}
std::ostream & SPSnode::nodeDataPrint ( std::ostream &  os) const [protected, virtual]

Print the data in a specified format.

Replaces the format that the cxsc::<< operator produces for vectors. The format used here produces numeric tab-delimited data. The format for an n-dimensional real vector data point is:

rvector[1] [tab] . . . [tab] rvector[n]

{
  if (!dataItrs.empty()) {

    NodeDataItr dataItr;
    
    int dimension = getDimension();
    
    for (dataItr = dataItrs.begin();
      dataItr!= dataItrs.end(); dataItr++) {

      BigDataItr bigIt = *dataItr;
      rvector theData = *bigIt;

      for (int i = 1; i < dimension + 1; i++) {
        os << theData[i] << "  "; // print data
      }   // end loop through data elements

    } // end loop through data container
  } // end if counter > 0
  // if no data, ie counter = 0, then just return os unaltered

  return os;
}
void SPSnode::nodeExpand ( int  comp) [virtual]

Expand a leaf node.

Expand a leaf node to have two children and pass data down to the children with no further splitting.

Equivalent to bisecting a box in a regular subpaving. Makes two new sibling child nodes of this.

Parameters:
compis the dimension on which to to bisect theBox.

Reimplemented from subpavings::SPnode.

{
  nodeExpansionOnly(comp);    // expand the node
  SplitNever sn;              // dummy split decision maker
  splitData(sn);            // split the data with no further splitting


}
void SPSnode::nodeExpand ( const SplitDecisionObj boolTest,
int  comp 
) [virtual]

Expand a leaf node.

Expand a leaf node to have two children and pass data down to the children, allowing for further splitting.

Parameters:
boolTestis a reference to an object providing a function operator determining whether to split a node when a data point arrives.
compis the dimension on which to to bisect theBox.
{
  nodeExpansionOnly(comp);    // expand the node
  // split the data, allowing for further splitting

  splitData(boolTest);
}
void SPSnode::nodeExpand ( ) [virtual]

Expand a leaf node.

Expand a leaf node to have two children and pass data down to the children with no further splitting.

Finds the splitting dimension.

Reimplemented from subpavings::SPnode.

{
  int maxdiamcomp; // variable to hold first longest dimension
  double temp = MaxDiam(getBox(), maxdiamcomp);
  nodeExpand(maxdiamcomp); // complete nodeExpand

}
void SPSnode::nodeExpand ( const SplitDecisionObj boolTest) [virtual]

Expand a leaf node.

Expand the leaf node to have two children and pass data down to the children, allowing for further splitting.

Finds the dimension to split on.

Parameters:
boolTestis a reference to an object providing a function operator determining whether to split a node when a data point arrives.
{
  int maxdiamcomp; // variable to hold first longest dimension
  double temp = MaxDiam(getBox(), maxdiamcomp);
  nodeExpand(boolTest, maxdiamcomp); // complete nodeExpand
}
void SPSnode::nodeExpansionOnly ( int  comp) [protected, virtual]

Expand the node with no reallocation of data.

Bisect box, make two new nodes (one for each half box) and graft onto this node provided that this node is a leaf. Equivalent to bisecting a box in a regular subpaving.

{
  if ( isEmpty() ) {
    throw NoBox_Error("SPSnode::nodeExpansionOnly(int )");
  }
  // only do something if this SPSnode is a leaf
  if (isLeaf()) {
    
    SPSnode* newLC = NULL;
    SPSnode* newRC = NULL;
    
    try {
      // ivectors to become boxes for new children
      ivector lC, rC;

      // Call Lower() and Upper() to put the split
      // boxes into lC and rC respectively
      Lower(getBox(), lC, comp);
      Upper(getBox(), rC, comp);
      
      // when making new children, use constructor
      // that will give space indication (for data)
      // of the size of this node's dataItrs
      size_t space = dataItrs.size();

      newLC = new SPSnode(lC, space, countsOnly);
      newRC = new SPSnode(rC, space, countsOnly);
      
      nodeAddLeft(newLC);
      nodeAddRight(newRC);

      //name the new children
      getLeftChild()->setNodeName(nodeName + "L");
      getRightChild()->setNodeName(nodeName + "R");

      // store the split dimension in this
      splitDim = comp;

      // store the split value in this
      // the split value is the infinum of interval
      // of right child box for dimension split on
      splitValue = _double(Inf(
        ((getRightChild())->getBox())[comp]));
    }
    catch(std::exception& e) {
      // overkill, but try to deal with all eventualities...
      try {
        if (getLeftChild() != NULL) {
          delete (getLeftChild());
          leftChild = NULL;
        }
        
        if (getRightChild() != NULL) {
          delete (getRightChild());
          rightChild = NULL;
        }
        if (newLC != NULL) {
          delete newLC;
          newLC = NULL;
        }
        if (newRC != NULL) {
          delete newRC;
          newRC = NULL;
        }
      }
      catch(std::exception& e) {} //catch and swallow
      
      throw; // rethrow original exception
    }
  }
}
std::ostream & SPSnode::nodePrint ( std::ostream &  os) const [virtual]

Output details of a specific node.

This is intended for console output or output to a mixed alpha and numeric file.

Parameters:
osis the stream to send to

Reimplemented from subpavings::SPnode.

{
  // output for box in form:
  // box, volume, counter, mean, variance covariance, and data
  
  if(theBox != NULL) { // do nothing if there is no box
  
    ivector thisBox = *theBox; // copy theBox

    os << "Box is :";

    for (int i = Lb(thisBox); i <= Ub(thisBox) ; i++) {
      // c-xsc default output for intervals
      os << "  " << thisBox[i];
    }

    os << std::endl;
    os << "Box volume is " << nodeRealVolume() << std::endl;
    os << "Counter = " << counter << std::endl;

    os << "Mean is ";
    nodeMeanPrint(os);
    os << "\nVariance-covariance matrix is" << std::endl;
    nodeVarCovarPrint(os);
    os << "Data is" << std::endl;
    nodeDataPrint(os);
    os << std::endl;

    os << std::endl;
  }

  return os;

}
void SPSnode::nodeReabsorbChildren ( ) [virtual]

Reabsorbs both children of the node.

Effectively reverses any split of the node.

Data associated with the children is pushed back up to this and the splitDim and splitValue reset to leaf defaults.

Works even if the children are not leaves.

Reimplemented from subpavings::SPnode.

{
  // first recursively deal with the children of the children
  if (hasLCwithBox())
    getLeftChild()->nodeReabsorbChildren();
  if (hasRCwithBox())
    getRightChild()->nodeReabsorbChildren();

  if (hasLCwithBox()) {
    getLeftChild()->gatherData(dataItrs);
    delete getLeftChild();
    
  }

  if (hasRCwithBox()) {
    getRightChild()->gatherData(dataItrs);
    delete getRightChild();
    
  }

  // reset splitDim and splitValue to their defaults
  splitDim = -1;
  splitValue = 0.0;

  leftChild = NULL;
  rightChild = NULL;
}
std::string SPSnode::nodeStringSummary ( ) const [virtual]

Get a string summary of this node's properties.

Just this node, not this and its descendents.

Returns:
the string summary.

Reimplemented from subpavings::SPnode.

{
  std::ostringstream oss;
  
  oss << "I am " << getNodeName() << "(address " << this << "),\n";
  oss << "Dimension is " << getDimension(); 
  if (isEmpty() ) {
    oss << ", the box is NULL\n"; 
  }
  else {
    oss << ", address of box is " << theBox << "\n"; 
  }
  oss << "spaceIndication is " << spaceIndication << ", splitDim is " << splitDim << ", splitValue is " << splitValue << "\n"; 
  oss << "address of dataItrs is " << &dataItrs << ", countsOnly is " << countsOnly << ", counter is " << counter << "\n";
  oss << "getMean is ";
  nodeMeanPrint(oss);
  oss << "\ngetVarCovar is\n";
  nodeVarCovarPrint(oss);
  oss << "my parent is ";
  if (getParent() != NULL) oss << getParent()->getNodeName();
  else oss << "NULL"; 
  oss << ", my left child is ";
  if (getLeftChild() != NULL) oss << getLeftChild()->getNodeName();
  else oss << "NULL"; 
  oss << ", my right child is ";
  if (getRightChild() != NULL) oss << getRightChild()->getNodeName();
  else oss << "NULL";
  
  return oss.str();
  
}
void SPSnode::recalculateOptionalStatsAndGatherData ( NodeData container) [protected, virtual]

Recalculates optional summary statistics associated with node from scratch, using all the data currently held.

Recalculates sums and sumproducts.

Uses only the data currently associated with the node in the the recalculation.

Does not recalculate non-optional summaries, like counter, because these should be right already

{
  countsOnly = false;
  
  // make sure the current containers are empty
  VecDotPrec().swap (dpSums);
  VecDotPrec().swap (dpSumProducts);
  
  NodeData myContainer;

  if (!isLeaf()) {
    
    if (hasLCwithBox()) {
      getLeftChild()->recalculateOptionalStatsAndGatherData(myContainer);
    }
    if (hasRCwithBox()) {
      getRightChild()->recalculateOptionalStatsAndGatherData(myContainer);
    
    }
  }
  else { // is a leaf
    // just my data
    
    myContainer = dataItrs;
    
  }
  
  // now do my stats
  assert(myContainer.size() == getCounter() );
  recalculateOptionalStatsForData(myContainer);
  
  // add my data to the container
  container.insert(container.end(),
          myContainer.begin(),
          myContainer.end());
  
}
void SPSnode::recalculateStats ( const rvector &  newdata) const [protected, virtual]

Recalculate summary statistics associated with node.

Recalculates counter and sums (used for mean) and sumproducts (used for variance-covariance).

{
  counter++;  // update the counter

  if (!countsOnly) {

    recalculateSums(newdata); // update the sums

    recalculateSumProducts(newdata); // update the sumproducts
  }
}
void SPSnode::recalculateSumProducts ( const rvector &  newdata) const [protected, virtual]

Recalculate summary statistics associated with node.

Recalculates sumproducts (used for variance-covariance).

{
  /* the sumproducts can be thought of as an nxn matrix,
  which is implemented here as a nxn element vector of
  dotprecision variables, using row-major order.
  Ie the m-th element (m = 0, . . . nxn-1) is in row floor(m/n)
  and column m-rowxn in the matrix configuration.
  Or, the sumproduct of elements i and j in an rvector,
  i,j = 0,...,n-1, is element m=(ixn+j) of the sumproducts
  vector. */
  int dimension = getDimension();
    
  if (dpSumProducts.empty()) {    //nothing there yet
    // reserve space for all elements
    dpSumProducts.reserve(dimension*dimension);

    // for each dimnsn^2 of data, initialise element
    for (int i = 0; i< (dimension*dimension); i++) {
      dotprecision dp;
      dp = 0.0;
      dpSumProducts.push_back(dp);
    }
  }

  // make a dot precision variable out of the ith element
  // and jth element of the of the rvector of new data and
  // store in dpSumProducts.
  for (int i = 1; i < dimension + 1; i++) {
    // only need to do columns 1 to i because of symmetry
    for (int j = 1; j< i + 1; j++) {

      int index = (i-1)*dimension + (j-1);
      // rvectors indexed 1 to n
      accumulate(dpSumProducts[index],
          newdata[i], newdata[j]);

      //if not on the diagonal of the matrix,
      // we can also fill in the symmetric element
      if (i!=j) {
        int sym_index = (j-1)*dimension
          + (i-1);
        dpSumProducts[sym_index] =
          dpSumProducts[index];
      } // end if
    }// end j-loop
  }// end i-loop

  // sumproducts has been updated for new datapoint
}
void SPSnode::recalculateSums ( const rvector &  newdata) const [protected, virtual]

Recalculate summary statistics associated with node.

Recalculates sums (used for mean).

{
  int dimension = getDimension();
    
  if (dpSums.empty()) {   //nothing in the sums yet
    // reserve space in dpSums for all elements of the mean
    
    dpSums.reserve(dimension);

    // for each dimnsn of data, initialise element
    for (int i = 0; i< dimension; i++) {
      dotprecision dp;
      dp = 0.0;
      dpSums.push_back(dp);
    }
    
  }

  // make a dot precision variable out of the ith element
  // of the rvector of new data and store in dpSums
  for (int i = 1; i< dimension + 1; i++) {
    // rvectors indexed 1 to n, vectors indexed 0 to n-1
    
    accumulate(dpSums[i-1], newdata[i], 1.0);
  }

}
void SPSnode::reshapeToUnion ( const SPnode other) [virtual]

Reshape so that the tree rooted at this has shape that is the union of this shape and the shape of another tree.

Throws a NoBox_Error if this has no box or if other has no box.

Throws an IncompatibleDimensions_Error if boxes of this and other are not the same.

Throws an std::runtime_error if other has an illegal state (see checkTreeStateLegal()).

Parameters:
otheris the tree to make the union against.
Precondition:
This has a box and that box is identical to the box of other.
Postcondition:
the tree rooted at this has shape that is the union of the shape of this before the operation and the shape of other. other is unchanged.

Reimplemented from subpavings::SPnode.

void SPSnode::reshapeToUnion ( const SPnode other,
size_t  minChildPoints 
)

Reshape so that the tree rooted at this has shape that is as close as possible to the union of this shape and the shape of another tree.

If other is more split than this, this will not exactly follow the shape of other if the resulting nodes would not splittable according to isSplittableNode(size_t minChildPoints). If any node cannot be split to follow the shape of other due to minChildPoints, a message will be printed to std::cerr.

Throws a NoBox_Error if this has no box or if other has no box.

Throws an IncompatibleDimensions_Error if boxes of this and other are not the same.

Throws an std::runtime_error if other has an illegal state (see checkTreeStateLegal()).

Parameters:
otheris the tree to make the union against.
minChildPointsis the minumum child points to use to check if this can be split in order to follow other.
Precondition:
This has a box and that box is identical to the box of other.
Postcondition:
the tree rooted at this has shape that is the union of the shape of this before the operation and the shape of other. other is unchanged.
{
  if (isEmpty() || other.isEmpty()) {
    throw NoBox_Error(
    "SPnode::reshapeToUnion(const SPnode&, size_t)");
  }
  if ( getBox() != other.getBox() )  {
    throw IncompatibleDimensions_Error(
    "SPnode::reshapeToUnion(const SPnode&, size_t)");
  }
  if ( !other.checkTreeStateLegal() )
    throw runtime_error(
    "SPnode::reshapeToUnion(const SPnode&, size_t) : other has illegal tree state");
  
  std::string baseErrorFilename("ReshapeErrors");
  std::string errorFilename = getUniqueFilename(baseErrorFilename);
  
  bool success = this->_reshapeToUnion(
          &other, minChildPoints, errorFilename);
  
  // if we returned success there should be no file with that name
  if(!success) {
    std::cerr << "\nCould not exactly reshape this to the union:"
      << " check " << errorFilename << " for errors\n" << endl;
  }
  
}
void SPSnode::setCountsOnly ( bool  setTo) [virtual]

Set an indicator controlling whether this maintains all available statistics or not.

If the node is changed from holding all available statistics to not holding all available statistics, the existing values held to enable calculation of the unwanted statistics are discarded.

If the node is changed from not holding all available statistics to holding all available statistics, values enabling the calculation of the optional statistics are recalculated.

Throws a NonRootNode_Error if this is not a root node.

Parameters:
setTothe indicator for whether only counts are maintained (setTo = true) or whether all available statistics are maintained (setTo = false).
Precondition:
this is a root node.
Postcondition:
countsOnly = setTo. If setTo is true, values enabling the calculation of optional statistics are not held. If setTo is false, values enabling the calculation of optional statistics are held.
{
  if (getParent() != NULL) {
    throw NonRootNode_Error("SPSnode::setCountsOnly(bool)");
  }
  
  if (setTo) { // change to keeping counts only
    stripOptionalStatsOnly(); // clears recursively on this and children
  }
  else { // change to not keeping counts only, ie to keeping all stats
  
    addOptionalStatsOnly(); // adds recursively on this and children
  }
}
void SPSnode::splitData ( const SplitDecisionObj boolTest) [protected, virtual]

Send the data associated with this down to children.

Children may then be resplit using boolTest.

{

  // check that both children exist
  if (!hasLCwithBox() || !hasRCwithBox()) {
    throw UnfulfillableRequest_Error(
            "SPSnode::splitData(const SplitDecisionObj&)");
  }

  NodeDataItr dataItr; // iterator

  //divvie the data up amongst the children
  for (dataItr = dataItrs.begin();
    dataItr!= dataItrs.end(); dataItr++) {
    BigDataItr newItr = *dataItr;

    //calls insertOneFind on the children of this node
    // so stats are not recalculated for this node itself
    SPSnode* reinsertedInto = NULL;

    if(rightChild!=NULL && !rightChild->isEmpty()) {

      reinsertedInto =
        (getRightChild())->insertOneFind(
        newItr, ON_RIGHT, boolTest);
    }

    // only try the left if it's not on the right
    if(reinsertedInto==NULL && leftChild!=NULL
    && !leftChild->isEmpty()) {

      reinsertedInto =
        (getLeftChild())->insertOneFind(
        newItr, ON_LEFT, boolTest);
    }

  }

  clearNodeData();         //clear the data in this node
}
bool SPSnode::splitRootAtLeastToShapeSPS ( std::vector< size_t >  reqDepths,
size_t  minPoints = 0 
)

Try to get tree rooted at this into shape at least as deep as described by instruction reqDepths.

Tries to ensure that this has at least shape of reqDepths, but allows the tree rooted at this to also be more split at some or all nodes.

This can only follow the instruction if the nodes it is required to split are splittable nodes, with respect to both width and minPoints (to be split a node must return isSplittableNode(minPoints) = true).

Parameters:
reqDepthsa collection of leaf levels to try to split to.
minPointsthe minimum child points used to determine whether a node is splittable (defaults to 0).
Returns:
true if this was able to follow the instruction, false otherwise (ie if instruction could not be followed because nodes were not splittable).
Precondition:
this has a box.
this is a root node.
reqDepths describes a valid string (including is not empty).
Postcondition:
If returns true, the tree rooted at this has at least the shape described by reqDepths; if returns false, the tree rooted at this will have been split according to reqDepths, working from the right, until a node that cannot be split needs to be split (all attempts to split will then have ceased).
{
  if ( isEmpty() ) {
    throw NoBox_Error("SPSnode::splitRootAtLeastToShape(...)");
  }
  if (getParent() != NULL) {
    throw NonRootNode_Error("SPSnode::splitRootAtLeastToShape(...)");
  }
  
  size_t myDepth = 0;
  
  bool success = _splitAtLeastToShapeSPS(reqDepths,
                myDepth,
                minPoints);
  
  if (success && !reqDepths.empty()) 
    throw std::invalid_argument("SPSnode::splitRootAtLeastToShape(...)");
  
  return success;
  
}
void SPSnode::stripData ( ) const [protected, virtual]

Strip the data from this node and its children, seting counters to 0 and discarding other statistics held, and discarding data held by leaf nodes.

{
  //strip me
  counter = 0;
  VecDotPrec().swap (dpSums);
  VecDotPrec().swap (dpSumProducts);
  NodeData().swap (dataItrs);

  // and children
  if (getLeftChild() != NULL) {
    getLeftChild()->stripData();
  }
  if (getRightChild() != NULL) {
    getRightChild()->stripData();
  }
}
void SPSnode::stripOptionalStatsOnly ( ) [protected, virtual]

Discard the values held to enable the calculation of optional statistics, and set countsOnly to true, in this and the children of this.

{
  if (!countsOnly) {
    //strip me of the sum collectors only
    VecDotPrec().swap (dpSums);
    VecDotPrec().swap (dpSumProducts);
    countsOnly = true;
  }
  
  // and children
  if (getLeftChild() != NULL) {
    getLeftChild()->stripOptionalStatsOnly();
  }
  if (getRightChild() != NULL) {
    getRightChild()->stripOptionalStatsOnly();
  }
}
void SPSnode::swapSPS ( SPSnode spn)

Swap this and another node.

Swaps all the data members of this with the other node.

Parameters:
spna reference to the node to swap with
Postcondition:
this is identical,in terms of its data members, to spn before the swap, and spn is identical to this after the swap.
{
  /* theBox, parent, leftChild,
  rightChild and nodeName are inherited from base class */
  SPnode::swap(spn); // use the base version
  
  std::swap(spaceIndication, spn.spaceIndication);
  std::swap(counter, spn.counter);
  std::swap(dpSums, spn.dpSums);
  std::swap(dpSumProducts, spn.dpSumProducts);
  std::swap(dataItrs, spn.dataItrs);
  std::swap(splitDim, spn.splitDim);
  std::swap(splitValue, spn.splitValue);
  std::swap(countsOnly, spn.countsOnly);
  
}
void SPSnode::unionTreeStructure ( const SPSnode *const  rhs)

Makes the non-minimal union of the tree structures of this and another node.

Discards the actual data. The resulting structure returned has no data associated with it.

Throws the following exceptions:

  • Throws a NoBox_Error if this has no box.
  • Throws a IncompatibleDimensions_Error if the node pointed to by rhs has a box but the dimensions and sizes of the boxes of this and the node pointed to by rhs are not the same ( however, it is acceptable for rhs to be a NULL pointer or to point to a node having no box).
Parameters:
rhsa pointer to the root of SPSnode tree to make a union against, which can be a NULL pointer. rhs is unchanged by the operation.
Precondition:
This must have a box. If the node pointed to by rhs has a box, the size and dimensions of that box must be the same as those for this's box.
Postcondition:
the tree rooted at this will have the the non-minimal union of its original paving structure and that of the tree tooted at the node pointed to by rhs but no data: all counts will be 0, means and variance-covariances will be undefined, and all data containers will be empty.
{
  if ( isEmpty() ) {
    throw NoBox_Error(
      "SPSnode::unionTreeStructure(const SPSnode * const)");
  }
  if ( rhs != NULL && !rhs->isEmpty() && (getBox() != rhs->getBox()) )
    {
      throw IncompatibleDimensions_Error(
        "SPSnode::unionTreeStructure(const SPSnode * const)");
    }
  // if rhs is null or empty we carry on
  
  unionNoData(rhs);
  
}

Member Data Documentation

Determines the amount of statistical summary data in node.

If true, only counts are maintained in the node. If false, counts and other statistics (sums, sumproducts) are also maintained.

Setting countsOnly to true can help to conserve memory when means (from sums) and covariances (from sumproducts) will not be needed.

A container for the association of data with a node.

Data is associated with a node via this container of iterators. The iterators can, very loosely, in the sense in which they are used here, be thought of as pointers to a big data collection of all data points. Only leaf nodes can have anything in this container. However, not all leaf nodes will necessarily have something in this container: the container will be empty if no data points are covered by the box represented by a leaf node.

A container representing the sumproduct matrix of the data points covered by theBox.

The sumproducts matrix is used to obtain the sample variance-covariance matrix.

The for n-dimensional data the sample variance-covariance matrix is an nxn matrix where the element in row i, column j is the sample covariance between the ith-dimension and jth-dimension of the data, which is [sumproduct(i,j)-sum(i)sum(j)/counter]/(counter-1).

So by keeping the sum product and sums up to date, we can calculate a covariance on demand.

The sumproducts can be thought of as an nxn matrix where the element in row i, column j is the sum over all the datapoints associated with that box of the products of the ith element and jth element in the datapoints. ie for each datapoint, we take the product of the ith and jth elements and then sum the products over all the datapoints.

Data points are rvectors so each element is a real, and the the accumulation (sum) of products of reals is implemented here with a dotprecision accumulator.

The sumproduct matrix is stored here as a nxn element vector of dotprecision variables (where n is the dimensions of the rvectors or data points), using row-major order.

Ie the m-th element (m = 0, . . . nxn-1) is in row floor(m/n) and column m-rowxn in the matrix configuration.

Or, the sumproduct of elements i and j in an rvector, i,j = 0,...,n-1, is element m=(ixn+j) of the sumproducts vector.

A container representing the sum of the data points covered by theBox.

The sums are used for calculating the mean and also the sample variance-covariance matrix for the data associated with a node.

cxsc::dotprecision accumulators are used to maintain the sum of the data in each dimension of the data because floating point arithmetic can result in inaccuracies during summation, especially in large boxes.

We could use Kahan summation instead with a lot more work. Kahan summation relies on adding a number of points in a sequence and recovering data lost in one summation during the next one. When we simply add two numbers, Kahan summation has no chance to recover the lost part. We would have to implement this by keeping the lost part, say having a vector of pairs, and trying to re-add the lost part each time. See http://en.wikipedia.org/wiki/Kahan_summation_algorithm . The same may apply to arguments for using gsl_mean using a simpler reccurence relation. Speed comparisons have not been performed on the three alternative possible implementations of the recursively computable sample sum or sample mean. However we get most relable and accurate sums using cxsc::dotprecision accumulators.

An indication of the maximum number of data points a node needs to carry.

This is used for efficiency only to reserve vector space and a node can have more than this maximum number of data points associated with it. Defaults to defaultMaxPts.

Dimension the node's box has been split along.

This is needed to accurately divide data between a node's left and right children. Defaults to -1 if the node is a leaf.

The value, on split dimension, where node's box was split.

This is needed to accurately divide data between a node's left and right children. Defaults to 0.0 if the node is a leaf.


The documentation for this class was generated from the following files:
 All Classes Namespaces Functions Variables Typedefs Enumerations Friends