MRS  1.0
A C++ Class Library for Statistical Set Processing
subpavings::FunctionEstimatorInterval Class Reference

A wrapper or manager for an IntervalMappedSPnode tree to be used for function estimation using interval enclosures over subpaving boxes. More...

List of all members.

Classes

class  IntervalBandAreaGainOnSplitMeasurer
class  IntervalBandAreaLossOnMergeMeasurer
class  NodePtrMeasurePair

Public Member Functions

 FunctionEstimatorInterval (const ivector &v, const MappedFobj &f, int lab=0)
 Initialised constructor.
 FunctionEstimatorInterval (const SPnode &spn, const MappedFobj &f, int lab=0)
 Initialised constructor.
 FunctionEstimatorInterval (const FunctionEstimatorInterval &other)
 Copy constructor.
 ~FunctionEstimatorInterval ()
 Destructor.
const MappedFobjgetFobjReference () const
 Get the reference to the function object used by this.
int getLabel () const
 Get the label.
void setLabel (int lab)
 Set the label.
bool hasSubPaving () const
 Get whether this has a subpaving to manage.
cxsc::ivector getRootBox () const
 Get the box of the subpaving managed by this.
cxsc::real getRootRangeDiameter () const
 Get diameter of the interval image of the root box of the subpaving managed by this.
int getDimensions () const
 get the dimensions of the subpaving this manages.
cxsc::real getDomainVolume () const
 get volume of the root box of the subpaving this manages.
size_t getRootLeaves () const
 Gets number of leaf nodes in the root paving.
IntVec getLeafLevels () const
std::string getLeafLevelsString () const
subpavings::PiecewiseConstantFunction makePiecewiseConstantFunction () const
 Make a PiecewiseConstantFunction out of this estimator.
void bruteForceEstimate (cxsc::real tolerance)
 Create an estimate of the function described by fobj using a depth-first brute force approach.
void breadthFirstBruteForceEstimate (const SPCheckVisitor &checker, size_t maxLeaves, long unsigned int seed=1234)
 Create an estimate of the described by fobj using a breadth-first brute force approach with a maximum number of leaves.
void breadthFirstBruteForceEstimate (const SPCheckVisitor &nodeChecker, const FEIEvalObj &fe, long unsigned int seed=1234)
void breadthFirstBruteForceEstimate (size_t maxLeaves, long unsigned int seed=1234)
void breadthFirstBruteForceEstimate (const FEIEvalObj &fe, long unsigned int seed=1234)
void hullPropagation ()
 Tighten up the range enclosures of the estimate.
bool prioritySplitOnGain (const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234)
 prioritySplitOnGain.
bool prioritySplitOnGain (size_t maxLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234)
 prioritySplitOnGain.
bool priorityMerge (const IntervalMappedSPnode::Measurer &measure, size_t minLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Measure for ordering of priority queue supplied by the user and a seed for the random number generator supplied by the user.
std::ostream & outputRootToStreamTabs (std::ostream &os, int prec=5) const
 Output all nodes of the subpaving managed by this to a given stream.
void outputLog (const std::string &s, int i, int prec=5) const
 Append current state of estimator to a txt log file.
std::string stringSummary () const
 Get a string summary of this estimator's properties.
prioritySplit methods.

These methods takes an estimator and progressively split using a priority queue of splittable nodes to determine which node to split first. The ordering for the queue is referred to here as the measure.

Note:
These overloadings of the prioritySplit function are supplied because it is much more efficient to check for number of leaves directly than by using a function object.

The default measure used is the 'area' of the the interval band on each piece of the current function estimate, where the area is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. This is effectively the difference of the Reimann integral to the top of the interval enclosure of the box against the Reimann integral to the bottom of the interval enclosure of the box (the 'Reimann Difference'. Pieces with the largest areas will be split first.

Note that the Reimann Difference does not provide a globally optimal ordering for the queue in the sense that it will not necessarily provide the tightest function enclosure for a given number of leaves. This measure may tend to prioritise the bisection of pieces with large volumes but small interval enclosures (ie large pieces where the function is almost flat) over pieces where the Reimann Difference is smaller but the function is sufficiently variable within the piece for there to be the potential for considerable tightening of the enclosure by bisection.

Splitting continues until there are maxLeaves leaves, or there are no more splittable nodes.

Each node in the subpaving managed by this decides for itself whether it is splittable, using isSplittableNode().

If more than one splittable node is equally 'large', on the basis of the measure used, then a random choice is made between all equally large nodes to find the node which will be split.

The seed for the random number generator used for random selection between equally 'large' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple estimates, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time; use of the internally set seed will give the same results each time.

Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.

Throws an std::logic_error if the state of this is not 'legal', ie if this contains cherries that do not pass isSplittableNode().

Throws an std::logic_error if the split becomes muddled because of some failure within the logic of the algorithm itself.

Aborts if there are no splittable leaves left (or none at the start).

Parameters:
measureis an instance of a class providing a function for comparing spsnodes, to order the nodes to prioritise splitting.
maxLeavesis number of leaves to aim for in the estimator.
loggingan enum controlling whether estimator creation output is sent to a log file.
seedis a seed for the random number generator.
Returns:
true if the priority split was successful, false otherwise (returns false if the split could not be started , ie none of the leaves of the initial state is splittable, or if the split had to be aborted).
Precondition:
hasSubPaving() == true.
The state of this is legal ie this does not include cherries that should not have been split according to isSplittableNode().
Postcondition:
if the method returned true, then this has maxLeaves leaves; if the method returned false then splitting had to be aborted before maxLeaves leaves was reached.
bool prioritySplit (const IntervalMappedSPnode::Measurer &measure, const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Measure for ordering of priority queue supplied by the user, and seed for the random number generator supplied by the user.
bool prioritySplit (const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Uses the default measure (the Reimann difference for each piece of the function estimate) and a seed for the random number generator supplied by the user.
bool prioritySplit (const IntervalMappedSPnode::Measurer &measure, size_t maxLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Version where measure is supplied by the user.
bool prioritySplit (size_t maxLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Version using the default measure, the 'Reimann Difference'.
priorityMergeOnLoss methods.

These methods takes an estimator and progressively merge using a priority queue of mergeable nodes to determine which node to merge first. The ordering for the queue is referred to here as the measure.

The measure used is the increase in the 'area' of the the interval band on each cherry piece of the current function estimate resulting from a merge of two halves of the piece. This is seen as a loss in the sense of being a loss in the tightness of the function enclosure, or an increase in the Reimann Difference. The area of a piece is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. The loss of tightness resulting from a merge of two halves of a piece is (area of band for merged piece) - (sum of areas of bands for the two halves of the piece). Pieces with the smallest 'loss' measured in this way will be merged first. Note that this does not provide a globally optimal ordering for the queue.

Merging continues until some criteria specified by the function evaluator fe (applying either to individual nodes or to the estimator as a whole) is satisfied, or there are no more mergeable nodes.

If more than one mergeable node is equally 'small', on the basis of the measure used, then a random choice is made between all equally small nodes to find the node which will be merged.

The seed for the random number generator used for random selection between equally 'small' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple estimates, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time; use of the internally set seed will give the same results each time.

Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.

Throws an std::logic_error if the state of this is not 'legal', ie if this contains cherries that do not pass isSplittableNode().

Aborts if there are no mergeable cherries left.

Parameters:
feis the function evaluator, an instance of a class which provides a criterion to determine when to stop merging.
loggingan enum controlling whether estimator creation output is sent to a log file.
seedis a seed for the random number generator.
Returns:
true if the priority merge was successful, false otherwise (returns false if the merge could not be started , ie there were no cherries in the initial state, or if the merge had to be aborted before fe is satisfied).
Precondition:
hasSubPaving() == true.
The state of this is legal ie this does not include cherries that should not have been split according to isSplittableNode().
Postcondition:
if the method returned true, then the state of this satisfies fe; if the method returned false then merging had to be aborted before fe was satisfied.
bool priorityMergeOnLoss (const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Uses the loss in tightness of the function enclosure as measured by the Reimann Difference on a merge as the priority queue ordering measure and a seed for the random number generator supplied by the user.
bool priorityMergeOnLoss (size_t minLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Uses the loss in tightness of the function enclosure as measured by the Reimann Difference on a merge as the priority queue ordering measure and a seed for the random number generator supplied by the user.
priorityMerge methods.

These methods takes an estimator and progressively merge using a priority queue of mergeable nodes to determine which node to merge first. The ordering for the queue is referred to here as the measure.

The default measure used is the 'area' of the the interval band on each piece of the current function estimate, where the area is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. This is effectively the difference of the Reimann integral to the top of the interval enclosure of the box against the Reimann integral to the bottom of the interval enclosure of the box (the 'Reimann Difference'. Children of cherry node pieces with the smallest areas will be merged first.

Note that the Reimann Difference does not provide a globally optimal ordering for the queue.

Merging continues until there are at most minLeaves nodes, or there are no more mergeable nodes.

If more than one mergeable node is equally 'small', on the basis of the measure used, then a random choice is made between all equally small nodes to find the node which will be merged.

The seed for the random number generator used for random selection between equally 'small' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple estimates, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time; use of the internally set seed will give the same results each time.

Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.

Throws an std::logic_error if the state of this is not 'legal', ie if this contains cherries that do not pass isSplittableNode().

Aborts if there are no mergeable cherries left.

Parameters:
measureis an instance of a class providing a function for comparing spsnodes, to order the nodes to prioitise merging.
minLeavesis the number of leaves to aim for.
loggingan enum controlling whether estimator creation output is sent to a log file.
seedis a seed for the random number generator.
Returns:
true if the priority merge was successful, false otherwise (returns false if the merge could not be started , ie there were no cherries in the initial state, or if the merge had to be aborted before there were minLeaves).
Precondition:
hasSubPaving() == true.
The state of this is legal ie this does not include cherries that should not have been split according to isSplittableNode().
Postcondition:
if the method returned true, then this has minLeaves leaves; if the method returned false then merging had to be aborted before the number of leaves was reduced to minLeaves.
bool priorityMerge (const IntervalMappedSPnode::Measurer &measure, const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Measure for ordering of priority queue supplied by the user and a seed for the random number generator supplied by the user.
bool priorityMerge (const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Uses the Reimann Difference of the cherries as the priority queue ordering measure and a seed for the random number generator supplied by the user.
bool priorityMerge (size_t minLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234)
 Uses the Reimann Difference of the cherries as the priority queue ordering measure and a seed for the random number generator supplied by the user.
bool splitToShape (std::string instruction)
 Split an estimator to a specified shape.
cxsc::real getTotalAreaOfIntervalBand () const
 Get the total "area" of the interval bands estimating the function represented by this.
cxsc::real getMaximumAreaOfIntervalBand () const
 Get the maximum "area" of the interval band over all the leaves of the subpaving managed by this.
cxsc::real getMaximumIntervalTolerance () const
 Get the maximum interval tolerance over all the leaves of the subpaving managed by this.
std::ostream & outputToStreamTabs (std::ostream &os, int prec=5) const
 Output the subpaving managed by this to a given stream.
void outputToTxtTabs (const std::string &s, int prec=5) const
 Output the subpaving managed by this to a txt file.
void outputToTxtTabs (const std::string &s, int prec, bool confirm) const
void outputRootToTxt (const std::string &s, int prec=5) const
 Output details of full sample (from root) to txt file.
void outputRootToTxt (const std::string &s, int prec, bool confirm) const

Detailed Description

A wrapper or manager for an IntervalMappedSPnode tree to be used for function estimation using interval enclosures over subpaving boxes.

The box of the root of the subpaving managed by a FunctionEstimatorInterval can be thought of as the domain and the boxes of the leaves of the subpaving can be thought of as the pieces in the partition of that domain for the current function estimate. The interval on each piece is an interval enclosure of the function to be estimated over that piece.

Todo:
should this class? also have the ability to simulate in the mrs way, ie squeeze etc: only this one knows about the top of the interval or the fobj target. But we could also use a piecewise constant function?

Constructor & Destructor Documentation

FunctionEstimatorInterval::FunctionEstimatorInterval ( const ivector &  v,
const MappedFobj f,
int  lab = 0 
)

Initialised constructor.

Initialised with domain box.

Throws a MalconstructedBox_Error if the box is not suitable as the basis of a subpaving (eg, box has no dimensions, or the box has a thin interval on at least one dimension).

Ideal constructor when the support domain of the function is set a priori.

Parameters:
vThe box to use for the subpaving to be managed.
fThe function to be estimated by this.
labThe label for this (defaults to 0).
Postcondition:
The estimator constructed has subpaving that consists of single leaf node (the root) with a box like v) and the range on that single node is the the interval image of v under the function represented by f.
         : rootPaving(NULL), fobj(f), label(lab)
{
    try {
        // check the box here
        if (!checkBox(v)) {
      throw subpavings::MalconstructedBox_Error(
      "FunctionEstimatorInterval::FunctionEstimatorInterval(const ivector&, const MappedFobj&, int lab)");
    }
        rootPaving = new IntervalMappedSPnode(v);
    
    IntervalEstimator estimator(fobj);
    rootPaving->acceptSPValueVisitor(estimator);
  }
    catch (exception const& e) {
    constructor_error_handler();
    }
}
FunctionEstimatorInterval::FunctionEstimatorInterval ( const SPnode spn,
const MappedFobj f,
int  lab = 0 
)

Initialised constructor.

Initialised with a subpaving.

Parameters:
spnA subpaving to copy as the subpaving to be managed.
fThe function to be estimated by this.
labThe label for this (defaults to 0).
Precondition:
spn has a box.
Postcondition:
The estimator constructed has label lab, a subpaving that that is a copy of spn and the range on each box in that subpaving is the interval image of that box under the function represented by f.
         : rootPaving(NULL), fobj(f), label(lab)
{
    try {
        // check spn has box
    if (spn.isEmpty()) {
      throw subpavings::NoBox_Error(
      "FunctionEstimatorInterval::FunctionEstimatorInterval(const SPnode&, MappedFobj&, int lab");
    }
        rootPaving = new IntervalMappedSPnode(spn);
    
    IntervalEstimator estimator(fobj);
    rootPaving->acceptSPValueVisitor(estimator);
  }
    catch (exception const& e) {
    constructor_error_handler();
    }
}

Member Function Documentation

void FunctionEstimatorInterval::breadthFirstBruteForceEstimate ( const SPCheckVisitor checker,
size_t  maxLeaves,
long unsigned int  seed = 1234 
)

Create an estimate of the described by fobj using a breadth-first brute force approach with a maximum number of leaves.

The over all aim to create an estimate of the described by fobj such that each of the leaf nodes of the subpaving managed by this meets whatever requirments are specified in checker. If this is not possible given maxLeaves, an estimate with at most maxLeaves leaves is created which is approximately as close as possible to the requirments of checker. It is only approximately as close as possible because although the method works across the current leaves of the same depth or level first before descending (ie level by level), if the number of leaves reaches maxLeaves mid-way through a 'level' the process will stop, leaving the remainder of the leaves of that level unvisited.

After the operation, the subpaving managed by this may have leaf nodes which do not meet the requirements specified in checker because the maximum number of leaves was was reached before those nodes were split.

This operation works breadth first, ie working from a leaf root it will successively split the root then one of the new children and then the other, and only then the children of the children. The child split first is randomly selected to avoid a strict left to right (or right to left) traversal of each level.

Note:
If this operation is applied to this when the subpaving managed by this already has leaf nodes which exceed the requirements of checker, those leaf nodes will be left unaltered by this operation.
Parameters:
checkeran instance of type SPCheckVisitor which determines if a leaf node needs to be split further.
maxLeavesthe maximum number of leaves the function estimate should have at the end of the process.
seeda seed for the random number generator used to determine which of a node's children is revisited first.
Precondition:
hasSubPaving() == true.
The state of this is legal ie this does not include cherries that should not have been split according to isSplittableNode().
Postcondition:
the leaf nodes of the subpaving managed by this have a range which is the interval image of the box of the node under the function represented by this and either the estimate has maxLeaves leaves and the requirements specified by checker are not met for all leaf nodes or the estimate has less than or equal to maxLeaves leaves and the requirements specified by checker are met for all leaf nodes.
{
  string errorMsg(
    "FunctionEstimatorInterval::breadthFirstBruteForceEstimate(...)");
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(errorMsg);
  }

  gsl_rng * r = NULL;
  
  try {
    
    NodeQueueT nq;
    
    _setupBreadthFirstQueue(nq, nodeChecker);
          
    r = gsl_rng_alloc (gsl_rng_mt19937);
          
    gsl_rng_set(r, seed);
    
    // actual number of leaves in subpaving
    size_t numLeaves = getRootLeaves();
      
    #ifdef MYDEBUGBFMIN
      clock_t start = clock();
    #endif
  
    IntervalEstimator estimator(fobj);
    
    /* split until there are no more leaves needing splitting 
    or there are enough leaves */
    while (!nq.empty() && numLeaves < maxLeaves) {
      
      _breadthFirstQueueLoop(
          nodeChecker, nq, estimator, r);
      
      /* actual number of leaves has increased by one 
      irrespective of whether any have actually gone into the queue*/
      numLeaves++;
      
      #ifdef MYDEBUGBFMIN
        if (numLeaves%10000 == 0) {
          double time = static_cast<double>(clock()-start)/CLOCKS_PER_SEC;
          cout << numLeaves << "\t" << time << "\t" << getTotalAreaOfIntervalBand() << endl;
          start = clock();
        }
      #endif
      #ifdef MYDEBUGBF
          cout << "current estimator is " << endl;
          outputToStreamTabs(cout, 10);
          cout << "getTotalAreaOfIntervalBand() = " << (getTotalAreaOfIntervalBand()) << "\n\n" << endl;
      #endif
    }// loop
    
    try {
      if (NULL != r) {
        gsl_rng_free(r);
        r = NULL;
      } 
      // free the random number generator
    }
    catch (exception const& ee) {} // catch and swallow
  
    #ifdef MYDEBUGBF
      cout << "\nEnd of queue " << endl;
      if (nq.empty()) cout << "ran out of nodes to expand" << endl;
      else cout << "number of leaves is " << getRootLeaves() << endl;

    #endif
  }
  
    catch (exception const& e) {
        try {
      if (NULL != r) {
        gsl_rng_free(r);
        r = NULL;
      } 
      // free the random number generator
    }
    catch (exception const& ee) {} // catch and swallow
    throw; // rethrow original exception
    }

}
void FunctionEstimatorInterval::bruteForceEstimate ( cxsc::real  tolerance)

Create an estimate of the function described by fobj using a depth-first brute force approach.

The over all aim to create an estimate of the described by fobj such that each of the leaf nodes of the subpaving managed by this has a box associated with it that meets the interval image tolerance requirement specified by tolerance.

The interval image tolerance requirement is that the diameter of the interval image of the box associated with any leaf node of of the subpaving managed by this should be less than or equal to the tolerance.

But in some cases this can result in boxes being split beyond the limits of the real number screen. If a split of a leaf node that does not meet the interval image tolerance requirement would result in the child nodes' volumes being < the value of the cxsc::MinReal (the smallest representable real number) then that leaf will not be split.

This means that after the operation, the subpaving managed by this may have leaf nodes where the interval image tolerance requirement is not met.

This operation works depth first, ie working from a leaf root it will successively split the root then the root's left child and that child's, left child, etc. It will only start splitting the right child of any node already split when all the descendents of the left child meet the interval image tolerance requirement (or they cannot be split further). This may cause the machine to run out of memory before the process is completed.

Note:
If this operation is applied to this when the subpaving managed by this already has leaf nodes which exceed the interval image tolerance requirement, those leaf nodes will be left unaltered by this operation.
Parameters:
tolerancedescribes the tolerance to be used in making the estimate.
Precondition:
hasSubPaving() == true.
tol >= cxsc::MinReal.
Postcondition:
the leaf nodes of the subpaving managed by this have a range which is the interval image of the box of the node under the function represented by this and either the interval image tolerance requirement is met for each leaf node or that leaf node is not splittable.
{
  string errorMsg(
    "FunctionEstimatorInterval::bruteForceEstimate(cxsc::real)");
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(errorMsg);
  }
  
  IntervalExpanderEstimator estimator(fobj, tolerance);

  getSubPaving()->acceptSPExpandVisitor(estimator);

}

get the dimensions of the subpaving this manages.

Returns:
0 if this does not have a subpaving, else returns the dimensions of the subpaving.
{
  int retValue = 0;
  if (hasSubPaving()) {
    retValue = getSubPaving()->getDimension();
  }
  return retValue;
}

get volume of the root box of the subpaving this manages.

Returns:
volume of the root box of the subpaving this manages, or 0.0 if this has no subpaving.
{
  real retValue(0.0);
  if (hasSubPaving()) {
    retValue = getSubPaving()->nodeRealVolume();
  }
  return retValue;
}

Get the reference to the function object used by this.

Returns:
the function object reference used by this.
{return fobj;}

Get the label.

Returns:
the label for this.
{return label;}

Get a vector of the leaf node levels.

Root is level 0, next level down is 1, etc.

Returns:
a vector of leaf levels, left to right order,
{
    IntVec levels; // empty container

    if (hasSubPaving()) {
        getSubPaving()->getLeafNodeLevels(levels, 0);
        //levels has now been filled in
    }
    return levels;
}

Get a string of the leaf node levels.

Root is level 0, next level down is 1, etc. Example return string "3,3,2,1"

Returns:
a comma separated string of leaf levels, left to right order
{
    string retValue = "";
    if (hasSubPaving())
        retValue = getSubPaving()->getLeafNodeLevelsString();

    return retValue;
}

Get the maximum "area" of the interval band over all the leaves of the subpaving managed by this.

The "area" of the interval band on a leaf is the diameter of the interval on the leaf multiplied by the volume of the box represented by the leaf.

Returns:
maximum area of the interval band over all the leaves of the subpaving managed by this.
Precondition:
This must have a subpaving to manage.
{
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(
    "FunctionEstimatorInterval::getMaximumAreaOfIntervalBand()");
  }
  
  IntervalMappedSPnode::ConstPtrs leaves;
  getSubPaving()->getConstLeaves(leaves);
  
  cxsc::real maxArea(0.0);
  
  for (IntervalMappedSPnode::ConstPtrsItr it = leaves.begin();
      it < leaves.end();
      ++it) {
        
    cxsc::real thisArea = (*it)->getIntervalAreaOnBox();
    if (thisArea > maxArea) maxArea = thisArea;
        
  }
  return maxArea;
  
}

Get the maximum interval tolerance over all the leaves of the subpaving managed by this.

For any leaf, the interval tolerance is the maximum distance between the ends of the interval on the leaf and the image of the midpoint of the leaf's box under the function being estimated by this.

Returns:
maximum interval tolerance over all the leaves of the subpaving managed by this.
Precondition:
This must have a subpaving to manage.
{
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(
    "FunctionEstimatorInterval::getMaximumIntervalTolerance()");
  }
  
  IntervalMappedSPnode::ConstPtrs leaves;
  getSubPaving()->getConstLeaves(leaves);
  
  cxsc::real maxTol(0.0);
  
  for (IntervalMappedSPnode::ConstPtrsItr it = leaves.begin();
      it < leaves.end();
      ++it) {
        
    interval thisRange = (*it)->getRange();
    real thisMidImage = fobj.imageMid((*it)->getBox());
    
    cxsc::real thisTol 
        = cxsc::max(cxsc::abs(Sup(thisRange) - thisMidImage),
          cxsc::abs(thisMidImage - Inf(thisRange)));
    if (thisTol > maxTol) maxTol = thisTol;
        
  }
  return maxTol;
  
}
cxsc::ivector FunctionEstimatorInterval::getRootBox ( ) const

Get the box of the subpaving managed by this.

Note:
with the present constructors, it is impossible for this to have a subpaving but for the subpaving to have no box.
Returns:
copy of the box of the subpaving managed by this.
Precondition:
hasSubPaving() == true.
{
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(
          "FunctionEstimatorInterval::getRootBox()");
  }
  return getSubPaving()->getBox();
}

Gets number of leaf nodes in the root paving.

Throws NullSubpavingPointer_Error is the subpaving that this manages is a NULL pointer.

Returns:
the total number of leaves in the subpaving managed.
Precondition:
hasSubPaving() == true.
{
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error("FunctionEstimatorInterval::getRootLeaves()");
    
  }
  return getSubPaving()->getNumberLeaves();
}

Get diameter of the interval image of the root box of the subpaving managed by this.

Returns:
copy of the box of the subpaving managed by this.
Precondition:
hasSubPaving() == true.
{
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(
          "FunctionEstimatorInterval::getRootRangeDiameter()");
  }
  return getSubPaving()->getRangeDiameter();
}

Get the total "area" of the interval bands estimating the function represented by this.

The "area" of the interval bands is calculated as the sum over all the leaves of the subpaving managed by this of the diameter of the interval on the leaf multiplied by the volume of the box represented by the leaf.

Returns:
total area of the interval band estimating the function.
Precondition:
This must have a subpaving to manage.
{
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(
    "FunctionEstimatorInterval::getTotalAreaOfIntervalBand()");
  }
  return getSubPaving()->getTotalLeafIntervalAreaOnBox();
}

Get whether this has a subpaving to manage.

Note:
with the present constructors, it is impossible for this to have a subpaving but for the subpaving to have no box.
Returns:
true if this has a subpaving to manage. false otherwise.
{
    return ( getSubPaving() != NULL );
}

Tighten up the range enclosures of the estimate.

Precondition:
This must have a subpaving to manage.
Postcondition:
The interval range on any non-leaf node of the subpaving managed by this will be the interval hull of the ranges of its children.
{
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(
    "FunctionEstimatorInterval::hullPropagation(...)");
  }
  getSubPaving()->hullPropagation();
}

Make a PiecewiseConstantFunction out of this estimator.

Returns:
A PiecewiseConstantFunction with the same subpaving as this with. The function values mapped onto each node of the for the returned object are the real mid-images under the function represented by this of the box associated with the node. The PiecewiseConstantFunction created will have the same label as this.
{
  /* make a rmspnode as a copy of this's subpaving and then put the 
  ranges on as the mid-image of the boxes */
  RealMappedSPnode tmp(*getSubPaving());
  
  RealEstimator estimator(fobj);
  tmp.acceptSPValueVisitor(estimator);
  return PiecewiseConstantFunction(tmp, getLabel());
}
void FunctionEstimatorInterval::outputLog ( const std::string &  s,
int  i,
int  prec = 5 
) const

Append current state of estimator to a txt log file.

Format is a tab-delimited file of numeric data. Output includes node contributions to unscaled EMP under COPERR and AIC and the changes in EMP that would result from splitting the node.

Parameters:
sthe name of the txt file to send output to.
ithe number of pass (ie, 0, 1, 2, 3 etc) in process.
precthe precision for output formatting. ie, number of decimal places.
{
    // To add output of the FunctionEstimatorInterval object to file
    ofstream os(s.c_str(), ios::app);         // append
    if (os.is_open()) {
    
    os << std::endl;
        os << "Pass " << i << std::endl; // numbering
        getSubPaving()->leavesOutputTabs(os, prec); // the output
    
    os.close();
    }
    else {
        std::cerr << "Error: could not open file named "
            << s << std::endl << std::endl;
    }
}
std::ostream & FunctionEstimatorInterval::outputRootToStreamTabs ( std::ostream &  os,
int  prec = 5 
) const

Output all nodes of the subpaving managed by this to a given stream.

Format is a tab-delimited data giving details of all nodes.

Parameters:
osis a reference to the stream to output to.
precthe precision for output formatting. ie, number of decimal places, defaulting to 5.
Returns:
a reference to the given stream.
{
  if (hasSubPaving()) {
    
    getSubPaving()->nodesAllOutput(os, 1, prec); // the output
    
  }
  
    return os;
}
void FunctionEstimatorInterval::outputRootToTxt ( const std::string &  s,
int  prec = 5 
) const

Output details of full sample (from root) to txt file.

Format is a mixture of alpha and numeric data.

Parameters:
sthe name of the txt file to send output to.
precthe precision for output formatting. ie, number of decimal places.
confirmis a boolean controlling whether confirmation goes to console output.
{
  outputRootToTxt(s, prec, false);
}
std::ostream & FunctionEstimatorInterval::outputToStreamTabs ( std::ostream &  os,
int  prec = 5 
) const

Output the subpaving managed by this to a given stream.

Format is a tab-delimited data giving details of leaf nodes.

Parameters:
osis a reference to the stream to output the histogramm to.
precthe precision for output formatting. ie, number of decimal places.
Returns:
a reference to the given stream.
{
  if (hasSubPaving()) {
    
    // have to use cxsc io manipulators
    os << cxsc::SaveOpt;
    os << cxsc::Variable << cxsc::SetPrecision(prec+2,prec);

    getSubPaving()->leavesOutputTabs(os); // the output
    
    os << cxsc::RestoreOpt;
  }
  
    return os;
}
void FunctionEstimatorInterval::outputToTxtTabs ( const std::string &  s,
int  prec = 5 
) const

Output the subpaving managed by this to a txt file.

Format is a tab-delimited file of numeric data starting with nodeName, then the node box volume, then the node counter, then the description of the node box as a tab-delimited list of interval upper and lower bounds.

Parameters:
sthe name of the txt file to send output to.
precthe precision for output formatting. ie, number of decimal places.
confirmis a boolean controlling whether confirmation goes to console output.
{
  outputToTxtTabs(s, prec, false);
}
bool FunctionEstimatorInterval::prioritySplitOnGain ( const FEIEvalObj fe,
LOGGING_LEVEL  logging,
long unsigned int  seed = 1234 
)

prioritySplitOnGain.

These methods takes an estimator and progressively split using a priority queue of splittable nodes to determine which node to split first. The ordering for the queue is referred to here as the measure.

The measure used is the reduction in the 'area' of the the interval band on each piece of the current function estimate resulting from a division of the piece into two halves. The area of a piece is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. The reduction of this area resulting from a split of a piece is (area of band for current piece) - (sum of areas of bands for the two halves of the piece that would result from dividing that piece in half). Pieces with the largest 'gain' measured in this way will be split first. Note that this does not provide a globally optimal ordering for the queue: this ordering only looks one step ahead and does not take account of large potential gains that might result from further bisections of the current pieces even if the first bisection does not provide a large immediate gain. In particular, if the function over some piece is completely symmetrical in two consecutive dimensions then the potential gain on bisection on the first of these will be calculated as 0 (because the interval enclosures of both halves of the current piece will be the same as the interval enclosure of the current piece itself). The bivariate Gaussian on a symmetric domain box provides a good example of this completely non-globally optimal behaviour.

Splitting continues until some criteria specified by the function evaluator fe (applying either to individual nodes or to the estimator as a whole) is satisfied, or there are no more splittable nodes.

Each node in the subpaving managed by this decides for itself whether it is splittable, using isSplittableNode().

If more than one splittable node is equally 'large', on the basis of the measure used, then a random choice is made between all equally large nodes to find the node which will be split.

The seed for the random number generator used for random selection between equally 'large' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple histograms, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time: use of the internally set seed will give the same results each time.

Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.

Throws an std::logic_error if the state of this does not 'legal', ie if this contains cherries that do not pass isSplittableNode().

Throws an std::logic_error if the split becomes muddled because of some failure within the logic of the algorithm itself.

Aborts if there are no splittable leaves left (or none at the start).

Parameters:
feis the function evaluator, an instance of a class which provides a criterion to determine when to stop splitting.
loggingan enum controlling whether estimator creation output is sent to a log file.
seedis a seed for the random number generator.
Returns:
true if the priority split was successful, false otherwise (returns false if the split could not be started , ie none of the leaves of the initial state is splittable, or if the split had to be aborted).
Precondition:
hasSubPaving() == true.
The state of this is legal ie this does not include cherries that should not have been split according to isSplittableNode().
Postcondition:
if the method returned true, then the state of this satisfies fe; if the method returned false then splitting had to be aborted before fe was satisfied.
{
  IntervalBandAreaGainOnSplitMeasurer measure(fobj);
  return prioritySplit(measure, fe, logging, seed);
}
bool FunctionEstimatorInterval::prioritySplitOnGain ( size_t  maxLeaves,
LOGGING_LEVEL  logging,
long unsigned int  seed = 1234 
)

prioritySplitOnGain.

These methods takes an estimator and progressively split using a priority queue of splittable nodes to determine which node to split first. The ordering for the queue is referred to here as the measure.

Note:
This overloading of the prioritySplitOnGain function is supplied because it is much more efficient to check for number of leaves directly than by using a function object.

The measure used is the reduction in the 'area' of the the interval band on each piece of the current function estimate resulting from a division of the piece into two halves. The area of a piece is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. The reduction of this area resulting from a split of a piece is (area of band for current piece) - (sum of areas of bands for the two halves of the piece that would result from dividing that piece in half). Pieces with the largest 'gain' measured in this way will be split first. Note that this does not provide a globally optimal ordering for the queue: this ordering only looks one step ahead and does not take account of large potential gains that might result from further bisections of the current pieces even if the first bisection does not provide a large immediate gain. In particular, if the function over some piece is completely symmetrical in two consecutive dimensions then the potential gain on bisection on the first of these will be calculated as 0 (because the interval enclosures of both halves of the current piece will be the same as the interval enclosure of the current piece itself). The bivariate Gaussian on a symmetric domain box provides a good example of this completely non-globally optimal behaviour.

Splitting continues until there are maxLeaves leaves, or there are no more splittable nodes.

Each node in the subpaving managed by this decides for itself whether it is splittable, using isSplittableNode().

If more than one splittable node is equally 'large', on the basis of the measure used, then a random choice is made between all equally large nodes to find the node which will be split.

The seed for the random number generator used for random selection between equally 'large' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple histograms, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time: use of the internally set seed will give the same results each time.

Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.

Throws an std::logic_error if the state of this does not 'legal', ie if this contains cherries that do not pass isSplittableNode().

Throws an std::logic_error if the split becomes muddled because of some failure within the logic of the algorithm itself.

Aborts if there are no splittable leaves left (or none at the start).

Parameters:
maxLeavesis number of leaves to aim for in the final estimator.
loggingan enum controlling whether estimator creation output is sent to a log file.
seedis a seed for the random number generator.
Returns:
true if the priority split was successful, false otherwise (returns false if the split could not be started , ie none of the leaves of the initial state is splittable, or if the split had to be aborted).
Precondition:
hasSubPaving() == true.
The state of this is legal ie this does not include cherries that should not have been split according to isSplittableNode().
Postcondition:
if the method returned true, then this has maxLeaves leaves; if the method returned false then splitting had to be aborted before the number of leaves reached maxLeaves.
{
  IntervalBandAreaGainOnSplitMeasurer measure(fobj);
  return prioritySplit(measure, maxLeaves, logging, seed);
}
bool FunctionEstimatorInterval::splitToShape ( std::string  instruction)

Split an estimator to a specified shape.

Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.

Throws a NoBox_Error if the subpaving box is empty.

Prints a message to the standard error output if the instruction could not be carried out.

Parameters:
instructionspecifies the required shape, eg "3, 3, 2, 1"
Returns:
true if the split was successful, false otherwise
Precondition:
hasSubPaving() == true.
Postcondition:
this has subpaving with shape specified by instruction and the value mapped to each node in the subpaving is the interval image of that node's box under the function represented by this.
{
  
  // checks:  is there a root paving, is the string properly formed?
  if (!hasSubPaving()) {
    throw NullSubpavingPointer_Error(
        "FunctionEstimatorInterval::splitToShape()");
  }
  bool success = false;
  IntervalMappedSPnode temp(*getSubPaving()); // copy to temp
  try {
    if (instruction.length() == 0) {
      throw std::invalid_argument(
        "FunctionEstimatorInterval::splitToShape() : No instruction");
    }

    std::string legal(", 0123456789");
    if (instruction.find_first_not_of(legal) != std::string::npos) {
      throw std::invalid_argument(
        "FunctionEstimatorInterval::splitToShape() : Illegal character");
    }

    // all seems to be okay, we can start splitting the root paving
    
    success = getSubPaving()->splitRootToShape(instruction);
    
    
    /* ALSO NEED to set the interval ranges using fobj */
    IntervalEstimator estimator(fobj);
    rootPaving->acceptSPValueVisitor(estimator);

    if (!success) {
      
      handleSplitToShapeError(temp);
     }
     
  }
  catch (std::invalid_argument const& ia) {
    cerr << ia.what() << endl;
    handleSplitToShapeError(temp);
    success = false;
  }
  catch (std::logic_error const& le) {
    cerr << le.what() << endl;
    handleSplitToShapeError(temp);
    success = false;
  }
  return success;
  // any other exceptions are unhandled
}

Get a string summary of this estimator's properties.

A string description of this. Includes the address of the subpaving managed but not details of that subpaving.

Returns:
the string summary.
{
  std::ostringstream oss;
  
  oss << "This address = " << (this) << endl;
  oss << "Reference to function object is  = " << (&fobj) << endl;
  
  if (hasSubPaving()) oss << "Address of subpaving is " << getSubPaving() << endl;
  else oss << "Subpaving is NULL" << endl;
  
  return oss.str();
}

The documentation for this class was generated from the following files:
 All Classes Namespaces Functions Variables Typedefs Enumerations Friends