A wrapper or manager for an IntervalMappedSPnode tree to be used for function estimation using interval enclosures over subpaving boxes. More...
Classes | |||||||||
class | IntervalBandAreaGainOnSplitMeasurer | ||||||||
class | IntervalBandAreaLossOnMergeMeasurer | ||||||||
class | NodePtrMeasurePair | ||||||||
Public Member Functions | |||||||||
FunctionEstimatorInterval (const ivector &v, const MappedFobj &f, int lab=0) | |||||||||
Initialised constructor. | |||||||||
FunctionEstimatorInterval (const SPnode &spn, const MappedFobj &f, int lab=0) | |||||||||
Initialised constructor. | |||||||||
FunctionEstimatorInterval (const FunctionEstimatorInterval &other) | |||||||||
Copy constructor. | |||||||||
~FunctionEstimatorInterval () | |||||||||
Destructor. | |||||||||
const MappedFobj & | getFobjReference () const | ||||||||
Get the reference to the function object used by this. | |||||||||
int | getLabel () const | ||||||||
Get the label. | |||||||||
void | setLabel (int lab) | ||||||||
Set the label. | |||||||||
bool | hasSubPaving () const | ||||||||
Get whether this has a subpaving to manage. | |||||||||
cxsc::ivector | getRootBox () const | ||||||||
Get the box of the subpaving managed by this. | |||||||||
cxsc::real | getRootRangeDiameter () const | ||||||||
Get diameter of the interval image of the root box of the subpaving managed by this. | |||||||||
int | getDimensions () const | ||||||||
get the dimensions of the subpaving this manages. | |||||||||
cxsc::real | getDomainVolume () const | ||||||||
get volume of the root box of the subpaving this manages. | |||||||||
size_t | getRootLeaves () const | ||||||||
Gets number of leaf nodes in the root paving. | |||||||||
IntVec | getLeafLevels () const | ||||||||
std::string | getLeafLevelsString () const | ||||||||
subpavings::PiecewiseConstantFunction | makePiecewiseConstantFunction () const | ||||||||
Make a PiecewiseConstantFunction out of this estimator. | |||||||||
void | bruteForceEstimate (cxsc::real tolerance) | ||||||||
Create an estimate of the function described by fobj using a depth-first brute force approach. | |||||||||
void | breadthFirstBruteForceEstimate (const SPCheckVisitor &checker, size_t maxLeaves, long unsigned int seed=1234) | ||||||||
Create an estimate of the described by fobj using a breadth-first brute force approach with a maximum number of leaves. | |||||||||
void | breadthFirstBruteForceEstimate (const SPCheckVisitor &nodeChecker, const FEIEvalObj &fe, long unsigned int seed=1234) | ||||||||
void | breadthFirstBruteForceEstimate (size_t maxLeaves, long unsigned int seed=1234) | ||||||||
void | breadthFirstBruteForceEstimate (const FEIEvalObj &fe, long unsigned int seed=1234) | ||||||||
void | hullPropagation () | ||||||||
Tighten up the range enclosures of the estimate. | |||||||||
bool | prioritySplitOnGain (const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
prioritySplitOnGain. | |||||||||
bool | prioritySplitOnGain (size_t maxLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
prioritySplitOnGain. | |||||||||
bool | priorityMerge (const IntervalMappedSPnode::Measurer &measure, size_t minLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Measure for ordering of priority queue supplied by the user and a seed for the random number generator supplied by the user. | |||||||||
std::ostream & | outputRootToStreamTabs (std::ostream &os, int prec=5) const | ||||||||
Output all nodes of the subpaving managed by this to a given stream. | |||||||||
void | outputLog (const std::string &s, int i, int prec=5) const | ||||||||
Append current state of estimator to a txt log file. | |||||||||
std::string | stringSummary () const | ||||||||
Get a string summary of this estimator's properties. | |||||||||
prioritySplit methods. | |||||||||
These methods takes an estimator and progressively split using a priority queue of splittable nodes to determine which node to split first. The ordering for the queue is referred to here as the measure.
The default measure used is the 'area' of the the interval band on each piece of the current function estimate, where the area is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. This is effectively the difference of the Reimann integral to the top of the interval enclosure of the box against the Reimann integral to the bottom of the interval enclosure of the box (the 'Reimann Difference'. Pieces with the largest areas will be split first. Note that the Reimann Difference does not provide a globally optimal ordering for the queue in the sense that it will not necessarily provide the tightest function enclosure for a given number of leaves. This measure may tend to prioritise the bisection of pieces with large volumes but small interval enclosures (ie large pieces where the function is almost flat) over pieces where the Reimann Difference is smaller but the function is sufficiently variable within the piece for there to be the potential for considerable tightening of the enclosure by bisection. Splitting continues until there are maxLeaves leaves, or there are no more splittable nodes. Each node in the subpaving managed by this decides for itself whether it is splittable, using isSplittableNode(). If more than one splittable node is equally 'large', on the basis of the measure used, then a random choice is made between all equally large nodes to find the node which will be split. The seed for the random number generator used for random selection between equally 'large' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple estimates, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time; use of the internally set seed will give the same results each time. Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer. Throws an std::logic_error if the state of this is not 'legal', ie if this contains cherries that do not pass isSplittableNode(). Throws an std::logic_error if the split becomes muddled because of some failure within the logic of the algorithm itself. Aborts if there are no splittable leaves left (or none at the start).
| |||||||||
bool | prioritySplit (const IntervalMappedSPnode::Measurer &measure, const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Measure for ordering of priority queue supplied by the user, and seed for the random number generator supplied by the user. | |||||||||
bool | prioritySplit (const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Uses the default measure (the Reimann difference for each piece of the function estimate) and a seed for the random number generator supplied by the user. | |||||||||
bool | prioritySplit (const IntervalMappedSPnode::Measurer &measure, size_t maxLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Version where measure is supplied by the user. | |||||||||
bool | prioritySplit (size_t maxLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Version using the default measure, the 'Reimann Difference'. | |||||||||
priorityMergeOnLoss methods. | |||||||||
These methods takes an estimator and progressively merge using a priority queue of mergeable nodes to determine which node to merge first. The ordering for the queue is referred to here as the measure. The measure used is the increase in the 'area' of the the interval band on each cherry piece of the current function estimate resulting from a merge of two halves of the piece. This is seen as a loss in the sense of being a loss in the tightness of the function enclosure, or an increase in the Reimann Difference. The area of a piece is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. The loss of tightness resulting from a merge of two halves of a piece is (area of band for merged piece) - (sum of areas of bands for the two halves of the piece). Pieces with the smallest 'loss' measured in this way will be merged first. Note that this does not provide a globally optimal ordering for the queue. Merging continues until some criteria specified by the function evaluator fe (applying either to individual nodes or to the estimator as a whole) is satisfied, or there are no more mergeable nodes. If more than one mergeable node is equally 'small', on the basis of the measure used, then a random choice is made between all equally small nodes to find the node which will be merged. The seed for the random number generator used for random selection between equally 'small' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple estimates, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time; use of the internally set seed will give the same results each time. Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer. Throws an std::logic_error if the state of this is not 'legal', ie if this contains cherries that do not pass isSplittableNode(). Aborts if there are no mergeable cherries left.
| |||||||||
bool | priorityMergeOnLoss (const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Uses the loss in tightness of the function enclosure as measured by the Reimann Difference on a merge as the priority queue ordering measure and a seed for the random number generator supplied by the user. | |||||||||
bool | priorityMergeOnLoss (size_t minLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Uses the loss in tightness of the function enclosure as measured by the Reimann Difference on a merge as the priority queue ordering measure and a seed for the random number generator supplied by the user. | |||||||||
priorityMerge methods. | |||||||||
These methods takes an estimator and progressively merge using a priority queue of mergeable nodes to determine which node to merge first. The ordering for the queue is referred to here as the measure. The default measure used is the 'area' of the the interval band on each piece of the current function estimate, where the area is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. This is effectively the difference of the Reimann integral to the top of the interval enclosure of the box against the Reimann integral to the bottom of the interval enclosure of the box (the 'Reimann Difference'. Children of cherry node pieces with the smallest areas will be merged first. Note that the Reimann Difference does not provide a globally optimal ordering for the queue. Merging continues until there are at most minLeaves nodes, or there are no more mergeable nodes. If more than one mergeable node is equally 'small', on the basis of the measure used, then a random choice is made between all equally small nodes to find the node which will be merged. The seed for the random number generator used for random selection between equally 'small' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple estimates, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time; use of the internally set seed will give the same results each time. Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer. Throws an std::logic_error if the state of this is not 'legal', ie if this contains cherries that do not pass isSplittableNode(). Aborts if there are no mergeable cherries left.
| |||||||||
bool | priorityMerge (const IntervalMappedSPnode::Measurer &measure, const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Measure for ordering of priority queue supplied by the user and a seed for the random number generator supplied by the user. | |||||||||
bool | priorityMerge (const FEIEvalObj &fe, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Uses the Reimann Difference of the cherries as the priority queue ordering measure and a seed for the random number generator supplied by the user. | |||||||||
bool | priorityMerge (size_t minLeaves, LOGGING_LEVEL logging, long unsigned int seed=1234) | ||||||||
Uses the Reimann Difference of the cherries as the priority queue ordering measure and a seed for the random number generator supplied by the user. | |||||||||
bool | splitToShape (std::string instruction) | ||||||||
Split an estimator to a specified shape. | |||||||||
cxsc::real | getTotalAreaOfIntervalBand () const | ||||||||
Get the total "area" of the interval bands estimating the function represented by this. | |||||||||
cxsc::real | getMaximumAreaOfIntervalBand () const | ||||||||
Get the maximum "area" of the interval band over all the leaves of the subpaving managed by this. | |||||||||
cxsc::real | getMaximumIntervalTolerance () const | ||||||||
Get the maximum interval tolerance over all the leaves of the subpaving managed by this. | |||||||||
std::ostream & | outputToStreamTabs (std::ostream &os, int prec=5) const | ||||||||
Output the subpaving managed by this to a given stream. | |||||||||
void | outputToTxtTabs (const std::string &s, int prec=5) const | ||||||||
Output the subpaving managed by this to a txt file. | |||||||||
void | outputToTxtTabs (const std::string &s, int prec, bool confirm) const | ||||||||
void | outputRootToTxt (const std::string &s, int prec=5) const | ||||||||
Output details of full sample (from root) to txt file. | |||||||||
void | outputRootToTxt (const std::string &s, int prec, bool confirm) const |
A wrapper or manager for an IntervalMappedSPnode tree to be used for function estimation using interval enclosures over subpaving boxes.
The box of the root of the subpaving managed by a FunctionEstimatorInterval can be thought of as the domain and the boxes of the leaves of the subpaving can be thought of as the pieces in the partition of that domain for the current function estimate. The interval on each piece is an interval enclosure of the function to be estimated over that piece.
FunctionEstimatorInterval::FunctionEstimatorInterval | ( | const ivector & | v, |
const MappedFobj & | f, | ||
int | lab = 0 |
||
) |
Initialised constructor.
Initialised with domain box.
Throws a MalconstructedBox_Error if the box is not suitable as the basis of a subpaving (eg, box has no dimensions, or the box has a thin interval on at least one dimension).
Ideal constructor when the support domain of the function is set a priori.
v | The box to use for the subpaving to be managed. |
f | The function to be estimated by this. |
lab | The label for this (defaults to 0). |
: rootPaving(NULL), fobj(f), label(lab) { try { // check the box here if (!checkBox(v)) { throw subpavings::MalconstructedBox_Error( "FunctionEstimatorInterval::FunctionEstimatorInterval(const ivector&, const MappedFobj&, int lab)"); } rootPaving = new IntervalMappedSPnode(v); IntervalEstimator estimator(fobj); rootPaving->acceptSPValueVisitor(estimator); } catch (exception const& e) { constructor_error_handler(); } }
FunctionEstimatorInterval::FunctionEstimatorInterval | ( | const SPnode & | spn, |
const MappedFobj & | f, | ||
int | lab = 0 |
||
) |
Initialised constructor.
Initialised with a subpaving.
spn | A subpaving to copy as the subpaving to be managed. |
f | The function to be estimated by this. |
lab | The label for this (defaults to 0). |
: rootPaving(NULL), fobj(f), label(lab) { try { // check spn has box if (spn.isEmpty()) { throw subpavings::NoBox_Error( "FunctionEstimatorInterval::FunctionEstimatorInterval(const SPnode&, MappedFobj&, int lab"); } rootPaving = new IntervalMappedSPnode(spn); IntervalEstimator estimator(fobj); rootPaving->acceptSPValueVisitor(estimator); } catch (exception const& e) { constructor_error_handler(); } }
void FunctionEstimatorInterval::breadthFirstBruteForceEstimate | ( | const SPCheckVisitor & | checker, |
size_t | maxLeaves, | ||
long unsigned int | seed = 1234 |
||
) |
Create an estimate of the described by fobj using a breadth-first brute force approach with a maximum number of leaves.
The over all aim to create an estimate of the described by fobj such that each of the leaf nodes of the subpaving managed by this meets whatever requirments are specified in checker. If this is not possible given maxLeaves, an estimate with at most maxLeaves leaves is created which is approximately as close as possible to the requirments of checker. It is only approximately as close as possible because although the method works across the current leaves of the same depth or level first before descending (ie level by level), if the number of leaves reaches maxLeaves mid-way through a 'level' the process will stop, leaving the remainder of the leaves of that level unvisited.
After the operation, the subpaving managed by this may have leaf nodes which do not meet the requirements specified in checker because the maximum number of leaves was was reached before those nodes were split.
This operation works breadth first, ie working from a leaf root it will successively split the root then one of the new children and then the other, and only then the children of the children. The child split first is randomly selected to avoid a strict left to right (or right to left) traversal of each level.
checker | an instance of type SPCheckVisitor which determines if a leaf node needs to be split further. |
maxLeaves | the maximum number of leaves the function estimate should have at the end of the process. |
seed | a seed for the random number generator used to determine which of a node's children is revisited first. |
{ string errorMsg( "FunctionEstimatorInterval::breadthFirstBruteForceEstimate(...)"); if (!hasSubPaving()) { throw NullSubpavingPointer_Error(errorMsg); } gsl_rng * r = NULL; try { NodeQueueT nq; _setupBreadthFirstQueue(nq, nodeChecker); r = gsl_rng_alloc (gsl_rng_mt19937); gsl_rng_set(r, seed); // actual number of leaves in subpaving size_t numLeaves = getRootLeaves(); #ifdef MYDEBUGBFMIN clock_t start = clock(); #endif IntervalEstimator estimator(fobj); /* split until there are no more leaves needing splitting or there are enough leaves */ while (!nq.empty() && numLeaves < maxLeaves) { _breadthFirstQueueLoop( nodeChecker, nq, estimator, r); /* actual number of leaves has increased by one irrespective of whether any have actually gone into the queue*/ numLeaves++; #ifdef MYDEBUGBFMIN if (numLeaves%10000 == 0) { double time = static_cast<double>(clock()-start)/CLOCKS_PER_SEC; cout << numLeaves << "\t" << time << "\t" << getTotalAreaOfIntervalBand() << endl; start = clock(); } #endif #ifdef MYDEBUGBF cout << "current estimator is " << endl; outputToStreamTabs(cout, 10); cout << "getTotalAreaOfIntervalBand() = " << (getTotalAreaOfIntervalBand()) << "\n\n" << endl; #endif }// loop try { if (NULL != r) { gsl_rng_free(r); r = NULL; } // free the random number generator } catch (exception const& ee) {} // catch and swallow #ifdef MYDEBUGBF cout << "\nEnd of queue " << endl; if (nq.empty()) cout << "ran out of nodes to expand" << endl; else cout << "number of leaves is " << getRootLeaves() << endl; #endif } catch (exception const& e) { try { if (NULL != r) { gsl_rng_free(r); r = NULL; } // free the random number generator } catch (exception const& ee) {} // catch and swallow throw; // rethrow original exception } }
void FunctionEstimatorInterval::bruteForceEstimate | ( | cxsc::real | tolerance | ) |
Create an estimate of the function described by fobj using a depth-first brute force approach.
The over all aim to create an estimate of the described by fobj such that each of the leaf nodes of the subpaving managed by this has a box associated with it that meets the interval image tolerance requirement specified by tolerance.
The interval image tolerance requirement is that the diameter of the interval image of the box associated with any leaf node of of the subpaving managed by this should be less than or equal to the tolerance.
But in some cases this can result in boxes being split beyond the limits of the real number screen. If a split of a leaf node that does not meet the interval image tolerance requirement would result in the child nodes' volumes being < the value of the cxsc::MinReal (the smallest representable real number) then that leaf will not be split.
This means that after the operation, the subpaving managed by this may have leaf nodes where the interval image tolerance requirement is not met.
This operation works depth first, ie working from a leaf root it will successively split the root then the root's left child and that child's, left child, etc. It will only start splitting the right child of any node already split when all the descendents of the left child meet the interval image tolerance requirement (or they cannot be split further). This may cause the machine to run out of memory before the process is completed.
tolerance | describes the tolerance to be used in making the estimate. |
{ string errorMsg( "FunctionEstimatorInterval::bruteForceEstimate(cxsc::real)"); if (!hasSubPaving()) { throw NullSubpavingPointer_Error(errorMsg); } IntervalExpanderEstimator estimator(fobj, tolerance); getSubPaving()->acceptSPExpandVisitor(estimator); }
int FunctionEstimatorInterval::getDimensions | ( | ) | const |
get the dimensions of the subpaving this manages.
{ int retValue = 0; if (hasSubPaving()) { retValue = getSubPaving()->getDimension(); } return retValue; }
cxsc::real FunctionEstimatorInterval::getDomainVolume | ( | ) | const |
get volume of the root box of the subpaving this manages.
{ real retValue(0.0); if (hasSubPaving()) { retValue = getSubPaving()->nodeRealVolume(); } return retValue; }
const MappedFobj & FunctionEstimatorInterval::getFobjReference | ( | ) | const |
Get the reference to the function object used by this.
{return fobj;}
int FunctionEstimatorInterval::getLabel | ( | ) | const |
Get the label.
{return label;}
Get a vector of the leaf node levels.
Root is level 0, next level down is 1, etc.
{ IntVec levels; // empty container if (hasSubPaving()) { getSubPaving()->getLeafNodeLevels(levels, 0); //levels has now been filled in } return levels; }
std::string FunctionEstimatorInterval::getLeafLevelsString | ( | ) | const |
Get a string of the leaf node levels.
Root is level 0, next level down is 1, etc. Example return string "3,3,2,1"
{ string retValue = ""; if (hasSubPaving()) retValue = getSubPaving()->getLeafNodeLevelsString(); return retValue; }
cxsc::real FunctionEstimatorInterval::getMaximumAreaOfIntervalBand | ( | ) | const |
Get the maximum "area" of the interval band over all the leaves of the subpaving managed by this.
The "area" of the interval band on a leaf is the diameter of the interval on the leaf multiplied by the volume of the box represented by the leaf.
{ if (!hasSubPaving()) { throw NullSubpavingPointer_Error( "FunctionEstimatorInterval::getMaximumAreaOfIntervalBand()"); } IntervalMappedSPnode::ConstPtrs leaves; getSubPaving()->getConstLeaves(leaves); cxsc::real maxArea(0.0); for (IntervalMappedSPnode::ConstPtrsItr it = leaves.begin(); it < leaves.end(); ++it) { cxsc::real thisArea = (*it)->getIntervalAreaOnBox(); if (thisArea > maxArea) maxArea = thisArea; } return maxArea; }
cxsc::real FunctionEstimatorInterval::getMaximumIntervalTolerance | ( | ) | const |
Get the maximum interval tolerance over all the leaves of the subpaving managed by this.
For any leaf, the interval tolerance is the maximum distance between the ends of the interval on the leaf and the image of the midpoint of the leaf's box under the function being estimated by this.
{ if (!hasSubPaving()) { throw NullSubpavingPointer_Error( "FunctionEstimatorInterval::getMaximumIntervalTolerance()"); } IntervalMappedSPnode::ConstPtrs leaves; getSubPaving()->getConstLeaves(leaves); cxsc::real maxTol(0.0); for (IntervalMappedSPnode::ConstPtrsItr it = leaves.begin(); it < leaves.end(); ++it) { interval thisRange = (*it)->getRange(); real thisMidImage = fobj.imageMid((*it)->getBox()); cxsc::real thisTol = cxsc::max(cxsc::abs(Sup(thisRange) - thisMidImage), cxsc::abs(thisMidImage - Inf(thisRange))); if (thisTol > maxTol) maxTol = thisTol; } return maxTol; }
cxsc::ivector FunctionEstimatorInterval::getRootBox | ( | ) | const |
Get the box of the subpaving managed by this.
{ if (!hasSubPaving()) { throw NullSubpavingPointer_Error( "FunctionEstimatorInterval::getRootBox()"); } return getSubPaving()->getBox(); }
size_t FunctionEstimatorInterval::getRootLeaves | ( | ) | const |
Gets number of leaf nodes in the root paving.
Throws NullSubpavingPointer_Error is the subpaving that this manages is a NULL pointer.
{ if (!hasSubPaving()) { throw NullSubpavingPointer_Error("FunctionEstimatorInterval::getRootLeaves()"); } return getSubPaving()->getNumberLeaves(); }
cxsc::real FunctionEstimatorInterval::getRootRangeDiameter | ( | ) | const |
Get diameter of the interval image of the root box of the subpaving managed by this.
{ if (!hasSubPaving()) { throw NullSubpavingPointer_Error( "FunctionEstimatorInterval::getRootRangeDiameter()"); } return getSubPaving()->getRangeDiameter(); }
cxsc::real FunctionEstimatorInterval::getTotalAreaOfIntervalBand | ( | ) | const |
Get the total "area" of the interval bands estimating the function represented by this.
The "area" of the interval bands is calculated as the sum over all the leaves of the subpaving managed by this of the diameter of the interval on the leaf multiplied by the volume of the box represented by the leaf.
{ if (!hasSubPaving()) { throw NullSubpavingPointer_Error( "FunctionEstimatorInterval::getTotalAreaOfIntervalBand()"); } return getSubPaving()->getTotalLeafIntervalAreaOnBox(); }
bool FunctionEstimatorInterval::hasSubPaving | ( | ) | const |
Get whether this has a subpaving to manage.
{
return ( getSubPaving() != NULL );
}
Tighten up the range enclosures of the estimate.
{ if (!hasSubPaving()) { throw NullSubpavingPointer_Error( "FunctionEstimatorInterval::hullPropagation(...)"); } getSubPaving()->hullPropagation(); }
subpavings::PiecewiseConstantFunction FunctionEstimatorInterval::makePiecewiseConstantFunction | ( | ) | const |
Make a PiecewiseConstantFunction out of this estimator.
{ /* make a rmspnode as a copy of this's subpaving and then put the ranges on as the mid-image of the boxes */ RealMappedSPnode tmp(*getSubPaving()); RealEstimator estimator(fobj); tmp.acceptSPValueVisitor(estimator); return PiecewiseConstantFunction(tmp, getLabel()); }
void FunctionEstimatorInterval::outputLog | ( | const std::string & | s, |
int | i, | ||
int | prec = 5 |
||
) | const |
Append current state of estimator to a txt log file.
Format is a tab-delimited file of numeric data. Output includes node contributions to unscaled EMP under COPERR and AIC and the changes in EMP that would result from splitting the node.
s | the name of the txt file to send output to. |
i | the number of pass (ie, 0, 1, 2, 3 etc) in process. |
prec | the precision for output formatting. ie, number of decimal places. |
{ // To add output of the FunctionEstimatorInterval object to file ofstream os(s.c_str(), ios::app); // append if (os.is_open()) { os << std::endl; os << "Pass " << i << std::endl; // numbering getSubPaving()->leavesOutputTabs(os, prec); // the output os.close(); } else { std::cerr << "Error: could not open file named " << s << std::endl << std::endl; } }
std::ostream & FunctionEstimatorInterval::outputRootToStreamTabs | ( | std::ostream & | os, |
int | prec = 5 |
||
) | const |
Output all nodes of the subpaving managed by this to a given stream.
Format is a tab-delimited data giving details of all nodes.
os | is a reference to the stream to output to. |
prec | the precision for output formatting. ie, number of decimal places, defaulting to 5. |
{ if (hasSubPaving()) { getSubPaving()->nodesAllOutput(os, 1, prec); // the output } return os; }
void FunctionEstimatorInterval::outputRootToTxt | ( | const std::string & | s, |
int | prec = 5 |
||
) | const |
Output details of full sample (from root) to txt file.
Format is a mixture of alpha and numeric data.
s | the name of the txt file to send output to. |
prec | the precision for output formatting. ie, number of decimal places. |
confirm | is a boolean controlling whether confirmation goes to console output. |
{ outputRootToTxt(s, prec, false); }
std::ostream & FunctionEstimatorInterval::outputToStreamTabs | ( | std::ostream & | os, |
int | prec = 5 |
||
) | const |
Output the subpaving managed by this to a given stream.
Format is a tab-delimited data giving details of leaf nodes.
os | is a reference to the stream to output the histogramm to. |
prec | the precision for output formatting. ie, number of decimal places. |
{ if (hasSubPaving()) { // have to use cxsc io manipulators os << cxsc::SaveOpt; os << cxsc::Variable << cxsc::SetPrecision(prec+2,prec); getSubPaving()->leavesOutputTabs(os); // the output os << cxsc::RestoreOpt; } return os; }
void FunctionEstimatorInterval::outputToTxtTabs | ( | const std::string & | s, |
int | prec = 5 |
||
) | const |
Output the subpaving managed by this to a txt file.
Format is a tab-delimited file of numeric data starting with nodeName, then the node box volume, then the node counter, then the description of the node box as a tab-delimited list of interval upper and lower bounds.
s | the name of the txt file to send output to. |
prec | the precision for output formatting. ie, number of decimal places. |
confirm | is a boolean controlling whether confirmation goes to console output. |
{ outputToTxtTabs(s, prec, false); }
bool FunctionEstimatorInterval::prioritySplitOnGain | ( | const FEIEvalObj & | fe, |
LOGGING_LEVEL | logging, | ||
long unsigned int | seed = 1234 |
||
) |
prioritySplitOnGain.
These methods takes an estimator and progressively split using a priority queue of splittable nodes to determine which node to split first. The ordering for the queue is referred to here as the measure.
The measure used is the reduction in the 'area' of the the interval band on each piece of the current function estimate resulting from a division of the piece into two halves. The area of a piece is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. The reduction of this area resulting from a split of a piece is (area of band for current piece) - (sum of areas of bands for the two halves of the piece that would result from dividing that piece in half). Pieces with the largest 'gain' measured in this way will be split first. Note that this does not provide a globally optimal ordering for the queue: this ordering only looks one step ahead and does not take account of large potential gains that might result from further bisections of the current pieces even if the first bisection does not provide a large immediate gain. In particular, if the function over some piece is completely symmetrical in two consecutive dimensions then the potential gain on bisection on the first of these will be calculated as 0 (because the interval enclosures of both halves of the current piece will be the same as the interval enclosure of the current piece itself). The bivariate Gaussian on a symmetric domain box provides a good example of this completely non-globally optimal behaviour.
Splitting continues until some criteria specified by the function evaluator fe (applying either to individual nodes or to the estimator as a whole) is satisfied, or there are no more splittable nodes.
Each node in the subpaving managed by this decides for itself whether it is splittable, using isSplittableNode().
If more than one splittable node is equally 'large', on the basis of the measure used, then a random choice is made between all equally large nodes to find the node which will be split.
The seed for the random number generator used for random selection between equally 'large' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple histograms, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time: use of the internally set seed will give the same results each time.
Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.
Throws an std::logic_error if the state of this does not 'legal', ie if this contains cherries that do not pass isSplittableNode().
Throws an std::logic_error if the split becomes muddled because of some failure within the logic of the algorithm itself.
Aborts if there are no splittable leaves left (or none at the start).
fe | is the function evaluator, an instance of a class which provides a criterion to determine when to stop splitting. |
logging | an enum controlling whether estimator creation output is sent to a log file. |
seed | is a seed for the random number generator. |
{ IntervalBandAreaGainOnSplitMeasurer measure(fobj); return prioritySplit(measure, fe, logging, seed); }
bool FunctionEstimatorInterval::prioritySplitOnGain | ( | size_t | maxLeaves, |
LOGGING_LEVEL | logging, | ||
long unsigned int | seed = 1234 |
||
) |
prioritySplitOnGain.
These methods takes an estimator and progressively split using a priority queue of splittable nodes to determine which node to split first. The ordering for the queue is referred to here as the measure.
The measure used is the reduction in the 'area' of the the interval band on each piece of the current function estimate resulting from a division of the piece into two halves. The area of a piece is the diameter of the interval image of the piece multiplied by the area of the piece (leaf) box. The reduction of this area resulting from a split of a piece is (area of band for current piece) - (sum of areas of bands for the two halves of the piece that would result from dividing that piece in half). Pieces with the largest 'gain' measured in this way will be split first. Note that this does not provide a globally optimal ordering for the queue: this ordering only looks one step ahead and does not take account of large potential gains that might result from further bisections of the current pieces even if the first bisection does not provide a large immediate gain. In particular, if the function over some piece is completely symmetrical in two consecutive dimensions then the potential gain on bisection on the first of these will be calculated as 0 (because the interval enclosures of both halves of the current piece will be the same as the interval enclosure of the current piece itself). The bivariate Gaussian on a symmetric domain box provides a good example of this completely non-globally optimal behaviour.
Splitting continues until there are maxLeaves leaves, or there are no more splittable nodes.
Each node in the subpaving managed by this decides for itself whether it is splittable, using isSplittableNode().
If more than one splittable node is equally 'large', on the basis of the measure used, then a random choice is made between all equally large nodes to find the node which will be split.
The seed for the random number generator used for random selection between equally 'large' nodes can be specified by the user or set by this. If you are looking at distributions of results across multiple histograms, supply the random number generator seed to the priority queue to ensure that each estimator will make different random choices each time: use of the internally set seed will give the same results each time.
Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.
Throws an std::logic_error if the state of this does not 'legal', ie if this contains cherries that do not pass isSplittableNode().
Throws an std::logic_error if the split becomes muddled because of some failure within the logic of the algorithm itself.
Aborts if there are no splittable leaves left (or none at the start).
maxLeaves | is number of leaves to aim for in the final estimator. |
logging | an enum controlling whether estimator creation output is sent to a log file. |
seed | is a seed for the random number generator. |
{ IntervalBandAreaGainOnSplitMeasurer measure(fobj); return prioritySplit(measure, maxLeaves, logging, seed); }
bool FunctionEstimatorInterval::splitToShape | ( | std::string | instruction | ) |
Split an estimator to a specified shape.
Throws a NullSubpavings_Error if the subpaving that this manages is a NULL pointer.
Throws a NoBox_Error if the subpaving box is empty.
Prints a message to the standard error output if the instruction could not be carried out.
instruction | specifies the required shape, eg "3, 3, 2, 1" |
{ // checks: is there a root paving, is the string properly formed? if (!hasSubPaving()) { throw NullSubpavingPointer_Error( "FunctionEstimatorInterval::splitToShape()"); } bool success = false; IntervalMappedSPnode temp(*getSubPaving()); // copy to temp try { if (instruction.length() == 0) { throw std::invalid_argument( "FunctionEstimatorInterval::splitToShape() : No instruction"); } std::string legal(", 0123456789"); if (instruction.find_first_not_of(legal) != std::string::npos) { throw std::invalid_argument( "FunctionEstimatorInterval::splitToShape() : Illegal character"); } // all seems to be okay, we can start splitting the root paving success = getSubPaving()->splitRootToShape(instruction); /* ALSO NEED to set the interval ranges using fobj */ IntervalEstimator estimator(fobj); rootPaving->acceptSPValueVisitor(estimator); if (!success) { handleSplitToShapeError(temp); } } catch (std::invalid_argument const& ia) { cerr << ia.what() << endl; handleSplitToShapeError(temp); success = false; } catch (std::logic_error const& le) { cerr << le.what() << endl; handleSplitToShapeError(temp); success = false; } return success; // any other exceptions are unhandled }
std::string FunctionEstimatorInterval::stringSummary | ( | ) | const |
Get a string summary of this estimator's properties.
A string description of this. Includes the address of the subpaving managed but not details of that subpaving.
{ std::ostringstream oss; oss << "This address = " << (this) << endl; oss << "Reference to function object is = " << (&fobj) << endl; if (hasSubPaving()) oss << "Address of subpaving is " << getSubPaving() << endl; else oss << "Subpaving is NULL" << endl; return oss.str(); }