forpy  2
forpy::Tree Class Reference

The main tree class for the forpy framework. More...

#include <tree.h>

Inheritance diagram for forpy::Tree:
forpy::ClassificationTree forpy::RegressionTree

Public Member Functions

 Tree (const uint &max_depth=std::numeric_limits< uint >::max(), const uint &min_samples_at_leaf=1, const uint &min_samples_at_node=2, const std::shared_ptr< IDecider > &decider=nullptr, const std::shared_ptr< ILeaf > &leaf_manager=nullptr, const uint &random_seed=1)
 The standard constructor for the forpy trees. More...
 
 Tree (std::string filename)
 Deserialization constructor for the forpy trees. More...
 
void make_node (const IDataProvider *data_provider, Desk *d)
 Handle the creation of one tree node. More...
 
void DFS (const IDataProvider *data_provider, const ECompletionLevel &completion, Desk *d)
 Do one DFS step with given completion level. More...
 
void parallel_DFS (Desk *d, TodoMark &mark, IDataProvider *data_provider, const bool &finalize=true)
 
void DFS_and_store (Desk *d, TodoMark &mark, const IDataProvider *dprov, const ECompletionLevel &comp)
 
size_t get_depth () const
 
Treefit (const Data< MatCRef > &data_v, const Data< MatCRef > &annotation_v, const size_t &n_threads, const bool &complete_dfs=true, const std::vector< float > &weights=std::vector< float >())
 Standard fitting function. More...
 
Treefit_dprov (std::shared_ptr< IDataProvider > data_provider, const bool &complete_dfs=true)
 The fitting function for a single tree. More...
 
id_t predict_leaf (const Data< MatCRef > &data, const id_t &start_node=0, const std::function< void(void *)> &dptf=nullptr) const
 Get the leaf id of the leaf where the given data will arrive. More...
 
Data< Matpredict (const Data< MatCRef > &data_v, const int &num_threads=1, const bool &use_fast_prediction_if_available=true, const bool &predict_proba=false, const bool &for_forest=false)
 
Data< Matpredict_proba (const Data< MatCRef > &data_v, const int &num_threads=1, const bool &use_fast_prediction_if_available=true)
 Overload for consistency with the sklearn interface. More...
 
Data< Matpredict_leaf_result (const Data< MatCRef > &data, const id_t &start_node=0, const std::function< void(void *)> &dptf=nullptr) const
 Get the data prediction result for the given data. More...
 
Data< Matcombine_leaf_results (const std::vector< Data< Mat >> &leaf_results, const Vec< float > &weights=Vec< float >(), const bool &predict_proba=false) const
 
bool is_initialized () const
 Whether the trees fit method has been called and its DFS and BFS methods can now be used. More...
 
float get_weight () const
 The tree weight. More...
 
size_t get_n_nodes () const
 The number of tree nodes. More...
 
void set_weight (const float &new_weight)
 Sets the tree weight. More...
 
size_t get_input_data_dimensions () const
 The data dimension that is required by this tree. More...
 
std::shared_ptr< const IDeciderget_decider () const
 The classifier manager used by this tree. More...
 
std::shared_ptr< const ILeafget_leaf_manager () const
 The leaf manager used by this tree. More...
 
size_t get_samples_stored () const
 The number of samples stored in leafs. More...
 
const std::vector< std::pair< id_t, id_t > > get_tree () const
 
void enable_fast_prediction ()
 
void disable_fast_prediction ()
 
bool operator== (Tree const &rhs) const
 
void save (const std::string &filename) const
 Save the tree. More...
 

Private Member Functions

template<class Archive >
void serialize (Archive &ar, const uint &)
 
 DISALLOW_COPY_AND_ASSIGN (Tree)
 

Private Attributes

uint max_depth
 
bool is_initialized_for_training
 
unsigned int min_samples_at_node
 
unsigned int min_samples_at_leaf
 
float weight
 
std::atomic< size_t > stored_in_leafs
 
std::shared_ptr< IDeciderdecider
 
std::shared_ptr< ILeafleaf_manager
 
std::vector< std::pair< id_t, id_t > > tree
 
std::unique_ptr< mu::variant< std::vector< std::tuple< size_t, float, size_t, size_t > >, std::vector< std::tuple< size_t, double, size_t, size_t > >, std::vector< std::tuple< size_t, uint32_t, size_t, size_t > >, std::vector< std::tuple< size_t, uint8_t, size_t, size_t > > > > fast_tree
 
std::vector< std::future< void > > futures
 
std::mutex fut_mtx
 
std::atomic< id_tnext_id
 
uint random_seed
 

Friends

class forpy::Forest
 
class cereal::access
 
std::ostream & operator<< (std::ostream &stream, const Tree &self)
 

Detailed Description

The main tree class for the forpy framework.

This class is the core element of the framework. It can be used as a standalone tree or to form a forest.

Definition at line 36 of file tree.h.

Constructor & Destructor Documentation

◆ Tree() [1/2]

forpy::Tree::Tree ( const uint max_depth = std::numeric_limits< uint >::max(),
const uint min_samples_at_leaf = 1,
const uint min_samples_at_node = 2,
const std::shared_ptr< IDecider > &  decider = nullptr,
const std::shared_ptr< ILeaf > &  leaf_manager = nullptr,
const uint random_seed = 1 
)

The standard constructor for the forpy trees.

Parameters
max_depthuint > 0 The maximum tree depth, including leafs (up to including).
min_samples_at_leafuint > 0 The minimum number of samples at a leaf (from including).
min_samples_at_nodeuint>=2*min_samples_at_leaf The minimum number of samples at a node (from including).
deciderIDecider The decider that stores, optimizes and applies the decision rules for each inner tree node.
leaf_managerThe leaf manager generates, stores and handles the return values of the leaf nodes.
random_seeduint>0 Seed for the random engine.

◆ Tree() [2/2]

forpy::Tree::Tree ( std::string  filename)

Deserialization constructor for the forpy trees.

Parameters
filenamestring The filename to deserialize the tree from.

Member Function Documentation

◆ combine_leaf_results()

Data<Mat> forpy::Tree::combine_leaf_results ( const std::vector< Data< Mat >> &  leaf_results,
const Vec< float > &  weights = Vec<float>(),
const bool &  predict_proba = false 
) const
inline

Combine the leaf results of several trees to the forest result.

Definition at line 227 of file tree.h.

◆ DFS()

void forpy::Tree::DFS ( const IDataProvider data_provider,
const ECompletionLevel completion,
Desk d 
)

Do one DFS step with given completion level.

For CompletionLevel::Level, the branch of the tree below the currently marked node is completed.

The function is to be used within a thread (see forpy::Tree::parallel_DFS).

Parameters
data_providerforpy::IDataProvider* The data provider to use to get the samples with the relevant ids.
completionCompletionLevel The ECompletionLevel to reach before returning from the function.
dDesk Desk to use thread local memory from.

◆ DFS_and_store()

void forpy::Tree::DFS_and_store ( Desk d,
TodoMark mark,
const IDataProvider dprov,
const ECompletionLevel comp 
)

◆ disable_fast_prediction()

void forpy::Tree::disable_fast_prediction ( )
inline

Frees the memory from the unpacked trees for fast predictions.

Definition at line 297 of file tree.h.

◆ DISALLOW_COPY_AND_ASSIGN()

forpy::Tree::DISALLOW_COPY_AND_ASSIGN ( Tree  )
private

◆ enable_fast_prediction()

void forpy::Tree::enable_fast_prediction ( )

Unpack the hash maps for thresholds and feature IDs for fast predictions.

This only works for trees with threshold deciders and AlignedSurfaceCalcluators for the features. Requires more memory than the default trees, but is significantly faster.

◆ fit()

Tree* forpy::Tree::fit ( const Data< MatCRef > &  data_v,
const Data< MatCRef > &  annotation_v,
const size_t &  n_threads,
const bool &  complete_dfs = true,
const std::vector< float > &  weights = std::vector< float >() 
)

Standard fitting function.

Fits this tree to the data given by the data provider. If complete_dfs is true, the tree is completely fitted to the data Otherwise, just a node todo for the root node is added and the tree may be performed step-by-step by calling the BFS or DFS functions.

Releases the GIL in Python!

Parameters
data_vVariant of 2D array, col-major contiguous Col-wise data points.
annotation_vVariant of 2D array, row-major contiguous Row-wise annotations.
n_threadssize_t The number of threads to use. If set to 0, use all hardware threads.
complete_dfsbool If set to true, finishes training the tree. Otherwise, the training is just set up, and make_node must be called. Default: true.
weightsvector<float> A vector with positive weights for each sample or an empty vector.

◆ fit_dprov()

Tree* forpy::Tree::fit_dprov ( std::shared_ptr< IDataProvider data_provider,
const bool &  complete_dfs = true 
)

The fitting function for a single tree.

Fits this tree to the data given by the data provider. If complete_dfs is true, the tree is completely fitted to the data Otherwise, just a node todo for the root node is added and the tree may be performed step-by-step by calling the BFS or DFS functions.

Parameters
data_providershared(IDataProvider) The data provider for the fitting process.
complete_dfsbool If true, complete the fitting process.

◆ get_decider()

std::shared_ptr<const IDecider> forpy::Tree::get_decider ( ) const
inline

The classifier manager used by this tree.

Definition at line 265 of file tree.h.

◆ get_depth()

size_t forpy::Tree::get_depth ( ) const

Get the tree depth.

The depth is defined to be 0 for an "empty" tree (only a leaf/root node) and as the amount of edges on the longest path in the tree otherwise.

◆ get_input_data_dimensions()

size_t forpy::Tree::get_input_data_dimensions ( ) const
inline

The data dimension that is required by this tree.

Definition at line 258 of file tree.h.

◆ get_leaf_manager()

std::shared_ptr<const ILeaf> forpy::Tree::get_leaf_manager ( ) const
inline

The leaf manager used by this tree.

Definition at line 272 of file tree.h.

◆ get_n_nodes()

size_t forpy::Tree::get_n_nodes ( ) const
inline

The number of tree nodes.

Definition at line 248 of file tree.h.

◆ get_samples_stored()

size_t forpy::Tree::get_samples_stored ( ) const
inline

The number of samples stored in leafs.

Definition at line 279 of file tree.h.

◆ get_tree()

const std::vector<std::pair<id_t, id_t> > forpy::Tree::get_tree ( ) const
inline

Definition at line 281 of file tree.h.

◆ get_weight()

float forpy::Tree::get_weight ( ) const
inline

The tree weight.

Definition at line 243 of file tree.h.

◆ is_initialized()

bool forpy::Tree::is_initialized ( ) const
inline

Whether the trees fit method has been called and its DFS and BFS methods can now be used.

Definition at line 238 of file tree.h.

◆ make_node()

void forpy::Tree::make_node ( const IDataProvider data_provider,
Desk d 
)

Handle the creation of one tree node.

Takes the next one of the list of marked nodes and fits it to the data. If necessary, creates two child nodes and a split criterion, otherwise makes it a leaf.

The function is to be used within a thread (see forpy::Tree::parallel_DFS). It is marked const so as to avoid concurrent writes to member elements. Everything that is written to must be available in a forpy::Desk.

Parameters
data_providershared(IDataProvider) The data provider to use.
dDesk Desk to use thread local memory from.

◆ operator==()

bool forpy::Tree::operator== ( Tree const &  rhs) const

◆ parallel_DFS()

void forpy::Tree::parallel_DFS ( Desk d,
TodoMark mark,
IDataProvider data_provider,
const bool &  finalize = true 
)

◆ predict()

Data<Mat> forpy::Tree::predict ( const Data< MatCRef > &  data_v,
const int &  num_threads = 1,
const bool &  use_fast_prediction_if_available = true,
const bool &  predict_proba = false,
const bool &  for_forest = false 
)

Predicts new data points.

Releases the GIL in Python!

Parameters
data_vVariant of 2D data, row-major contiguous The data predict with one sample per row.
num_threadsint>0 The number of threads to use for prediction. The number of samples should be at least three times larger than the number of threads to observe good parallelization behavior. Currently disabled.
use_fast_prediction_if_availablebool If set to true (default), this will create a compressed version of the tree that has particularly favorable properties for fast access and use it for predictions. You can trigger the creation manually by calling Tree::enable_fast_prediction.
predict_probabool If enabled, will ask the leaf manager to provide probability information additionally to the prediction output.
for_forestbool If set to true, will create an intermediate result that can be fused to a whole forest result. Not relevant for end-users.

◆ predict_leaf()

id_t forpy::Tree::predict_leaf ( const Data< MatCRef > &  data,
const id_t start_node = 0,
const std::function< void(void *)> &  dptf = nullptr 
) const

Get the leaf id of the leaf where the given data will arrive.

Parameters
dataThe data to propagate through the tree.
start_nodeThe node to start from, doesn't have to be the root.
dptfFeature mapping function; disabled at the moment.
Returns
The node id of the leaf.

◆ predict_leaf_result()

Data<Mat> forpy::Tree::predict_leaf_result ( const Data< MatCRef > &  data,
const id_t start_node = 0,
const std::function< void(void *)> &  dptf = nullptr 
) const
inline

Get the data prediction result for the given data.

Definition at line 218 of file tree.h.

◆ predict_proba()

Data<Mat> forpy::Tree::predict_proba ( const Data< MatCRef > &  data_v,
const int &  num_threads = 1,
const bool &  use_fast_prediction_if_available = true 
)

Overload for consistency with the sklearn interface.

Tree::predict.

◆ save()

void forpy::Tree::save ( const std::string &  filename) const

Save the tree.

Parameters
filenamestring The filename of the file to store the tree in.

◆ serialize()

template<class Archive >
void forpy::Tree::serialize ( Archive &  ar,
const uint  
)
inlineprivate

Definition at line 322 of file tree.h.

◆ set_weight()

void forpy::Tree::set_weight ( const float &  new_weight)
inline

Sets the tree weight.

Definition at line 253 of file tree.h.

Friends And Related Function Documentation

◆ cereal::access

friend class cereal::access
friend

Definition at line 320 of file tree.h.

◆ forpy::Forest

friend class forpy::Forest
friend

Definition at line 316 of file tree.h.

◆ operator<<

std::ostream& operator<< ( std::ostream &  stream,
const Tree self 
)
friend

Definition at line 312 of file tree.h.

Member Data Documentation

◆ decider

std::shared_ptr<IDecider> forpy::Tree::decider
private

The associated classifier manager.

Definition at line 349 of file tree.h.

◆ fast_tree

std::unique_ptr< mu::variant<std::vector<std::tuple<size_t, float, size_t, size_t> >, std::vector<std::tuple<size_t, double, size_t, size_t> >, std::vector<std::tuple<size_t, uint32_t, size_t, size_t> >, std::vector<std::tuple<size_t, uint8_t, size_t, size_t> > > > forpy::Tree::fast_tree
private

Pointer to a structure that can be used for fast predictions.

Vector ids are node ids. The first value in the tuple is the threshold value at that node. If the first and second tuple elements are the same, they contain a leaf ID.

Definition at line 365 of file tree.h.

◆ fut_mtx

std::mutex forpy::Tree::fut_mtx
private

Definition at line 367 of file tree.h.

◆ futures

std::vector<std::future<void> > forpy::Tree::futures
private

Definition at line 366 of file tree.h.

◆ is_initialized_for_training

bool forpy::Tree::is_initialized_for_training
private

Whether the fit method has been called and the DFS and BFS methods can now be used for training.

Definition at line 339 of file tree.h.

◆ leaf_manager

std::shared_ptr<ILeaf> forpy::Tree::leaf_manager
private

The associated leaf manager.

Definition at line 351 of file tree.h.

◆ max_depth

uint forpy::Tree::max_depth
private

The maximum depth of the tree. Non-const for serialization purposes only.

Definition at line 328 of file tree.h.

◆ min_samples_at_leaf

unsigned int forpy::Tree::min_samples_at_leaf
private

The minimum number of samples that must arrive at a leaf.

Definition at line 343 of file tree.h.

◆ min_samples_at_node

unsigned int forpy::Tree::min_samples_at_node
private

The minimum number of samples that must arrive at an inner node.

Definition at line 341 of file tree.h.

◆ next_id

std::atomic<id_t> forpy::Tree::next_id
private

Definition at line 368 of file tree.h.

◆ random_seed

uint forpy::Tree::random_seed
private

Definition at line 369 of file tree.h.

◆ stored_in_leafs

std::atomic<size_t> forpy::Tree::stored_in_leafs
private

The amount of samples stored in leafs so far.

Definition at line 347 of file tree.h.

◆ tree

std::vector<std::pair<id_t, id_t> > forpy::Tree::tree
private

Holds the entire tree structure.

Definition at line 353 of file tree.h.

◆ weight

float forpy::Tree::weight
private

A weight assigned to this tree. Can be used by learning algorithms.

Definition at line 345 of file tree.h.


The documentation for this class was generated from the following file: