Forest covertypes¶
The samples in this dataset correspond to 30×30m patches of forest in the US, collected for the task of predicting each patch’s cover type, i.e. the dominant species of tree. There are seven covertypes, making this a multiclass classification problem. Each sample has 54 features, described on the dataset’s homepage. Some of the features are boolean indicators, while others are discrete or continuous measurements.
Data Set Characteristics:
Classes |
7 |
Samples total |
581012 |
Dimensionality |
54 |
Features |
int |
sklearn.datasets.fetch_covtype()
will load the covertype dataset;
it returns a dictionary-like “Bunch” object
with the feature matrix in the data
member
and the target values in target
. If optional argument “as_frame” is
set to “True”, it will return data
and target
as pandas
data frame, and there will be an additional member frame
as well.
The dataset will be downloaded from the web if necessary.