imputers module#
Apply imputation models to impute curves in given dataframe |
|
|
Enables IterativeImputer |
Generates 3rd order polynomial regression models with the DEPTH column as the target y variable and each curve in the provided curves keyword argument as the x variable (i.e. |
|
Imputation of curves based on polynomial regression models of the curve based on DEPTH |
|
Determines whether an individual or global model would be best for a given list of curves to check and generates individual models if the checks are passed. |
|
Imputes all numerical columns with sklearn's iterative imputer using a Bayesian Ridge as the estimator. |
|
|
Imputes missing values in specified columns with sklearn's SimpleImputer using the mean strategy for numeric columns and the most_frequent strategy for categorical columns |
- akerbp.mlpet.imputers.simple_impute(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame [source]#
Imputes missing values in specified columns with sklearn’s SimpleImputer using the mean strategy for numeric columns and the most_frequent strategy for categorical columns
- Parameters
df (pd.DataFrame) – dataframe with columns to impute
- Keyword Arguments
categorical_curves – List of column names that should be considered as categorical. If not provided, defaults to trying to determine these using the get_col_types utility function
depth_column – The name of the depth column to be excluded from imputation if desired. Defaults to None
- Returns
- dataframe with imputed values and a dictionary containing the
fitted imputers
- Return type
tuple
- akerbp.mlpet.imputers.iterative_impute(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame [source]#
Imputes all numerical columns with sklearn’s iterative imputer using a Bayesian Ridge as the estimator.
- Parameters
df (pd.DataFrame) – dataframe with columns to impute
- Keyword Arguments
imputer (_BaseImputer, optional) – This kwarg is NOT YET IMPLEMENTED. Defaults to None.
depth_column – The name of the depth column to be excluded from imputation if desired. Defaults to None
- Returns
dataframe with imputed values
- Return type
pd.DataFrame
- akerbp.mlpet.imputers.generate_imputation_models(df: pandas.core.frame.DataFrame, **kwargs) Dict[str, Dict[str, Any]] [source]#
Generates 3rd order polynomial regression models with the DEPTH column as the target y variable and each curve in the provided curves keyword argument as the x variable (i.e. a model per curve).
- Parameters
df (pd.DataFrame) – dataframe to get data
- Keywords Args:
- curves (list): list of curves names to generate models for. If this
argument is not provided, no models are generated because it defaults to an empty list.
depth_column: the curve that indicates the depth
- Returns
dictionary with models for each curve based on DEPTH
- Return type
dict
- akerbp.mlpet.imputers.individual_imputation_models(df: pandas.core.frame.DataFrame, **kwargs) Dict[str, Dict[str, Any]] [source]#
Determines whether an individual or global model would be best for a given list of curves to check and generates individual models if the checks are passed. We check the percentage of missing data and the spread of actual data with some thresholds to decide if we should use an individual model. If the spread of the data is greater than 0.7 and the percentage of missing data is less than 60%, an individual model is created. These thresholds can be changed via the kwargs.
- Parameters
df (pd.DataFrame) – dataframe with data
- Keyword Arguments
curves (list) – list of curves to create individual models for provided they pass the relevant thresholds
imputation_models (dict) – models given for each curve (usually global models). If not provided, defaults to an empty dict
data_spread_threshold (float) – The data spread threshold that determines whether or not an individual model for the curve should be created.
missing_data_threshold (float) – The data spread threshold that determines whether or not an individual model for the curve should be created.
- Returns
- updated imputation models dictionary (if provided via kwargs)
with individual models replacing existing models (where applicable)
- Return type
dict
- akerbp.mlpet.imputers.apply_depth_trend_imputation(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame [source]#
Apply imputation models to impute curves in given dataframe
- Parameters
df (pd.DataFrame) – dataframe to which impute values
- Keyword Arguments
curves (list) – list of curves to apply the imputation to.
imputation_models (dict) – imputation models for each curve. If a model is not provided for each curve, a KeyError is raised
- Returns
dataframe with imputed values based on depth trend
- Return type
pd.DataFrame
- akerbp.mlpet.imputers.impute_depth_trend(df: pandas.core.frame.DataFrame, **kwargs) Union[pandas.core.frame.DataFrame, Tuple[pandas.core.frame.DataFrame, Dict[str, Any]]] [source]#
Imputation of curves based on polynomial regression models of the curve based on DEPTH
- Parameters
df (pd.DataFrame) – df to impute curves
- Keyword Arguments
curves_to_impute (list) – list of curves to depth impute
imputation_models (dict) – dictionary with curves as keys and the sklearn model as value
save_imputation_models (bool) – whether to save the models in the folder_path
folder_path (str) – The path to the folder where the imputation models should be saved.
allow_individual_models (bool) – whether to allow individual models if seen that it has enough data
so (to do) –
curves_mappings (dict) – A mapping dictionary to allow mapping curve names to more standardized names. Defaults to {} (ie. no standardization).
- Returns
- dataframe with curves imputed, and the imputations models that
were used to impute the curves stored in a dict
- Return type
tuple(pd.DataFrame, dict)
- akerbp.mlpet.imputers.fillna_callibration_values(df: pandas.core.frame.DataFrame, curves: List[str], calib_values: Dict[str, pandas.core.frame.DataFrame], level: str, id_column: str, standardize_level_names: bool = True) pandas.core.frame.DataFrame [source]#
Imputes missing values with values of closest wells. The values will be anything the user has chosen, eg mode, mean, median, which is the value in the calib_values given. Calib values can be acquired with the function utilities.get_calibration_values.
- Parameters
df (pd.DataFrame) – dataframe to impute
curves (List[str]) – curves to impute missing values
calib_values (Dict[str, pd.DataFrame]) – dictionary with keys being well id
level (str) –
level – grouping chosen by the user for the values (eg group/formation)
id_column (str) – well id name in df
standardize_level_names (bool optional) – whether to standardize formation
True. (or group names. Defaults to) –
- Returns
imputed dataframe
- Return type
pd.DataFrame