imputers module#

akerbp.mlpet.imputers.apply_depth_trend_imputation(df, ...)

Apply imputation models to impute curves in given dataframe

akerbp.mlpet.imputers.enable_iterative_imputer

Enables IterativeImputer

akerbp.mlpet.imputers.generate_imputation_models(df, ...)

Generates 3rd order polynomial regression models with the DEPTH column as the target y variable and each curve in the provided curves keyword argument as the x variable (i.e.

akerbp.mlpet.imputers.impute_depth_trend(df, ...)

Imputation of curves based on polynomial regression models of the curve based on DEPTH

akerbp.mlpet.imputers.individual_imputation_models(df, ...)

Determines whether an individual or global model would be best for a given list of curves to check and generates individual models if the checks are passed.

akerbp.mlpet.imputers.iterative_impute(df, ...)

Imputes all numerical columns with sklearn's iterative imputer using a Bayesian Ridge as the estimator.

akerbp.mlpet.imputers.simple_impute(df, **kwargs)

Imputes missing values in specified columns with sklearn's SimpleImputer using the mean strategy for numeric columns and the most_frequent strategy for categorical columns

akerbp.mlpet.imputers.simple_impute(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Imputes missing values in specified columns with sklearn’s SimpleImputer using the mean strategy for numeric columns and the most_frequent strategy for categorical columns

Parameters

df (pd.DataFrame) – dataframe with columns to impute

Keyword Arguments
  • categorical_curves – List of column names that should be considered as categorical. If not provided, defaults to trying to determine these using the get_col_types utility function

  • depth_column – The name of the depth column to be excluded from imputation if desired. Defaults to None

Returns

dataframe with imputed values and a dictionary containing the

fitted imputers

Return type

tuple

akerbp.mlpet.imputers.iterative_impute(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Imputes all numerical columns with sklearn’s iterative imputer using a Bayesian Ridge as the estimator.

Parameters

df (pd.DataFrame) – dataframe with columns to impute

Keyword Arguments
  • imputer (_BaseImputer, optional) – This kwarg is NOT YET IMPLEMENTED. Defaults to None.

  • depth_column – The name of the depth column to be excluded from imputation if desired. Defaults to None

Returns

dataframe with imputed values

Return type

pd.DataFrame

akerbp.mlpet.imputers.generate_imputation_models(df: pandas.core.frame.DataFrame, **kwargs) Dict[str, Dict[str, Any]][source]#

Generates 3rd order polynomial regression models with the DEPTH column as the target y variable and each curve in the provided curves keyword argument as the x variable (i.e. a model per curve).

Parameters

df (pd.DataFrame) – dataframe to get data

Keywords Args:
curves (list): list of curves names to generate models for. If this

argument is not provided, no models are generated because it defaults to an empty list.

depth_column: the curve that indicates the depth

Returns

dictionary with models for each curve based on DEPTH

Return type

dict

akerbp.mlpet.imputers.individual_imputation_models(df: pandas.core.frame.DataFrame, **kwargs) Dict[str, Dict[str, Any]][source]#

Determines whether an individual or global model would be best for a given list of curves to check and generates individual models if the checks are passed. We check the percentage of missing data and the spread of actual data with some thresholds to decide if we should use an individual model. If the spread of the data is greater than 0.7 and the percentage of missing data is less than 60%, an individual model is created. These thresholds can be changed via the kwargs.

Parameters

df (pd.DataFrame) – dataframe with data

Keyword Arguments
  • curves (list) – list of curves to create individual models for provided they pass the relevant thresholds

  • imputation_models (dict) – models given for each curve (usually global models). If not provided, defaults to an empty dict

  • data_spread_threshold (float) – The data spread threshold that determines whether or not an individual model for the curve should be created.

  • missing_data_threshold (float) – The data spread threshold that determines whether or not an individual model for the curve should be created.

Returns

updated imputation models dictionary (if provided via kwargs)

with individual models replacing existing models (where applicable)

Return type

dict

akerbp.mlpet.imputers.apply_depth_trend_imputation(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Apply imputation models to impute curves in given dataframe

Parameters

df (pd.DataFrame) – dataframe to which impute values

Keyword Arguments
  • curves (list) – list of curves to apply the imputation to.

  • imputation_models (dict) – imputation models for each curve. If a model is not provided for each curve, a KeyError is raised

Returns

dataframe with imputed values based on depth trend

Return type

pd.DataFrame

akerbp.mlpet.imputers.impute_depth_trend(df: pandas.core.frame.DataFrame, **kwargs) Union[pandas.core.frame.DataFrame, Tuple[pandas.core.frame.DataFrame, Dict[str, Any]]][source]#

Imputation of curves based on polynomial regression models of the curve based on DEPTH

Parameters

df (pd.DataFrame) – df to impute curves

Keyword Arguments
  • curves_to_impute (list) – list of curves to depth impute

  • imputation_models (dict) – dictionary with curves as keys and the sklearn model as value

  • save_imputation_models (bool) – whether to save the models in the folder_path

  • folder_path (str) – The path to the folder where the imputation models should be saved.

  • allow_individual_models (bool) – whether to allow individual models if seen that it has enough data

  • so (to do) –

  • curves_mappings (dict) – A mapping dictionary to allow mapping curve names to more standardized names. Defaults to {} (ie. no standardization).

Returns

dataframe with curves imputed, and the imputations models that

were used to impute the curves stored in a dict

Return type

tuple(pd.DataFrame, dict)

akerbp.mlpet.imputers.fillna_callibration_values(df: pandas.core.frame.DataFrame, curves: List[str], calib_values: Dict[str, pandas.core.frame.DataFrame], level: str, id_column: str, standardize_level_names: bool = True) pandas.core.frame.DataFrame[source]#

Imputes missing values with values of closest wells. The values will be anything the user has chosen, eg mode, mean, median, which is the value in the calib_values given. Calib values can be acquired with the function utilities.get_calibration_values.

Parameters
  • df (pd.DataFrame) – dataframe to impute

  • curves (List[str]) – curves to impute missing values

  • calib_values (Dict[str, pd.DataFrame]) – dictionary with keys being well id

  • level (str) –

  • level – grouping chosen by the user for the values (eg group/formation)

  • id_column (str) – well id name in df

  • standardize_level_names (bool optional) – whether to standardize formation

  • True. (or group names. Defaults to) –

Returns

imputed dataframe

Return type

pd.DataFrame