feature_engineering module#

akerbp.mlpet.feature_engineering.add_formations_and_groups(df, ...)

Adds a FORMATION AND GROUP column to the dataframe based on the well formation tops metadata and the depth in the column.

akerbp.mlpet.feature_engineering.add_gradient_features(df, ...)

Creates columns with gradient of curves.

akerbp.mlpet.feature_engineering.add_log_features(df, ...)

Creates columns with log10 of curves.

akerbp.mlpet.feature_engineering.add_petrophysical_features(df, ...)

Creates petrophysical features according to relevant heuristics/formulas.

akerbp.mlpet.feature_engineering.add_rolling_features(df, ...)

Creates columns with window/rolling features of curves.

akerbp.mlpet.feature_engineering.add_sequential_features(df, ...)

Adds n past values of columns (for sequential models modelling).

akerbp.mlpet.feature_engineering.calculate_AI(df)

Calculates AI from DEN and AC according to the following formula.

akerbp.mlpet.feature_engineering.calculate_CALI_BS(df)

Calculates CALI-BS assuming at least CALI is provided in the dataframe argument.

akerbp.mlpet.feature_engineering.calculate_FI(df)

Calculates FI from LFI according to the following formula.

akerbp.mlpet.feature_engineering.calculate_LFI(df)

Calculates LFI from NEU and DEN according to the following formula.

akerbp.mlpet.feature_engineering.calculate_LI(df)

Calculates LI from LFI according to the following formula.

akerbp.mlpet.feature_engineering.calculate_PR(df)

Calculates PR from VP and VS or ACS and AC (if VP and VS are not found) according to the following formula.

akerbp.mlpet.feature_engineering.calculate_RAVG(df)

Calculates RAVG from RDEP, RMED, RSHA according to the following formula.

akerbp.mlpet.feature_engineering.calculate_VPVS(df)

Calculates VPVS from ACS and AC according to the following formula.

akerbp.mlpet.feature_engineering.calculate_VSH(df, ...)

Calculates the VSH curve based off the GR curve and the type of formation defined in the GROUP column, as follows.

akerbp.mlpet.feature_engineering.guess_BS_from_CALI(df)

Guess bitsize from CALI, given the standard bitsizes

akerbp.mlpet.feature_engineering.add_log_features(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Creates columns with log10 of curves. All created columns are suffixed with ‘_log’. All negative values are set to zero and 1 is added to all values. In other words, this function is synonymous of numpy’s log1p.

Parameters

df (pd.DataFrame) – dataframe with columns to calculate log10 from

Keyword Arguments

log_features (list, optional) – list of column names for the columns that should be loggified. Defaults to None

Returns

New dataframe with calculated log columns

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.add_gradient_features(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Creates columns with gradient of curves. All created columns are suffixed with ‘_gradient’.

Parameters

df (pd.DataFrame) – dataframe with columns to calculate gradient from

Keyword Arguments

gradient_features (list, optional) – list of column names for the columns that gradient features should be calculated for. Defaults to None.

Returns

New dataframe with calculated gradient feature columns

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.add_rolling_features(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Creates columns with window/rolling features of curves. All created columns are suffixed with ‘_window_mean’ / ‘_window_max’ / ‘_window_min’.

Parameters

df (pd.DataFrame) – dataframe with columns to calculate rolling features from

Keyword Arguments
  • rolling_features (list) – columns to apply rolling features to. Defaults to None.

  • depth_column (str) – The name of the column to use to determine the sampling rate. Without this kwarg no rolling features are calculated.

  • window (float) – The window size to use for calculating the rolling features. The window size is defined in distance! The sampling rate is determined from the depth_column kwarg and used to transform the window size into an index based window. If this is not provided, no rolling features are calculated.

Returns

New dataframe with calculated rolling feature columns

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.add_sequential_features(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Adds n past values of columns (for sequential models modelling). All created columns are suffixed with ‘_1’ / ‘_2’ / … / ‘_n’.

Parameters

df (pd.DataFrame) – dataframe to add time features to

Keyword Arguments
  • sequential_features (list, optional) – columns to apply shifting to. Defaults to None.

  • shift_size (int, optional) – Size of the shifts to calculate. In other words, number of past values to include. If this is not provided, no sequential features are calculated.

Returns

New dataframe with sequential gradient columns

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.add_petrophysical_features(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Creates petrophysical features according to relevant heuristics/formulas.

The features created are as follows (each one can be toggled on/off via the ‘petrophysical_features’ kwarg):

- VPVS = ACS / AC
- PR = (VP ** 2 * 2 * VS ** 2) / (2 * (VP ** 2 * VS ** 2)) where
- VP = 304.8 / AC
- VS = 304.8 / ACS
- RAVG = AVG(RDEP, RMED, RSHA), if at least two of those are present
- LFI = 2.95 * ((NEU + 0.15) / 0.6) * DEN, and
    - LFI < *0.9 = 0
    - NaNs are filled with 0
- FI = (ABS(LFI) + LFI) / 2
- LI = ABS(ABS(LFI) * LFI) / 2
- AI = DEN * ((304.8 / AC) ** 2)
- CALI*BS = CALI * BS, where
    - BS is calculated using the guess_BS_from_CALI function from this
    module it is not found in the pass dataframe
- VSH = Refer to the calculate_VSH docstring for more info on this
Parameters

df (pd.DataFrame) – dataframe to which add features from and to

Keyword Arguments

petrophysical_features (list) – A list of all the petrophysical features that should be created (see above for all the potential features this method can create). This defaults to an empty list (i.e. no features created).

Returns

dataframe with added features

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.add_well_metadata(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Adds well metadata columns to the provided dataframe from the provided well metadata dictionary (kwarg)

Warning

This method will not work without the three kwargs listed below! It will return the df untouched and print a warning if kwargs are missing.

Parameters

df (pd.DataFrame) – The dataframe in which the well metadata columns will be added

Keyword Arguments
  • metadata_dict (dict) – The dictionary containing the relevant metadata per well (usually generated with the :py:meth: get_well_metadata <akerbp.mlpet.utilties.get_well_metadata> function).

  • metadata_columns (list) – List of metadata columns to add (each entry must correspond to a metadata key in the provided metadata_dict kwarg)

  • id_column (str) – The name of the column containing the well names (to be matched with the keys in the provided metadata_dict)

Returns

Return the passed dataframe with the requested columns added

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.add_formations_and_groups(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Adds a FORMATION AND GROUP column to the dataframe based on the well formation tops metadata and the depth in the column.

Note

This function requires several kwargs to be able to run. If they are not provided a warning is raised and instead the df is returned untouched.

Note

If the well is not found in formation_tops_mapping, the code will print a warning and continue to the next well.

Example

An example mapper dictionary that would classify all depths in WELL_A between 120 & 879 as NORDLAND GP and all depths between 879 and 2014 as HORDALAND GP, would look like this:

formation_tops_mapper = {
    "WELL_A": {
        "labels": [NORDLAND GP, HORDALAND GP],
        "levels": [120.0, 879.0, 2014.0]
    }
    ...
}

It can be generated by using the :py:meth: get_formation_tops <akerbp.mlpet.utilties.get_formation_tops> function

Parameters

df (pd.DataFrame) – The dataframe in which the formation tops label column should be added

Keyword Arguments
  • id_column (str) – The name of the column of well IDs

  • depth_column (str) – The name of the depth column to use for applying the mappings.

  • formation_tops_mapper (dict) –

    A dictionary mapping the well IDs to the formation tops labels, chronostrat and depth levels. For example:

    formation_tops_mapper = {
        "31/6-6": {
            "group_labels": ['Nordland Group', 'Hordaland Group', ...],
            "group_labels_chronostrat": ['Cenozoic', 'Paleogene', ...]
            "group_levels": [336.0, 531.0, 650.0, ...],
            "formation_labels": ['Balder Formation', 'Sele Formation', ...],
            "formation_labels_chronostrat": ['Eocene', 'Paleocene', ...],
            "formation_levels": [650.0, 798.0, 949.0, ...]
        }
        ...
    }
    

    The above example would classify all depths in well 31/6-6 between 336 & 531 to belong to the Nordland Group, and the corresponding chronostrat is the Cenozoic period. Depths between 650 and 798 are classified to belong to the Balder formation, which belongs to the Eocene period.

  • client (CogniteClient) – client to query CDF for formaiton tops if a mapping dictionary is not provided Defaults to None

Returns

dataframe with additional columns for FORMATION and GROUP

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.guess_BS_from_CALI(df: pandas.core.frame.DataFrame, standard_BS_values: Optional[List[float]] = None) pandas.core.frame.DataFrame[source]#

Guess bitsize from CALI, given the standard bitsizes

Parameters

df (pd.DataFrame) – dataframe to preprocess

Keyword Arguments

standard_BS_values (ndarray) –

Numpy array of standardized bitsizes to consider. Defaults to:

np.array([6, 8.5, 9.875, 12.25, 17.5, 26])

Returns

preprocessed dataframe

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_CALI_BS(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Calculates CALI-BS assuming at least CALI is provided in the dataframe argument. If BS is not provided, it is estimated using the guess_BS_from_CALI method from this module.

Parameters

df (pd.DataFrame) – The dataframe to which CALI-BS should be added.

Raises

ValueError – Raises an error if neither CALI nor BS are provided

Returns

Returns the dataframe with CALI-BS as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_AI(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Calculates AI from DEN and AC according to the following formula:

AI = DEN * ((304.8 / AC) ** 2)
Parameters

df (pd.DataFrame) – The dataframe to which AI should be added.

Raises

ValueError – Raises an error if neither DEN nor AC are provided

Returns

Returns the dataframe with AI as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_LI(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Calculates LI from LFI according to the following formula:

LI = ABS(ABS(LFI) - LFI) / 2

If LFI is not in the provided dataframe, it is calculated using the calculate_LFI method of this module.

Parameters

df (pd.DataFrame) – The dataframe to which LI should be added.

Raises

ValueError – Raises an error if neither NEU nor DEN or LFI are provided

Returns

Returns the dataframe with LI as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_FI(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Calculates FI from LFI according to the following formula:

FI = (ABS(LFI) + LFI) / 2

If LFI is not in the provided dataframe, it is calculated using the calculate_LFI method of this module.

Parameters

df (pd.DataFrame) – The dataframe to which FI should be added.

Raises

ValueError – Raises an error if neither NEU nor DEN or LFI are provided

Returns

Returns the dataframe with FI as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_LFI(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Calculates LFI from NEU and DEN according to the following formula:

LFI = 2.95 - ((NEU + 0.15) / 0.6) - DEN

where:

  • LFI < -0.9 = 0

  • NaNs are filled with 0

Parameters

df (pd.DataFrame) – The dataframe to which LFI should be added.

Raises

ValueError – Raises an error if neither NEU nor DEN are provided

Returns

Returns the dataframe with LFI as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_RAVG(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Calculates RAVG from RDEP, RMED, RSHA according to the following formula:

RAVG = AVG(RDEP, RMED, RSHA), if at least two of those are present
Parameters

df (pd.DataFrame) – The dataframe to which RAVG should be added.

Raises

ValueError – Raises an error if one or less resistivity curves are found in the provided dataframe

Returns

Returns the dataframe with RAVG as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_VPVS(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Calculates VPVS from ACS and AC according to the following formula:

VPVS = ACS / AC
Parameters

df (pd.DataFrame) – The dataframe to which VPVS should be added.

Raises

ValueError – Raises an error if neither ACS nor AC are found in the provided dataframe

Returns

Returns the dataframe with VPVS as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_PR(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Calculates PR from VP and VS or ACS and AC (if VP and VS are not found) according to the following formula:

PR = (VP ** 2 - 2 * VS ** 2) / (2 * (VP ** 2 - VS ** 2))

where:

  • VP = 304.8 / AC

  • VS = 304.8 / ACS

Parameters

df (pd.DataFrame) – The dataframe to which PR should be added.

Raises

ValueError – Raises an error if none of AC, ACS, VP or VS are found in the provided dataframe

Returns

Returns the dataframe with PR as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_VP(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Calculates VP (if AC is found) according to the following formula:

VP = 304.8 / AC
Parameters

df (pd.DataFrame) – The dataframe to which PR should be added.

Raises

ValueError – Raises an error if AC is not found in the provided dataframe

Returns

Returns the dataframe with VP as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_VS(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Calculates VS (if ACS is found) according to the following formula:

VS = 304.8 / ACS
Parameters

df (pd.DataFrame) – The dataframe to which PR should be added.

Raises

ValueError – Raises an error if ACS is not found in the provided dataframe

Returns

Returns the dataframe with VS as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.calculate_VSH(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Calculates the VSH curve based off the GR curve and the type of formation defined in the GROUP column, as follows:

VSH = (GR - GR_ss) / (GR_sh_Gp_f - GR_ss)

where:

  • GR_ss = The 5th quantile (quant_ss - value can be changed via the

    kwargs) of each defined system (some systems are grouped if relevant)

  • GR_sh_Gp_f = Shale formation groups are grouped by GROUP and a rolling

    window calculation is applied to each group (window size is determined by the ‘window’ kwarg and quantile is determined by the quant_sh kwarg - these default to 2500 and 0.95 respectively). A savgol filter of windowlength min(501, number_of_non_nans // 2) and polynomial order 3 is then applied to the rolling quantile group. Note that the filter is ONLY applied if there is enough non NaN data present in the rolling quantiles. This limit is currently set to 10. If after this filter is applied the group still has np.NaNs, linear interpolation is applied to fill the gaps (provided there is data that can be used to interpolate). GR_sh_Gp_f represents this final result for all groups.

Note

This calculation is performed per well! Formation tops column in input df is forced into upper case for generalization.

Warning

If a calculation fails for one well, the well will be skipped and calculation continuous for the next well.

Note

If no mapping could be made to the pre-defined systems, the GROUP will be labeled as ‘other’.

Parameters

df (pd.DataFrame) – The dataframe to which VSH should be added.

Keyword Arguments
  • groups_column_name (str) – The name of the column containing group names. Defaults to ‘GROUP’

  • formations_column_name (str) – The name of the column containing formation names. Defaults to ‘FORMATION’

  • id_column (str) – The name of the well ID column to use for grouping the dataset by well. Defaults to ‘well_name’

  • rolling_window_size (int) – The size of the window to use for the rolling quantile calculation of the shale formation groups. Defaults to 2000 or len(group_df) // 2 if less than 2000 where group_df is the dataframe for the specific shale formation group.

  • filter_window_size (int) – The size of the window to use for the savgol filtering. Defaults to 501 or odd(len(filter_series) // 2) if less than 501 where filter_series is the series of rolling quantiles to be filtered by the savgol filter. MUST be odd (if an even int is provided, the code automatically converts it to an odd window size)

  • quant_ss (float) – The quantile to use for each age group in the sand formation groups calculation (GR_ss). Defaults to 0.02

  • quant_sh (float) – The quantile to use in the rolling quantile calculation of the shale formation groups. Defaults to 0.95

  • NHR_ss_threshold (float) –

    The sand point threshold above which the Nordland, Hordaland & Rogaland (NHR) groups should be merged. The threshold is represented as the ratio between the group specific sandpoint (quant_ss) and the NHR system sand point (quant_ss calculated across all three groups - N, H & R). If this ratio is greater than this threshold the groups are merged according to the following strategy:

    1. Nordland’s sandpoint is set to Hordaland’s sandpoint. If there

      is no Hordaland group present in the well it falls back to being set to the NHR system sandpoint.

    2. Hordaland’s sandpoint is set to the average of Nordland and

      Rogaland’s sandpoints

    3. Rogaland’s sandpoint is set to Hordaland’s sandpoint. If there

      is no Hordaland group present in the well it falls back to being set to the NHR system sandpoint.

  • non_shale_window_threshold (float) –

    A threshold for the following ratio:

    NSWT = GR_ss / (GR_sh_Gp_f * (GR_sh_Gp_f - GR_ss))
    

    This threshold causes the VSH_AUTO calculation to linearly interpolate between local minimas in the GR_sh_Gp_f curve whenever the above ratio goes above the user provided threshold. Initial user testing suggests a threshold of 0.015 is a good starting point.

Returns

Returns the dataframe with VSH as a new column

Return type

pd.DataFrame

akerbp.mlpet.feature_engineering.add_vertical_depths(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Add vertical depths, i.e. TVDKB, TVDSS and TVDBML, to the input dataframe. This function relies on a keyword argument for a vertical depth mapper dictionary, created by querying CDF at discrete points along the wellbore for each well. To map the vertical depths along the entire wellbore, the data in the dictionary is interpolated by using the measured depth

Parameters

df (pd.DataFrame) – pandas dataframe to add vertical depths to

Keyword Arguments
  • md_column (str) – identifier for the measured depth column in the provided dataframe Defaults to None

  • id_column (str) – identifier for the well column in the provided dataframe Defaults to None

  • vertical_depths_mapper (dict) –

    dictionary containing vertical- and measured depths queried from CDF at discrete points along the wellbore for each well. For example:

    vertical_depths_mapper = {
        "25/6-2": {
            "TVDKB": [0.0, 145.0, 149.9998, ...],
            "TVDSS": [-26.0, 119.0, 123.9998, ...],
            "TVDBML": [-145.0, 0.0, 4.999799999999993, ...],
            "MD": [0.0, 145.0, 150.0, ...]
        }
    }
    

    Defaults to an empty dictionary, i.e. {}

  • client (CogniteClient) – client for querying vertical depths from CDF if a mapping dictionary is not provided Defaults to None

Returns

dataframe with additional column for TVDKB, TVDSS and TVDBML

Return type

pd.DataFrame