abacusai.feature_group

Module Contents

Classes

FeatureGroup

A feature group

class abacusai.feature_group.FeatureGroup(client, modificationLock=None, featureGroupId=None, name=None, featureGroupSourceType=None, tableName=None, sql=None, datasetId=None, functionSourceCode=None, functionName=None, sourceTables=None, createdAt=None, description=None, featureGroupType=None, useForTraining=None, sqlError=None, latestVersionOutdated=None, tags=None, primaryKey=None, updateTimestampKey=None, lookupKeys=None, featureGroupUse=None, isIncremental=None, mergeConfig=None, features={}, duplicateFeatures={}, latestFeatureGroupVersion={})

Bases: abacusai.return_class.AbstractApiClass

A feature group

Parameters
  • client (ApiClient) – An authenticated API Client instance

  • modificationLock (bool) – If feature groups is locked against a change or not

  • featureGroupId (str) – The unique identifier for this feature group

  • name (str) – [DEPRECATED] A user friendly name for the feature group

  • featureGroupSourceType (str) – One of SQL, PYTHON, DATASET, BATCH_PREDICTION

  • tableName (str) – The unique table name of this feature group

  • sql (str) – The sql definition creating this feature group

  • datasetId (str) – The datasetId the feature group is sourced from

  • functionSourceCode (str) – The source definition creating this feature group

  • functionName (str) – The function name to execute from the source code

  • sourceTables (list of string) – The source tables for this feature group

  • createdAt (str) – The timestamp at which the feature group was created.

  • description (str) – Description of the feature group

  • featureGroupType (str) – The Project Dataset Type when the Feature Group is used in the context of a project

  • useForTraining (bool) – used for training

  • sqlError (str) – collects the error message with this FeatureGroup

  • latestVersionOutdated (bool) – Is latest materialized feature group version outdated

  • tags (list of string) – Tags added to this feature group

  • primaryKey (str) – The primary index feature

  • updateTimestampKey (str) – The primary timestamp feature

  • lookupKeys (list of string) – Additional indexed features for this feature group

  • featureGroupUse (str) – The user assigned feature group use which allows for organizing feature groups in a project

  • isIncremental (bool) – If feature group corresponds to an incremental dataset.

  • mergeConfig (dict) – The merge configuration settings for the feature group.

  • features (Feature) – List of resolved features

  • duplicateFeatures (Feature) – List of duplicate features

  • latestFeatureGroupVersion (FeatureGroupVersion) – The latest feature group version

__repr__(self)

Return repr(self).

to_dict(self)

Get a dict representation of the parameters in this class

Returns

The dict value representation of the class parameters

Return type

dict

add_to_project(self, project_id, feature_group_type='CUSTOM_TABLE', feature_group_use=None)

Adds a feature group to a project,

Parameters
  • project_id (str) – The unique ID associated with the project.

  • feature_group_type (str) – The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.

  • feature_group_use (str) – The user assigned feature group use which allows for organizing project feature groups DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT

remove_from_project(self, project_id)

Removes a feature group from a project.

Parameters

project_id (str) – The unique ID associated with the project.

set_type(self, project_id, feature_group_type='CUSTOM_TABLE')

Update the feature group type in a project. The feature group must already be added to the project.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • feature_group_type (str) – The feature group type to set the feature group as. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.

use_for_training(self, project_id, use_for_training=True)

Use the feature group for model training input

Parameters
  • project_id (str) – The unique ID associated with the project.

  • use_for_training (bool) – Boolean variable to include or exclude a feature group from a model’s training. Only one feature group per type can be used for training

create_sampling(self, table_name, sampling_config, description=None)

Creates a new feature group defined as a sample of rows from another feature group.

For efficiency, sampling is approximate unless otherwise specified. (E.g. the number of rows may vary slightly from what was requested).

Parameters
  • table_name (str) – The unique name to be given to this sampling feature group.

  • sampling_config (dict) – JSON object (aka map) defining the sampling method and its parameters.

  • description (str) – A human-readable description of this feature group.

Returns

The created feature group.

Return type

FeatureGroup

set_sampling_config(self, sampling_config)

Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.

Currently, sampling is only for Sampling FeatureGroups, so this API only allows calling on that kind of FeatureGroup.

Parameters

sampling_config (dict) – A json object string specifying the sampling method and parameters specific to that sampling method. Empty sampling_config means no sampling.

Returns

The updated feature group.

Return type

FeatureGroup

set_merge_config(self, merge_config)

Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.

Parameters

merge_config (dict) – A json object string specifying the merge rule. An empty mergeConfig will default to only including the latest Dataset Version.

set_schema(self, schema)

Creates a new schema and points the feature group to the new feature group schema id.

Parameters

schema (list) – An array of json objects with ‘name’ and ‘dataType’ properties.

get_schema(self, project_id=None)

Returns a schema given a specific FeatureGroup in a project.

Parameters

project_id (str) – The unique ID associated with the project.

Returns

An array of objects for each column in the specified feature group.

Return type

Feature

create_feature(self, name, select_expression)

Creates a new feature in a Feature Group from a SQL select statement

Parameters
  • name (str) – The name of the feature to add

  • select_expression (str) – SQL select expression to create the feature

Returns

A feature group object with the newly added feature.

Return type

FeatureGroup

add_tag(self, tag)

Adds a tag to the feature group

Parameters

tag (str) – The tag to add to the feature group

remove_tag(self, tag)

Removes a tag from the feature group

Parameters

tag (str) – The tag to add to the feature group

create_nested_feature(self, nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)

Creates a new nested feature in a feature group from a SQL statements to create a new nested feature.

Parameters
  • nested_feature_name (str) – The name of the feature.

  • table_name (str) – The table name of the feature group to nest

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent

  • where_clause (str) – A SQL where statement to filter the nested rows

  • order_clause (str) – A SQL clause to order the nested rows

Returns

A feature group object with the newly added nested feature.

Return type

FeatureGroup

update_nested_feature(self, nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)

Updates a previously existing nested feature in a feature group.

Parameters
  • nested_feature_name (str) – The name of the feature to be updated.

  • table_name (str) – The name of the table.

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent

  • where_clause (str) – A SQL where statement to filter the nested rows

  • order_clause (str) – A SQL clause to order the nested rows

  • new_nested_feature_name (str) – New name for the nested feature.

Returns

A feature group object with the updated nested feature.

Return type

FeatureGroup

delete_nested_feature(self, nested_feature_name)

Delete a nested feature.

Parameters

nested_feature_name (str) – The name of the feature to be updated.

Returns

A feature group object without the deleted nested feature.

Return type

FeatureGroup

create_point_in_time_feature(self, feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)

Creates a new point in time feature in a feature group using another historical feature group, window spec and aggregate expression.

We use the aggregation keys, and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group. If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature >= lookbackStartCount and < the value of the current rows timeFeature are considered. An option lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to make sure that these rows are available in the online context when we are performing a lookup on this feature group. If window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is >= lookbackCount and includes the row just prior to the current one. The lag is specified in term of positions using lookbackUntilPosition.

Parameters
  • feature_name (str) – The name of the feature to create

  • history_table_name (str) – The table name of the history table.

  • aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature

  • historical_timestamp_key (str) – Name of feature which contains the historical timestamp.

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns

A feature group object with the newly added nested feature.

Return type

FeatureGroup

update_point_in_time_feature(self, feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)

Updates an existing point in time feature in a feature group. See createPointInTimeFeature for detailed semantics.

Parameters
  • feature_name (str) – The name of the feature.

  • history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.

  • aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature

  • historical_timestamp_key (str) – Name of feature which contains the historical timestamp.

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

  • new_feature_name (str) – New name for the point in time feature.

Returns

A feature group object with the newly added nested feature.

Return type

FeatureGroup

set_feature_type(self, feature, feature_type)

Set a feature’s type in a feature group/. Specify the feature group ID, feature name and feature type, and the method will return the new column with the resulting changes reflected.

Parameters
  • feature (str) – The name of the feature.

  • feature_type (str) – The machine learning type of the data in the feature. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some FeatureMappings will restrict the options or explicitly set the FeatureType.

Returns

The feature group after the data_type is applied

Return type

Schema

invalidate_streaming_data(self, invalid_before_timestamp)

Invalidates all streaming data with timestamp before invalidBeforeTimestamp

Parameters

invalid_before_timestamp (int) – The unix timestamp, any data which has a timestamp before this time will be deleted

concatenate_data(self, source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)

Concatenates data from one feature group to another. Feature groups can be merged if their schema’s are compatible and they have the special updateTimestampKey column and if set, the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).

Parameters
  • source_feature_group_id (str) – The feature group to concatenate with the destination feature group.

  • merge_type (str) – UNION or INTERSECTION

  • replace_until_timestamp (int) – The unix timestamp to specify the point till which we will replace data from the source feature group.

  • skip_materialize (bool) – If true, will not materialize the concatenated feature group

refresh(self)

Calls describe and refreshes the current object’s fields

Returns

The current object

Return type

FeatureGroup

describe(self)

Describe a Feature Group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

Returns

The feature group object.

Return type

FeatureGroup

set_indexing_config(self, primary_key=None, update_timestamp_key=None, lookup_keys=None)

Sets various attributes of the feature group used for deployment lookups and streaming updates.

Parameters
  • primary_key (str) – Name of feature which defines the primary key of the feature group.

  • update_timestamp_key (str) – Name of feature which defines the update timestamp of the feature group - used in concatenation and primary key deduplication.

  • lookup_keys (list) – List of feature names which can be used in the lookup api to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.

update(self, description=None)

Modifies an existing feature group

Parameters

description (str) – The description about the feature group.

Returns

The updated feature group object.

Return type

FeatureGroup

update_sql_definition(self, sql)

Updates the SQL statement for a feature group.

Parameters

sql (str) – Input SQL statement for the feature group.

Returns

The updated feature group

Return type

FeatureGroup

update_function_definition(self, function_source_code=None, function_name=None, input_feature_groups=None)

Updates the function definition for a feature group created using createFeatureGroupFromFunction

Parameters
  • function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

Returns

The updated feature group

Return type

FeatureGroup

update_feature(self, name, select_expression=None, new_name=None)

Modifies an existing feature in a feature group. A user needs to specify the name and feature group ID and either a SQL statement or new name tp update the feature.

Parameters
  • name (str) – The name of the feature to be updated.

  • select_expression (str) – Input SQL statement for modifying the feature.

  • new_name (str) – The new name of the feature.

Returns

The updated feature group object.

Return type

FeatureGroup

list_exports(self)

Lists all of the feature group exports for a given feature group

Parameters

feature_group_id (str) – The ID of the feature group

Returns

The feature group exports

Return type

FeatureGroupExport

set_modifier_lock(self, locked=True)

To lock a feature group to prevent it from being modified.

Parameters

locked (bool) – True or False to disable or enable feature group modification.

list_modifiers(self)

To list users who can modify a feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

Returns

Modification lock status and groups and organizations added to the feature group.

Return type

ModificationLockInfo

add_user_to_modifiers(self, email)

Adds user to a feature group.

Parameters

email (str) – The email address of the user to be removed.

add_organization_group_to_modifiers(self, organization_group_id)

Add Organization to a feature group.

Parameters

organization_group_id (str) – The unique ID associated with the organization group.

remove_user_from_modifiers(self, email)

Removes user from a feature group.

Parameters

email (str) – The email address of the user to be removed.

remove_organization_group_from_modifiers(self, organization_group_id)

Removes Organization from a feature group.

Parameters

organization_group_id (str) – The unique ID associated with the organization group.

delete_feature(self, name)

Removes an existing feature from a feature group. A user needs to specify the name of the feature to be deleted and the feature group ID.

Parameters

name (str) – The name of the feature to be deleted.

Returns

The updated feature group object.

Return type

FeatureGroup

delete(self)

Removes an existing feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

create_version(self)

Creates a snapshot for a specified feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

Returns

A feature group version.

Return type

FeatureGroupVersion

list_versions(self, limit=100, start_after_version=None)

Retrieves a list of all feature group versions for the specified feature group.

Parameters
  • limit (int) – The max length of the returned versions

  • start_after_version (str) – Results will start after this version

Returns

An array of feature group version.

Return type

FeatureGroupVersion

get_recent_streamed_data(self)

Returns recently streamed data to a streaming feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

upsert_data(self, streaming_token, data)

Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests

  • data (dict) – The data to record

append_data(self, streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests

  • data (dict) – The data to record

wait_for_materialization(self, timeout=7200)

A waiting call until feature group is materialized.

Parameters

timeout (int, optional) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 milliseconds.

get_status(self)

Gets the status of the feature group.

Returns

A string describing the status of a feature group (pending, complete, etc.).

Return type

str

load_as_pandas(self)

Loads the feature groups into a python pandas dataframe.

Returns

A pandas dataframe with annotations and text_snippet columns.

Return type

DataFrame

describe_dataset(self)

Displays the dataset attached to a feature group.

Returns

A dataset object with all the relevant information about the dataset.

Return type

Dataset