abacusai.feature_group
Module Contents
Classes
A feature group |
- class abacusai.feature_group.FeatureGroup(client, modificationLock=None, featureGroupId=None, name=None, featureGroupSourceType=None, tableName=None, sql=None, datasetId=None, functionSourceCode=None, functionName=None, sourceTables=None, createdAt=None, description=None, featureGroupType=None, sqlError=None, latestVersionOutdated=None, tags=None, primaryKey=None, updateTimestampKey=None, lookupKeys=None, streamingEnabled=None, featureGroupUse=None, incremental=None, mergeConfig=None, transformConfig=None, cpuSize=None, memory=None, streamingReady=None, featureTags=None, moduleName=None, isAboveDataLimitingThreshold=None, isCdsAvailable=None, isCdsActive=None, features={}, duplicateFeatures={}, pointInTimeGroups={}, concatenationConfig={}, indexingConfig={}, latestFeatureGroupVersion={})
Bases:
abacusai.return_class.AbstractApiClass
A feature group
- Parameters
client (ApiClient) – An authenticated API Client instance
modificationLock (bool) – If feature groups is locked against a change or not
featureGroupId (str) – The unique identifier for this feature group
name (str) – [DEPRECATED] A user friendly name for the feature group
featureGroupSourceType (str) – One of SQL, PYTHON, DATASET, BATCH_PREDICTION
tableName (str) – The unique table name of this feature group
sql (str) – The sql definition creating this feature group
datasetId (str) – The datasetId the feature group is sourced from
functionSourceCode (str) – The source definition creating this feature group
functionName (str) – The function name to execute from the source code
sourceTables (list of string) – The source tables for this feature group
createdAt (str) – The timestamp at which the feature group was created.
description (str) – Description of the feature group
featureGroupType (str) – The Project Dataset Type when the Feature Group is used in the context of a project
sqlError (str) – collects the error message with this FeatureGroup
latestVersionOutdated (bool) – Is latest materialized feature group version outdated
tags (list of string) – Tags added to this feature group
primaryKey (str) – The primary index feature
updateTimestampKey (str) – The primary timestamp feature
lookupKeys (list of string) – Additional indexed features for this feature group
streamingEnabled (bool) – If true, the feature group can have data streamed to it
featureGroupUse (str) – The user assigned feature group use which allows for organizing feature groups in a project
incremental (bool) – If feature group corresponds to an incremental dataset.
mergeConfig (dict) – The merge configuration settings for the feature group.
transformConfig (dict) – The transform configuration settings for the feature group.
cpuSize (str) – Cpu size specified for the python feature group.
memory (int) – Memory in GB specified for the python feature group.
streamingReady (bool) – If true, the feature group is ready to receive streaming data
featureTags (dict) –
moduleName (str) – The path to the file with the feature group function.
isAboveDataLimitingThreshold (bool) – Boolean indicating whether one of the source datasets dependencies for feature group instance has data (rows) above the threshold limit (1 million)
isCdsAvailable (bool) – Boolean indicating whether a custom dataserver (CDS) is available to be deployed
isCdsActive (bool) – Boolean indicating whether a custom dataserver (CDS) is present
features (Feature) – List of resolved features
duplicateFeatures (Feature) – List of duplicate features
pointInTimeGroups (PointInTimeGroup) – List of Point In Time Groups
latestFeatureGroupVersion (FeatureGroupVersion) – The latest feature group version
concatenationConfig (ConcatenationConfig) – The Feature Group ID whose data will be concatenated into this feature group
indexingConfig (IndexingConfig) –
- __repr__(self)
Return repr(self).
- to_dict(self)
Get a dict representation of the parameters in this class
- Returns
The dict value representation of the class parameters
- Return type
- add_to_project(self, project_id, feature_group_type='CUSTOM_TABLE', feature_group_use=None)
Adds a feature group to a project,
- Parameters
project_id (str) – The unique ID associated with the project.
feature_group_type (str) – The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.
feature_group_use (str) – The user assigned feature group use which allows for organizing project feature groups DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT
- remove_from_project(self, project_id)
Removes a feature group from a project.
- Parameters
project_id (str) – The unique ID associated with the project.
- set_type(self, project_id, feature_group_type='CUSTOM_TABLE')
Update the feature group type in a project. The feature group must already be added to the project.
- Parameters
project_id (str) – The unique ID associated with the project.
feature_group_type (str) – The feature group type to set the feature group as. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.
- use_for_training(self, project_id, use_for_training=True)
Use the feature group for model training input
- create_sampling(self, table_name, sampling_config, description=None)
Creates a new feature group defined as a sample of rows from another feature group.
For efficiency, sampling is approximate unless otherwise specified. (E.g. the number of rows may vary slightly from what was requested).
- Parameters
- Returns
The created feature group.
- Return type
- set_sampling_config(self, sampling_config)
Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.
Currently, sampling is only for Sampling FeatureGroups, so this API only allows calling on that kind of FeatureGroup.
- Parameters
sampling_config (dict) – A json object string specifying the sampling method and parameters specific to that sampling method. Empty sampling_config means no sampling.
- Returns
The updated feature group.
- Return type
- set_merge_config(self, merge_config)
Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.
- Parameters
merge_config (dict) – A json object string specifying the merge rule. An empty mergeConfig will default to only including the latest Dataset Version.
- set_transform_config(self, transform_config)
Set a TransformFeatureGroup’s transform config to the values provided.
- Parameters
transform_config (dict) – A json object string specifying the pre-defined transformation.
- set_schema(self, schema)
Creates a new schema and points the feature group to the new feature group schema id.
- Parameters
schema (list) – An array of json objects with ‘name’ and ‘dataType’ properties.
- get_schema(self, project_id=None)
Returns a schema given a specific FeatureGroup in a project.
- create_feature(self, name, select_expression)
Creates a new feature in a Feature Group from a SQL select statement
- Parameters
- Returns
A feature group object with the newly added feature.
- Return type
- add_tag(self, tag)
Adds a tag to the feature group
- Parameters
tag (str) – The tag to add to the feature group
- remove_tag(self, tag)
Removes a tag from the feature group
- Parameters
tag (str) – The tag to add to the feature group
- create_nested_feature(self, nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)
Creates a new nested feature in a feature group from a SQL statements to create a new nested feature.
- Parameters
nested_feature_name (str) – The name of the feature.
table_name (str) – The table name of the feature group to nest
using_clause (str) – The SQL join column or logic to join the nested table with the parent
where_clause (str) – A SQL where statement to filter the nested rows
order_clause (str) – A SQL clause to order the nested rows
- Returns
A feature group object with the newly added nested feature.
- Return type
- update_nested_feature(self, nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)
Updates a previously existing nested feature in a feature group.
- Parameters
nested_feature_name (str) – The name of the feature to be updated.
table_name (str) – The name of the table.
using_clause (str) – The SQL join column or logic to join the nested table with the parent
where_clause (str) – A SQL where statement to filter the nested rows
order_clause (str) – A SQL clause to order the nested rows
new_nested_feature_name (str) – New name for the nested feature.
- Returns
A feature group object with the updated nested feature.
- Return type
- delete_nested_feature(self, nested_feature_name)
Delete a nested feature.
- Parameters
nested_feature_name (str) – The name of the feature to be updated.
- Returns
A feature group object without the deleted nested feature.
- Return type
- create_point_in_time_feature(self, feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)
Creates a new point in time feature in a feature group using another historical feature group, window spec and aggregate expression.
We use the aggregation keys, and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group. If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature >= lookbackStartCount and < the value of the current rows timeFeature are considered. An option lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to make sure that these rows are available in the online context when we are performing a lookup on this feature group. If window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is >= lookbackCount and includes the row just prior to the current one. The lag is specified in term of positions using lookbackUntilPosition.
- Parameters
feature_name (str) – The name of the feature to create
history_table_name (str) – The table name of the history table.
aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.
timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature
historical_timestamp_key (str) – Name of feature which contains the historical timestamp.
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns
A feature group object with the newly added nested feature.
- Return type
- update_point_in_time_feature(self, feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)
Updates an existing point in time feature in a feature group. See createPointInTimeFeature for detailed semantics.
- Parameters
feature_name (str) – The name of the feature.
history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.
aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.
timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature
historical_timestamp_key (str) – Name of feature which contains the historical timestamp.
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
new_feature_name (str) – New name for the point in time feature.
- Returns
A feature group object with the newly added nested feature.
- Return type
- create_point_in_time_group(self, group_name, window_key, aggregation_keys, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=0, lookback_count=None, lookback_until_position=0)
Create point in time group
- Parameters
group_name (str) – The name of the point in time group
window_key (str) – Name of feature to use for ordering the rows on the source table
aggregation_keys (list) – List of keys to perform on the source table for the window aggregation.
history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used
history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used
history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys
lookback_window (float) – Number of seconds in the past from the current time for start of the window. If 0, the lookback will include all rows.
lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns
The feature group after the point in time group has been created
- Return type
- update_point_in_time_group(self, group_name, window_key=None, aggregation_keys=None, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=None, lookback_count=None, lookback_until_position=None)
Update point in time group
- Parameters
group_name (str) – The name of the point in time group
window_key (str) – Name of feature which contains the timestamp value for the point in time feature
aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.
history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used
history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used
history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys
lookback_window (float) – Number of seconds in the past from the current time for start of the window.
lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns
The feature group after the update has been applied
- Return type
- delete_point_in_time_group(self, group_name)
Delete point in time group
- Parameters
group_name (str) – The name of the point in time group
- Returns
The feature group after the point in time group has been deleted
- Return type
- create_point_in_time_group_feature(self, group_name, name, expression)
Create point in time group feature
- Parameters
- Returns
The feature group after the update has been applied
- Return type
- update_point_in_time_group_feature(self, group_name, name, expression)
Update a feature’s SQL expression in a point in time group
- Parameters
- Returns
The feature group after the update has been applied
- Return type
- set_feature_type(self, feature, feature_type)
Set a feature’s type in a feature group/. Specify the feature group ID, feature name and feature type, and the method will return the new column with the resulting changes reflected.
- Parameters
feature (str) – The name of the feature.
feature_type (str) – The machine learning type of the data in the feature. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some FeatureMappings will restrict the options or explicitly set the FeatureType.
- Returns
The feature group after the data_type is applied
- Return type
- invalidate_streaming_data(self, invalid_before_timestamp)
Invalidates all streaming data with timestamp before invalidBeforeTimestamp
- Parameters
invalid_before_timestamp (int) – The unix timestamp, any data which has a timestamp before this time will be deleted
- concatenate_data(self, source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)
Concatenates data from one feature group to another. Feature groups can be merged if their schema’s are compatible and they have the special updateTimestampKey column and if set, the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).
- Parameters
source_feature_group_id (str) – The feature group to concatenate with the destination feature group.
merge_type (str) – UNION or INTERSECTION
replace_until_timestamp (int) – The unix timestamp to specify the point till which we will replace data from the source feature group.
skip_materialize (bool) – If true, will not materialize the concatenated feature group
- remove_concatenation_config(self)
Removes the concatenation config on a destination feature group.
- Parameters
feature_group_id (str) – Removes the concatenation configuration on a destination feature group
- refresh(self)
Calls describe and refreshes the current object’s fields
- Returns
The current object
- Return type
- describe(self)
Describe a Feature Group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- Returns
The feature group object.
- Return type
- set_indexing_config(self, primary_key=None, update_timestamp_key=None, lookup_keys=None)
Sets various attributes of the feature group used for deployment lookups and streaming updates.
- Parameters
primary_key (str) – Name of feature which defines the primary key of the feature group.
update_timestamp_key (str) – Name of feature which defines the update timestamp of the feature group - used in concatenation and primary key deduplication.
lookup_keys (list) – List of feature names which can be used in the lookup api to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.
- update(self, description=None)
Modifies an existing feature group
- Parameters
description (str) – The description about the feature group.
- Returns
The updated feature group object.
- Return type
- update_sql_definition(self, sql)
Updates the SQL statement for a feature group.
- Parameters
sql (str) – Input SQL statement for the feature group.
- Returns
The updated feature group
- Return type
- update_function_definition(self, function_source_code=None, function_name=None, input_feature_groups=None, cpu_size=None, memory=None)
Updates the function definition for a feature group created using createFeatureGroupFromFunction
- Parameters
function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
- Returns
The updated feature group
- Return type
- update_zip(self, function_name, module_name, input_feature_groups=None, cpu_size=None, memory=None)
Updates the zip for a feature group created using createFeatureGroupFromZip
- Parameters
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
module_name (str) – Path to the file with the feature group function.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
- Returns
The Upload to upload the zip file to
- Return type
- update_feature(self, name, select_expression=None, new_name=None)
Modifies an existing feature in a feature group. A user needs to specify the name and feature group ID and either a SQL statement or new name tp update the feature.
- Parameters
- Returns
The updated feature group object.
- Return type
- list_exports(self)
Lists all of the feature group exports for a given feature group
- Parameters
feature_group_id (str) – The ID of the feature group
- Returns
The feature group exports
- Return type
- set_modifier_lock(self, locked=True)
To lock a feature group to prevent it from being modified.
- Parameters
locked (bool) – True or False to disable or enable feature group modification.
- list_modifiers(self)
To list users who can modify a feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- Returns
Modification lock status and groups and organizations added to the feature group.
- Return type
- add_user_to_modifiers(self, email)
Adds user to a feature group.
- Parameters
email (str) – The email address of the user to be removed.
- add_organization_group_to_modifiers(self, organization_group_id)
Add Organization to a feature group.
- Parameters
organization_group_id (str) – The unique ID associated with the organization group.
- remove_user_from_modifiers(self, email)
Removes user from a feature group.
- Parameters
email (str) – The email address of the user to be removed.
- remove_organization_group_from_modifiers(self, organization_group_id)
Removes Organization from a feature group.
- Parameters
organization_group_id (str) – The unique ID associated with the organization group.
- delete_feature(self, name)
Removes an existing feature from a feature group. A user needs to specify the name of the feature to be deleted and the feature group ID.
- Parameters
name (str) – The name of the feature to be deleted.
- Returns
The updated feature group object.
- Return type
- delete(self)
Removes an existing feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- create_version(self, variable_bindings=None)
Creates a snapshot for a specified feature group.
- Parameters
variable_bindings (dict) – (JSON Object): JSON object (aka map) defining variable bindings that override parent feature group values.
- Returns
A feature group version.
- Return type
- list_versions(self, limit=100, start_after_version=None)
Retrieves a list of all feature group versions for the specified feature group.
- Parameters
- Returns
An array of feature group version.
- Return type
- get_recent_streamed_data(self)
Returns recently streamed data to a streaming feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- upsert_data(self, streaming_token, data)
Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.
- append_data(self, streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- wait_for_dataset(self, timeout=7200)
A waiting call until the feature group’s dataset, if any, is ready for use.
- Parameters
timeout (int, optional) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.
- wait_for_upload(self, timeout=7200)
Waits for a feature group created from a dataframe to be ready for materialization and version creation.
- Parameters
timeout (int, optional) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.
- wait_for_materialization(self, timeout=7200)
A waiting call until feature group is materialized.
- Parameters
timeout (int, optional) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.
- wait_for_streaming_ready(self, timeout=600)
Waits for the feature group indexing config to be applied for streaming
- Parameters
timeout (int, optional) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 600 seconds.
- get_status(self, streaming_status=False)
Gets the status of the feature group.
- load_as_pandas(self)
Loads the feature groups into a python pandas dataframe.
- Returns
A pandas dataframe with annotations and text_snippet columns.
- Return type
DataFrame