abacusai.client
Module Contents
Classes
Options for configuring the ApiClient |
|
Abstract Base API Client |
|
Abacus.AI API Client |
Functions
|
|
|
|
- abacusai.client._requests_retry_session(retries=5, backoff_factor=0.1, status_forcelist=(502, 504), session=None)
- abacusai.client._discover_service_url(service_discovery_url, client_version, deployment_id, deployment_token)
- abacusai.client._get_service_discovery_url()
- class abacusai.client.ClientOptions(exception_on_404=True, server='https://api.abacus.ai')
Options for configuring the ApiClient
- exception abacusai.client.ApiException(message, http_status, exception=None)
Bases:
Exception
Default ApiException raised by APIs
- Parameters
- __str__(self)
Return str(self).
- class abacusai.client.BaseApiClient(api_key=None, server=None, client_options=None, skip_version_check=False)
Abstract Base API Client
- Parameters
api_key (str) – The api key to use as authentication to the server
server (str) – The base server url to use to send API requets to
client_options (ClientOptions) – Optional API client configurations
skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client
- client_version = 0.34.1
- _clean_api_objects(self, obj)
- _call_api(self, action, method, query_params=None, body=None, files=None, parse_type=None, streamable_response=False, server_override=None)
- _build_class(self, return_class, values)
- _request(self, url, method, query_params=None, headers=None, body=None, files=None, stream=False)
- _poll(self, obj, wait_states, delay=5, timeout=300, poll_args={})
- _upload_from_df(self, upload, df)
- class abacusai.client.ApiClient(api_key=None, server=None, client_options=None, skip_version_check=False)
Bases:
BaseApiClient
Abacus.AI API Client
- Parameters
api_key (str) – The api key to use as authentication to the server
server (str) – The base server url to use to send API requets to
client_options (ClientOptions) – Optional API client configurations
skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client
- create_dataset_from_pandas(self, feature_group_table_name, df, name=None)
Creates a Dataset from a pandas dataframe
- create_dataset_version_from_pandas(self, table_name_or_id, df)
Updates an existing dataset from a pandas dataframe
- create_model_from_functions(self, project_id, train_function, predict_function, training_input_tables=None)
Creates a model from a python function
- Parameters
project_id (str) – The project to create the model in
train_function (callable) – The training fucntion callable to serialize and upload
predict_function (callable) – The predict function callable to serialize and upload
training_input_tables (list) – The input table names of the feature groups to pass to the train function
- add_user_to_organization(self, email)
Invites a user to your organization. This method will send the specified email address an invitation link to join your organization.
- Parameters
email (str) – The email address to invite to your Organization.
- list_api_keys(self)
Lists all of the user’s API keys the user’s organization.
- Returns
List of API Keys for this user.
- Return type
- list_organization_users(self)
Retrieves a list of all users in the organization.
This method will retrieve a list containing all the users in the organization. The list includes pending users who have been invited to the organization.
- Returns
Array of all of the users in the Organization
- Return type
- describe_user(self)
Get the current user’s information, such as their name, email, admin status, etc.
- Returns
Information about the current User
- Return type
- list_organization_groups(self)
Lists all Organizations Groups within this Organization
- Returns
List of Groups in this Organization
- Return type
- create_organization_group(self, group_name, permissions, default_group=False)
Creates a new Organization Group.
- Parameters
- Returns
Information about the created Organization Group
- Return type
- describe_organization_group(self, organization_group_id)
Returns the specific organization group passes in by the user.
- Parameters
organization_group_id (str) – The unique ID of the organization group to that needs to be described.
- Returns
Information about a specific Organization Group
- Return type
- add_organization_group_permission(self, organization_group_id, permission)
Adds a permission to the specified Organization Group
- remove_organization_group_permission(self, organization_group_id, permission)
Removes a permission from the specified Organization Group
- delete_organization_group(self, organization_group_id)
Deletes the specified Organization Group from the organization.
- Parameters
organization_group_id (str) – The ID of the Organization Group
- add_user_to_organization_group(self, organization_group_id, email)
Adds a user to the specified Organization Group
- remove_user_from_organization_group(self, organization_group_id, email)
Removes a user from an Organization Group
- set_default_organization_group(self, organization_group_id)
Sets the default Organization Group that all new users that join an organization are automatically added to
- Parameters
organization_group_id (str) – The ID of the Organization Group
- delete_api_key(self, api_key_id)
Delete a specified API Key. You can use the “listApiKeys” method to find the list of all API Key IDs.
- Parameters
api_key_id (str) – The ID of the API key to delete.
- remove_user_from_organization(self, email)
Removes the specified user from the Organization. You can remove yourself, Otherwise you must be an Organization Administrator to use this method to remove other users from the organization.
- Parameters
email (str) – The email address of the user to remove from the Organization.
- create_project(self, name, use_case)
Creates a project with your specified project name and use case. Creating a project creates a container for all of the datasets and the models that are associated with a particular problem/project that you would like to work on. For example, if you want to create a model to detect fraud, you have to first create a project, upload datasets, create feature groups, and then create one or more models to get predictions for your use case.
- Parameters
name (str) – The project’s name
use_case (str) – The use case that the project solves. You can refer to our (guide on use cases)[https://api.abacus.ai/app/help/useCases] for further details of each use case. The following enums are currently available for you to choose from: NLP_HYBRID, LANGUAGE_DETECTION, NLP_SENTIMENT, NLP_QA, NLP_SEARCH, NLP_SENTENCE_BOUNDARY_DETECTION, NLP_CLASSIFICATION, NLP_DOCUMENT_VISUALIZATION, ANOMALYEVENTSTREAM, UCPLUGANDPLAY, EMBEDDINGS_ONLY, MODEL_WITH_EMBEDDINGS, TORCH_MODEL_WITH_EMBEDDINGS, PYTHON_MODEL, DOCKER_MODEL, DOCKER_MODEL_WITH_EMBEDDINGS, CUSTOMER_CHURN, ENERGY, FINANCIAL_METRICS, FRAUD_ACCOUNT, FRAUD_THREAT, FRAUD_TRANSACTIONS, OPERATIONS_CLOUD, CLOUD_SPEND, TIMESERIES_ANOMALY_DETECTION, OPERATIONS_MAINTENANCE, OPERATIONS_INCIDENT, PERS_PROMOTIONS, PREDICTING, FEATURE_STORE, RETAIL, SALES_FORECASTING, SALES_SCORING, FEED_RECOMMEND, USER_RANKINGS, NAMED_ENTITY_RECOGNITION, USER_ITEM_AFFINITY, USER_RECOMMENDATIONS, USER_RELATED, DECO_REALTIME, DECO_HOSTMON, DECO_VECTOR, DECO_EXPDEB, VISION_SEGMENTATION, VISION, VISION_HYBRID, FEATURE_DRIFT.
- Returns
This object represents the newly created project. For details refer to
- Return type
- list_use_cases(self)
Retrieves a list of all use cases with descriptions. Use the given mappings to specify a use case when needed.
- Returns
A list of UseCase objects describing all the use cases addressed by the platform. For details, please refer to
- Return type
- describe_use_case_requirements(self, use_case)
This API call returns the feature requirements for a specified use case
- Parameters
use_case (str) – This will contain the Enum String for the use case whose dataset requirements are needed.
- Returns
The feature requirements of the use case are returned. This includes all the feature group required for the use case along with their descriptions and feature mapping details.
- Return type
- describe_project(self, project_id)
Returns a description of a project.
- list_projects(self, limit=100, start_after_id=None)
Retrieves a list of all projects in the current organization.
- list_project_datasets(self, project_id)
Retrieves all dataset(s) attached to a specified project. This API returns all attributes of each dataset, such as its name, type, and ID.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array representing all of the datasets attached to the project.
- Return type
- get_schema(self, project_id, dataset_id)
[DEPRECATED] Returns a schema given a specific dataset in a project. The schema of the dataset consists of the columns in the dataset, the data type of the column, and the column’s column mapping.
- rename_project(self, project_id, name)
This method renames a project after it is created.
- delete_project(self, project_id)
Deletes a specified project from your organization.
This method deletes the project, trained models and deployments in the specified project. The datasets attached to the specified project remain available for use with other projects in the organization.
This method will not delete a project that contains active deployments. Be sure to stop all active deployments before you use the delete option.
Note: All projects, models, and deployments cannot be recovered once they are deleted.
- Parameters
project_id (str) – The unique ID of the project to delete.
- add_feature_group_to_project(self, feature_group_id, project_id, feature_group_type='CUSTOM_TABLE', feature_group_use=None)
Adds a feature group to a project,
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
project_id (str) – The unique ID associated with the project.
feature_group_type (str) – The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.
feature_group_use (str) – The user assigned feature group use which allows for organizing project feature groups DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT
- remove_feature_group_from_project(self, feature_group_id, project_id)
Removes a feature group from a project.
- set_feature_group_type(self, feature_group_id, project_id, feature_group_type='CUSTOM_TABLE')
Update the feature group type in a project. The feature group must already be added to the project.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
project_id (str) – The unique ID associated with the project.
feature_group_type (str) – The feature group type to set the feature group as. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.
- use_feature_group_for_training(self, feature_group_id, project_id, use_for_training=True)
Use the feature group for model training input
- Parameters
- set_feature_mapping(self, project_id, feature_group_id, feature_name, feature_mapping, nested_column_name=None)
Set a column’s feature mapping. If the column mapping is single-use and already set in another column in this feature group, this call will first remove the other column’s mapping and move it to this column.
- Parameters
project_id (str) – The unique ID associated with the project.
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature.
feature_mapping (str) – The mapping of the feature in the feature group.
nested_column_name (str) – The name of the nested column.
- Returns
A list of objects that describes the resulting feature group’s schema after the feature’s featureMapping is set.
- Return type
- validate_project(self, project_id)
Validates that the specified project has all required feature group types for its use case and that all required feature columns are set.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
The project validation. If the specified project is missing required columns or feature groups, the response includes an array of objects for each missing required feature group and the missing required features in each feature group.
- Return type
- set_column_data_type(self, project_id, dataset_id, column, data_type)
Set a dataset’s column type.
- Parameters
project_id (str) – The unique ID associated with the project.
dataset_id (str) – The unique ID associated with the dataset.
column (str) – The name of the column.
data_type (str) – The type of the data in the column. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some ColumnMappings will restrict the options or explicity set the DataType.
- Returns
A list of objects that describes the resulting dataset’s schema after the column’s dataType is set.
- Return type
- set_column_mapping(self, project_id, dataset_id, column, column_mapping)
Set a dataset’s column mapping. If the column mapping is single-use and already set in another column in this dataset, this call will first remove the other column’s mapping and move it to this column.
- Parameters
- Returns
A list of columns that describes the resulting dataset’s schema after the column’s columnMapping is set.
- Return type
- remove_column_mapping(self, project_id, dataset_id, column)
Removes a column mapping from a column in the dataset. Returns a list of all columns with their mappings once the change is made.
- Parameters
- Returns
A list of objects that describes the resulting dataset’s schema after the column’s columnMapping is set.
- Return type
- create_feature_group(self, table_name, sql, description=None)
Creates a new feature group from a SQL statement.
- Parameters
- Returns
The created feature group
- Return type
- create_feature_group_from_function(self, table_name, function_source_code, function_name, input_feature_groups=[], description=None)
Creates a new feature in a Feature Group from user provided code. Code language currently supported is Python.
If a list of input feature groups are supplied, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.
This method expects function_source_code to be a valid language source file which contains a function named `function_name. This function needs return a DataFrame when it is executed and this DataFrame will be used as the materialized version of this feature group table.
- Parameters
table_name (str) – The unique name to be given to the feature group.
function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
description (str) – The description for this feature group.
- Returns
The created feature group
- Return type
- create_sampling_feature_group(self, feature_group_id, table_name, sampling_config, description=None)
Creates a new feature group defined as a sample of rows from another feature group.
For efficiency, sampling is approximate unless otherwise specified. (E.g. the number of rows may vary slightly from what was requested).
- Parameters
feature_group_id (str) – The unique ID associated with the pre-existing feature group that will be sampled by this new feature group. I.e. the input for sampling.
table_name (str) – The unique name to be given to this sampling feature group.
sampling_config (dict) – JSON object (aka map) defining the sampling method and its parameters.
description (str) – A human-readable description of this feature group.
- Returns
The created feature group.
- Return type
- create_merge_feature_group(self, source_feature_group_id, table_name, merge_config, description=None)
Creates a new feature group defined as the union of other feature group versions.
- Parameters
- Returns
The created feature group.
- Return type
- set_feature_group_sampling_config(self, feature_group_id, sampling_config)
Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.
Currently, sampling is only for Sampling FeatureGroups, so this API only allows calling on that kind of FeatureGroup.
- Parameters
- Returns
The updated feature group.
- Return type
- set_feature_group_merge_config(self, feature_group_id, merge_config)
Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.
- set_feature_group_schema(self, feature_group_id, schema)
Creates a new schema and points the feature group to the new feature group schema id.
- get_feature_group_schema(self, feature_group_id, project_id=None)
Returns a schema given a specific FeatureGroup in a project.
- create_feature(self, feature_group_id, name, select_expression)
Creates a new feature in a Feature Group from a SQL select statement
- Parameters
- Returns
A feature group object with the newly added feature.
- Return type
- add_feature_group_tag(self, feature_group_id, tag)
Adds a tag to the feature group
- remove_feature_group_tag(self, feature_group_id, tag)
Removes a tag from the feature group
- create_nested_feature(self, feature_group_id, nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)
Creates a new nested feature in a feature group from a SQL statements to create a new nested feature.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
nested_feature_name (str) – The name of the feature.
table_name (str) – The table name of the feature group to nest
using_clause (str) – The SQL join column or logic to join the nested table with the parent
where_clause (str) – A SQL where statement to filter the nested rows
order_clause (str) – A SQL clause to order the nested rows
- Returns
A feature group object with the newly added nested feature.
- Return type
- update_nested_feature(self, feature_group_id, nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)
Updates a previously existing nested feature in a feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
nested_feature_name (str) – The name of the feature to be updated.
table_name (str) – The name of the table.
using_clause (str) – The SQL join column or logic to join the nested table with the parent
where_clause (str) – A SQL where statement to filter the nested rows
order_clause (str) – A SQL clause to order the nested rows
new_nested_feature_name (str) – New name for the nested feature.
- Returns
A feature group object with the updated nested feature.
- Return type
- delete_nested_feature(self, feature_group_id, nested_feature_name)
Delete a nested feature.
- Parameters
- Returns
A feature group object without the deleted nested feature.
- Return type
- create_point_in_time_feature(self, feature_group_id, feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)
Creates a new point in time feature in a feature group using another historical feature group, window spec and aggregate expression.
We use the aggregation keys, and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group. If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature >= lookbackStartCount and < the value of the current rows timeFeature are considered. An option lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to make sure that these rows are available in the online context when we are performing a lookup on this feature group. If window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is >= lookbackCount and includes the row just prior to the current one. The lag is specified in term of positions using lookbackUntilPosition.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature to create
history_table_name (str) – The table name of the history table.
aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.
timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature
historical_timestamp_key (str) – Name of feature which contains the historical timestamp.
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns
A feature group object with the newly added nested feature.
- Return type
- update_point_in_time_feature(self, feature_group_id, feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)
Updates an existing point in time feature in a feature group. See createPointInTimeFeature for detailed semantics.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature.
history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.
aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.
timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature
historical_timestamp_key (str) – Name of feature which contains the historical timestamp.
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
new_feature_name (str) – New name for the point in time feature.
- Returns
A feature group object with the newly added nested feature.
- Return type
- set_feature_type(self, feature_group_id, feature, feature_type)
Set a feature’s type in a feature group/. Specify the feature group ID, feature name and feature type, and the method will return the new column with the resulting changes reflected.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
feature (str) – The name of the feature.
feature_type (str) – The machine learning type of the data in the feature. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some FeatureMappings will restrict the options or explicitly set the FeatureType.
- Returns
The feature group after the data_type is applied
- Return type
- invalidate_streaming_feature_group_data(self, feature_group_id, invalid_before_timestamp)
Invalidates all streaming data with timestamp before invalidBeforeTimestamp
- concatenate_feature_group_data(self, feature_group_id, source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)
Concatenates data from one feature group to another. Feature groups can be merged if their schema’s are compatible and they have the special updateTimestampKey column and if set, the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).
- Parameters
feature_group_id (str) – The destination feature group.
source_feature_group_id (str) – The feature group to concatenate with the destination feature group.
merge_type (str) – UNION or INTERSECTION
replace_until_timestamp (int) – The unix timestamp to specify the point till which we will replace data from the source feature group.
skip_materialize (bool) – If true, will not materialize the concatenated feature group
- describe_feature_group(self, feature_group_id)
Describe a Feature Group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- Returns
The feature group object.
- Return type
- describe_feature_group_by_table_name(self, table_name)
Describe a Feature Group by the feature group’s table name
- Parameters
table_name (str) – The unique table name of the Feature Group to lookup
- Returns
The Feature Group
- Return type
- set_feature_group_indexing_config(self, feature_group_id, primary_key=None, update_timestamp_key=None, lookup_keys=None)
Sets various attributes of the feature group used for deployment lookups and streaming updates.
- Parameters
feature_group_id (str) – The feature group
primary_key (str) – Name of feature which defines the primary key of the feature group.
update_timestamp_key (str) – Name of feature which defines the update timestamp of the feature group - used in concatenation and primary key deduplication.
lookup_keys (list) – List of feature names which can be used in the lookup api to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.
- list_feature_groups(self, limit=100, start_after_id=None)
Enlist all the feature groups associated with a project. A user needs to specify the unique project ID to fetch all attached feature groups.
- Parameters
- Returns
All the feature groups in the organization
- Return type
- list_project_feature_groups(self, project_id, filter_feature_group_use=None)
List all the feature groups associated with a project
- Parameters
project_id (str) – The unique ID associated with the project.
filter_feature_group_use (str) – The feature group use filter, when given as an argument, only allows feature groups in this project to be returned if they are of the given use. DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT, BATCH_PREDICTION_OUTPUT
- Returns
All the Feature Groups in the Organization
- Return type
- update_feature_group(self, feature_group_id, description=None)
Modifies an existing feature group
- Parameters
- Returns
The updated feature group object.
- Return type
- update_feature_group_sql_definition(self, feature_group_id, sql)
Updates the SQL statement for a feature group.
- Parameters
- Returns
The updated feature group
- Return type
- update_feature_group_function_definition(self, feature_group_id, function_source_code=None, function_name=None, input_feature_groups=None)
Updates the function definition for a feature group created using createFeatureGroupFromFunction
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
- Returns
The updated feature group
- Return type
- update_feature(self, feature_group_id, name, select_expression=None, new_name=None)
Modifies an existing feature in a feature group. A user needs to specify the name and feature group ID and either a SQL statement or new name tp update the feature.
- Parameters
- Returns
The updated feature group object.
- Return type
- export_feature_group_version_to_file_connector(self, feature_group_version, location, export_file_format, overwrite=False)
Export Feature group to File Connector.
- Parameters
- Returns
The FeatureGroupExport instance
- Return type
- export_feature_group_version_to_database_connector(self, feature_group_version, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None)
Export Feature group to Database Connector.
- Parameters
feature_group_version (str) – The Feature Group instance id to export.
database_connector_id (str) – Database connector to export to.
object_name (str) – The database object to write to
write_mode (str) – Either INSERT or UPSERT
database_feature_mapping (dict) – A key/value pair JSON Object of “database connector column” -> “feature name” pairs.
id_column (str) – Required if mode is UPSERT. Indicates which database column should be used as the lookup key for UPSERT
- Returns
The FeatureGroupExport instance
- Return type
- describe_feature_group_export(self, feature_group_export_id)
A feature group export
- Parameters
feature_group_export_id (str) – The ID of the feature group export.
- Returns
The feature group export
- Return type
- list_feature_group_exports(self, feature_group_id)
Lists all of the feature group exports for a given feature group
- Parameters
feature_group_id (str) – The ID of the feature group
- Returns
The feature group exports
- Return type
- set_feature_group_modifier_lock(self, feature_group_id, locked=True)
To lock a feature group to prevent it from being modified.
- list_feature_group_modifiers(self, feature_group_id)
To list users who can modify a feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- Returns
Modification lock status and groups and organizations added to the feature group.
- Return type
- add_user_to_feature_group_modifiers(self, feature_group_id, email)
Adds user to a feature group.
- add_organization_group_to_feature_group_modifiers(self, feature_group_id, organization_group_id)
Add Organization to a feature group.
- remove_user_from_feature_group_modifiers(self, feature_group_id, email)
Removes user from a feature group.
- remove_organization_group_from_feature_group_modifiers(self, feature_group_id, organization_group_id)
Removes Organization from a feature group.
- delete_feature(self, feature_group_id, name)
Removes an existing feature from a feature group. A user needs to specify the name of the feature to be deleted and the feature group ID.
- Parameters
- Returns
The updated feature group object.
- Return type
- delete_feature_group(self, feature_group_id)
Removes an existing feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- create_feature_group_version(self, feature_group_id)
Creates a snapshot for a specified feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- Returns
A feature group version.
- Return type
- get_materialization_logs(self, feature_group_version, stdout=False, stderr=False)
Returns logs for materialized feature group version.
- Parameters
- Returns
A function logs.
- Return type
- list_feature_group_versions(self, feature_group_id, limit=100, start_after_version=None)
Retrieves a list of all feature group versions for the specified feature group.
- Parameters
- Returns
An array of feature group version.
- Return type
- describe_feature_group_version(self, feature_group_version)
Get a specific feature group version.
- Parameters
feature_group_version (str) – The unique ID associated with the feature group version.
- Returns
A feature group version.
- Return type
- upload_part(self, upload_id, part_number, part_data)
Uploads a part of a large dataset file from your bucket to our system. Our system currently supports a size of up to 5GB for a part of a full file and a size of up to 5TB for the full file. Note that each part must be >=5MB in size, unless it is the last part in the sequence of parts for the full file.
- Parameters
upload_id (str) – A unique identifier for this upload
part_number (int) – The 1-indexed number denoting the position of the file part in the sequence of parts for the full file.
part_data (io.TextIOBase) – The multipart/form-data for the current part of the full file.
- Returns
The object ‘UploadPart’ which encapsulates the hash and the etag for the part that got uploaded.
- Return type
- mark_upload_complete(self, upload_id)
Marks an upload process as complete.
- create_dataset_from_file_connector(self, name, table_name, location, file_format=None, refresh_schedule=None, csv_delimiter=None, filename_column=None, start_prefix=None, until_prefix=None, location_date_format=None, date_format_lookback_days=None, merge_config=None)
Creates a dataset from a file located in a cloud storage, such as Amazon AWS S3, using the specified dataset name and location.
- Parameters
name (str) – The name for the dataset.
table_name (str) – Organization-unique table name or the name of the feature group table to create using the source table.
location (str) – The URI location format of the dataset source. The URI location format needs to be specified to match the location_date_format when location_date_format is specified. Ex. Location = s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/* when The URI location format needs to include both the start_prefix and until_prefix when both are specified. Ex. Location s3://bucket1/dir1/* includes both s3://bucket1/dir1/dir2/event_date=2021-08-02/* and s3://bucket1/dir1/dir2/event_date=2021-08-08/*
file_format (str) – The file format of the dataset.
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.
filename_column (str) – Adds a new column to the dataset with the external URI path.
start_prefix (str) – The start prefix (inclusive) for a range based search on a cloud storage location URI.
until_prefix (str) – The end prefix (exclusive) for a range based search on a cloud storage location URI.
location_date_format (str) – The date format in which the data is partitioned in the cloud storage location. E.g., if the data is partitioned as s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/dir4/filename.parquet, then the location_date_format is YYYY-MM-DD This format needs to be consistent across all files within the specified location.
date_format_lookback_days (int) – The number of days to look back from the current day for import locations that are date partitioned. E.g., import date, 2021-06-04, with date_format_lookback_days = 3 will retrieve data for all the dates in the range [2021-06-02, 2021-06-04].
merge_config (dict) – Struct for specifying an incremental dataset and the merge rule associated with it.
- Returns
The dataset created.
- Return type
- create_dataset_version_from_file_connector(self, dataset_id, location=None, file_format=None, csv_delimiter=None)
Creates a new version of the specified dataset.
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
location (str) – A new external URI to import the dataset from. If not specified, the last location will be used.
file_format (str) – The fileFormat to be used. If not specified, the service will try to detect the file format.
csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.
- Returns
The new Dataset Version created.
- Return type
- create_dataset_from_database_connector(self, name, table_name, database_connector_id, object_name=None, columns=None, query_arguments=None, refresh_schedule=None, sql_query=None)
Creates a dataset from a Database Connector
- Parameters
name (str) – The name for the dataset to be attached.
table_name (str) – Organization-unique table name
database_connector_id (str) – The Database Connector to import the dataset from
object_name (str) – If applicable, the name/id of the object in the service to query.
columns (str) – The columns to query from the external service object.
query_arguments (str) – Additional query arguments to filter the data
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override objectName, columns, and queryArguments
- Returns
The created dataset.
- Return type
- create_dataset_from_application_connector(self, name, table_name, application_connector_id, object_id=None, start_timestamp=None, end_timestamp=None, refresh_schedule=None)
Creates a dataset from an Application Connector
- Parameters
name (str) – The name for the dataset
table_name (str) – Organization-unique table name
application_connector_id (str) – The unique application connector to download data from
object_id (str) – If applicable, the id of the object in the service to query.
start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.
end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
- Returns
The created dataset.
- Return type
- create_dataset_version_from_database_connector(self, dataset_id, object_name=None, columns=None, query_arguments=None, sql_query=None)
Creates a new version of the specified dataset
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
object_name (str) – If applicable, the name/id of the object in the service to query. If not specified, the last name will be used.
columns (str) – The columns to query from the external service object. If not specified, the last columns will be used.
query_arguments (str) – Additional query arguments to filter the data. If not specified, the last arguments will be used.
sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override objectName, columns, and queryArguments
- Returns
The new Dataset Version created.
- Return type
- create_dataset_version_from_application_connector(self, dataset_id, object_id=None, start_timestamp=None, end_timestamp=None)
Creates a new version of the specified dataset
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
object_id (str) – If applicable, the id of the object in the service to query. If not specified, the last name will be used.
start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.
end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.
- Returns
The new Dataset Version created.
- Return type
- create_dataset_from_upload(self, name, table_name, file_format=None, csv_delimiter=None)
Creates a dataset and return an upload Id that can be used to upload a file.
- Parameters
- Returns
A refernce to be used when uploading file parts.
- Return type
- create_dataset_version_from_upload(self, dataset_id, file_format=None)
Creates a new version of the specified dataset using a local file upload.
- create_streaming_dataset(self, name, table_name, project_id=None, dataset_type=None)
Creates a streaming dataset. Use a streaming dataset if your dataset is receiving information from multiple sources over an extended period of time.
- Parameters
name (str) – The name for the dataset.
table_name (str) – The feature group table name to create for this dataset
project_id (str) – The project to create the streaming dataset for.
dataset_type (str) – The dataset has to be a type that is associated with the use case of your project. Please see (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for the datasetTypes that are supported per use case.
- Returns
The streaming dataset created.
- Return type
- snapshot_streaming_data(self, dataset_id)
Snapshots the current data in the streaming dataset for training.
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
- Returns
The new Dataset Version created.
- Return type
- set_dataset_column_data_type(self, dataset_id, column, data_type)
Set a column’s type in a specified dataset.
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
column (str) – The name of the column.
data_type (str) – The type of the data in the column. INTEGER, FLOAT, STRING, DATE, DATETIME, BOOLEAN, LIST, STRUCT Refer to the (guide on data types)[https://api.abacus.ai/app/help/class/DataType] for more information. Note: Some ColumnMappings will restrict the options or explicity set the DataType.
- Returns
The dataset and schema after the data_type has been set
- Return type
- create_dataset_from_streaming_connector(self, name, table_name, streaming_connector_id, streaming_args=None, refresh_schedule=None)
Creates a dataset from a Streaming Connector
- Parameters
name (str) – The name for the dataset to be attached.
table_name (str) – Organization-unique table name
streaming_connector_id (str) – The Streaming Connector to import the dataset from
streaming_args (dict) – Dict of arguments to read data from the streaming connector
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
- Returns
The created dataset.
- Return type
- set_streaming_retention_policy(self, dataset_id, retention_hours=None, retention_row_count=None)
Sets the streaming retention policy
- get_dataset_schema(self, dataset_id)
Retrieves the column schema of a dataset
- Parameters
dataset_id (str) – The Dataset schema to lookup.
- Returns
List of Column schema definitions
- Return type
- get_file_connector_instructions(self, bucket, write_permission=False)
Retrieves verification information to create a data connector to a cloud storage bucket.
- Parameters
- Returns
An object with full description of the cloud storage bucket authentication options and bucket policy. Returns an error message if the parameters are invalid.
- Return type
- list_database_connectors(self)
Retrieves a list of all of the database connectors along with all their attributes.
- Returns
The database Connector
- Return type
- list_file_connectors(self)
Retrieves a list of all connected services in the organization and their current verification status.
- Returns
An array of cloud storage buckets connected to the organization.
- Return type
- list_database_connector_objects(self, database_connector_id)
Lists querable objects in the database connector.
- get_database_connector_object_schema(self, database_connector_id, object_name=None)
Get the schema of an object in an database connector.
- rename_database_connector(self, database_connector_id, name)
Renames a Database Connector
- rename_application_connector(self, application_connector_id, name)
Renames an Application Connector
- verify_database_connector(self, database_connector_id)
Checks to see if Abacus.AI can access the database.
- Parameters
database_connector_id (str) – The unique identifier for the database connector.
- verify_file_connector(self, bucket)
Checks to see if Abacus.AI can access the bucket.
- Parameters
bucket (str) – The bucket to test.
- Returns
The Result of the verification.
- Return type
- delete_database_connector(self, database_connector_id)
Delete a database connector.
- Parameters
database_connector_id (str) – The unique identifier for the database connector.
- delete_application_connector(self, application_connector_id)
Delete a application connector.
- Parameters
application_connector_id (str) – The unique identifier for the application connector.
- delete_file_connector(self, bucket)
Removes a connected service from the specified organization.
- Parameters
bucket (str) – The fully qualified URI of the bucket to remove.
- list_application_connectors(self)
Retrieves a list of all of the application connectors along with all their attributes.
- Returns
The appplication Connector
- Return type
- list_application_connector_objects(self, application_connector_id)
Lists querable objects in the application connector.
- verify_application_connector(self, application_connector_id)
Checks to see if Abacus.AI can access the Application.
- Parameters
application_connector_id (str) – The unique identifier for the application connector.
- set_azure_blob_connection_string(self, bucket, connection_string)
Authenticates specified Azure Blob Storage bucket using an authenticated Connection String.
- Parameters
- Returns
An object with the roleArn and verification status for the specified bucket.
- Return type
- list_streaming_connectors(self)
Retrieves a list of all of the streaming connectors along with all their attributes.
- Returns
The streaming Connector
- Return type
- create_streaming_token(self)
Creates a streaming token for the specified project. Streaming tokens are used to authenticate requests to append data to streaming datasets.
- Returns
The streaming token.
- Return type
- list_streaming_tokens(self)
Retrieves a list of all streaming tokens along with their attributes.
- Returns
An array of streaming tokens.
- Return type
- delete_streaming_token(self, streaming_token)
Deletes the specified streaming token.
- Parameters
streaming_token (str) – The streaming token to delete.
- get_recent_feature_group_streamed_data(self, feature_group_id)
Returns recently streamed data to a streaming feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- list_uploads(self)
Lists all ongoing uploads in the organization
- Returns
An array of uploads.
- Return type
- describe_upload(self, upload_id)
Retrieves the current upload status (complete or inspecting) and the list of file parts uploaded for a specified dataset upload.
- list_datasets(self, limit=100, start_after_id=None, exclude_streaming=False)
Retrieves a list of all of the datasets in the organization.
- describe_dataset(self, dataset_id)
Retrieves a full description of the specified dataset, with attributes such as its ID, name, source type, etc.
- list_dataset_versions(self, dataset_id, limit=100, start_after_version=None)
Retrieves a list of all dataset versions for the specified dataset.
- Parameters
- Returns
A list of dataset versions.
- Return type
- attach_dataset_to_project(self, dataset_id, project_id, dataset_type)
[DEPRECATED] Attaches the dataset to the project.
Use this method to attach a dataset that is already in the organization to another project. The dataset type is required to let the AI engine know what type of schema should be used.
- Parameters
dataset_id (str) – The dataset to attach.
project_id (str) – The project to attach the dataset to.
dataset_type (str) – The dataset has to be a type that is associated with the use case of your project. Please see (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for the datasetTypes that are supported per use case.
- Returns
An array of columns descriptions.
- Return type
- remove_dataset_from_project(self, dataset_id, project_id)
[DEPRECATED] Removes a dataset from a project.
- rename_dataset(self, dataset_id, name)
Rename a dataset.
- delete_dataset(self, dataset_id)
Deletes the specified dataset from the organization.
The dataset cannot be deleted if it is currently attached to a project.
- Parameters
dataset_id (str) – The dataset to delete.
- get_training_config_options(self, project_id)
Retrieves the full description of the model training configuration options available for the specified project.
The configuration options available are determined by the use case associated with the specified project. Refer to the (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for more information on use cases and use case specific configuration options.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of options that can be specified when training a model in this project.
- Return type
- train_model(self, project_id, name=None, training_config={}, refresh_schedule=None)
Trains a model for the specified project.
Use this method to train a model in this project. This method supports user-specified training configurations defined in the getTrainingConfigOptions method.
- Parameters
project_id (str) – The unique ID associated with the project.
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.
training_config (dict) – The training config key/value pairs used to train this model.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model.
- Returns
The new model which is being trained.
- Return type
- create_model_from_python(self, project_id, function_source_code, train_function_name, predict_function_name, training_input_tables, name=None)
Initializes a new Model from user provided Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
project_id (str) – The unique ID associated with the project.
function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”
- Returns
The new model, which has not been trained.
- Return type
- list_models(self, project_id)
Retrieves the list of models in the specified project.
- describe_model(self, model_id)
Retrieves a full description of the specified model.
- rename_model(self, model_id, name)
Renames a model
- update_python_model(self, model_id, function_source_code=None, train_function_name=None, predict_function_name=None, training_input_tables=None)
Updates an existing python Model using user provided Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
model_id (str) – The unique ID associated with the Python model to be changed.
function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
- Returns
The updated model
- Return type
- set_model_training_config(self, model_id, training_config)
Edits the default model training config
- set_model_prediction_params(self, model_id, prediction_config)
Sets the model prediction config for the model
- get_model_metrics(self, model_id, model_version=None, baseline_metrics=False)
Retrieves a full list of the metrics for the specified model.
If only the model’s unique identifier (modelId) is specified, the latest trained version of model (modelVersion) is used.
- Parameters
- Returns
An object to show the model metrics and explanations for what each metric means.
- Return type
- list_model_versions(self, model_id, limit=100, start_after_version=None)
Retrieves a list of the version for a given model.
- Parameters
- Returns
An array of model versions.
- Return type
- retrain_model(self, model_id, deployment_ids=[])
Retrains the specified model. Gives you an option to choose the deployments you want the retraining to be deployed to.
- delete_model(self, model_id)
Deletes the specified model and all its versions. Models which are currently used in deployments cannot be deleted.
- Parameters
model_id (str) – The ID of the model to delete.
- delete_model_version(self, model_version)
Deletes the specified model version. Model Versions which are currently used in deployments cannot be deleted.
- Parameters
model_version (str) – The ID of the model version to delete.
- describe_model_version(self, model_version)
Retrieves a full description of the specified model version
- Parameters
model_version (str) – The unique version ID of the model version
- Returns
A model version.
- Return type
- get_training_logs(self, model_version, stdout=False, stderr=False)
Returns training logs for the model.
- Parameters
- Returns
A function logs.
- Return type
- create_model_monitor(self, project_id, training_feature_group_id=None, prediction_feature_group_id=None, name=None, refresh_schedule=None)
Runs a model monitor for the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
training_feature_group_id (str) – The unique ID of the training data feature group
prediction_feature_group_id (str) – The unique ID of the prediction data feature group
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model monitor
- Returns
The new model monitor that was created.
- Return type
- rerun_model_monitor(self, model_monitor_id)
Reruns the specified model monitor.
- Parameters
model_monitor_id (str) – The model monitor to rerun.
- Returns
The model monitor that is being rerun.
- Return type
- list_model_monitors(self, project_id)
Retrieves the list of models monitors in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of model monitors.
- Return type
- describe_model_monitor(self, model_monitor_id)
Retrieves a full description of the specified model monitor.
- Parameters
model_monitor_id (str) – The unique ID associated with the model monitor.
- Returns
The description of the model monitor.
- Return type
- list_model_monitor_versions(self, model_monitor_id, limit=100, start_after_version=None)
Retrieves a list of the versions for a given model monitor.
- Parameters
- Returns
An array of model monitor versions.
- Return type
- describe_model_monitor_version(self, model_monitor_version)
Retrieves a full description of the specified model monitor version
- Parameters
model_monitor_version (str) – The unique version ID of the model monitor version
- Returns
A model monitor version.
- Return type
- rename_model_monitor(self, model_monitor_id, name)
Renames a model monitor
- delete_model_monitor(self, model_monitor_id)
Deletes the specified model monitor and all its versions.
- Parameters
model_monitor_id (str) – The ID of the model monitor to delete.
- delete_model_monitor_version(self, model_monitor_version)
Deletes the specified model monitor version.
- Parameters
model_monitor_version (str) – The ID of the model monitor version to delete.
- get_model_monitoring_logs(self, model_monitor_version, stdout=False, stderr=False)
Returns monitoring logs for the model.
- Parameters
- Returns
A function logs.
- Return type
- get_drift_for_feature(self, model_monitor_version, feature_name)
Gets the feature drift associated with a single feature in an output feature group from a prediction.
- get_outliers_for_feature(self, model_monitor_version, feature_name=None)
Gets a list of outliers measured by a single feature (or overall) in an output feature group from a prediction.
- create_deployment(self, name=None, model_id=None, feature_group_id=None, project_id=None, description=None, calls_per_second=None, auto_deploy=True, start=True)
Creates a deployment with the specified name and description for the specified model or feature group.
A Deployment makes the trained model or feature group available for prediction requests.
- Parameters
name (str) – The name of the deployment.
model_id (str) – The unique ID associated with the model.
feature_group_id (str) – The unique ID associated with a feature group.
project_id (str) – The unique ID associated with a project.
description (str) – The description for the deployment.
calls_per_second (int) – The number of calls per second the deployment could handle.
auto_deploy (bool) – Flag to enable the automatic deployment when a new Model Version finishes training.
start (bool) –
- Returns
The new model or feature group deployment.
- Return type
- create_deployment_token(self, project_id)
Creates a deployment token for the specified project.
Deployment tokens are used to authenticate requests to the prediction APIs and are scoped on the project level.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
The deployment token.
- Return type
- describe_deployment(self, deployment_id)
Retrieves a full description of the specified deployment.
- Parameters
deployment_id (str) – The unique ID associated with the deployment.
- Returns
The description of the deployment.
- Return type
- list_deployments(self, project_id)
Retrieves a list of all deployments in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of deployments.
- Return type
- list_deployment_tokens(self, project_id)
Retrieves a list of all deployment tokens in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of deployment tokens.
- Return type
- update_deployment(self, deployment_id, description=None)
Updates a deployment’s description.
- rename_deployment(self, deployment_id, name)
Updates a deployment’s name and/or description.
- set_auto_deployment(self, deployment_id, enable=None)
Enable/Disable auto deployment for the specified deployment.
When a model is scheduled to retrain, deployments with this enabled will be marked to automatically promote the new model version. After the newly trained model completes, a check on its metrics in comparison to the currently deployed model version will be performed. If the metrics are comparable or better, the newly trained model version is automatically promoted. If not, it will be marked as a failed model version promotion with an error indicating poor metrics performance.
- set_deployment_model_version(self, deployment_id, model_version)
Promotes a Model Version to be served in the Deployment
- set_deployment_feature_group_version(self, deployment_id, feature_group_version)
Promotes a Feature Group Version to be served in the Deployment
- start_deployment(self, deployment_id)
Restarts the specified deployment that was previously suspended.
- Parameters
deployment_id (str) – The unique ID associated with the deployment.
- stop_deployment(self, deployment_id)
Stops the specified deployment.
- Parameters
deployment_id (str) – The Deployment ID
- delete_deployment(self, deployment_id)
Deletes the specified deployment. The deployment’s models will not be affected. Note that the deployments are not recoverable after they are deleted.
- Parameters
deployment_id (str) – The ID of the deployment to delete.
- delete_deployment_token(self, deployment_token)
Deletes the specified deployment token.
- Parameters
deployment_token (str) – The deployment token to delete.
- set_deployment_feature_group_export_file_connector_output(self, deployment_id, output_format=None, output_location=None)
Sets the export output for the Feature Group Deployment to be a file connector.
- set_deployment_feature_group_export_database_connector_output(self, deployment_id, database_connector_id=None, object_name=None, write_mode=None, database_feature_mapping=None, id_column=None)
Sets the export output for the Feature Group Deployment to be a Database connector.
- Parameters
deployment_id (str) – The deployment for which the export type is set
database_connector_id (str) – The database connector ID used
object_name (str) – The database connector’s object to write to
write_mode (str) – UPSERT or INSERT for writing to the database connector
database_feature_mapping (dict) – The column/feature pairs mapping the features to the database columns
id_column (str) – The id column to use as the upsert key
- remove_deployment_feature_group_export_output(self, deployment_id)
Removes the export type that is set for the Feature Group Deployment
- Parameters
deployment_id (str) – The deployment for which the export type is set
- create_refresh_policy(self, name, cron, refresh_type, project_id=None, dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], prediction_metric_ids=[])
Creates a refresh policy with a particular cron pattern and refresh type.
A refresh policy allows for the scheduling of a particular set of actions at regular intervals. This can be useful for periodically updated data which needs to be re-imported into the project for re-training.
- Parameters
name (str) – The name for the refresh policy
cron (str) – A cron-like string specifying the frequency of a refresh policy
refresh_type (str) – The Refresh Type is used to determine what is being refreshed, whether its a single dataset, or dataset and a model, or more.
project_id (str) – Optionally, a Project ID can be specified so that all datasets, models and deployments are captured at the instant this policy was created
dataset_ids (list) – Comma separated list of Dataset IDs
model_ids (list) – Comma separated list of Model IDs
deployment_ids (list) – Comma separated list of Deployment IDs
batch_prediction_ids (list) – Comma separated list of Batch Predictions
prediction_metric_ids (list) – Comma separated list of Prediction Metrics
- Returns
The refresh policy created
- Return type
- delete_refresh_policy(self, refresh_policy_id)
Delete a refresh policy
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- describe_refresh_policy(self, refresh_policy_id)
Retrieve a single refresh policy
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- Returns
A refresh policy object
- Return type
- describe_refresh_pipeline_run(self, refresh_pipeline_run_id)
Retrieve a single refresh pipeline run
- Parameters
refresh_pipeline_run_id (str) – The unique ID associated with this refresh pipeline_run
- Returns
A refresh pipeline run object
- Return type
- list_refresh_policies(self, project_id=None, dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], model_monitor_ids=[])
List the refresh policies for the organization
- Parameters
project_id (str) – Optionally, a Project ID can be specified so that all datasets, models and deployments are captured at the instant this policy was created
dataset_ids (list) – Comma separated list of Dataset IDs
model_ids (list) – Comma separated list of Model IDs
deployment_ids (list) – Comma separated list of Deployment IDs
batch_prediction_ids (list) – Comma separated list of Batch Predictions
model_monitor_ids (list) – Comma separated list of Model Monitor IDs.
- Returns
List of all refresh policies in the organization
- Return type
- list_refresh_pipeline_runs(self, refresh_policy_id)
List the the times that the refresh policy has been run
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- Returns
A list of refresh pipeline runs for the given refresh policy id
- Return type
- pause_refresh_policy(self, refresh_policy_id)
Pauses a refresh policy
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- resume_refresh_policy(self, refresh_policy_id)
Resumes a refresh policy
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- run_refresh_policy(self, refresh_policy_id)
Force a run of the refresh policy.
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- update_refresh_policy(self, refresh_policy_id, name=None, cron=None)
Update the name or cron string of a refresh policy
- Parameters
- Returns
The updated refresh policy
- Return type
- lookup_features(self, deployment_token, deployment_id, query_data={})
Returns the feature group deployed in the feature store project.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict(self, deployment_token, deployment_id, query_data={})
Returns a prediction for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_multiple(self, deployment_token, deployment_id, query_data={})
Returns a list of predictions for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (list) – This will be a list of dictionaries where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_from_datasets(self, deployment_token, deployment_id, query_data={})
Returns a list of predictions for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the source dataset name and ‘Value’ will be a list of records corresponding to the dataset rows
- Return type
Dict
- predict_lead(self, deployment_token, deployment_id, query_data)
Returns the probability of a user to be a lead on the basis of his/her interaction with the service/product and user’s own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of click, items in cart, etc.).
- Return type
Dict
- predict_churn(self, deployment_token, deployment_id, query_data)
Returns a probability of a user to churn out in response to his/her interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_takeover(self, deployment_token, deployment_id, query_data)
Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing account activity characteristics (e.g. login id, login duration, login type, ip address, etc.).
- Return type
Dict
- predict_fraud(self, deployment_token, deployment_id, query_data)
Returns a probability of a transaction performed under a specific account as being a fraud or not. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).
- Return type
Dict
- predict_class(self, deployment_token, deployment_id, query_data={}, threshold=None, threshold_class=None, explain_predictions=False, fixed_features=None, nested=None)
Returns a prediction for regression classification
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
threshold (float) – float value that is applied on the popular class label.
threshold_class (str) – label upon which the threshold is added (Binary labels only)
explain_predictions (bool) – If true, returns the SHAP explanations for all input features.
fixed_features (list) – Set of input features to treat as constant for explanations.
nested (str) – If specified generates prediction delta for each index of the specified nested feature.
- Return type
Dict
- predict_target(self, deployment_token, deployment_id, query_data={}, explain_predictions=False, fixed_features=None, nested=None)
Returns a prediction from a classification or regression model. Optionally, includes explanations.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
explain_predictions (bool) – If true, returns the SHAP explanations for all input features.
fixed_features (list) – Set of input features to treat as constant for explanations.
nested (str) – If specified generates prediction delta for each index of the specified nested feature.
- Return type
Dict
- get_anomalies(self, deployment_token, deployment_id, threshold=None, histogram=False)
Returns a list of anomalies from the training dataset
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.
histogram (bool) – If True, will return a histogram of the distribution of all points
- Return type
- is_anomaly(self, deployment_token, deployment_id, query_data=None)
Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – The input data for the prediction.
- Return type
Dict
- get_forecast(self, deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None)
Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.
future_data (dict) – This will be a dictionary of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). The key and the value both will be of type ‘String’. For example future data entered for a Store may be {“Holiday”:”No”, “Promo”:”Yes”}.
num_predictions (int) – The number of timestamps to predict in the future.
prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).
- Return type
Dict
- get_k_nearest(self, deployment_token, deployment_id, vector, k=None, distance=None, include_score=False)
Returns the k nearest neighbors for the provided embedding vector.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
vector (list) – Input vector to perform the k nearest neighbors with.
k (int) – Overrideable number of items to return
distance (str) – Specify the distance function to use when finding nearest neighbors
include_score (bool) – If True, will return the score alongside the resulting embedding value
- Return type
Dict
- get_multiple_k_nearest(self, deployment_token, deployment_id, queries)
Returns the k nearest neighbors for the queries provided
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
queries (list) – List of Mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters
- get_labels(self, deployment_token, deployment_id, query_data, threshold=0.5)
Returns a list of scored labels from
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.
threshold (float) – The minimum output probability that a predicted label must have in order for us to report it.
- Return type
Dict
- get_recommendations(self, deployment_token, deployment_id, query_data, num_items=50, page=1, exclude_item_ids=[], score_field='', scaling_factors=[], restrict_items=[], exclude_items=[], explore_fraction=0.0)
Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
exclude_item_ids (list) – [DEPRECATED]
score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.
explore_fraction (float) – The fraction of recommendations that is to be new items.
- Return type
Dict
- get_personalized_ranking(self, deployment_token, deployment_id, query_data, preserve_ranks=[], scaling_factors=[])
Returns a list of items with personalized promotions on them for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
- Return type
Dict
- get_ranked_items(self, deployment_token, deployment_id, query_data, preserve_ranks=[], scaling_factors=[])
Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
- Return type
Dict
Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.
- Return type
Dict
- get_feature_group_rows(self, deployment_token, deployment_id, query_data)
- get_search_results(self, deployment_token, deployment_id, query_data)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.
- Return type
Dict
- get_sentiment(self, deployment_token, deployment_id, document)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
document (str) – # TODO
- Returns
Dict of labels and their probabilities
- Return type
- predict_language(self, deployment_token, deployment_id, query_data)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (str) – # TODO
- Returns
Dict of labels and their probabilities
- Return type
- create_batch_prediction(self, deployment_id, table_name=None, name=None, global_prediction_args=None, explanations=False, output_format=None, output_location=None, database_connector_id=None, database_output_config=None, refresh_schedule=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None)
Creates a batch prediction job description for the given deployment.
- Parameters
deployment_id (str) – The unique identifier to a deployment.
table_name (str) – If specified, the name of the feature group table to write the results of the batch prediction. Can only be specified iff outputLocation and databaseConnectorId are not specified. If table_name is specified, the outputType will be enforced as CSV
name (str) – The name of batch prediction job.
global_prediction_args (dict) – Argument(s) to pass on every prediction call.
explanations (bool) – If true, will provide SHAP Explanations for each prediction, if supported by the use case.
output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON)
output_location (str) – If specified, the location to write the prediction results. Otherwise, results will be stored in Abacus.AI.
database_connector_id (str) – The unique identifier of an Database Connection to write predictions to. Cannot be specified in conjunction with outputLocation.
database_output_config (dict) – A key-value pair of columns/values to write to the database connector. Only available if databaseConnectorId is specified.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically run the batch prediction.
csv_input_prefix (str) – A prefix to prepend to the input columns, only applies when output format is CSV
csv_prediction_prefix (str) – A prefix to prepend to the prediction columns, only applies when output format is CSV
csv_explanations_prefix (str) – A prefix to prepend to the explanation columns, only applies when output format is CSV
- Returns
The batch prediction description.
- Return type
- start_batch_prediction(self, batch_prediction_id)
Creates a new batch prediction version job for a given batch prediction job description
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction to create a new version of
- Returns
The batch prediction version started by this method call.
- Return type
- download_batch_prediction_result_chunk(self, batch_prediction_version, offset=0, chunk_size=10485760)
Returns a stream containing the batch prediction results
- Parameters
- Return type
- get_batch_prediction_connector_errors(self, batch_prediction_version)
Returns a stream containing the batch prediction database connection write errors, if any writes failed to the database connector
- Parameters
batch_prediction_version (str) – The unique identifier of the batch prediction job to get the errors for
- Return type
- list_batch_predictions(self, project_id)
Retrieves a list for the batch predictions in the project
- Parameters
project_id (str) – The unique identifier of the project
- Returns
A list of batch prediction jobs.
- Return type
- describe_batch_prediction(self, batch_prediction_id)
Describes the batch prediction
- Parameters
batch_prediction_id (str) – The unique ID associated with the batch prediction.
- Returns
The batch prediction description.
- Return type
- list_batch_prediction_versions(self, batch_prediction_id, limit=100, start_after_version=None)
Retrieves a list of versions of a given batch prediction
- Parameters
- Returns
A list of batch prediction versions.
- Return type
- describe_batch_prediction_version(self, batch_prediction_version)
Describes a batch prediction version
- Parameters
batch_prediction_version (str) – The unique identifier of the batch prediction version
- Returns
The batch prediction version.
- Return type
- update_batch_prediction(self, batch_prediction_id, deployment_id=None, global_prediction_args=None, explanations=None, output_format=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None)
Updates a batch prediction job description
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction.
deployment_id (str) – The unique identifier to a deployment.
global_prediction_args (dict) – Argument(s) to pass on every prediction call.
explanations (bool) – If true, will provide SHAP Explanations for each prediction, if supported by the use case.
output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).
csv_input_prefix (str) – A prefix to prepend to the input columns, only applies when output format is CSV
csv_prediction_prefix (str) – A prefix to prepend to the prediction columns, only applies when output format is CSV
csv_explanations_prefix (str) – A prefix to prepend to the explanation columns, only applies when output format is CSV
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_file_connector_output(self, batch_prediction_id, output_format=None, output_location=None)
Updates the file connector output configuration of the batch prediction
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction.
output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).
output_location (str) – If specified, the location to write the prediction results. Otherwise, results will be stored in Abacus.AI.
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_database_connector_output(self, batch_prediction_id, database_connector_id=None, database_output_config=None)
Updates the database connector output configuration of the batch prediction
- Parameters
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_feature_group_output(self, batch_prediction_id, table_name)
Creates a feature group and sets it to be the batch prediction output
- Parameters
- Returns
The batch prediction after the output has been applied
- Return type
- set_batch_prediction_output_to_console(self, batch_prediction_id)
Sets the batch prediction output to the console, clearing both the file connector and database connector config
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_dataset(self, batch_prediction_id, dataset_type, dataset_id=None)
[Deprecated] Sets the batch prediction input dataset. Only applicable for legacy dataset-based projects
- Parameters
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_feature_group(self, batch_prediction_id, feature_group_type, feature_group_id=None)
Sets the batch prediction input feature group.
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction
feature_group_type (str) – The feature group type to set. The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.
feature_group_id (str) – The feature group to set as input to the batch prediction
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_dataset_remap(self, batch_prediction_id, dataset_id_remap)
For the purpose of this batch prediction, will swap out datasets in the input feature groups
- Parameters
- Returns
Batch Prediction object
- Return type
- delete_batch_prediction(self, batch_prediction_id)
Deletes a batch prediction
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction
- add_user_item_interaction(self, streaming_token, dataset_id, timestamp, user_id, item_id, event_type, additional_attributes)
Adds a user-item interaction record (data row) to a streaming dataset.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the dataset.
dataset_id (str) – The streaming dataset to record data to.
timestamp (int) – The unix timestamp of the event.
user_id (str) – The unique identifier for the user.
item_id (list) – The unique identifier for the items
event_type (str) – The event type.
additional_attributes (dict) – Attributes of the user interaction.
- upsert_user_attributes(self, streaming_token, dataset_id, user_id, user_attributes)
Adds a user attributes record (data row) to a streaming dataset.
Either the streaming dataset ID or the project ID is required.
- upsert_item_attributes(self, streaming_token, dataset_id, item_id, item_attributes)
Adds an item attributes record (data row) to a streaming dataset.
Either the streaming dataset ID or the project ID is required.
- add_multiple_user_item_interactions(self, streaming_token, dataset_id, interactions)
Adds a user-item interaction record (data row) to a streaming dataset.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the dataset.
dataset_id (str) – The streaming dataset to record data to.
interactions (list) – List of interactions, each interaction of format {‘userId’: userId, ‘timestamp’: timestamp, ‘itemId’: itemId, ‘eventType’: eventType, ‘additionalAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}
- upsert_multiple_user_attributes(self, streaming_token, dataset_id, upserts)
Adds multiple user attributes records (data row) to a streaming dataset.
The streaming dataset ID is required.
- upsert_multiple_item_attributes(self, streaming_token, dataset_id, upserts)
Adds multiple item attributes records (data row) to a streaming dataset.
The streaming dataset ID is required.
- upsert_item_embeddings(self, streaming_token, model_id, item_id, vector, catalog_id=None)
Upserts an embedding vector for an item id for a model_id.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – The model id to upsert item embeddings to.
item_id (str) – The item id for which its embeddings will be upserted.
vector (list) – The embedding vector.
catalog_id (str) – Optional name to specify which catalog in a model to update.
- delete_item_embeddings(self, streaming_token, model_id, item_ids, catalog_id=None)
Deletes knn embeddings for a list of item ids for a model_id.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – The model id to delete item embeddings from.
item_ids (list) – A list of item ids for which its embeddings will be deleted.
catalog_id (str) – Optional name to specify which catalog in a model to update.
- upsert_multiple_item_embeddings(self, streaming_token, model_id, upserts, catalog_id=None)
Upserts a knn embedding for multiple item ids for a model_id.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – The model id to upsert item embeddings to.
upserts (list) – A list of {‘itemId’: …, ‘vector’: […]} dicts for each upsert.
catalog_id (str) – Optional name to specify which catalog in a model to update.
- upsert_data(self, feature_group_id, streaming_token, data)
Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.
- append_data(self, feature_group_id, streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.