abacusai

Submodules

Package Contents

Classes

ApiClient

Abacus.AI API Client

ClientOptions

Options for configuring the ApiClient

ReadOnlyClient

Abacus.AI Read Only API Client. Only contains GET methods

PredictionClient

Abacus.AI Prediction API Client. Does not utilize authentication and only contains public prediction methods

Attributes

__version__

class abacusai.ApiClient(api_key=None, server=None, client_options=None, skip_version_check=False)

Bases: ReadOnlyClient

Abacus.AI API Client

Parameters:
  • api_key (str) – The api key to use as authentication to the server

  • server (str) – The base server url to use to send API requets to

  • client_options (ClientOptions) – Optional API client configurations

  • skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client

create_dataset_from_pandas(feature_group_table_name, df, clean_column_names=False)

[Deprecated] Creates a Dataset from a pandas dataframe

Parameters:
  • feature_group_table_name (str) – The table name to assign to the feature group created by this call

  • df (pandas.DataFrame) – The dataframe to upload

  • clean_column_names (bool) – If true, the dataframe’s column names will be automatically cleaned to be complaint with Abacus.AI’s column requirements. Otherwise it will raise a ValueError.

Returns:

The dataset object created

Return type:

Dataset

create_dataset_version_from_pandas(table_name_or_id, df, clean_column_names=False)

[Deprecated] Updates an existing dataset from a pandas dataframe

Parameters:
  • table_name_or_id (str) – The table name of the feature group or the ID of the dataset to update

  • df (pandas.DataFrame) – The dataframe to upload

  • clean_column_names (bool) – If true, the dataframe’s column names will be automatically cleaned to be complaint with Abacus.AI’s column requirements. Otherwise it will raise a ValueError.

Returns:

The dataset updated

Return type:

Dataset

create_feature_group_from_pandas_df(table_name, df, clean_column_names=False)

Create a Feature Group from a local Pandas DataFrame.

Parameters:
  • table_name (str) – The table name to assign to the feature group created by this call

  • df (pandas.DataFrame) – The dataframe to upload

  • clean_column_names (bool) – If true, the dataframe’s column names will be automatically cleaned to be complaint with Abacus.AI’s column requirements. Otherwise it will raise a ValueError.

Return type:

abacusai.feature_group.FeatureGroup

update_feature_group_from_pandas_df(table_name, df, clean_column_names=False)

Updates a DATASET Feature Group from a local Pandas DataFrame.

Parameters:
  • table_name (str) – The table name to assign to the feature group created by this call

  • df (pandas.DataFrame) – The dataframe to upload

  • clean_column_names (bool) – If true, the dataframe’s column names will be automatically cleaned to be complaint with Abacus.AI’s column requirements. Otherwise it will raise a ValueError.

Return type:

abacusai.feature_group.FeatureGroup

create_feature_group_from_spark_df(table_name, df)

Create a Feature Group from a local Spark DataFrame.

Parameters:
  • df (pyspark.sql.DataFrame) – The dataframe to upload

  • table_name (str) – The table name to assign to the feature group created by this call

Return type:

abacusai.feature_group.FeatureGroup

update_feature_group_from_spark_df(table_name, df)

Create a Feature Group from a local Spark DataFrame.

Parameters:
  • df (pyspark.sql.DataFrame) – The dataframe to upload

  • table_name (str) – The table name to assign to the feature group created by this call

  • should_wait_for_upload (bool) – Wait for dataframe to upload before returning. Some FeatureGroup methods, like materialization, may not work until upload is complete.

  • timeout (int, optional) – If waiting for upload, time out after this limit.

Return type:

abacusai.feature_group.FeatureGroup

create_spark_df_from_feature_group_version(session, feature_group_version)

Create a Spark Dataframe in the provided Spark Session context, for a materialized Abacus Feature Group Version.

Parameters:
  • session (pyspark.sql.SparkSession) – Spark session

  • feature_group_version (str) – Feature group version to load from

Returns:

pyspark.sql.DataFrame

create_model_from_functions(project_id, train_function, predict_function=None, training_input_tables=None, predict_many_function=None, initialize_function=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False)

Creates a model from a python function

Parameters:
  • project_id (str) – The project to create the model in

  • train_function (callable) – The training fucntion callable to serialize and upload

  • predict_function (callable) – The predict function callable to serialize and upload

  • predict_many_function (callable) – The predict many function callable to serialize and upload

  • initialize_function (callable) – The initialize function callable to serialize and upload

  • training_input_tables (list) – The input table names of the feature groups to pass to the train function

  • cpu_size (str) – Size of the cpu for the training function

  • memory (int) – Memory (in GB) for the training function

  • training_config (dict) –

  • exclusive_run (bool) –

create_feature_group_from_python_function(function, table_name, input_tables=None, python_function_name=None, python_function_bindings=None, cpu_size=None, memory=None)

Creates a feature group from a python function

Parameters:
  • function (callable) – The function callable for the feature group

  • table_name (str) – The table name to give the feature group

  • input_tables (list) – The input table names of the feature groups as input to the feature group function

  • python_function_name (str) – The name of the python function to create a feature group from.

  • python_function_bindings (List<PythonFunctionArguments>) – List of python function arguments

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

update_python_function_code(name, function=None, function_variable_mappings=None)

Update custom python function with user inputs for the given python function.

Parameters:
  • name (String) – The unique name to identify the python function in an organization.

  • function (callable) – The function callable to serialize and upload.

  • function_variable_mappings (List<PythonFunctionArguments>) – List of python function arguments

Returns:

The python_function object.

Return type:

PythonFunction

create_algorithm_from_function(name, problem_type, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function=None, predict_function=None, predict_many_function=None, initialize_function=None, config_options=None, is_default_enabled=False, project_id=None, use_gpu=False)

Create a new algorithm, or update existing algorithm if the name already exists

Parameters:
  • name (String) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed

  • problem_type (Enum string) – The type of the problem this algorithm will work on

  • train_function (callable) – The training function callable to serialize and upload

  • predict_function (callable) – The predict function callable to serialize and upload

  • predict_many_function (callable) – The predict many function callable to serialize and upload

  • initialize_function (callable) – The initialize function callable to serialize and upload

  • training_data_parameter_names_mapping (Dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name (string) – The train config parameter name in the train function

  • config_options (Dict) – Map dataset types and configs to train function parameter names

  • is_default_enabled (bool) – Whether train with the algorithm by default

  • project_id (Unique String Identifier) – The unique version ID of the project

  • use_gpu (Boolean) – Whether this algorithm needs to run on GPU

update_algorithm_from_function(algorithm, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function=None, predict_function=None, predict_many_function=None, initialize_function=None, config_options=None, is_default_enabled=None, use_gpu=None)

Create a new algorithm, or update existing algorithm if the name already exists

Parameters:
  • algorithm (String) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed

  • train_function (callable) – The training fucntion callable to serialize and upload

  • predict_function (callable) – The predict function callable to serialize and upload

  • predict_many_function (callable) – The predict many function callable to serialize and upload

  • initialize_function (callable) – The initialize function callable to serialize and upload

  • training_data_parameter_names_mapping (Dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name (string) – The train config parameter name in the train function

  • config_options (Dict) – Map dataset types and configs to train function parameter names

  • is_default_enabled (Boolean) – Whether train with the algorithm by default

  • use_gpu (Boolean) – Whether this algorithm needs to run on GPU

get_train_function_input(project_id, training_table_names=None, training_data_parameter_name_override=None, training_config_parameter_name_override=None, training_config=None, custom_algorithm_config=None)

Get the input data for the train function to test locally.

Parameters:
  • project_id (String) – The id of the project

  • training_table_names (List) – A list of feature group tables used for training

  • training_data_parameter_name_override (Dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name_override (String) – The train config parameter name in the train function

  • training_config (Dict) – A dictionary for Abacus.AI defined training options and values

  • custom_algorithm_config (Any) – User-defined config that can be serialized by JSON

Returns:

A dictionary that maps train function parameter names to their values.

create_custom_loss_function(name, loss_function_type, loss_function)

Registers a new custom loss function which can be used as an objective function during model training.

Parameters:
  • name (String) – A name for the loss. Should be unique per organization. Limit - 50 chars. Only underscores, numbers, uppercase alphabets allowed

  • loss_function_type (String) – The category of problems that this loss would be applicable to. Ex - REGRESSION_DL_TF, CLASSIFICATION_DL_TF, etc.

  • loss_function (Callable) – A python functor which can take required arguments (Ex - (y_true, y_pred)) and returns loss value(s) (Ex - An array of loss values of size batch size)

Returns:

A description of the registered custom loss function

Return type:

CustomLossFunction

Raises:
  • InvalidParameterError – If either loss function name or type or the passed function is invalid/incompatible

  • AlreadyExistsError – If the loss function with the same name already exists in the organization

update_custom_loss_function(name, loss_function)

Updates a previously registered custom loss function with a new function implementation.

Parameters:
  • name (String) – name of the registered custom loss.

  • loss_function (Callable) – A python functor which can take required arguments (Ex - (y_true, y_pred)) and returns loss value(s) (Ex - An array of loss values of size batch size)

Returns:

A description of the updated custom loss function

Return type:

CustomLossFunction

Raises:
  • InvalidParameterError – If either loss function name or type or the passed function is invalid/incompatible

  • DataNotFoundError – If a loss function with given name is not found in the organization

add_user_to_organization(email)

Invites a user to your organization. This method will send the specified email address an invitation link to join your organization.

Parameters:

email (str) – The email address to invite to your Organization.

create_organization_group(group_name, permissions, default_group=False)

Creates a new Organization Group.

Parameters:
  • group_name (str) – The name of the group

  • permissions (list) – The list of permissions to initialize the group with

  • default_group (bool) – If true, this group will replace the current default group

Returns:

Information about the created Organization Group

Return type:

OrganizationGroup

add_organization_group_permission(organization_group_id, permission)

Adds a permission to the specified Organization Group

Parameters:
  • organization_group_id (str) – The ID of the Organization Group

  • permission (str) – The permission to add to the Organization Group

remove_organization_group_permission(organization_group_id, permission)

Removes a permission from the specified Organization Group

Parameters:
  • organization_group_id (str) – The ID of the Organization Group

  • permission (str) – The permission to remove from the Organization Group

delete_organization_group(organization_group_id)

Deletes the specified Organization Group from the organization.

Parameters:

organization_group_id (str) – The ID of the Organization Group

add_user_to_organization_group(organization_group_id, email)

Adds a user to the specified Organization Group

Parameters:
  • organization_group_id (str) – The ID of the Organization Group

  • email (str) – The email of the user that is added to the group

remove_user_from_organization_group(organization_group_id, email)

Removes a user from an Organization Group

Parameters:
  • organization_group_id (str) – The ID of the Organization Group

  • email (str) – The email of the user to remove

set_default_organization_group(organization_group_id)

Sets the default Organization Group that all new users that join an organization are automatically added to

Parameters:

organization_group_id (str) – The ID of the Organization Group

delete_api_key(api_key_id)

Delete a specified API Key. You can use the “listApiKeys” method to find the list of all API Key IDs.

Parameters:

api_key_id (str) – The ID of the API key to delete.

remove_user_from_organization(email)

Removes the specified user from the Organization. You can remove yourself, Otherwise you must be an Organization Administrator to use this method to remove other users from the organization.

Parameters:

email (str) – The email address of the user to remove from the Organization.

create_deployment_webhook(deployment_id, endpoint, webhook_event_type, payload_template=None)

Create a webhook attached to a given deployment id.

Parameters:
  • deployment_id (str) – ID of the deployment this webhook will attach to.

  • endpoint (str) – URI that the webhook will send HTTP POST requests to.

  • webhook_event_type (str) – One of ‘DEPLOYMENT_START’, ‘DEPLOYMENT_SUCCESS’, ‘DEPLOYMENT_FAILED’

  • payload_template (dict) – Template for the body of the HTTP POST requests. Defaults to {}.

Returns:

The Webhook attached to the deployment

Return type:

Webhook

update_webhook(webhook_id, endpoint=None, webhook_event_type=None, payload_template=None)

Update the webhook associated with a given webhook id.

Parameters:
  • webhook_id (str) – ID of the webhook to be updated.

  • endpoint (str) – If set, changes the webhook’s endpoint.

  • webhook_event_type (str) – If set, changes event type.

  • payload_template (dict) – If set, changes payload template.

delete_webhook(webhook_id)

Delete the webhook with a given id.

Parameters:

webhook_id (str) – ID of target webhook.

create_project(name, use_case)

Creates a project with your specified project name and use case. Creating a project creates a container for all of the datasets and the models that are associated with a particular problem/project that you would like to work on. For example, if you want to create a model to detect fraud, you have to first create a project, upload datasets, create feature groups, and then create one or more models to get predictions for your use case.

Parameters:
  • name (str) – The project’s name

  • use_case (str) – The use case that the project solves. You can refer to our (guide on use cases)[https://api.abacus.ai/app/help/useCases] for further details of each use case. The following enums are currently available for you to choose from: LANGUAGE_DETECTION, NLP_SENTIMENT, NLP_QA, NLP_SEARCH, NLP_SENTENCE_BOUNDARY_DETECTION, NLP_CLASSIFICATION, NLP_SUMMARIZATION, NLP_DOCUMENT_VISUALIZATION, EMBEDDINGS_ONLY, MODEL_WITH_EMBEDDINGS, TORCH_MODEL, TORCH_MODEL_WITH_EMBEDDINGS, PYTHON_MODEL, NOTEBOOK_PYTHON_MODEL, DOCKER_MODEL, DOCKER_MODEL_WITH_EMBEDDINGS, CUSTOMER_CHURN, ENERGY, FINANCIAL_METRICS, CUMULATIVE_FORECASTING, FRAUD_ACCOUNT, FRAUD_THREAT, FRAUD_TRANSACTIONS, OPERATIONS_CLOUD, CLOUD_SPEND, TIMESERIES_ANOMALY_DETECTION, OPERATIONS_MAINTENANCE, OPERATIONS_INCIDENT, PERS_PROMOTIONS, PREDICTING, FEATURE_STORE, RETAIL, SALES_FORECASTING, SALES_SCORING, FEED_RECOMMEND, USER_RANKINGS, NAMED_ENTITY_RECOGNITION, USER_ITEM_AFFINITY, USER_RECOMMENDATIONS, USER_RELATED, VISION, FEATURE_DRIFT, SCHEDULING, GENERIC_FORECASTING.

Returns:

This object represents the newly created project. For details refer to

Return type:

Project

rename_project(project_id, name)

This method renames a project after it is created.

Parameters:
  • project_id (str) – The unique ID for the project.

  • name (str) – The new name for the project.

delete_project(project_id)

Deletes a specified project from your organization.

This method deletes the project, trained models and deployments in the specified project. The datasets attached to the specified project remain available for use with other projects in the organization.

This method will not delete a project that contains active deployments. Be sure to stop all active deployments before you use the delete option.

Note: All projects, models, and deployments cannot be recovered once they are deleted.

Parameters:

project_id (str) – The unique ID of the project to delete.

add_feature_group_to_project(feature_group_id, project_id, feature_group_type='CUSTOM_TABLE', feature_group_use=None)

Adds a feature group to a project,

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

  • feature_group_type (str) – The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.

  • feature_group_use (str) – The user assigned feature group use which allows for organizing project feature groups DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT

remove_feature_group_from_project(feature_group_id, project_id)

Removes a feature group from a project.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

set_feature_group_type(feature_group_id, project_id, feature_group_type='CUSTOM_TABLE')

Update the feature group type in a project. The feature group must already be added to the project.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

  • feature_group_type (str) – The feature group type to set the feature group as. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.

use_feature_group_for_training(feature_group_id, project_id, use_for_training=True)

Use the feature group for model training input

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

  • use_for_training (bool) – Boolean variable to include or exclude a feature group from a model’s training. Only one feature group per type can be used for training

set_feature_mapping(project_id, feature_group_id, feature_name, feature_mapping, nested_column_name=None)

Set a column’s feature mapping. If the column mapping is single-use and already set in another column in this feature group, this call will first remove the other column’s mapping and move it to this column.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature_name (str) – The name of the feature.

  • feature_mapping (str) – The mapping of the feature in the feature group.

  • nested_column_name (str) – The name of the nested column.

Returns:

A list of objects that describes the resulting feature group’s schema after the feature’s featureMapping is set.

Return type:

Feature

set_column_data_type(project_id, dataset_id, column, data_type)

Set a dataset’s column type.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • dataset_id (str) – The unique ID associated with the dataset.

  • column (str) – The name of the column.

  • data_type (str) – The type of the data in the column. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE, MULTICATEGORICAL_LIST, COORDINATE_LIST, NUMERICAL_LIST, TIMESTAMP_LIST Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some ColumnMappings will restrict the options or explicitly set the DataType.

Returns:

A list of objects that describes the resulting dataset’s schema after the column’s dataType is set.

Return type:

Schema

set_column_mapping(project_id, dataset_id, column, column_mapping)

Set a dataset’s column mapping. If the column mapping is single-use and already set in another column in this dataset, this call will first remove the other column’s mapping and move it to this column.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • dataset_id (str) – The unique ID associated with the dataset.

  • column (str) – The name of the column.

  • column_mapping (str) – The mapping of the column in the dataset. See a list of columns mapping enums here.

Returns:

A list of columns that describes the resulting dataset’s schema after the column’s columnMapping is set.

Return type:

Schema

remove_column_mapping(project_id, dataset_id, column)

Removes a column mapping from a column in the dataset. Returns a list of all columns with their mappings once the change is made.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • dataset_id (str) – The unique ID associated with the dataset.

  • column (str) – The name of the column.

Returns:

A list of objects that describes the resulting dataset’s schema after the column’s columnMapping is set.

Return type:

Schema

add_annotation(annotation, feature_group_id, feature_name, doc_id=None, feature_group_row_identifier=None, annotation_source='ui')

Add an annotation entry to the database.

Parameters:
  • annotation (dict) – The annotation to add. Format of the annotation is determined by its annotation type.

  • feature_group_id (str) – The ID of the feature group the annotation is on.

  • feature_name (str) – The name of the feature the annotation is on.

  • doc_id (str) – The ID of the primary document the annotation is on.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group primary key value.

  • annotation_source (str) – Indicator of whether the annotation came from the UI, bulk upload, etc.

Returns:

The annotation entry that was added

Return type:

AnnotationEntry

describe_annotation(feature_group_id, feature_name=None, doc_id=None, feature_group_row_identifier=None)

Get the latest annotation entry for a given feature group, feature, and document.

Parameters:
  • feature_group_id (str) – The ID of the feature group the annotation is on.

  • feature_name (str) – The name of the feature the annotation is on.

  • doc_id (str) – The ID of the primary document the annotation is on.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the primary key value. At least one of the doc_id or key value must be provided so that the correct annotation can be identified.

Returns:

The latest annotation entry for the given feature group, feature, and document and/or annotation key value

Return type:

AnnotationEntry

create_feature_group(table_name, sql, description=None)

Creates a new feature group from a SQL statement.

Parameters:
  • table_name (str) – The unique name to be given to the feature group.

  • sql (str) – Input SQL statement for forming the feature group.

  • description (str) – The description about the feature group.

Returns:

The created feature group

Return type:

FeatureGroup

create_feature_group_from_template(table_name, feature_group_template_id, template_bindings=None, should_attach_feature_group_to_template=True, description=None)

Creates a new feature group from a SQL statement.

Parameters:
  • table_name (str) – The unique name to be given to the feature group.

  • feature_group_template_id (str) – template_info.template_sqlThe unique ID associated with the template that will be used to create this feature group.

  • template_bindings (list) – Variable bindings that override the template’s variable values.

  • should_attach_feature_group_to_template (bool) – Set to False to create a feature group but not leave it attached the template that created it.

  • description (str) – A user-friendly description of this feature group.

Returns:

The created feature group

Return type:

FeatureGroup

create_feature_group_from_function(table_name, function_source_code=None, function_name=None, input_feature_groups=None, description=None, cpu_size=None, memory=None, package_requirements=None, use_original_csv_names=False, python_function_name=None, python_function_bindings=None)

Creates a new feature in a Feature Group from user provided code. Code language currently supported is Python.

If a list of input feature groups are supplied, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.

This method expects the source code to be a valid language source file which contains a function. This function needs return a DataFrame when it is executed and this DataFrame will be used as the materialized version of this feature group table.

Parameters:
  • table_name (str) – The unique name to be given to the feature group.

  • function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • description (str) – The description for this feature group.

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

  • use_original_csv_names (bool) – Defaults to False, if set it uses the original column names for input feature groups from csv datasets.

  • python_function_name (str) – Name of Python Function that contains the source code and function arguments.

  • python_function_bindings (list) – List of arguments to be supplied to the function as parameters in the format [{‘name’: ‘function_argument’, ‘variable_type’: ‘FEATURE_GROUP’, ‘value’: ‘name_of_feature_group’}].

Returns:

The created feature group

Return type:

FeatureGroup

create_sampling_feature_group(feature_group_id, table_name, sampling_config, description=None)

Creates a new feature group defined as a sample of rows from another feature group.

For efficiency, sampling is approximate unless otherwise specified. (E.g. the number of rows may vary slightly from what was requested).

Parameters:
  • feature_group_id (str) – The unique ID associated with the pre-existing feature group that will be sampled by this new feature group. I.e. the input for sampling.

  • table_name (str) – The unique name to be given to this sampling feature group.

  • sampling_config (dict) – JSON object (aka map) defining the sampling method and its parameters.

  • description (str) – A human-readable description of this feature group.

Returns:

The created feature group.

Return type:

FeatureGroup

create_merge_feature_group(source_feature_group_id, table_name, merge_config, description=None)

Creates a new feature group defined as the union of other feature group versions.

Parameters:
  • source_feature_group_id (str) – ID corresponding to the dataset feature group that will have its versions merged into this feature group.

  • table_name (str) – The unique name to be given to this merge feature group.

  • merge_config (dict) – JSON object (aka map) defining the merging method and its parameters.

  • description (str) – A human-readable description of this feature group.

Returns:

The created feature group.

Return type:

FeatureGroup

create_transform_feature_group(source_feature_group_id, table_name, transform_config, description=None)

Creates a new feature group defined as a pre-defined transform on another feature group.

Parameters:
  • source_feature_group_id (str) – ID corresponding to the feature group that will have the transformation applied.

  • table_name (str) – The unique name to be given to this transform feature group.

  • transform_config (dict) – JSON object (aka map) defining the transform and its parameters.

  • description (str) – A human-readable description of this feature group.

Returns:

The created feature group.

Return type:

FeatureGroup

create_snapshot_feature_group(feature_group_version, table_name)

Creates a Snapshot Feature Group corresponding to a specific feature group version.

Parameters:
  • feature_group_version (str) – The unique ID associated with the feature group version being snapshotted.

  • table_name (str) – The name for the newly created Snapshot Feature Group table.

Returns:

Feature Group corresponding to the newly created Snapshot.

Return type:

FeatureGroup

set_feature_group_sampling_config(feature_group_id, sampling_config)

Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.

Currently, sampling is only for Sampling FeatureGroups, so this API only allows calling on that kind of FeatureGroup.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • sampling_config (dict) – A json object string specifying the sampling method and parameters specific to that sampling method. Empty sampling_config means no sampling.

Returns:

The updated feature group.

Return type:

FeatureGroup

set_feature_group_merge_config(feature_group_id, merge_config)

Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • merge_config (dict) – A json object string specifying the merge rule. An empty mergeConfig will default to only including the latest Dataset Version.

Return type:

None

set_feature_group_transform_config(feature_group_id, transform_config)

Set a TransformFeatureGroup’s transform config to the values provided.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • transform_config (dict) – A json object string specifying the pre-defined transformation.

Return type:

None

set_feature_group_schema(feature_group_id, schema)

Creates a new schema and points the feature group to the new feature group schema id.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • schema (list) – An array of json objects with ‘name’ and ‘dataType’ properties.

create_feature(feature_group_id, name, select_expression)

Creates a new feature in a Feature Group from a SQL select statement

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • name (str) – The name of the feature to add

  • select_expression (str) – SQL select expression to create the feature

Returns:

A feature group object with the newly added feature.

Return type:

FeatureGroup

add_feature_group_tag(feature_group_id, tag)

Adds a tag to the feature group

Parameters:
  • feature_group_id (str) – The feature group

  • tag (str) – The tag to add to the feature group

remove_feature_group_tag(feature_group_id, tag)

Removes a tag from the feature group

Parameters:
  • feature_group_id (str) – The feature group

  • tag (str) – The tag to add to the feature group

add_annotatable_feature(feature_group_id, name, annotation_type)
Parameters:
  • feature_group_id (str) –

  • name (str) –

  • annotation_type (str) –

Returns:

None

Return type:

FeatureGroup

set_feature_as_annotatable_feature(feature_group_id, feature_name, annotation_type, feature_group_row_identifier_feature=None, doc_id_feature=None)
Parameters:
  • feature_group_id (str) –

  • feature_name (str) –

  • annotation_type (str) –

  • feature_group_row_identifier_feature (str) –

  • doc_id_feature (str) –

Returns:

None

Return type:

FeatureGroup

unset_feature_as_annotatable_feature(feature_group_id, feature_name)
Parameters:
  • feature_group_id (str) –

  • feature_name (str) –

Returns:

None

Return type:

FeatureGroup

add_feature_group_annotation_label(feature_group_id, label_name, annotation_type, label_definition=None)
Parameters:
  • feature_group_id (str) –

  • label_name (str) –

  • annotation_type (str) –

  • label_definition (str) –

Returns:

None

Return type:

FeatureGroup

remove_feature_group_annotation_label(feature_group_id, label_name)
Parameters:
  • feature_group_id (str) –

  • label_name (str) –

Returns:

None

Return type:

FeatureGroup

add_feature_tag(feature_group_id, feature, tag)
Parameters:
  • feature_group_id (str) –

  • feature (str) –

  • tag (str) –

remove_feature_tag(feature_group_id, feature, tag)
Parameters:
  • feature_group_id (str) –

  • feature (str) –

  • tag (str) –

create_nested_feature(feature_group_id, nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)

Creates a new nested feature in a feature group from a SQL statements to create a new nested feature.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • nested_feature_name (str) – The name of the feature.

  • table_name (str) – The table name of the feature group to nest

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent

  • where_clause (str) – A SQL where statement to filter the nested rows

  • order_clause (str) – A SQL clause to order the nested rows

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

update_nested_feature(feature_group_id, nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)

Updates a previously existing nested feature in a feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • nested_feature_name (str) – The name of the feature to be updated.

  • table_name (str) – The name of the table.

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent

  • where_clause (str) – A SQL where statement to filter the nested rows

  • order_clause (str) – A SQL clause to order the nested rows

  • new_nested_feature_name (str) – New name for the nested feature.

Returns:

A feature group object with the updated nested feature.

Return type:

FeatureGroup

delete_nested_feature(feature_group_id, nested_feature_name)

Delete a nested feature.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • nested_feature_name (str) – The name of the feature to be updated.

Returns:

A feature group object without the deleted nested feature.

Return type:

FeatureGroup

create_point_in_time_feature(feature_group_id, feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)

Creates a new point in time feature in a feature group using another historical feature group, window spec and aggregate expression.

We use the aggregation keys, and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group. If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature >= lookbackStartCount and < the value of the current rows timeFeature are considered. An option lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to make sure that these rows are available in the online context when we are performing a lookup on this feature group. If window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is >= lookbackCount and includes the row just prior to the current one. The lag is specified in term of positions using lookbackUntilPosition.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature_name (str) – The name of the feature to create

  • history_table_name (str) – The table name of the history table.

  • aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature

  • historical_timestamp_key (str) – Name of feature which contains the historical timestamp.

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

update_point_in_time_feature(feature_group_id, feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)

Updates an existing point in time feature in a feature group. See createPointInTimeFeature for detailed semantics.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature_name (str) – The name of the feature.

  • history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.

  • aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature

  • historical_timestamp_key (str) – Name of feature which contains the historical timestamp.

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

  • new_feature_name (str) – New name for the point in time feature.

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

create_point_in_time_group(feature_group_id, group_name, window_key, aggregation_keys, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=0, lookback_count=None, lookback_until_position=0)

Create point in time group

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group to add the point in time group to.

  • group_name (str) – The name of the point in time group

  • window_key (str) – Name of feature to use for ordering the rows on the source table

  • aggregation_keys (list) – List of keys to perform on the source table for the window aggregation.

  • history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used

  • history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used

  • history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys

  • lookback_window (float) – Number of seconds in the past from the current time for start of the window. If 0, the lookback will include all rows.

  • lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns:

The feature group after the point in time group has been created

Return type:

FeatureGroup

update_point_in_time_group(feature_group_id, group_name, window_key=None, aggregation_keys=None, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=None, lookback_count=None, lookback_until_position=None)

Update point in time group

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • group_name (str) – The name of the point in time group

  • window_key (str) – Name of feature which contains the timestamp value for the point in time feature

  • aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used

  • history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used

  • history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys

  • lookback_window (float) – Number of seconds in the past from the current time for start of the window.

  • lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns:

The feature group after the update has been applied

Return type:

FeatureGroup

delete_point_in_time_group(feature_group_id, group_name)

Delete point in time group

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • group_name (str) – The name of the point in time group

Returns:

The feature group after the point in time group has been deleted

Return type:

FeatureGroup

create_point_in_time_group_feature(feature_group_id, group_name, name, expression)

Create point in time group feature

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • group_name (str) – The name of the point in time group

  • name (str) – The name of the feature to add to the point in time group

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

Returns:

The feature group after the update has been applied

Return type:

FeatureGroup

update_point_in_time_group_feature(feature_group_id, group_name, name, expression)

Update a feature’s SQL expression in a point in time group

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • group_name (str) – The name of the point in time group

  • name (str) – The name of the feature to add to the point in time group

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

Returns:

The feature group after the update has been applied

Return type:

FeatureGroup

set_feature_type(feature_group_id, feature, feature_type)

Set a feature’s type in a feature group/. Specify the feature group ID, feature name and feature type, and the method will return the new column with the resulting changes reflected.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature (str) – The name of the feature.

  • feature_type (str) – The machine learning type of the data in the feature. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE, MULTICATEGORICAL_LIST, COORDINATE_LIST, NUMERICAL_LIST, TIMESTAMP_LIST Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some FeatureMappings will restrict the options or explicitly set the FeatureType.

Returns:

The feature group after the data_type is applied

Return type:

Schema

invalidate_streaming_feature_group_data(feature_group_id, invalid_before_timestamp)

Invalidates all streaming data with timestamp before invalidBeforeTimestamp

Parameters:
  • feature_group_id (str) – The Streaming feature group to record data to

  • invalid_before_timestamp (int) – The unix timestamp, any data which has a timestamp before this time will be deleted

concatenate_feature_group_data(feature_group_id, source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)

Concatenates data from one feature group to another. Feature groups can be merged if their schema’s are compatible and they have the special updateTimestampKey column and if set, the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).

Parameters:
  • feature_group_id (str) – The destination feature group.

  • source_feature_group_id (str) – The feature group to concatenate with the destination feature group.

  • merge_type (str) – UNION or INTERSECTION

  • replace_until_timestamp (int) – The unix timestamp to specify the point till which we will replace data from the source feature group.

  • skip_materialize (bool) – If true, will not materialize the concatenated feature group

remove_concatenation_config(feature_group_id)

Removes the concatenation config on a destination feature group.

Parameters:

feature_group_id (str) – Removes the concatenation configuration on a destination feature group

set_feature_group_indexing_config(feature_group_id, primary_key=None, update_timestamp_key=None, lookup_keys=None)

Sets various attributes of the feature group used for deployment lookups and streaming updates.

Parameters:
  • feature_group_id (str) – The feature group

  • primary_key (str) – Name of feature which defines the primary key of the feature group.

  • update_timestamp_key (str) – Name of feature which defines the update timestamp of the feature group - used in concatenation and primary key deduplication.

  • lookup_keys (list) – List of feature names which can be used in the lookup api to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.

update_feature_group(feature_group_id, description=None)

Modifies an existing feature group

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • description (str) – The description about the feature group.

Returns:

The updated feature group object.

Return type:

FeatureGroup

detach_feature_group_from_template(feature_group_id)

Update a feature group to detach it from a template.

Currently, this converts the feature group into a SQL feature group rather than a template feature group.

Parameters:

feature_group_id (str) – The unique ID associated with the feature group.

Returns:

The updated feature group

Return type:

FeatureGroup

update_feature_group_template_bindings(feature_group_id, template_bindings=None)

Update the feature group template bindings for a template feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • template_bindings (list) – Values in these bindings override values set in the template.

Returns:

The updated feature group

Return type:

FeatureGroup

update_feature_group_python_function_bindings(feature_group_id, python_function_bindings)

Updates an existing Feature Group’s python function bindings from a user provided Python Function. If a list of feature groups are supplied within the python function

bindings, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • python_function_bindings (list) – List of arguments to be supplied to the function as parameters in the format [{‘name’: ‘function_argument’, ‘variable_type’: ‘FEATURE_GROUP’, ‘value’: ‘name_of_feature_group’}].

update_feature_group_sql_definition(feature_group_id, sql)

Updates the SQL statement for a feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • sql (str) – Input SQL statement for the feature group.

Returns:

The updated feature group

Return type:

FeatureGroup

update_dataset_feature_group_feature_expression(feature_group_id, feature_expression)

Updates the SQL feature expression for a dataset feature group’s custom features

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature_expression (str) – Input SQL statement for the feature group.

Returns:

The updated feature group

Return type:

FeatureGroup

update_feature_group_function_definition(feature_group_id, function_source_code=None, function_name=None, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None, use_original_csv_names=False)

Updates the function definition for a feature group created using createFeatureGroupFromFunction

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

  • use_original_csv_names (bool) – If set to true, feature group uses the original column names for input feature groups from csv datasets.

Returns:

The updated feature group

Return type:

FeatureGroup

update_feature_group_zip(feature_group_id, function_name, module_name, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None)

Updates the zip for a feature group created using createFeatureGroupFromZip

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • module_name (str) – Path to the file with the feature group function.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns:

The Upload to upload the zip file to

Return type:

Upload

update_feature_group_git(feature_group_id, application_connector_id=None, branch_name=None, python_root=None, function_name=None, module_name=None, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None)

Updates a feature group created using createFeatureGroupFromGit

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • application_connector_id (str) – The unique ID associated with the git application connector.

  • branch_name (str) – Name of the branch in the git repository to be used for training.

  • python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • module_name (str) – Path to the file with the feature group function.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns:

The updated FeatureGroup

Return type:

FeatureGroup

update_feature(feature_group_id, name, select_expression=None, new_name=None)

Modifies an existing feature in a feature group. A user needs to specify the name and feature group ID and either a SQL statement or new name to update the feature.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • name (str) – The name of the feature to be updated.

  • select_expression (str) – Input SQL statement for modifying the feature.

  • new_name (str) – The new name of the feature.

Returns:

The updated feature group object.

Return type:

FeatureGroup

export_feature_group_version_to_file_connector(feature_group_version, location, export_file_format, overwrite=False)

Export Feature group to File Connector.

Parameters:
  • feature_group_version (str) – The Feature Group instance to export.

  • location (str) – Cloud file location to export to.

  • export_file_format (str) – File format to export to.

  • overwrite (bool) – If true and a file exists at this location, this process will overwrite the file.

Returns:

The FeatureGroupExport instance

Return type:

FeatureGroupExport

export_feature_group_version_to_database_connector(feature_group_version, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)

Export Feature group to Database Connector.

Parameters:
  • feature_group_version (str) – The Feature Group instance id to export.

  • database_connector_id (str) – Database connector to export to.

  • object_name (str) – The database object to write to

  • write_mode (str) – Either INSERT or UPSERT

  • database_feature_mapping (dict) – A key/value pair JSON Object of “database connector column” -> “feature name” pairs.

  • id_column (str) – Required if mode is UPSERT. Indicates which database column should be used as the lookup key for UPSERT

  • additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting

Returns:

The FeatureGroupExport instance

Return type:

FeatureGroupExport

export_feature_group_version_to_console(feature_group_version, export_file_format)

Export Feature group to console.

Parameters:
  • feature_group_version (str) – The Feature Group instance to export.

  • export_file_format (str) – File format to export to.

Returns:

The FeatureGroupExport instance

Return type:

FeatureGroupExport

set_feature_group_modifier_lock(feature_group_id, locked=True)

To lock a feature group to prevent it from being modified.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • locked (bool) – True or False to disable or enable feature group modification.

add_user_to_feature_group_modifiers(feature_group_id, email)

Adds user to a feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • email (str) – The email address of the user to be removed.

add_organization_group_to_feature_group_modifiers(feature_group_id, organization_group_id)

Add Organization to a feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • organization_group_id (str) – The unique ID associated with the organization group.

remove_user_from_feature_group_modifiers(feature_group_id, email)

Removes user from a feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • email (str) – The email address of the user to be removed.

remove_organization_group_from_feature_group_modifiers(feature_group_id, organization_group_id)

Removes Organization from a feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • organization_group_id (str) – The unique ID associated with the organization group.

delete_feature(feature_group_id, name)

Removes an existing feature from a feature group. A user needs to specify the name of the feature to be deleted and the feature group ID.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • name (str) – The name of the feature to be deleted.

Returns:

The updated feature group object.

Return type:

FeatureGroup

delete_feature_group(feature_group_id)

Removes an existing feature group.

Parameters:

feature_group_id (str) – The unique ID associated with the feature group.

create_feature_group_version(feature_group_id, variable_bindings=None)

Creates a snapshot for a specified feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • variable_bindings (dict) – (JSON Object): JSON object (aka map) defining variable bindings that override parent feature group values.

Returns:

A feature group version.

Return type:

FeatureGroupVersion

create_feature_group_template(feature_group_id, name, template_sql, template_variables, description=None, template_bindings=None, should_attach_feature_group_to_template=False)

Create a feature group template.

Parameters:
  • feature_group_id (str) – Identifier of the feature group this template was created from.

  • name (str) – The user-friendly of for this feature group template.

  • template_sql (str) – The template sql that will be resolved by applying values from the template variables to generate sql for a feature group.

  • template_variables (list) – The template variables for resolving the template.

  • description (str) – A description of this feature group template

  • template_bindings (list) – If the feature group will be attached to the newly created template, set these variable bindings on that feature group.

  • should_attach_feature_group_to_template (bool) – Set to True to convert the feature group to a template feature group and attach it to the newly created template.

Returns:

The created feature group template

Return type:

FeatureGroupTemplate

delete_feature_group_template(feature_group_template_id)

Delete an existing feature group template.

Parameters:

feature_group_template_id (str) – The unique ID associated with the feature group template.

update_feature_group_template(feature_group_template_id, template_sql=None, template_variables=None, description=None, name=None)

Update a feature group template.

Parameters:
  • feature_group_template_id (str) – Identifier of the feature group template to update.

  • template_sql (str) – If provided, the new value to use for the template sql.

  • template_variables (list) – If provided, the new value to use for the template variables.

  • description (str) – A description of this feature group template

  • name (str) – The user-friendly of for this feature group template.

Returns:

The updated feature group template.

Return type:

FeatureGroupTemplate

preview_feature_group_template_resolution(feature_group_template_id=None, template_bindings=None, template_sql=None, template_variables=None, should_validate=True)

Resolve template sql using template variables and template bindings.

Parameters:
  • feature_group_template_id (str) – If specified, use this template, otherwise assume an empty template.

  • template_bindings (list) – Values that overide the template variable values specified by the template.

  • template_sql (str) – If specified, use this as the template sql instead of the feature group template’s sql.

  • template_variables (list) – Template variables to use. If a template is provided, this overrides the template’s template variables.

  • should_validate (bool) –

Returns:

None

Return type:

ResolvedFeatureGroupTemplate

cancel_upload(upload_id)

Cancels an upload

Parameters:

upload_id (str) – The Upload ID

upload_part(upload_id, part_number, part_data)

Uploads a part of a large dataset file from your bucket to our system. Our system currently supports a size of up to 5GB for a part of a full file and a size of up to 5TB for the full file. Note that each part must be >=5MB in size, unless it is the last part in the sequence of parts for the full file.

Parameters:
  • upload_id (str) – A unique identifier for this upload

  • part_number (int) – The 1-indexed number denoting the position of the file part in the sequence of parts for the full file.

  • part_data (io.TextIOBase) – The multipart/form-data for the current part of the full file.

Returns:

The object ‘UploadPart’ which encapsulates the hash and the etag for the part that got uploaded.

Return type:

UploadPart

mark_upload_complete(upload_id)

Marks an upload process as complete.

Parameters:

upload_id (str) – A unique identifier for this upload

Returns:

The upload object associated with the upload process for the full file. The details of the object are described below:

Return type:

Upload

create_dataset_from_file_connector(table_name, location, file_format=None, refresh_schedule=None, csv_delimiter=None, filename_column=None, start_prefix=None, until_prefix=None, location_date_format=None, date_format_lookback_days=None, incremental=False)

Creates a dataset from a file located in a cloud storage, such as Amazon AWS S3, using the specified dataset name and location.

Parameters:
  • table_name (str) – Organization-unique table name or the name of the feature group table to create using the source table.

  • location (str) – The URI location format of the dataset source. The URI location format needs to be specified to match the location_date_format when location_date_format is specified. Ex. Location = s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/* when The URI location format needs to include both the start_prefix and until_prefix when both are specified. Ex. Location s3://bucket1/dir1/* includes both s3://bucket1/dir1/dir2/event_date=2021-08-02/* and s3://bucket1/dir1/dir2/event_date=2021-08-08/*

  • file_format (str) – The file format of the dataset.

  • refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.

  • csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.

  • filename_column (str) – Adds a new column to the dataset with the external URI path.

  • start_prefix (str) – The start prefix (inclusive) for a range based search on a cloud storage location URI.

  • until_prefix (str) – The end prefix (exclusive) for a range based search on a cloud storage location URI.

  • location_date_format (str) – The date format in which the data is partitioned in the cloud storage location. E.g., if the data is partitioned as s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/dir4/filename.parquet, then the location_date_format is YYYY-MM-DD This format needs to be consistent across all files within the specified location.

  • date_format_lookback_days (int) – The number of days to look back from the current day for import locations that are date partitioned. E.g., import date, 2021-06-04, with date_format_lookback_days = 3 will retrieve data for all the dates in the range [2021-06-02, 2021-06-04].

  • incremental (bool) – Signifies if the dataset is an incremental dataset.

Returns:

The dataset created.

Return type:

Dataset

create_dataset_version_from_file_connector(dataset_id, location=None, file_format=None, csv_delimiter=None)

Creates a new version of the specified dataset.

Parameters:
  • dataset_id (str) – The unique ID associated with the dataset.

  • location (str) – A new external URI to import the dataset from. If not specified, the last location will be used.

  • file_format (str) – The fileFormat to be used. If not specified, the service will try to detect the file format.

  • csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.

Returns:

The new Dataset Version created.

Return type:

DatasetVersion

create_dataset_from_database_connector(table_name, database_connector_id, object_name=None, columns=None, query_arguments=None, refresh_schedule=None, sql_query=None, incremental=False, timestamp_column=None)

Creates a dataset from a Database Connector

Parameters:
  • table_name (str) – Organization-unique table name

  • database_connector_id (str) – The Database Connector to import the dataset from

  • object_name (str) – If applicable, the name/id of the object in the service to query.

  • columns (str) – The columns to query from the external service object.

  • query_arguments (str) – Additional query arguments to filter the data

  • refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.

  • sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override objectName, columns, timestampColumn, and queryArguments

  • incremental (bool) – Signifies if the dataset is an incremental dataset.

  • timestamp_column (str) – If dataset is incremental, this is the column name of the required column in the dataset. This column must contain timestamps in descending order which are used to determine the increments of the incremental dataset.

Returns:

The created dataset.

Return type:

Dataset

create_dataset_from_application_connector(table_name, application_connector_id, object_id=None, start_timestamp=None, end_timestamp=None, refresh_schedule=None)

Creates a dataset from an Application Connector

Parameters:
  • table_name (str) – Organization-unique table name

  • application_connector_id (str) – The unique application connector to download data from

  • object_id (str) – If applicable, the id of the object in the service to query.

  • start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.

  • end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.

  • refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.

Returns:

The created dataset.

Return type:

Dataset

create_dataset_version_from_database_connector(dataset_id, object_name=None, columns=None, query_arguments=None, sql_query=None)

Creates a new version of the specified dataset

Parameters:
  • dataset_id (str) – The unique ID associated with the dataset.

  • object_name (str) – If applicable, the name/id of the object in the service to query. If not specified, the last name will be used.

  • columns (str) – The columns to query from the external service object. If not specified, the last columns will be used.

  • query_arguments (str) – Additional query arguments to filter the data. If not specified, the last arguments will be used.

  • sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override objectName, columns, and queryArguments

Returns:

The new Dataset Version created.

Return type:

DatasetVersion

create_dataset_version_from_application_connector(dataset_id, object_id=None, start_timestamp=None, end_timestamp=None)

Creates a new version of the specified dataset

Parameters:
  • dataset_id (str) – The unique ID associated with the dataset.

  • object_id (str) – If applicable, the id of the object in the service to query. If not specified, the last name will be used.

  • start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.

  • end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.

Returns:

The new Dataset Version created.

Return type:

DatasetVersion

create_dataset_from_upload(table_name, file_format=None, csv_delimiter=None)

Creates a dataset and return an upload Id that can be used to upload a file.

Parameters:
  • table_name (str) – Organization-unique table name for this dataset.

  • file_format (str) – The file format of the dataset.

  • csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.

Returns:

A reference to be used when uploading file parts.

Return type:

Upload

create_dataset_version_from_upload(dataset_id, file_format=None)

Creates a new version of the specified dataset using a local file upload.

Parameters:
  • dataset_id (str) – The unique ID associated with the dataset.

  • file_format (str) – The file_format to be used. If not specified, the service will try to detect the file format.

Returns:

A token to be used when uploading file parts.

Return type:

Upload

create_streaming_dataset(table_name, project_id=None, dataset_type=None)

Creates a streaming dataset. Use a streaming dataset if your dataset is receiving information from multiple sources over an extended period of time.

Parameters:
  • table_name (str) – The feature group table name to create for this dataset

  • project_id (str) – The project to create the streaming dataset for.

  • dataset_type (str) – The dataset has to be a type that is associated with the use case of your project. Please see (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for the datasetTypes that are supported per use case.

Returns:

The streaming dataset created.

Return type:

Dataset

snapshot_streaming_data(dataset_id)

Snapshots the current data in the streaming dataset for training.

Parameters:

dataset_id (str) – The unique ID associated with the dataset.

Returns:

The new Dataset Version created.

Return type:

DatasetVersion

set_dataset_column_data_type(dataset_id, column, data_type)

Set a column’s type in a specified dataset.

Parameters:
  • dataset_id (str) – The unique ID associated with the dataset.

  • column (str) – The name of the column.

  • data_type (str) – The type of the data in the column. INTEGER, FLOAT, STRING, DATE, DATETIME, BOOLEAN, LIST, STRUCT Refer to the (guide on data types)[https://api.abacus.ai/app/help/class/DataType] for more information. Note: Some ColumnMappings will restrict the options or explicitly set the DataType.

Returns:

The dataset and schema after the data_type has been set

Return type:

Dataset

create_dataset_from_streaming_connector(table_name, streaming_connector_id, streaming_args=None, refresh_schedule=None)

Creates a dataset from a Streaming Connector

Parameters:
  • table_name (str) – Organization-unique table name

  • streaming_connector_id (str) – The Streaming Connector to import the dataset from

  • streaming_args (dict) – Dict of arguments to read data from the streaming connector

  • refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.

Returns:

The created dataset.

Return type:

Dataset

set_streaming_retention_policy(dataset_id, retention_hours=None, retention_row_count=None)

Sets the streaming retention policy

Parameters:
  • dataset_id (str) – The Streaming dataset

  • retention_hours (int) – The number of hours to retain streamed data in memory

  • retention_row_count (int) – The number of rows to retain streamed data in memory

rename_database_connector(database_connector_id, name)

Renames a Database Connector

Parameters:
  • database_connector_id (str) – The unique identifier for the database connector.

  • name (str) – The new name for the Database Connector

rename_application_connector(application_connector_id, name)

Renames an Application Connector

Parameters:
  • application_connector_id (str) – The unique identifier for the application connector.

  • name (str) – A new name for the application connector

verify_database_connector(database_connector_id)

Checks to see if Abacus.AI can access the database.

Parameters:

database_connector_id (str) – The unique identifier for the database connector.

verify_file_connector(bucket)

Checks to see if Abacus.AI can access the bucket.

Parameters:

bucket (str) – The bucket to test.

Returns:

The Result of the verification.

Return type:

FileConnectorVerification

delete_database_connector(database_connector_id)

Delete a database connector.

Parameters:

database_connector_id (str) – The unique identifier for the database connector.

delete_application_connector(application_connector_id)

Delete a application connector.

Parameters:

application_connector_id (str) – The unique identifier for the application connector.

delete_file_connector(bucket)

Removes a connected service from the specified organization.

Parameters:

bucket (str) – The fully qualified URI of the bucket to remove.

verify_application_connector(application_connector_id)

Checks to see if Abacus.AI can access the Application.

Parameters:

application_connector_id (str) – The unique identifier for the application connector.

set_azure_blob_connection_string(bucket, connection_string)

Authenticates specified Azure Blob Storage bucket using an authenticated Connection String.

Parameters:
  • bucket (str) – The fully qualified Azure Blob Storage Bucket URI

  • connection_string (str) – The Connection String {product_name} should use to authenticate when accessing this bucket

Returns:

An object with the roleArn and verification status for the specified bucket.

Return type:

FileConnectorVerification

verify_streaming_connector(streaming_connector_id)

Checks to see if Abacus.AI can access the streaming connector.

Parameters:

streaming_connector_id (str) – The unique identifier for the streaming connector.

rename_streaming_connector(streaming_connector_id, name)

Renames a Streaming Connector

Parameters:
  • streaming_connector_id (str) – The unique identifier for the streaming connector.

  • name (str) – A new name for the streaming connector

delete_streaming_connector(streaming_connector_id)

Delete a streaming connector.

Parameters:

streaming_connector_id (str) – The unique identifier for the streaming connector.

create_streaming_token()

Creates a streaming token for the specified project. Streaming tokens are used to authenticate requests to append data to streaming datasets.

Returns:

The streaming token.

Return type:

StreamingAuthToken

delete_streaming_token(streaming_token)

Deletes the specified streaming token.

Parameters:

streaming_token (str) – The streaming token to delete.

attach_dataset_to_project(dataset_id, project_id, dataset_type)

[DEPRECATED] Attaches the dataset to the project.

Use this method to attach a dataset that is already in the organization to another project. The dataset type is required to let the AI engine know what type of schema should be used.

Parameters:
  • dataset_id (str) – The dataset to attach.

  • project_id (str) – The project to attach the dataset to.

  • dataset_type (str) – The dataset has to be a type that is associated with the use case of your project. Please see (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for the datasetTypes that are supported per use case.

Returns:

An array of columns descriptions.

Return type:

Schema

remove_dataset_from_project(dataset_id, project_id)

[DEPRECATED] Removes a dataset from a project.

Parameters:
  • dataset_id (str) – The unique ID associated with the dataset.

  • project_id (str) – The unique ID associated with the project.

delete_dataset(dataset_id)

Deletes the specified dataset from the organization.

The dataset cannot be deleted if it is currently attached to a project.

Parameters:

dataset_id (str) – The dataset to delete.

get_training_config_options(project_id, feature_group_ids=None, for_retrain=False, current_training_config=None)

Retrieves the full initial description of the model training configuration options available for the specified project.

The configuration options available are determined by the use case associated with the specified project. Refer to the (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for more information on use cases and use case specific configuration options.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • feature_group_ids (list) – The feature group IDs to be used for training

  • for_retrain (bool) – If training config options are used for retrain

  • current_training_config (dict) – This is None by default initially and represents the current state of the training config, with some options set, which shall be used to get new options after refresh.

Returns:

An array of options that can be specified when training a model in this project.

Return type:

TrainingConfigOptions

create_train_test_data_split_feature_group(project_id, training_config, feature_group_ids)

Get the train and test data split without training the model. Only supported for models with custom algorithms.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • training_config (dict) – The training config key/value pairs used to influence how split is calculated.

  • feature_group_ids (list) – List of feature group ids provided by the user, including the required one for data split and others to influence how to split.

Returns:

The feature group containing the training data and folds information.

Return type:

FeatureGroup

train_model(project_id, name=None, training_config=None, feature_group_ids=None, refresh_schedule=None, custom_algorithms=None, custom_algorithms_only=False, custom_algorithm_configs=None, builtin_algorithms=None, cpu_size=None, memory=None)

Trains a model for the specified project.

Use this method to train a model in this project. This method supports user-specified training configurations defined in the getTrainingConfigOptions method.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.

  • training_config (dict) – The training config key/value pairs used to train this model.

  • feature_group_ids (list) – List of feature group ids provided by the user to train the model on.

  • refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model.

  • custom_algorithms (list) – List of user-defined algorithms to train. If not set, will run default enabled custom algorithms.

  • custom_algorithms_only (bool) – Whether only run custom algorithms.

  • custom_algorithm_configs (dict) – Configs for each user-defined algorithm, key is algorithm name, value is the config serialized to json

  • builtin_algorithms (list) – List of the builtin algorithms provided by Abacus.AI to train. If not set, will try all applicable builtin algorithms.

  • cpu_size (str) – Size of the cpu for the user-defined algorithms during train.

  • memory (int) – Memory (in GB) for the user-defined algorithms during train.

Returns:

The new model which is being trained.

Return type:

Model

create_model_from_python(project_id, function_source_code, train_function_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, name=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False, package_requirements=None)

Initializes a new Model from user provided Python code. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • name (str) – The name you want your model to have. Defaults to “<Project Name> Model”

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

  • training_config (dict) – Training configuration

  • exclusive_run (bool) – Decides if this model will be run exclusively OR along with other Abacus.ai algorithms

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns:

The new model, which has not been trained.

Return type:

Model

rename_model(model_id, name)

Renames a model

Parameters:
  • model_id (str) – The ID of the model to rename

  • name (str) – The name to apply to the model

update_python_model(model_id, function_source_code=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None)

Updates an existing python Model using user provided Python code. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters:
  • model_id (str) – The unique ID associated with the Python model to be changed.

  • function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed to run batch predictions through model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns:

The updated model

Return type:

Model

update_python_model_zip(model_id, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None)

Updates an existing python Model using a provided zip file. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters:
  • model_id (str) – The unique ID associated with the Python model to be changed.

  • train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.

  • train_module_name (str) – Full path of the module that contains the train function from the root of the zip.

  • predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns:

The updated model

Return type:

Upload

update_python_model_git(model_id, application_connector_id=None, branch_name=None, python_root=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None)

Updates an existing python Model using an existing git application connector. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters:
  • model_id (str) – The unique ID associated with the Python model to be changed.

  • application_connector_id (str) – The unique ID associated with the git application connector.

  • branch_name (str) – Name of the branch in the git repository to be used for training.

  • python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.

  • train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.

  • train_module_name (str) – Full path of the module that contains the train function from the root of the zip.

  • predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

Returns:

The updated model

Return type:

Model

set_model_training_config(model_id, training_config, feature_group_ids=None)

Edits the default model training config

Parameters:
  • model_id (str) – The unique ID of the model to update

  • training_config (dict) – The training config key/value pairs used to train this model.

  • feature_group_ids (list) –

Returns:

The model object correspoding after the training config is applied

Return type:

Model

set_model_prediction_params(model_id, prediction_config)

Sets the model prediction config for the model

Parameters:
  • model_id (str) – The unique ID of the model to update

  • prediction_config (dict) – The prediction config for the model

Returns:

The model object correspoding after the prediction config is applied

Return type:

Model

retrain_model(model_id, deployment_ids=[], feature_group_ids=None, custom_algorithms=None, builtin_algorithms=None, custom_algorithm_configs=None, cpu_size=None, memory=None, training_config=None)

Retrains the specified model. Gives you an option to choose the deployments you want the retraining to be deployed to.

Parameters:
  • model_id (str) – The model to retrain.

  • deployment_ids (list) – List of deployments to automatically deploy to.

  • feature_group_ids (list) – List of feature group ids provided by the user to train the model on.

  • custom_algorithms (list) – List of user-defined algorithms to train. If not set, will honor the runs from last time and applicable new custom algorithms.

  • builtin_algorithms (list) – List of the builtin algorithms provided by Abacus.AI to train. If not set, honor the runs from last time and applicable new builtin algorithms.

  • custom_algorithm_configs (dict) – The user-defined training configs for each custom algorithm.

  • cpu_size (str) – Size of the cpu for the user-defined algorithms during train.

  • memory (int) – Memory (in GB) for the user-defined algorithms during train.

  • training_config (dict) – The training config key/value pairs used to train this model.

Returns:

The model that is being retrained.

Return type:

Model

delete_model(model_id)

Deletes the specified model and all its versions. Models which are currently used in deployments cannot be deleted.

Parameters:

model_id (str) – The ID of the model to delete.

delete_model_version(model_version)

Deletes the specified model version. Model Versions which are currently used in deployments cannot be deleted.

Parameters:

model_version (str) – The ID of the model version to delete.

export_model_artifact_as_feature_group(model_version, table_name, artifact_type)

Exports metric artifact data for a model as a feature group.

Parameters:
  • model_version (str) – The version of the model.

  • table_name (str) – The name of the feature group table to create.

  • artifact_type (str) – An EvalArtifact enum of which artifact to export.

Returns:

The created feature group.

Return type:

FeatureGroup

get_custom_train_function_info(project_id, feature_group_names_for_training=None, training_data_parameter_name_override=None, training_config=None, custom_algorithm_config=None)

Returns the information about how to call the custom train function.

Parameters:
  • project_id (str) – The unique version ID of the project

  • feature_group_names_for_training (list) – A list of feature group table names that will be used for training

  • training_data_parameter_name_override (dict) – Override from feature group type to parameter name in train function.

  • training_config (dict) – Training config names to values for the options supported by Abacus.ai platform.

  • custom_algorithm_config (dict) – User-defined config that can be serialized by JSON.

Returns:

Information about how to call the customer provided train function.

Return type:

CustomTrainFunctionInfo

create_model_monitor(project_id, prediction_feature_group_id, training_feature_group_id=None, name=None, refresh_schedule=None, target_value=None, target_value_bias=None, target_value_performance=None, feature_mappings=None, model_id=None, training_feature_mappings=None, feature_group_base_monitor_config=None, feature_group_comparison_monitor_config=None)

Runs a model monitor for the specified project.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • prediction_feature_group_id (str) – The unique ID of the prediction data feature group

  • training_feature_group_id (str) – The unique ID of the training data feature group

  • name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.

  • refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model monitor

  • target_value (str) – A target positive value for the label to compute bias and pr/auc for performance page (old style until UI is on prod) (TODO: @sheetal)

  • target_value_bias (str) – A target positive value for the label to compute bias

  • target_value_performance (str) – A target positive value for the label to compute pr curve/ auc for performance page

  • feature_mappings (dict) – A json map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.

  • model_id (str) – The Unique ID of the Model

  • training_feature_mappings (dict) – A json map to override features for training_fature_group, where keys are column names and the values are feature data use types.

  • feature_group_base_monitor_config (dict) –

  • feature_group_comparison_monitor_config (dict) –

Returns:

The new model monitor that was created.

Return type:

ModelMonitor

rerun_model_monitor(model_monitor_id)

Reruns the specified model monitor.

Parameters:

model_monitor_id (str) – The model monitor to rerun.

Returns:

The model monitor that is being rerun.

Return type:

ModelMonitor

rename_model_monitor(model_monitor_id, name)

Renames a model monitor

Parameters:
  • model_monitor_id (str) – The ID of the model monitor to rename

  • name (str) – The name to apply to the model monitor

delete_model_monitor(model_monitor_id)

Deletes the specified model monitor and all its versions.

Parameters:

model_monitor_id (str) – The ID of the model monitor to delete.

delete_model_monitor_version(model_monitor_version)

Deletes the specified model monitor version.

Parameters:

model_monitor_version (str) – The ID of the model monitor version to delete.

create_monitor_alert(project_id, model_monitor_id, alert_name, condition_config, action_config)

Create a monitor alert for the given conditions and monitor

Parameters:
  • project_id (str) –

  • model_monitor_id (str) – The unique identifier to a model monitor created under the project.

  • alert_name (str) – The alert name.

  • condition_config (dict) – The condition to run the actions for the alert.

  • action_config (dict) – The configuration for the action of the alert

Returns:

An object describing the monitor alert

Return type:

MonitorAlert

update_monitor_alert(monitor_alert_id, alert_name=None, condition_config=None, action_config=None)
Parameters:
  • monitor_alert_id (str) –

  • alert_name (str) –

  • condition_config (dict) –

  • action_config (dict) –

Returns:

None

Return type:

MonitorAlert

run_monitor_alert(monitor_alert_id)

Reruns a given monitor alert from latest monitor instance

Parameters:

monitor_alert_id (str) – The unique identifier to a monitor alert

Returns:

An object describing the monitor alert

Return type:

MonitorAlert

delete_monitor_alert(monitor_alert_id)
Parameters:

monitor_alert_id (str) –

create_deployment(name=None, model_id=None, model_version=None, algorithm=None, feature_group_id=None, project_id=None, description=None, calls_per_second=None, auto_deploy=True, start=True, enable_batch_streaming_updates=False, model_deployment_config=None)

Creates a deployment with the specified name and description for the specified model or feature group.

A Deployment makes the trained model or feature group available for prediction requests.

Parameters:
  • name (str) – The name of the deployment.

  • model_id (str) – The unique ID associated with the model.

  • model_version (str) – The unique ID associated with the model version to deploy.

  • algorithm (str) – The unique ID associated with the algorithm to deploy.

  • feature_group_id (str) – The unique ID associated with a feature group.

  • project_id (str) – The unique ID associated with a project.

  • description (str) – The description for the deployment.

  • calls_per_second (int) – The number of calls per second the deployment could handle.

  • auto_deploy (bool) – Flag to enable the automatic deployment when a new Model Version finishes training.

  • start (bool) –

  • enable_batch_streaming_updates (bool) – Flag to enable marking the feature group deployment to have a background process cache streamed in rows for quicker lookup

  • model_deployment_config (dict) – The deployment config for model to deploy

Returns:

The new model or feature group deployment.

Return type:

Deployment

create_deployment_token(project_id, name=None)

Creates a deployment token for the specified project.

Deployment tokens are used to authenticate requests to the prediction APIs and are scoped on the project level.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • name (str) – The name of the deployement token

Returns:

The deployment token.

Return type:

DeploymentAuthToken

update_deployment(deployment_id, description=None)

Updates a deployment’s description.

Parameters:
  • deployment_id (str) – The deployment to update.

  • description (str) – The new deployment description.

rename_deployment(deployment_id, name)

Updates a deployment’s name and/or description.

Parameters:
  • deployment_id (str) – The deployment to update.

  • name (str) – The new deployment name.

set_auto_deployment(deployment_id, enable=None)

Enable/Disable auto deployment for the specified deployment.

When a model is scheduled to retrain, deployments with this enabled will be marked to automatically promote the new model version. After the newly trained model completes, a check on its metrics in comparison to the currently deployed model version will be performed. If the metrics are comparable or better, the newly trained model version is automatically promoted. If not, it will be marked as a failed model version promotion with an error indicating poor metrics performance.

Parameters:
  • deployment_id (str) – The unique ID associated with the deployment

  • enable (bool) – Enable/disable the autoDeploy property of the Deployment.

set_deployment_model_version(deployment_id, model_version, algorithm=None)

Promotes a Model Version to be served in the Deployment

Parameters:
  • deployment_id (str) – The unique ID for the Deployment

  • model_version (str) – The unique ID for the Model Version

  • algorithm (str) –

set_deployment_feature_group_version(deployment_id, feature_group_version)

Promotes a Feature Group Version to be served in the Deployment

Parameters:
  • deployment_id (str) – The unique ID for the Deployment

  • feature_group_version (str) – The unique ID for the Feature Group Version

start_deployment(deployment_id)

Restarts the specified deployment that was previously suspended.

Parameters:

deployment_id (str) – The unique ID associated with the deployment.

stop_deployment(deployment_id)

Stops the specified deployment.

Parameters:

deployment_id (str) – The Deployment ID

delete_deployment(deployment_id)

Deletes the specified deployment. The deployment’s models will not be affected. Note that the deployments are not recoverable after they are deleted.

Parameters:

deployment_id (str) – The ID of the deployment to delete.

delete_deployment_token(deployment_token)

Deletes the specified deployment token.

Parameters:

deployment_token (str) – The deployment token to delete.

set_deployment_feature_group_export_file_connector_output(deployment_id, file_format=None, output_location=None)

Sets the export output for the Feature Group Deployment to be a file connector.

Parameters:
  • deployment_id (str) – The deployment for which the export type is set

  • file_format (str) –

  • output_location (str) – the file connector (cloud) location of where to export

set_deployment_feature_group_export_database_connector_output(deployment_id, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)

Sets the export output for the Feature Group Deployment to be a Database connector.

Parameters:
  • deployment_id (str) – The deployment for which the export type is set

  • database_connector_id (str) – The database connector ID used

  • object_name (str) – The database connector’s object to write to

  • write_mode (str) – UPSERT or INSERT for writing to the database connector

  • database_feature_mapping (dict) – The column/feature pairs mapping the features to the database columns

  • id_column (str) – The id column to use as the upsert key

  • additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting

remove_deployment_feature_group_export_output(deployment_id)

Removes the export type that is set for the Feature Group Deployment

Parameters:

deployment_id (str) – The deployment for which the export type is set

create_refresh_policy(name, cron, refresh_type, project_id=None, dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], prediction_metric_ids=[])

Creates a refresh policy with a particular cron pattern and refresh type.

A refresh policy allows for the scheduling of a particular set of actions at regular intervals. This can be useful for periodically updated data which needs to be re-imported into the project for re-training.

Parameters:
  • name (str) – The name for the refresh policy

  • cron (str) – A cron-like string specifying the frequency of a refresh policy

  • refresh_type (str) – The Refresh Type is used to determine what is being refreshed, whether its a single dataset, or dataset and a model, or more.

  • project_id (str) – Optionally, a Project ID can be specified so that all datasets, models and deployments are captured at the instant this policy was created

  • dataset_ids (list) – Comma separated list of Dataset IDs

  • model_ids (list) – Comma separated list of Model IDs

  • deployment_ids (list) – Comma separated list of Deployment IDs

  • batch_prediction_ids (list) – Comma separated list of Batch Predictions

  • prediction_metric_ids (list) – Comma separated list of Prediction Metrics

Returns:

The refresh policy created

Return type:

RefreshPolicy

delete_refresh_policy(refresh_policy_id)

Delete a refresh policy

Parameters:

refresh_policy_id (str) – The unique ID associated with this refresh policy

pause_refresh_policy(refresh_policy_id)

Pauses a refresh policy

Parameters:

refresh_policy_id (str) – The unique ID associated with this refresh policy

resume_refresh_policy(refresh_policy_id)

Resumes a refresh policy

Parameters:

refresh_policy_id (str) – The unique ID associated with this refresh policy

run_refresh_policy(refresh_policy_id)

Force a run of the refresh policy.

Parameters:

refresh_policy_id (str) – The unique ID associated with this refresh policy

update_refresh_policy(refresh_policy_id, name=None, cron=None)

Update the name or cron string of a refresh policy

Parameters:
  • refresh_policy_id (str) – The unique ID associated with this refresh policy

  • name (str) – Optional, specify to update the name of the refresh policy

  • cron (str) – Optional, specify to update the cron string describing the schedule from the refresh policy

Returns:

The updated refresh policy

Return type:

RefreshPolicy

lookup_features(deployment_token, deployment_id, query_data={}, limit_results=None, result_columns=None)

Returns the feature group deployed in the feature store project.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • limit_results (int) – If present, will limit the number of results to the value provided.

  • result_columns (list) – If present, will limit the columns present in each result to the columns specified in this list

Return type:

Dict

predict(deployment_token, deployment_id, query_data={})

Returns a prediction for Predictive Modeling

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type:

Dict

predict_multiple(deployment_token, deployment_id, query_data={})

Returns a list of predictions for Predictive Modeling

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (list) – This will be a list of dictionaries where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type:

Dict

predict_from_datasets(deployment_token, deployment_id, query_data={})

Returns a list of predictions for Predictive Modeling

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the source dataset name and ‘Value’ will be a list of records corresponding to the dataset rows

Return type:

Dict

predict_lead(deployment_token, deployment_id, query_data, explain_predictions=False, explainer_type=None)

Returns the probability of a user to be a lead on the basis of his/her interaction with the service/product and user’s own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of click, items in cart, etc.).

  • explain_predictions (bool) – Will explain predictions for lead

  • explainer_type (str) – Type of explainer to use for explanations

Return type:

Dict

predict_churn(deployment_token, deployment_id, query_data)

Returns a probability of a user to churn out in response to his/her interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type:

Dict

predict_takeover(deployment_token, deployment_id, query_data)

Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing account activity characteristics (e.g. login id, login duration, login type, ip address, etc.).

Return type:

Dict

predict_fraud(deployment_token, deployment_id, query_data)

Returns a probability of a transaction performed under a specific account as being a fraud or not. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).

Return type:

Dict

predict_class(deployment_token, deployment_id, query_data={}, threshold=None, threshold_class=None, thresholds=None, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)

Returns a classification prediction

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • threshold (float) – float value that is applied on the popular class label.

  • threshold_class (str) – label upon which the threshold is added (Binary labels only)

  • thresholds (list) – maps labels to thresholds (Multi label classification only). Defaults to F1 optimal threshold if computed for the given class, else uses 0.5

  • explain_predictions (bool) – If true, returns the SHAP explanations for all input features.

  • fixed_features (list) – Set of input features to treat as constant for explanations.

  • nested (str) – If specified generates prediction delta for each index of the specified nested feature.

  • explainer_type (str) –

Return type:

Dict

predict_target(deployment_token, deployment_id, query_data={}, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)

Returns a prediction from a classification or regression model. Optionally, includes explanations.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • explain_predictions (bool) – If true, returns the SHAP explanations for all input features.

  • fixed_features (list) – Set of input features to treat as constant for explanations.

  • nested (str) – If specified generates prediction delta for each index of the specified nested feature.

  • explainer_type (str) –

Return type:

Dict

get_anomalies(deployment_token, deployment_id, threshold=None, histogram=False)

Returns a list of anomalies from the training dataset

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.

  • histogram (bool) – If True, will return a histogram of the distribution of all points

Return type:

io.BytesIO

is_anomaly(deployment_token, deployment_id, query_data=None)

Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – The input data for the prediction.

Return type:

Dict

get_forecast(deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None, explain_predictions=False, explainer_type=None)

Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.

  • future_data (dict) – This will be a dictionary of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). The key and the value both will be of type ‘String’. For example future data entered for a Store may be {“Holiday”:”No”, “Promo”:”Yes”}.

  • num_predictions (int) – The number of timestamps to predict in the future.

  • prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).

  • explain_predictions (bool) – Will explain predictions for forecasting

  • explainer_type (str) – Type of explainer to use for explanations

Return type:

Dict

get_k_nearest(deployment_token, deployment_id, vector, k=None, distance=None, include_score=False)

Returns the k nearest neighbors for the provided embedding vector.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • vector (list) – Input vector to perform the k nearest neighbors with.

  • k (int) – Overrideable number of items to return

  • distance (str) – Specify the distance function to use when finding nearest neighbors

  • include_score (bool) – If True, will return the score alongside the resulting embedding value

Return type:

Dict

get_multiple_k_nearest(deployment_token, deployment_id, queries)

Returns the k nearest neighbors for the queries provided

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • queries (list) – List of Mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters

get_labels(deployment_token, deployment_id, query_data, threshold=None)

Returns a list of scored labels from

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.

  • threshold (None) – Deprecated

Return type:

Dict

get_recommendations(deployment_token, deployment_id, query_data, num_items=50, page=1, exclude_item_ids=[], score_field='', scaling_factors=[], restrict_items=[], exclude_items=[], explore_fraction=0.0)

Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.

  • num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.

  • page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.

  • exclude_item_ids (list) – [DEPRECATED]

  • score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

  • restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.

  • exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.

  • explore_fraction (float) – The fraction of recommendations that is to be new items.

Return type:

Dict

get_personalized_ranking(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])

Returns a list of items with personalized promotions on them for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.

  • preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.

  • preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

Return type:

Dict

get_ranked_items(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])

Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.

  • preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.

  • preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

Return type:

Dict

Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.

  • num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.

  • page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

  • restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.

  • exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.

Return type:

Dict

get_feature_group_rows(deployment_token, deployment_id, query_data)
Parameters:
  • deployment_token (str) –

  • deployment_id (str) –

  • query_data (dict) –

get_search_results(deployment_token, deployment_id, query_data)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.

Return type:

Dict

get_sentiment(deployment_token, deployment_id, document)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type:

Dict

get_entailment(deployment_token, deployment_id, document)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type:

Dict

get_classification(deployment_token, deployment_id, document)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type:

Dict

get_summary(deployment_token, deployment_id, query_data)

Returns a json of the predicted summary for the given document. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘text’ mapped to mapping ‘DOCUMENT’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Raw Data dictionary containing the required document data - must have a key document corresponding to a DOCUMENT type text as value.

Return type:

Dict

predict_language(deployment_token, deployment_id, query_data)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (str) – # TODO

Return type:

Dict

get_assignments(deployment_token, deployment_id, query_data, forced_assignments=None)

Get all positive assignments that match a query.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – specifies the set of assignments being requested.

  • forced_assignments (dict) – set of assignments to force and resolve before returning query results.

Return type:

Dict

check_constraints(deployment_token, deployment_id, query_data)

Check for any constraints violated by the overrides.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – assignment overrides to the solution.

Return type:

Dict

predict_with_binary_data(deployment_token, deployment_id, blob, blob_key_name='blob')

Make predictions for a given blob, e.g. image, audio

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • blob (io.TextIOBase) – The multipart/form-data of the data

  • blob_key_name (str) – the key to access this blob data in the model query data

Return type:

Dict

create_prediction_metric(feature_group_id, prediction_metric_config, project_id=None)

Create a prediction metric job description for the given prediction and actual-labels data.

Parameters:
  • feature_group_id (str) – The feature group to use as input to the prediction metrics.

  • prediction_metric_config (dict) – Specification for prediction metric to run in this job.

  • project_id (str) – Project to use for the prediction metrics. Defaults to the project for the input feature_group, if the feature_group has exactly one project.

Returns:

The Prediction Metric job description.

Return type:

PredictionMetric

describe_prediction_metric(prediction_metric_id, should_include_latest_version_description=True)

Describe a Prediction Metric.

Parameters:
  • prediction_metric_id (str) – The unique ID associated with the prediction metric.

  • should_include_latest_version_description (bool) – include the description of the latest prediction metric version

Returns:

The prediction metric object.

Return type:

PredictionMetric

delete_prediction_metric(prediction_metric_id)

Removes an existing PredictionMetric.

Parameters:

prediction_metric_id (str) – The unique ID associated with the prediction metric.

run_prediction_metric(prediction_metric_id)

Creates a new prediction metrics job run for the given prediction metric job description, and starts that job.

Configures and starts the computations running to compute the prediciton metric.

Parameters:

prediction_metric_id (str) – The prediction metric job description to apply for configuring a prediction metric job.

Returns:

A prediction metric version. For more information, please refer to the details on the object (below).

Return type:

PredictionMetricVersion

delete_prediction_metric_version(prediction_metric_version)

Removes an existing prediction metric version.

Parameters:

prediction_metric_version (str) –

list_prediction_metric_versions(prediction_metric_id, limit=100, start_after_id=None)

List the prediction metric versions for a prediction metric.

Parameters:
  • prediction_metric_id (str) – The prediction metric whose instances will be listed.

  • limit (int) – The the number of prediction metric instances to be retrieved.

  • start_after_id (str) – An offset parameter to exclude all prediction metric versions till the specified prediction metric ID.

Returns:

The prediction metric instances for this prediction metric.

Return type:

PredictionMetricVersion

create_batch_prediction(deployment_id, table_name=None, name=None, global_prediction_args=None, explanations=False, output_format=None, output_location=None, database_connector_id=None, database_output_config=None, refresh_schedule=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None, output_includes_metadata=None, result_input_columns=None)

Creates a batch prediction job description for the given deployment.

Parameters:
  • deployment_id (str) – The unique identifier to a deployment.

  • table_name (str) – If specified, the name of the feature group table to write the results of the batch prediction. Can only be specified iff outputLocation and databaseConnectorId are not specified. If tableName is specified, the outputType will be enforced as CSV

  • name (str) – The name of batch prediction job.

  • global_prediction_args (dict) – Argument(s) to pass on every prediction call.

  • explanations (bool) – If true, will provide SHAP Explanations for each prediction, if supported by the use case.

  • output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON)

  • output_location (str) – If specified, the location to write the prediction results. Otherwise, results will be stored in Abacus.AI.

  • database_connector_id (str) – The unique identifier of an Database Connection to write predictions to. Cannot be specified in conjunction with outputLocation.

  • database_output_config (dict) – A key-value pair of columns/values to write to the database connector. Only available if databaseConnectorId is specified.

  • refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically run the batch prediction.

  • csv_input_prefix (str) – A prefix to prepend to the input columns, only applies when output format is CSV

  • csv_prediction_prefix (str) – A prefix to prepend to the prediction columns, only applies when output format is CSV

  • csv_explanations_prefix (str) – A prefix to prepend to the explanation columns, only applies when output format is CSV

  • output_includes_metadata (bool) – If true, output will contain columns including prediction start time, batch prediction version, and model version

  • result_input_columns (list) – If present, will limit result files or feature groups to only include columns present in this list

Returns:

The batch prediction description.

Return type:

BatchPrediction

start_batch_prediction(batch_prediction_id)

Creates a new batch prediction version job for a given batch prediction job description

Parameters:

batch_prediction_id (str) – The unique identifier of the batch prediction to create a new version of

Returns:

The batch prediction version started by this method call.

Return type:

BatchPredictionVersion

update_batch_prediction(batch_prediction_id, deployment_id=None, global_prediction_args=None, explanations=None, output_format=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None, output_includes_metadata=None, result_input_columns=None)

Updates a batch prediction job description

Parameters:
  • batch_prediction_id (str) – The unique identifier of the batch prediction.

  • deployment_id (str) – The unique identifier to a deployment.

  • global_prediction_args (dict) – Argument(s) to pass on every prediction call.

  • explanations (bool) – If true, will provide SHAP Explanations for each prediction, if supported by the use case.

  • output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).

  • csv_input_prefix (str) – A prefix to prepend to the input columns, only applies when output format is CSV

  • csv_prediction_prefix (str) – A prefix to prepend to the prediction columns, only applies when output format is CSV

  • csv_explanations_prefix (str) – A prefix to prepend to the explanation columns, only applies when output format is CSV

  • output_includes_metadata (bool) – If true, output will contain columns including prediction start time, batch prediction version, and model version

  • result_input_columns (list) – If present, will limit result files or feature groups to only include columns present in this list

Returns:

The batch prediction description.

Return type:

BatchPrediction

set_batch_prediction_file_connector_output(batch_prediction_id, output_format=None, output_location=None)

Updates the file connector output configuration of the batch prediction

Parameters:
  • batch_prediction_id (str) – The unique identifier of the batch prediction.

  • output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).

  • output_location (str) – If specified, the location to write the prediction results. Otherwise, results will be stored in Abacus.AI.

Returns:

The batch prediction description.

Return type:

BatchPrediction

set_batch_prediction_database_connector_output(batch_prediction_id, database_connector_id=None, database_output_config=None)

Updates the database connector output configuration of the batch prediction

Parameters:
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • database_connector_id (str) – The unique identifier of an Database Connection to write predictions to.

  • database_output_config (dict) – A key-value pair of columns/values to write to the database connector

Returns:

The batch prediction description.

Return type:

BatchPrediction

set_batch_prediction_feature_group_output(batch_prediction_id, table_name)

Creates a feature group and sets it to be the batch prediction output

Parameters:
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • table_name (str) – The name of the feature group table to create

Returns:

The batch prediction after the output has been applied

Return type:

BatchPrediction

set_batch_prediction_output_to_console(batch_prediction_id)

Sets the batch prediction output to the console, clearing both the file connector and database connector config

Parameters:

batch_prediction_id (str) – The unique identifier of the batch prediction

Returns:

The batch prediction description.

Return type:

BatchPrediction

set_batch_prediction_dataset(batch_prediction_id, dataset_type, dataset_id=None)

[Deprecated] Sets the batch prediction input dataset. Only applicable for legacy dataset-based projects

Parameters:
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • dataset_type (str) – The dataset type to set

  • dataset_id (str) – The dataset to set

Returns:

The batch prediction description.

Return type:

BatchPrediction

set_batch_prediction_feature_group(batch_prediction_id, feature_group_type, feature_group_id=None)

Sets the batch prediction input feature group.

Parameters:
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • feature_group_type (str) – The feature group type to set. The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.

  • feature_group_id (str) – The feature group to set as input to the batch prediction

Returns:

The batch prediction description.

Return type:

BatchPrediction

set_batch_prediction_dataset_remap(batch_prediction_id, dataset_id_remap)

For the purpose of this batch prediction, will swap out datasets in the input feature groups

Parameters:
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • dataset_id_remap (dict) – Key/value pairs of dataset_ids to replace during batch predictions

Returns:

Batch Prediction object

Return type:

BatchPrediction

delete_batch_prediction(batch_prediction_id)

Deletes a batch prediction and associated data such as associated monitors.

Parameters:

batch_prediction_id (str) – The unique identifier of the batch prediction

add_user_item_interaction(streaming_token, dataset_id, timestamp, user_id, item_id, event_type, additional_attributes)

Adds a user-item interaction record (data row) to a streaming dataset.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • timestamp (int) – The unix timestamp of the event.

  • user_id (str) – The unique identifier for the user.

  • item_id (list) – The unique identifier for the items

  • event_type (str) – The event type.

  • additional_attributes (dict) – Attributes of the user interaction.

upsert_user_attributes(streaming_token, dataset_id, user_id, user_attributes)

Adds a user attributes record (data row) to a streaming dataset.

Either the streaming dataset ID or the project ID is required.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • user_id (str) – The unique identifier for the user.

  • user_attributes (dict) – Attributes of the user interaction.

upsert_item_attributes(streaming_token, dataset_id, item_id, item_attributes)

Adds an item attributes record (data row) to a streaming dataset.

Either the streaming dataset ID or the project ID is required.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • item_id (str) – The unique identifier for the item.

  • item_attributes (dict) – Attributes of the item interaction.

add_multiple_user_item_interactions(streaming_token, dataset_id, interactions)

Adds a user-item interaction record (data row) to a streaming dataset.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • interactions (list) – List of interactions, each interaction of format {‘userId’: userId, ‘timestamp’: timestamp, ‘itemId’: itemId, ‘eventType’: eventType, ‘additionalAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}

upsert_multiple_user_attributes(streaming_token, dataset_id, upserts)

Adds multiple user attributes records (data row) to a streaming dataset.

The streaming dataset ID is required.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • upserts (list) – List of upserts, each upsert of format {‘userId’: userId, ‘userAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}.

upsert_multiple_item_attributes(streaming_token, dataset_id, upserts)

Adds multiple item attributes records (data row) to a streaming dataset.

The streaming dataset ID is required.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • upserts (list) – List of upserts, each upsert of format {‘itemId’: itemId, ‘itemAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}.

upsert_item_embeddings(streaming_token, model_id, item_id, vector, catalog_id=None)

Upserts an embedding vector for an item id for a model_id.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the model.

  • model_id (str) – The model id to upsert item embeddings to.

  • item_id (str) – The item id for which its embeddings will be upserted.

  • vector (list) – The embedding vector.

  • catalog_id (str) – Optional name to specify which catalog in a model to update.

delete_item_embeddings(streaming_token, model_id, item_ids, catalog_id=None)

Deletes knn embeddings for a list of item ids for a model_id.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the model.

  • model_id (str) – The model id to delete item embeddings from.

  • item_ids (list) – A list of item ids for which its embeddings will be deleted.

  • catalog_id (str) – Optional name to specify which catalog in a model to update.

upsert_multiple_item_embeddings(streaming_token, model_id, upserts, catalog_id=None)

Upserts a knn embedding for multiple item ids for a model_id.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests to the model.

  • model_id (str) – The model id to upsert item embeddings to.

  • upserts (list) – A list of {‘itemId’: …, ‘vector’: […]} dicts for each upsert.

  • catalog_id (str) – Optional name to specify which catalog in a model to update.

upsert_data(feature_group_id, streaming_token, data)

Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.

Parameters:
  • feature_group_id (str) – The Streaming feature group to record data to

  • streaming_token (str) – The streaming token for authenticating requests

  • data (dict) – The data to record

append_data(feature_group_id, streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters:
  • feature_group_id (str) – The Streaming feature group to record data to

  • streaming_token (str) – The streaming token for authenticating requests

  • data (dict) – The data to record

upsert_multiple_data(feature_group_id, streaming_token, data)

Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.

Parameters:
  • feature_group_id (str) – The Streaming feature group to record data to

  • streaming_token (str) – The streaming token for authenticating requests

  • data (dict) – The data to record, as an array of JSON Objects

append_multiple_data(feature_group_id, streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters:
  • feature_group_id (str) – The Streaming feature group to record data to

  • streaming_token (str) – The streaming token for authenticating requests

  • data (list) – The data to record, as an array of JSON objects

create_python_function(name, source_code=None, function_name=None, function_variable_mappings=None, package_requirements=None)

Creates a custom python function that’s re-usable

Parameters:
  • name (str) – The name to identify the python function

  • source_code (str) – Contents of a valid python source code file. The source code should contain the transform feature group functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • function_name (str) – The name of the python function.

  • function_variable_mappings (list) – List of python function arguments.

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns:

The python function that can be used (i.e. for feature group transform)

Return type:

PythonFunction

update_python_function(name, source_code=None, function_name=None, function_variable_mappings=None, package_requirements=None)

Update custom python function with user inputs for the given python function.

Parameters:
  • name (str) – The name to identify the python function

  • source_code (str) – Contents of a valid python source code file. The source code should contain the transform feature group functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • function_name (str) – The name of the python function.

  • function_variable_mappings (list) – List of python function arguments

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns:

The python_function object.

Return type:

PythonFunction

delete_python_function(name)

Removes an existing python function.

Parameters:

name (str) – The name to identify the python function

create_algorithm(name, problem_type, source_code=None, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, config_options=None, is_default_enabled=False, project_id=None, use_gpu=False)

Creates a custome algorithm that’s re-usable for model training

Parameters:
  • name (str) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed

  • problem_type (str) – The type of the problem this algorithm will work on

  • source_code (str) – Contents of a valid python source code file. The source code should contain the train/predict/predict_many/initialize functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • training_data_parameter_names_mapping (dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name (str) – The train config parameter name in the train function

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • config_options (dict) – Map dataset types and configs to train function parameter names

  • is_default_enabled (bool) – Whether train with the algorithm by default

  • project_id (str) – The unique version ID of the project

  • use_gpu (bool) – Whether this algorithm needs to run on GPU

Returns:

The new customer model can be used for training

Return type:

Algorithm

delete_algorithm(algorithm)

Deletes the specified customer algorithm.

Parameters:

algorithm (str) – The name of the algorithm to delete.

update_algorithm(algorithm, source_code=None, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, config_options=None, is_default_enabled=None, use_gpu=None)

Update custome algorithm for the given algorithm name. If source_code is provided, also need to provide all the function names in the source_code.

Parameters:
  • algorithm (str) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed

  • source_code (str) – Contents of a valid python source code file. The source code should contain the train/predict/predict_many/initialize functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • training_data_parameter_names_mapping (dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name (str) – The train config parameter name in the train function

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • config_options (dict) – Map dataset types and configs to train function parameter names

  • is_default_enabled (bool) – Whether train with the algorithm by default

  • use_gpu (bool) – Whether this algorithm needs to run on GPU

Returns:

The new customer model can be used for training

Return type:

Algorithm

create_custom_loss_function_with_source_code(name, loss_function_type, loss_function_name, loss_function_source_code)

Registers a new custom loss function which can be used as an objective function during model training.

Parameters:
  • name (str) – A name for the loss. Should be unique per organization. Limit - 50 chars. Only underscores, numbers, uppercase alphabets allowed

  • loss_function_type (str) – The category of problems that this loss would be applicable to. Ex - REGRESSION_DL_TF, CLASSIFICATION_DL_TF, etc

  • loss_function_name (str) – The name of the function whose full source code is passed in loss_function_source_code

  • loss_function_source_code (str) – Python source code string of the function

Returns:

A description of the registered custom loss function

Return type:

CustomLossFunction

update_custom_loss_function_with_source_code(name, loss_function_name, loss_function_source_code)

Updates a previously registered custom loss function with a new function implementation.

Parameters:
  • name (str) – name of the registered custom loss.

  • loss_function_name (str) – The name of the function whose full source code is passed in loss_function_source_code

  • loss_function_source_code (str) – Python source code string of the function

Returns:

A description of the updated custom loss function

Return type:

CustomLossFunction

delete_custom_loss_function(name)

Deletes a previously registered custom loss function.

Parameters:

name (str) – The name of the custom loss function to be deleted

exception abacusai.ApiException(message, http_status, exception=None)

Bases: Exception

Default ApiException raised by APIs

Parameters:
  • message (str) – The error message

  • http_status (int) – The https status code raised by the server

  • exception (str) – The exception class raised by the server

__str__()

Return str(self).

class abacusai.ClientOptions(exception_on_404=True, server=DEFAULT_SERVER)

Options for configuring the ApiClient

Parameters:
  • exception_on_404 (bool) – If true, will raise an exception on a 404 from the server, else will return None.

  • server (str) – The default server endpoint to use for API requests

class abacusai.ReadOnlyClient(api_key=None, server=None, client_options=None, skip_version_check=False)

Bases: BaseApiClient

Abacus.AI Read Only API Client. Only contains GET methods

Parameters:
  • api_key (str) – The api key to use as authentication to the server

  • server (str) – The base server url to use to send API requets to

  • client_options (ClientOptions) – Optional API client configurations

  • skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client

list_api_keys()

Lists all of the user’s API keys the user’s organization.

Returns:

List of API Keys for this user.

Return type:

ApiKey

list_organization_users()

Retrieves a list of all users in the organization.

This method will retrieve a list containing all the users in the organization. The list includes pending users who have been invited to the organization.

Returns:

Array of all of the users in the Organization

Return type:

User

describe_user()

Get the current user’s information, such as their name, email, admin status, etc.

Returns:

Information about the current User

Return type:

User

list_organization_groups()

Lists all Organizations Groups within this Organization

Returns:

List of Groups in this Organization

Return type:

OrganizationGroup

describe_organization_group(organization_group_id)

Returns the specific organization group passes in by the user.

Parameters:

organization_group_id (str) – The unique ID of the organization group to that needs to be described.

Returns:

Information about a specific Organization Group

Return type:

OrganizationGroup

describe_webhook(webhook_id)

Describe the webhook with a given id.

Parameters:

webhook_id (str) – ID of target webhook.

Returns:

The Webhook with the given id.

Return type:

Webhook

list_deployment_webhooks(deployment_id)

List and describe all the webhooks attached to a given deployment ID.

Parameters:

deployment_id (str) – ID of target deployment.

Returns:

The webhooks attached to the given deployment id.

Return type:

Webhook

list_use_cases()

Retrieves a list of all use cases with descriptions. Use the given mappings to specify a use case when needed.

Returns:

A list of UseCase objects describing all the use cases addressed by the platform. For details, please refer to

Return type:

UseCase

describe_problem_type(problem_type)
Parameters:

problem_type (str) –

Returns:

None

Return type:

ProblemType

describe_use_case_requirements(use_case)

This API call returns the feature requirements for a specified use case

Parameters:

use_case (str) – This will contain the Enum String for the use case whose dataset requirements are needed.

Returns:

The feature requirements of the use case are returned. This includes all the feature group required for the use case along with their descriptions and feature mapping details.

Return type:

UseCaseRequirements

describe_project(project_id)

Returns a description of a project.

Parameters:

project_id (str) – The unique project ID

Returns:

The project description is returned.

Return type:

Project

list_projects(limit=100, start_after_id=None)

Retrieves a list of all projects in the current organization.

Parameters:
  • limit (int) – The max length of the list of projects.

  • start_after_id (str) – The ID of the project after which the list starts.

Returns:

An array of all projects in the Organization the user is currently logged in to.

Return type:

Project

list_project_datasets(project_id)

Retrieves all dataset(s) attached to a specified project. This API returns all attributes of each dataset, such as its name, type, and ID.

Parameters:

project_id (str) – The unique ID associated with the project.

Returns:

An array representing all of the datasets attached to the project.

Return type:

ProjectDataset

get_schema(project_id, dataset_id)

[DEPRECATED] Returns a schema given a specific dataset in a project. The schema of the dataset consists of the columns in the dataset, the data type of the column, and the column’s column mapping.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • dataset_id (str) – The unique ID associated with the dataset.

Returns:

An array of objects for each column in the specified dataset.

Return type:

Schema

validate_project(project_id, feature_group_ids=None)

Validates that the specified project has all required feature group types for its use case and that all required feature columns are set.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • feature_group_ids (list) – The feature group IDS to validate

Returns:

The project validation. If the specified project is missing required columns or feature groups, the response includes an array of objects for each missing required feature group and the missing required features in each feature group.

Return type:

ProjectValidation

get_feature_group_schema(feature_group_id, project_id=None)

Returns a schema given a specific FeatureGroup in a project.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

Returns:

An array of objects for each column in the specified feature group.

Return type:

Feature

describe_feature_group(feature_group_id)

Describe a Feature Group.

Parameters:

feature_group_id (str) – The unique ID associated with the feature group.

Returns:

The feature group object.

Return type:

FeatureGroup

describe_feature_group_by_table_name(table_name)

Describe a Feature Group by the feature group’s table name

Parameters:

table_name (str) – The unique table name of the Feature Group to lookup

Returns:

The Feature Group

Return type:

FeatureGroup

list_feature_groups(limit=100, start_after_id=None, feature_group_template_id=None, is_including_detached_from_template=False)

Enlist all the feature groups associated with a project. A user needs to specify the unique project ID to fetch all attached feature groups.

Parameters:
  • limit (int) – The the number of feature groups to be retrieved.

  • start_after_id (str) – An offset parameter to exclude all feature groups till a specified ID.

  • feature_group_template_id (str) – If specified, limit results to feature groups attached to this template id.

  • is_including_detached_from_template (bool) – When feature_group_template_id is specified, include feature groups that were detached from that template id.

Returns:

All the feature groups in the organization

Return type:

FeatureGroup

list_project_feature_groups(project_id, filter_feature_group_use=None)

List all the feature groups associated with a project

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • filter_feature_group_use (str) – The feature group use filter, when given as an argument, only allows feature groups in this project to be returned if they are of the given use. DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT, BATCH_PREDICTION_OUTPUT

Returns:

All the Feature Groups in the Organization

Return type:

FeatureGroup

list_python_function_feature_groups(name, limit=100)

Enlist all the feature groups associated with a python function. A user needs to specify the unique python function ID to fetch all attached feature groups.

Parameters:
  • name (str) – The name to identify the python function.

  • limit (int) – The the number of feature groups to be retrieved.

Returns:

All the feature groups associated with a python function id.

Return type:

FeatureGroup

get_feature_group_version_export_download_url(feature_group_export_id)

Get a link to download the feature group version.

Parameters:

feature_group_export_id (str) – The Feature Group Export to get signed url for.

Returns:

The FeatureGroupExportDownloadUrl instance, which contains the download URL and expiration time.

Return type:

FeatureGroupExportDownloadUrl

describe_feature_group_export(feature_group_export_id)

A feature group export

Parameters:

feature_group_export_id (str) – The ID of the feature group export.

Returns:

The feature group export

Return type:

FeatureGroupExport

list_feature_group_exports(feature_group_id)

Lists all of the feature group exports for a given feature group

Parameters:

feature_group_id (str) – The ID of the feature group

Returns:

The feature group exports

Return type:

FeatureGroupExport

get_feature_group_export_connector_errors(feature_group_export_id)

Returns a stream containing the feature group export database connection write errors, if any writes failed to the database connector

Parameters:

feature_group_export_id (str) – The ID of the feature group export to get the errors for

Return type:

io.BytesIO

list_feature_group_modifiers(feature_group_id)

To list users who can modify a feature group.

Parameters:

feature_group_id (str) – The unique ID associated with the feature group.

Returns:

Modification lock status and groups and organizations added to the feature group.

Return type:

ModificationLockInfo

get_materialization_logs(feature_group_version, stdout=False, stderr=False)

Returns logs for materialized feature group version.

Parameters:
  • feature_group_version (str) – The Feature Group instance to export

  • stdout (bool) – Set True to get info logs

  • stderr (bool) – Set True to get error logs

Returns:

A function logs.

Return type:

FunctionLogs

list_feature_group_versions(feature_group_id, limit=100, start_after_version=None)

Retrieves a list of all feature group versions for the specified feature group.

Parameters:
  • feature_group_id (str) – The unique ID associated with the feature group.

  • limit (int) – The max length of the returned versions

  • start_after_version (str) – Results will start after this version

Returns:

An array of feature group version.

Return type:

FeatureGroupVersion

describe_feature_group_version(feature_group_version)

Get a specific feature group version.

Parameters:

feature_group_version (str) – The unique ID associated with the feature group version.

Returns:

A feature group version.

Return type:

FeatureGroupVersion

describe_feature_group_template(feature_group_template_id)

Describe a Feature Group Template.

Parameters:

feature_group_template_id (str) – The unique ID of a feature group template.

Returns:

The feature group template object.

Return type:

FeatureGroupTemplate

list_feature_group_templates(limit=100, start_after_id=None, feature_group_id=None, should_include_system_templates=False)

List feature group templates, optionally scoped by the feature group that created the templates.

Parameters:
  • limit (int) – The maximum number of templates to be retrieved.

  • start_after_id (str) – An offset parameter to exclude all templates till the specified feature group template ID.

  • feature_group_id (str) – If specified, limit to templates created from this feature group.

  • should_include_system_templates (bool) –

Returns:

All the feature groups in the organization, optionally limited by the feature group that created the template(s).

Return type:

FeatureGroupTemplate

list_project_feature_group_templates(project_id, limit=100, start_after_id=None, should_include_all_system_templates=False)

List feature group templates for feature groups associated with the project.

Parameters:
  • project_id (str) – Limit to templates associated with this project, e.g. templates associated with feature groups in this project.

  • limit (int) – The maximum number of templates to be retrieved.

  • start_after_id (str) – An offset parameter to exclude all templates till the specified feature group template ID.

  • should_include_all_system_templates (bool) –

Returns:

All the feature groups in the organization, optionally limited by the feature group that created the template(s).

Return type:

FeatureGroupTemplate

suggest_feature_group_template_for_feature_group(feature_group_id)

Suggest values for a feature gruop template, based on a feature group.

Parameters:

feature_group_id (str) – The unique ID associated with the feature group to use for suggesting values to use for the template.

Returns:

None

Return type:

FeatureGroupTemplate

get_dataset_schema(dataset_id)

Retrieves the column schema of a dataset

Parameters:

dataset_id (str) – The Dataset schema to lookup.

Returns:

List of Column schema definitions

Return type:

DatasetColumn

get_file_connector_instructions(bucket, write_permission=False)

Retrieves verification information to create a data connector to a cloud storage bucket.

Parameters:
  • bucket (str) – The fully qualified URI of the storage bucket to verify.

  • write_permission (bool) – If true, instructions will include steps for allowing Abacus.AI to write to this service.

Returns:

An object with full description of the cloud storage bucket authentication options and bucket policy. Returns an error message if the parameters are invalid.

Return type:

FileConnectorInstructions

list_database_connectors()

Retrieves a list of all of the database connectors along with all their attributes.

Returns:

The database Connector

Return type:

DatabaseConnector

list_file_connectors()

Retrieves a list of all connected services in the organization and their current verification status.

Returns:

An array of cloud storage buckets connected to the organization.

Return type:

FileConnector

list_database_connector_objects(database_connector_id)

Lists querable objects in the database connector.

Parameters:

database_connector_id (str) – The unique identifier for the database connector.

Return type:

List[str]

get_database_connector_object_schema(database_connector_id, object_name=None)

Get the schema of an object in an database connector.

Parameters:
  • database_connector_id (str) – The unique identifier for the database connector.

  • object_name (str) – The unique identifier for the object in the external system.

Return type:

List[str]

list_application_connectors()

Retrieves a list of all of the application connectors along with all their attributes.

Returns:

The appplication Connector

Return type:

ApplicationConnector

list_application_connector_objects(application_connector_id)

Lists querable objects in the application connector.

Parameters:

application_connector_id (str) – The unique identifier for the application connector.

Return type:

List[str]

list_streaming_connectors()

Retrieves a list of all of the streaming connectors along with all their attributes.

Returns:

The streaming Connector

Return type:

StreamingConnector

list_streaming_tokens()

Retrieves a list of all streaming tokens along with their attributes.

Returns:

An array of streaming tokens.

Return type:

StreamingAuthToken

get_recent_feature_group_streamed_data(feature_group_id)

Returns recently streamed data to a streaming feature group.

Parameters:

feature_group_id (str) – The unique ID associated with the feature group.

list_uploads()

Lists all ongoing uploads in the organization

Returns:

An array of uploads.

Return type:

Upload

describe_upload(upload_id)

Retrieves the current upload status (complete or inspecting) and the list of file parts uploaded for a specified dataset upload.

Parameters:

upload_id (str) – The unique ID associated with the file uploaded or being uploaded in parts.

Returns:

The details associated with the large dataset file uploaded in parts.

Return type:

Upload

list_datasets(limit=100, start_after_id=None, exclude_streaming=False)

Retrieves a list of all of the datasets in the organization.

Parameters:
  • limit (int) – The max length of the list of projects.

  • start_after_id (str) – The ID of the project after which the list starts.

  • exclude_streaming (bool) – Exclude streaming datasets from result.

Returns:

A list of datasets.

Return type:

Dataset

describe_dataset(dataset_id)

Retrieves a full description of the specified dataset, with attributes such as its ID, name, source type, etc.

Parameters:

dataset_id (str) – The unique ID associated with the dataset.

Returns:

The dataset.

Return type:

Dataset

describe_dataset_version(dataset_version)

Retrieves a full description of the specified dataset version, with attributes such as its ID, name, source type, etc.

Parameters:

dataset_version (str) – The unique ID associated with the dataset version.

Returns:

The dataset version.

Return type:

DatasetVersion

list_dataset_versions(dataset_id, limit=100, start_after_version=None)

Retrieves a list of all dataset versions for the specified dataset.

Parameters:
  • dataset_id (str) – The unique ID associated with the dataset.

  • limit (int) – The max length of the list of all dataset versions.

  • start_after_version (str) – The id of the version after which the list starts.

Returns:

A list of dataset versions.

Return type:

DatasetVersion

describe_train_test_data_split_feature_group(model_id)

Get the train and test data split for a trained model by model id. Only supported for models with custom algorithms.

Parameters:

model_id (str) – The unique ID of the model. By default will return for latest model version if version is not specified.

Returns:

The feature group containing the training data and folds information.

Return type:

FeatureGroup

describe_train_test_data_split_feature_group_version(model_version)

Get the train and test data split for a trained model by model_version. Only supported for models with custom algorithms.

Parameters:

model_version (str) – The unique version ID of the model version

Returns:

The feature group version containing the training data and folds information.

Return type:

FeatureGroupVersion

list_models(project_id)

Retrieves the list of models in the specified project.

Parameters:

project_id (str) – The unique ID associated with the project.

Returns:

An array of models.

Return type:

Model

describe_model(model_id)

Retrieves a full description of the specified model.

Parameters:

model_id (str) – The unique ID associated with the model.

Returns:

The description of the model.

Return type:

Model

get_model_metrics(model_id, model_version=None, baseline_metrics=False)

Retrieves a full list of the metrics for the specified model.

If only the model’s unique identifier (modelId) is specified, the latest trained version of model (modelVersion) is used.

Parameters:
  • model_id (str) – The unique ID associated with the model.

  • model_version (str) – The version of the model.

  • baseline_metrics (bool) – If true, will also return the baseline model metrics for comparison.

Returns:

An object to show the model metrics and explanations for what each metric means.

Return type:

ModelMetrics

list_model_versions(model_id, limit=100, start_after_version=None)

Retrieves a list of the version for a given model.

Parameters:
  • model_id (str) – The unique ID associated with the model.

  • limit (int) – The max length of the list of all dataset versions.

  • start_after_version (str) – The id of the version after which the list starts.

Returns:

An array of model versions.

Return type:

ModelVersion

describe_model_version(model_version)

Retrieves a full description of the specified model version

Parameters:

model_version (str) – The unique version ID of the model version

Returns:

A model version.

Return type:

ModelVersion

get_training_data_logs(model_version)

Retrieves the data preparation logs during model training.

Parameters:

model_version (str) – The unique version ID of the model version

Returns:

A list of logs.

Return type:

DataPrepLogs

set_default_model_algorithm(model_id=None, algorithm=None)

Sets the model’s algorithm to default for all new deployments

Parameters:
  • model_id (str) – The model to set

  • algorithm (str) – the algorithm to pin in the model

  • model_id

  • algorithm

get_training_logs(model_version, stdout=False, stderr=False)

Returns training logs for the model.

Parameters:
  • model_version (str) – The unique version ID of the model version

  • stdout (bool) – Set True to get info logs

  • stderr (bool) – Set True to get error logs

Returns:

A function logs.

Return type:

FunctionLogs

ignore_lofo_features(model_version, threshold=None, top_n=0)
Parameters:
  • model_version (str) –

  • threshold (float) –

  • top_n (int) –

list_model_monitors(project_id)

Retrieves the list of models monitors in the specified project.

Parameters:

project_id (str) – The unique ID associated with the project.

Returns:

An array of model monitors.

Return type:

ModelMonitor

describe_model_monitor(model_monitor_id)

Retrieves a full description of the specified model monitor.

Parameters:

model_monitor_id (str) – The unique ID associated with the model monitor.

Returns:

The description of the model monitor.

Return type:

ModelMonitor

get_prediction_drift(model_monitor_version)

Gets the label and prediction drifts for a model monitor.

Parameters:

model_monitor_version (str) – The unique identifier to a model monitor version created under the project.

Returns:

An object describing training and prediction output label and prediction distributions.

Return type:

DriftDistributions

get_model_monitor_summary(model_monitor_id)

Gets the summary of a model monitor across versions.

Parameters:

model_monitor_id (str) – The unique ID associated with the model monitor.

Returns:

An object describing integrity, bias violations, model accuracy, and drift for a model monitor.

Return type:

ModelMonitorSummary

list_model_monitor_versions(model_monitor_id, limit=100, start_after_version=None)

Retrieves a list of the versions for a given model monitor.

Parameters:
  • model_monitor_id (str) – The unique ID associated with the model monitor.

  • limit (int) – The max length of the list of all model monitor versions.

  • start_after_version (str) – The id of the version after which the list starts.

Returns:

An array of model monitor versions.

Return type:

ModelMonitorVersion

describe_model_monitor_version(model_monitor_version)

Retrieves a full description of the specified model monitor version

Parameters:

model_monitor_version (str) – The unique version ID of the model monitor version

Returns:

A model monitor version.

Return type:

ModelMonitorVersion

model_monitor_version_metric_data(model_monitor_version, metric_type, actual_values_to_detail=None)

Provides the data needed for decile metrics associated with the model monitor.

Parameters:
  • model_monitor_version (str) – Model monitor version id.

  • metric_type (str) – The metric type to get data for.

  • actual_values_to_detail (list) –

Returns:

Data associated with the metric.

Return type:

ModelMonitorVersionMetricData

list_organization_model_monitors(only_starred=False)

Gets a list of model monitors for an organization.

Parameters:

only_starred (None) – Return only starred model monitors. Defaults to False.

Returns:

An array of model monitors.

Return type:

ModelMonitor

get_model_monitor_chart_from_organization(organization_id, chart_type, limit=15)

Gets a list of model monitor summaries across monitors for an organization.

Parameters:
  • organization_id (str) – The unique ID associated with the organization.

  • chart_type (str) – The type of chart (model_accuracy, bias_violations, data_integrity, or model_drift) to return.

  • limit (int) – The max length of the model monitors.

Returns:

A list of ModelMonitorSummaryForOrganization objects describing accuracy, bias, drift, or integrity for all model monitors in an organization.

Return type:

ModelMonitorSummaryFromOrg

get_model_monitor_summary_from_organization()

Gets a consolidated summary of model monitors for an organization.

Returns:

A list of ModelMonitorSummaryForOrganization objects describing accuracy, bias, drift, and integrity for all model monitors in an organization.

Return type:

ModelMonitorOrgSummary

describe_monitor_alert(monitor_alert_id)

Describes a given monitor alert id

Parameters:

monitor_alert_id (str) – The unique identifier to a monitor alert

Returns:

An object describing the monitor alert

Return type:

MonitorAlert

describe_monitor_alert_version(monitor_alert_version)

Describes a given monitor alert version id

Parameters:

monitor_alert_version (str) – The unique identifier to a monitor alert

Returns:

An object describing the monitor alert version

Return type:

MonitorAlertVersion

list_monitor_alerts_for_monitor(model_monitor_id)

Retrieves the list of monitor alerts for a specified monitor

Parameters:

model_monitor_id (str) – The unique ID associated with the model monitor.

Returns:

An array of monitor alerts.

Return type:

MonitorAlert

list_monitor_alert_versions_for_monitor_version(model_monitor_version)

Retrieves the list of monitor alerts version for a specified monitor instance

Parameters:

model_monitor_version (str) – The unique ID associated with the model monitor.

Returns:

An array of monitor alerts.

Return type:

MonitorAlertVersion

get_model_monitoring_logs(model_monitor_version, stdout=False, stderr=False)

Returns monitoring logs for the model.

Parameters:
  • model_monitor_version (str) – The unique version ID of the model monitor version

  • stdout (bool) – Set True to get info logs

  • stderr (bool) – Set True to get error logs

Returns:

A function logs.

Return type:

FunctionLogs

get_drift_for_feature(model_monitor_version, feature_name)

Gets the feature drift associated with a single feature in an output feature group from a prediction.

Parameters:
  • model_monitor_version (str) – The unique identifier to a model monitor version created under the project.

  • feature_name (str) – Name of the feature to view the distribution of.

Return type:

Dict

get_outliers_for_feature(model_monitor_version, feature_name=None)

Gets a list of outliers measured by a single feature (or overall) in an output feature group from a prediction.

Parameters:
  • model_monitor_version (str) – The unique identifier to a model monitor version created under the project.

  • feature_name (str) – Name of the feature to view the distribution of.

Return type:

Dict

get_outliers_for_batch_prediction_feature(batch_prediction_version, feature_name=None)

Gets a list of outliers measured by a single feature (or overall) in an output feature group from a prediction.

Parameters:
  • batch_prediction_version (str) – The unique identifier to a batch prediction version created under the project.

  • feature_name (str) – Name of the feature to view the distribution of.

Return type:

Dict

describe_deployment(deployment_id)

Retrieves a full description of the specified deployment.

Parameters:

deployment_id (str) – The unique ID associated with the deployment.

Returns:

The description of the deployment.

Return type:

Deployment

list_deployments(project_id)

Retrieves a list of all deployments in the specified project.

Parameters:

project_id (str) – The unique ID associated with the project.

Returns:

An array of deployments.

Return type:

Deployment

list_deployment_tokens(project_id)

Retrieves a list of all deployment tokens in the specified project.

Parameters:

project_id (str) – The unique ID associated with the project.

Returns:

An array of deployment tokens.

Return type:

DeploymentAuthToken

get_model_training_types_for_deployment(model_id, model_version=None, algorithm=None)

Returns types of models we can deploy for given model instance id

Parameters:
  • model_id (str) – The unique ID associated with the model.

  • model_version (str) – The unique ID associated with the model version to deploy.

  • algorithm (str) – The unique ID associated with the algorithm to deploy.

Returns:

Model training types for deployment

Return type:

ModelTrainingTypeForDeployment

describe_refresh_policy(refresh_policy_id)

Retrieve a single refresh policy

Parameters:

refresh_policy_id (str) – The unique ID associated with this refresh policy

Returns:

A refresh policy object

Return type:

RefreshPolicy

describe_refresh_pipeline_run(refresh_pipeline_run_id)

Retrieve a single refresh pipeline run

Parameters:

refresh_pipeline_run_id (str) – The unique ID associated with this refresh pipeline_run

Returns:

A refresh pipeline run object

Return type:

RefreshPipelineRun

list_refresh_policies(project_id=None, dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], model_monitor_ids=[], prediction_metric_ids=[])

List the refresh policies for the organization

Parameters:
  • project_id (str) – Optionally, a Project ID can be specified so that all datasets, models and deployments are captured at the instant this policy was created

  • dataset_ids (list) – Comma separated list of Dataset IDs

  • model_ids (list) – Comma separated list of Model IDs

  • deployment_ids (list) – Comma separated list of Deployment IDs

  • batch_prediction_ids (list) – Comma separated list of Batch Prediction IDs

  • model_monitor_ids (list) – Comma separated list of Model Monitor IDs.

  • prediction_metric_ids (list) – Comma separated list of Prediction Metric IDs,

Returns:

List of all refresh policies in the organization

Return type:

RefreshPolicy

list_refresh_pipeline_runs(refresh_policy_id)

List the the times that the refresh policy has been run

Parameters:

refresh_policy_id (str) – The unique ID associated with this refresh policy

Returns:

A list of refresh pipeline runs for the given refresh policy id

Return type:

RefreshPipelineRun

list_prediction_metrics(feature_group_id, limit=100, should_include_latest_version_description=True, start_after_id=None)

List the prediction metrics for a feature group.

Parameters:
  • feature_group_id (str) – The feature group used as input to this prediction metric.

  • limit (int) – The the number of prediction metrics to be retrieved.

  • should_include_latest_version_description (bool) – include the description of the latest prediction metric version for each prediction metric

  • start_after_id (str) – An offset parameter to exclude all prediction metrics till the specified prediction metric ID.

Returns:

The prediction metrics for this feature group.

Return type:

PredictionMetric

query_prediction_metrics(feature_group_id=None, project_id=None, limit=100, should_include_latest_version_description=True, start_after_id=None)

Query and return prediction metrics and extra data needed by the UI, constrained by the parameters provided.

feature_group_id (Unique String Identifier): [optional] The feature group used as input to the prediction metrics.

project_id (Unique String Identifier): [optional] The project_id of the prediction metrics. limit (Integer): The the number of prediction metrics to be retrieved. should_include_latest_version_description (Boolean): include the description of the latest prediction metric version for each prediction metric start_after_id (Unique String Identifier): An offset parameter to exclude all prediction metrics till the specified prediction metric ID.

Parameters:
  • feature_group_id (str) –

  • project_id (str) –

  • limit (int) –

  • should_include_latest_version_description (bool) –

  • start_after_id (str) –

Returns:

The prediction metrics for this feature group.

Return type:

PredictionMetric

describe_prediction_metric_version(prediction_metric_version)

Retrieves a full description of the specified prediction metric version

Parameters:

prediction_metric_version (str) – The unique version ID of the prediction metric version

Returns:

A prediction metric version. For more information, please refer to the details on the object (below).

Return type:

PredictionMetricVersion

download_batch_prediction_result_chunk(batch_prediction_version, offset=0, chunk_size=10485760)

Returns a stream containing the batch prediction results

Parameters:
  • batch_prediction_version (str) – The unique identifier of the batch prediction version to get the results from

  • offset (int) – The offset to read from

  • chunk_size (int) – The max amount of data to read

Return type:

io.BytesIO

get_batch_prediction_connector_errors(batch_prediction_version)

Returns a stream containing the batch prediction database connection write errors, if any writes failed to the database connector

Parameters:

batch_prediction_version (str) – The unique identifier of the batch prediction job to get the errors for

Return type:

io.BytesIO

list_batch_predictions(project_id)

Retrieves a list for the batch predictions in the project

Parameters:

project_id (str) – The unique identifier of the project

Returns:

A list of batch prediction jobs.

Return type:

BatchPrediction

describe_batch_prediction(batch_prediction_id)

Describes the batch prediction

Parameters:

batch_prediction_id (str) – The unique ID associated with the batch prediction.

Returns:

The batch prediction description.

Return type:

BatchPrediction

list_batch_prediction_versions(batch_prediction_id, limit=100, start_after_version=None)

Retrieves a list of versions of a given batch prediction

Parameters:
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • limit (int) – The number of versions to list

  • start_after_version (str) – The version to start after

Returns:

A list of batch prediction versions.

Return type:

BatchPredictionVersion

describe_batch_prediction_version(batch_prediction_version)

Describes a batch prediction version

Parameters:

batch_prediction_version (str) – The unique identifier of the batch prediction version

Returns:

The batch prediction version.

Return type:

BatchPredictionVersion

describe_python_function(name)

Describe a Python Function.

Parameters:

name (str) – The name to identify the python function

Returns:

The python_function object.

Return type:

PythonFunction

list_python_functions()

List all python functions within the organization.

Returns:

A list of python functions.

Return type:

PythonFunction

describe_algorithm(algorithm)

Retrieves a full description of the specified algorithm.

Parameters:

algorithm (str) – The name of the algorithm.

Returns:

The description of the Algorithm.

Return type:

Algorithm

list_algorithms(problem_type=None, project_id=None)

List all custom algorithms within the org, with filtering on problem_type and project_id

Parameters:
  • problem_type (str) – the problem type to query. Return all algorithms in the org if problem_type is None

  • project_id (str) – the id of the project

Returns:

A list of algorithms

Return type:

Algorithm

list_builtin_algorithms(project_id, feature_group_ids=None, training_config=None)

Return list of builtin algorithms based on given input.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • feature_group_ids (list) – List of feature group ids applied to the algorithms.

  • training_config (dict) – The training config key/value pairs used to train with the algorithm.

Returns:

A list of applicable builtin algorithms.

Return type:

Algorithm

describe_custom_loss_function(name)

Retrieves a full description of a previously resgistered custom loss function.

Parameters:

name (str) – Registered name of the custom loss function.

Returns:

The description of the custom loss function with given name.

Return type:

CustomLossFunction

list_custom_loss_functions(name_prefix=None, loss_function_type=None)

Retrieves a list of registered custom loss functions’ descriptions

Parameters:
  • name_prefix (str) – The prefix of the names of the loss functions to list

  • loss_function_type (str) – The category of loss functions to search in.

Returns:

The description of the custom loss function with given name.

Return type:

CustomLossFunction

class abacusai.PredictionClient(client_options=None)

Bases: abacusai.client.BaseApiClient

Abacus.AI Prediction API Client. Does not utilize authentication and only contains public prediction methods

Parameters:

client_options (ClientOptions) – Optional API client configurations

predict_raw(deployment_token, deployment_id, **kwargs)

Raw interface for returning predictions from Plug and Play deployments.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • **kwargs (dict) – Arbitrary key/value pairs may be passed in and is sent as part of the request body.

lookup_features(deployment_token, deployment_id, query_data={}, limit_results=None, result_columns=None)

Returns the feature group deployed in the feature store project.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • limit_results (int) – If present, will limit the number of results to the value provided.

  • result_columns (list) – If present, will limit the columns present in each result to the columns specified in this list

Return type:

Dict

predict(deployment_token, deployment_id, query_data={})

Returns a prediction for Predictive Modeling

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type:

Dict

predict_multiple(deployment_token, deployment_id, query_data={})

Returns a list of predictions for Predictive Modeling

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (list) – This will be a list of dictionaries where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type:

Dict

predict_from_datasets(deployment_token, deployment_id, query_data={})

Returns a list of predictions for Predictive Modeling

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the source dataset name and ‘Value’ will be a list of records corresponding to the dataset rows

Return type:

Dict

predict_lead(deployment_token, deployment_id, query_data, explain_predictions=False, explainer_type=None)

Returns the probability of a user to be a lead on the basis of his/her interaction with the service/product and user’s own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of click, items in cart, etc.).

  • explain_predictions (bool) – Will explain predictions for lead

  • explainer_type (str) – Type of explainer to use for explanations

Return type:

Dict

predict_churn(deployment_token, deployment_id, query_data)

Returns a probability of a user to churn out in response to his/her interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type:

Dict

predict_takeover(deployment_token, deployment_id, query_data)

Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing account activity characteristics (e.g. login id, login duration, login type, ip address, etc.).

Return type:

Dict

predict_fraud(deployment_token, deployment_id, query_data)

Returns a probability of a transaction performed under a specific account as being a fraud or not. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).

Return type:

Dict

predict_class(deployment_token, deployment_id, query_data={}, threshold=None, threshold_class=None, thresholds=None, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)

Returns a classification prediction

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • threshold (float) – float value that is applied on the popular class label.

  • threshold_class (str) – label upon which the threshold is added (Binary labels only)

  • thresholds (list) – maps labels to thresholds (Multi label classification only). Defaults to F1 optimal threshold if computed for the given class, else uses 0.5

  • explain_predictions (bool) – If true, returns the SHAP explanations for all input features.

  • fixed_features (list) – Set of input features to treat as constant for explanations.

  • nested (str) – If specified generates prediction delta for each index of the specified nested feature.

  • explainer_type (str) –

Return type:

Dict

predict_target(deployment_token, deployment_id, query_data={}, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)

Returns a prediction from a classification or regression model. Optionally, includes explanations.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • explain_predictions (bool) – If true, returns the SHAP explanations for all input features.

  • fixed_features (list) – Set of input features to treat as constant for explanations.

  • nested (str) – If specified generates prediction delta for each index of the specified nested feature.

  • explainer_type (str) –

Return type:

Dict

get_anomalies(deployment_token, deployment_id, threshold=None, histogram=False)

Returns a list of anomalies from the training dataset

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.

  • histogram (bool) – If True, will return a histogram of the distribution of all points

Return type:

io.BytesIO

is_anomaly(deployment_token, deployment_id, query_data=None)

Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – The input data for the prediction.

Return type:

Dict

get_forecast(deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None, explain_predictions=False, explainer_type=None)

Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.

  • future_data (dict) – This will be a dictionary of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). The key and the value both will be of type ‘String’. For example future data entered for a Store may be {“Holiday”:”No”, “Promo”:”Yes”}.

  • num_predictions (int) – The number of timestamps to predict in the future.

  • prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).

  • explain_predictions (bool) – Will explain predictions for forecasting

  • explainer_type (str) – Type of explainer to use for explanations

Return type:

Dict

get_k_nearest(deployment_token, deployment_id, vector, k=None, distance=None, include_score=False)

Returns the k nearest neighbors for the provided embedding vector.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • vector (list) – Input vector to perform the k nearest neighbors with.

  • k (int) – Overrideable number of items to return

  • distance (str) – Specify the distance function to use when finding nearest neighbors

  • include_score (bool) – If True, will return the score alongside the resulting embedding value

Return type:

Dict

get_multiple_k_nearest(deployment_token, deployment_id, queries)

Returns the k nearest neighbors for the queries provided

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • queries (list) – List of Mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters

get_labels(deployment_token, deployment_id, query_data, threshold=None)

Returns a list of scored labels from

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.

  • threshold (None) – Deprecated

Return type:

Dict

get_recommendations(deployment_token, deployment_id, query_data, num_items=50, page=1, exclude_item_ids=[], score_field='', scaling_factors=[], restrict_items=[], exclude_items=[], explore_fraction=0.0)

Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.

  • num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.

  • page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.

  • exclude_item_ids (list) – [DEPRECATED]

  • score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

  • restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.

  • exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.

  • explore_fraction (float) – The fraction of recommendations that is to be new items.

Return type:

Dict

get_personalized_ranking(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])

Returns a list of items with personalized promotions on them for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.

  • preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.

  • preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

Return type:

Dict

get_ranked_items(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])

Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.

  • preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.

  • preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

Return type:

Dict

Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.

  • num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.

  • page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

  • restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.

  • exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.

Return type:

Dict

get_feature_group_rows(deployment_token, deployment_id, query_data)
Parameters:
  • deployment_token (str) –

  • deployment_id (str) –

  • query_data (dict) –

get_search_results(deployment_token, deployment_id, query_data)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.

Return type:

Dict

get_sentiment(deployment_token, deployment_id, document)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type:

Dict

get_entailment(deployment_token, deployment_id, document)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type:

Dict

get_classification(deployment_token, deployment_id, document)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type:

Dict

get_summary(deployment_token, deployment_id, query_data)

Returns a json of the predicted summary for the given document. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘text’ mapped to mapping ‘DOCUMENT’ in our system).

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Raw Data dictionary containing the required document data - must have a key document corresponding to a DOCUMENT type text as value.

Return type:

Dict

predict_language(deployment_token, deployment_id, query_data)

TODO

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (str) – # TODO

Return type:

Dict

get_assignments(deployment_token, deployment_id, query_data, forced_assignments=None)

Get all positive assignments that match a query.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – specifies the set of assignments being requested.

  • forced_assignments (dict) – set of assignments to force and resolve before returning query results.

Return type:

Dict

check_constraints(deployment_token, deployment_id, query_data)

Check for any constraints violated by the overrides.

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – assignment overrides to the solution.

Return type:

Dict

predict_with_binary_data(deployment_token, deployment_id, blob, blob_key_name='blob')

Make predictions for a given blob, e.g. image, audio

Parameters:
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • blob (io.TextIOBase) – The multipart/form-data of the data

  • blob_key_name (str) – the key to access this blob data in the model query data

Return type:

Dict

abacusai.__version__ = 0.40.2