abacusai.upload

Module Contents

Classes

Upload

A Upload Reference for uploading file parts

class abacusai.upload.Upload(client, uploadId=None, datasetUploadId=None, status=None, datasetId=None, datasetVersion=None, modelId=None, modelVersion=None, batchPredictionId=None, parts=None, createdAt=None)

Bases: abacusai.return_class.AbstractApiClass

A Upload Reference for uploading file parts

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • uploadId (str) – The unique ID generated when the upload process of the full large file in smaller parts is initiated.

  • datasetUploadId (str) – Same as upload_id. It is kept for backwards compatibility purposes.

  • status (str) – The current status of the upload.

  • datasetId (str) – A reference to the dataset this upload is adding data to.

  • datasetVersion (str) – A reference to the dataset version the upload is adding data to.

  • modelId (str) – A reference the model the upload is creating a version for

  • modelVersion (str) – A reference to the model version the upload is creating.

  • batchPredictionId (str) – A reference to the batch prediction the upload is creating.

  • parts (list of json objects) – A list containing the order of the file parts that have been uploaded.

  • createdAt (str) – The timestamp at which the upload was created.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

cancel()

Cancels an upload

Parameters:

upload_id (str) – The Upload ID

part(part_number, part_data)

Uploads a part of a large dataset file from your bucket to our system. Our system currently supports a size of up to 5GB for a part of a full file and a size of up to 5TB for the full file. Note that each part must be >=5MB in size, unless it is the last part in the sequence of parts for the full file.

Parameters:
  • part_number (int) – The 1-indexed number denoting the position of the file part in the sequence of parts for the full file.

  • part_data (io.TextIOBase) – The multipart/form-data for the current part of the full file.

Returns:

The object ‘UploadPart’ which encapsulates the hash and the etag for the part that got uploaded.

Return type:

UploadPart

mark_complete()

Marks an upload process as complete.

Parameters:

upload_id (str) – A unique identifier for this upload

Returns:

The upload object associated with the upload process for the full file. The details of the object are described below:

Return type:

Upload

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

Upload

describe()

Retrieves the current upload status (complete or inspecting) and the list of file parts uploaded for a specified dataset upload.

Parameters:

upload_id (str) – The unique ID associated with the file uploaded or being uploaded in parts.

Returns:

The details associated with the large dataset file uploaded in parts.

Return type:

Upload

upload_part(upload_args)

Uploads a file part. If the upload fails, it will retry up to 3 times with a short backoff before raising an exception.

Returns:

The object ‘UploadPart’ that encapsulates the hash and the etag for the part that got uploaded.

Return type:

UploadPart

upload_file(file, threads=10, chunksize=1024 * 1024 * 10, wait_timeout=600)

Uploads the file in the specified chunk size using the specified number of workers.

Parameters:
  • file (IOBase) – A bytesIO or StringIO object to upload to Abacus.AI

  • threads (int, optional) – The max number of workers to use while uploading the file

  • chunksize (int, optional) – The number of bytes to use for each chunk while uploading the file. Defaults to 10 MB

  • wait_timeout (int, optional) – The max number of seconds to wait for the file parts to be joined on Abacus.AI. Defaults to 600.

Returns:

The upload file object.

Return type:

Upload

_yield_upload_part(file, chunksize)
wait_for_join(timeout=600)

A waiting call until the upload parts are joined.

Parameters:

timeout (int, optional) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to have timed out. Defaults to 600.

get_status()

Gets the status of the upload.

Returns:

A string describing the status of the upload (pending, complete, etc.).

Return type:

str