AWS S3 BlobPath

Core Interfaces

class blob_path.backends.s3.S3BlobPath(bucket: str, region: str, object_key: PurePath)

BlobPath modeling AWS S3.

Properties:

  • Globally Unique: True

An S3 blob path is located by three parameters: bucket, object_key and a region You can pass this path around anywhere (any server, lambda, container, etc.) and the correct S3 location will always be uniquely identified (__eq__, serialise and deserialise also behaves sanely here, that is, no matter the location, same serialised representations point to the same location globally and uniquely)

Implements: BlobPath, Presigned

Apart from the interface exposed by BlobPath and Presigned, this class provides some extension points users can use to tweak how communication with S3 is done (you should be wholly able to tweak all performance and security params). Its advised to only override the methods below for extending the functionality of a path Methods that are safe to inherit and override: download, upload and session

Usage:

from blob_path.backends.s3 import S3BlobPath

# the generic way is to use this constructor for defining your path
p = S3BlobPath(bucket, region, key)
with p.open("r") as f:
   print(f.read())

# the class also provides a factory `create_default` which can be used as follows:
# `bucket` and `region` are injected using implicit variables
p = S3BlobPath.create_default(PurePath("hello") / "world.txt")

# generate a pre-signed url
url = p.presigned_url()

This class does not use any implicit variables other than for providing the create_default factory function

property bucket: str

bucket getter, useful while extending this class

cp(destination: BlobPath) None

Copy the content of the current file to the destination.

This method overrides the default implementation to provide some performance benefits. If the destination is S3BlobPath, direct copying is done without downloading the object to the local system

classmethod create_default(p: PurePath) Self

Create a new S3BlobPath, the bucket and region would be injected from implicit variables.

Parameters:

p – A PurePath which represents the “object_key” that you want to use

Returns:

An S3BlobPath

Implicit variables:
  • bucket: IMPLICIT_BLOB_PATH_GEN_S3_BUCKET

  • region: IMPLICIT_BLOB_PATH_GEN_S3_REGION

delete() bool

Delete the file if it exists.

How delete happens is based on the underlying storage and is not important. The file might be accessible through other means if the underlying storage keeps some sort of archive (like S3 versioned buckets), but doing an exists should return False once delete is called, no matter what how the underlying storage works. A read on the file using open will raise DoesNotExist if a file is deleted.

Returns:

True if the file existed and was deleted, else False

final classmethod deserialise(data: SerialisedBlobPath) Self

Deserialise a given serialised representation.

Do not use this method directly in your code, you should use blob_path.deserialise.deserialise

Parameters:

data – A SerialisedBlobPath whose kind should always be equal to self.kind

Returns:

A new BlobPath instance

download(handle: IO[bytes]) None

Download data for the given S3 path and write it to the provided binary handle.

Users can extend this method if they want to change how the download is done This is recommended if you want to tweak your performance etc.

Parameters:

handle – An IO byte stream where the downloaded content should be written to

Raises:

DoesNotExist – exception when the path does not point to any object in S3

exists() bool

check if the path point to a valid existing object in S3.

Returns:

A boolean representing whether the file exists or not

kind = 'blob-path-aws'

kind is a globally unique class variable which uniquely identifies a subtype of BlobPath

Each subtype defines its kind which should never clash with any other implementation. kind is used for serialisation

property object_key: PurePath

object_key getter, useful while extending this class

open(mode: str = 'r')

Open the underlying file in the given mode.

This function mimics the builtin open function. It fetches the file from the underlying storage and opens it. Returns a file handle to the downloaded file. If the file is opened in write mode, it is uploaded back to the cloud when the handle is closed. Currently this function can only be opened with a context manager. (you can’t manually call close right now) If the file is opened using w mode, then the file does not need to exist in the underlying storage

Parameters:

mode – the mode in which the file should be opened. Currently only a is not supported

Returns:

a file handle where the user can read/write data. Once the context is finished, the file is uploaded to the backend if file was opened in w mode

Raises:

blob_path.interface.DoesNotExist – The file does not exist

property parent: S3BlobPath

The logical parent of the path.

Behavior is consistent with pathlib.PurePath.parent. In case of an empty path/root path, the current path is returned as is

Returns:

A new BlobPath which is the parent of the current path

presigned_url(expiry_seconds: int) str

Generate a pre-signed URL for the underlying file.

Users should not assume the structure of the pre-signed URL (since this can change between different storage backends).

Parameters:

expiry_seconds – Seconds after which the URL might expire. This is optional behavior. A subclass might ignore expiry_seconds and provide URLs that might never expire. Read the subclasses documentation for caveats

Returns:

A URL where an HTTP GET would download a file

Raises:

blob_path.core.interface.DoesNotExist – Raised if the file does not exist

property region: str

region getter, useful while extending this class

final serialise() SerialisedBlobPath

serialise a BlobPath to a JSON-able dict which can be passed around

Generally, if a BlobPath is deserialised from some serialised representation, it should be perfectly reproducible. That is two path representations of the same serialisation anywhere (different process, different server, etc.) should point to the same file if it is accessible. This might not always be true (depending on what storage backend you are using), read the documentation of the underlying backend for caveats That said, the library tries to follow this requirement diligently, all paths which can be uniquely pointed from anywhere in the world (S3, Azure Blob Store, etc) always follow this.

Returns:

A JSON-able dict

Return type:

blob_path.interface.SerialisedBlobPath

classmethod session() boto3.session.Session

Get a boto3 session to use for BlobPath.

Override this if you want to change how your session is created

upload(handle: IO[bytes])

Upload data produced by reading from the given Binary file handle to S3.

Users can extend this method if they want to change how the upload is done This is recommended if you want to tweak your performance etc.

Parameters:

handle – An IO byte stream from where you should read content and upload to S3

Serialisation

class blob_path.backends.s3.Payload(*, bucket: str, region: str, object_key: list[str])

The serialised representation for the payload of an blob_path.backends.s3.S3BlobPath.