AWS S3 BlobPath¶
Core Interfaces¶
- class blob_path.backends.s3.S3BlobPath(bucket: str, region: str, object_key: PurePath)¶
BlobPath
modeling AWS S3.Properties:
Globally Unique: True
An S3 blob path is located by three parameters: bucket, object_key and a region You can pass this path around anywhere (any server, lambda, container, etc.) and the correct S3 location will always be uniquely identified (
__eq__
,serialise
anddeserialise
also behaves sanely here, that is, no matter the location, same serialised representations point to the same location globally and uniquely)Implements:
BlobPath
,Presigned
Apart from the interface exposed by
BlobPath
andPresigned
, this class provides some extension points users can use to tweak how communication with S3 is done (you should be wholly able to tweak all performance and security params). Its advised to only override the methods below for extending the functionality of a path Methods that are safe to inherit and override:download
,upload
andsession
Usage:
from blob_path.backends.s3 import S3BlobPath # the generic way is to use this constructor for defining your path p = S3BlobPath(bucket, region, key) with p.open("r") as f: print(f.read()) # the class also provides a factory `create_default` which can be used as follows: # `bucket` and `region` are injected using implicit variables p = S3BlobPath.create_default(PurePath("hello") / "world.txt") # generate a pre-signed url url = p.presigned_url()
This class does not use any implicit variables other than for providing the create_default factory function
- property bucket: str¶
bucket getter, useful while extending this class
- cp(destination: BlobPath) None ¶
Copy the content of the current file to the destination.
This method overrides the default implementation to provide some performance benefits. If the destination is S3BlobPath, direct copying is done without downloading the object to the local system
- classmethod create_default(p: PurePath) Self ¶
Create a new S3BlobPath, the bucket and region would be injected from implicit variables.
- Parameters:
p – A PurePath which represents the “object_key” that you want to use
- Returns:
An S3BlobPath
- Implicit variables:
bucket
:IMPLICIT_BLOB_PATH_GEN_S3_BUCKET
region
:IMPLICIT_BLOB_PATH_GEN_S3_REGION
- delete() bool ¶
Delete the file if it exists.
How delete happens is based on the underlying storage and is not important. The file might be accessible through other means if the underlying storage keeps some sort of archive (like S3 versioned buckets), but doing an
exists
should returnFalse
once delete is called, no matter what how the underlying storage works. A read on the file usingopen
will raiseDoesNotExist
if a file is deleted.- Returns:
True
if the file existed and was deleted, elseFalse
- final classmethod deserialise(data: SerialisedBlobPath) Self ¶
Deserialise a given serialised representation.
Do not use this method directly in your code, you should use
blob_path.deserialise.deserialise
- Parameters:
data – A
SerialisedBlobPath
whosekind
should always be equal toself.kind
- Returns:
A new
BlobPath
instance
- download(handle: IO[bytes]) None ¶
Download data for the given S3 path and write it to the provided binary handle.
Users can extend this method if they want to change how the download is done This is recommended if you want to tweak your performance etc.
- Parameters:
handle – An IO byte stream where the downloaded content should be written to
- Raises:
DoesNotExist – exception when the path does not point to any object in S3
- exists() bool ¶
check if the path point to a valid existing object in S3.
- Returns:
A boolean representing whether the file exists or not
- kind = 'blob-path-aws'¶
kind
is a globally unique class variable which uniquely identifies a subtype ofBlobPath
Each subtype defines its
kind
which should never clash with any other implementation.kind
is used for serialisation
- property object_key: PurePath¶
object_key getter, useful while extending this class
- open(mode: str = 'r')¶
Open the underlying file in the given mode.
This function mimics the builtin
open
function. It fetches the file from the underlying storage and opens it. Returns a file handle to the downloaded file. If the file is opened in write mode, it is uploaded back to the cloud when the handle is closed. Currently this function can only be opened with a context manager. (you can’t manually callclose
right now) If the file is opened usingw
mode, then the file does not need to exist in the underlying storage- Parameters:
mode – the mode in which the file should be opened. Currently only
a
is not supported- Returns:
a file handle where the user can read/write data. Once the context is finished, the file is uploaded to the backend if file was opened in
w
mode- Raises:
blob_path.interface.DoesNotExist – The file does not exist
- property parent: S3BlobPath¶
The logical parent of the path.
Behavior is consistent with
pathlib.PurePath.parent
. In case of an empty path/root path, the current path is returned as is- Returns:
A new
BlobPath
which is the parent of the current path
- presigned_url(expiry_seconds: int) str ¶
Generate a pre-signed URL for the underlying file.
Users should not assume the structure of the pre-signed URL (since this can change between different storage backends).
- Parameters:
expiry_seconds – Seconds after which the URL might expire. This is optional behavior. A subclass might ignore expiry_seconds and provide URLs that might never expire. Read the subclasses documentation for caveats
- Returns:
A URL where an HTTP GET would download a file
- Raises:
blob_path.core.interface.DoesNotExist – Raised if the file does not exist
- property region: str¶
region getter, useful while extending this class
- final serialise() SerialisedBlobPath ¶
serialise a
BlobPath
to a JSON-able dict which can be passed aroundGenerally, if a
BlobPath
is deserialised from some serialised representation, it should be perfectly reproducible. That is two path representations of the same serialisation anywhere (different process, different server, etc.) should point to the same file if it is accessible. This might not always be true (depending on what storage backend you are using), read the documentation of the underlying backend for caveats That said, the library tries to follow this requirement diligently, all paths which can be uniquely pointed from anywhere in the world (S3, Azure Blob Store, etc) always follow this.- Returns:
A JSON-able
dict
- Return type:
blob_path.interface.SerialisedBlobPath
- classmethod session() boto3.session.Session ¶
Get a boto3 session to use for BlobPath.
Override this if you want to change how your session is created
- upload(handle: IO[bytes])¶
Upload data produced by reading from the given Binary file handle to S3.
Users can extend this method if they want to change how the upload is done This is recommended if you want to tweak your performance etc.
- Parameters:
handle – An IO byte stream from where you should read content and upload to S3
Serialisation¶
- class blob_path.backends.s3.Payload(*, bucket: str, region: str, object_key: list[str])¶
The serialised representation for the payload of an
blob_path.backends.s3.S3BlobPath
.