- class blob_path.core.BlobPath(*args, **kwargs)¶
An interface representing a file belonging to any of the supporting storage backends.
This is the main interface provided by the library, any file in any storage (S3, Azure, etc.) can be modelled using this interface Different storage types are simply implementations of this interface and manage the underlying storage’s intricacies
There are some strict requirements:
Only functionality supported by every supported backend is added in this interface
BlobPath needs to be strictly JSON serialisable and deserialisable, this should allow users to pass around their BlobPath instances around their services and it should just work
BlobPath does not intend to replace pathlib.Path, if a file is only stored in the local FS, do not use this class.
It is immutable
Usage:
from blob_path import BlobPath from blob_path.deserialise import deserialise def f(p: BlobPath): # this method triggers a download from the underlying storage for reading # we return a simple file object as returned by python builtin `open` # this is the main interface to talk to the file in this path # many other methods use this method for providing generic implementations with p.open("r") as f: content = f.read() # you can check if the file exists or not # this would generally make some kind of metadata fetch from the underlying storage print(p.exists()) serialised = p.serialise() newp = deserialise(serialised) assert newp == p
- cp(destination: BlobPath | Path) None ¶
Copy file pointed by self to destination.
The generic implementation is pretty simple, it opens both the current file in read mode, the destination in write mode and copies data there.
Storage backends are free to optimise this call for special cases (like copying from one S3 Path to another without downloading intermediate data)
- Parameters:
destination – a BlobPath where the data is copied to
- Raises:
blob_path.core.interface.DoesNotExist – The current file does not exist
- delete() bool ¶
Delete the file if it exists.
How delete happens is based on the underlying storage and is not important. The file might be accessible through other means if the underlying storage keeps some sort of archive (like S3 versioned buckets), but doing an exists should return False once delete is called, no matter what how the underlying storage works. A read on the file using open will raise DoesNotExist if a file is deleted.
- Returns:
True if the file existed and was deleted, else False
- classmethod deserialise(data: SerialisedBlobPath) Self ¶
Deserialise a given serialised representation.
Do not use this method directly in your code, you should use blob_path.deserialise.deserialise
- Parameters:
data – A SerialisedBlobPath whose kind should always be equal to self.kind
- Returns:
A new BlobPath instance
- exists() bool ¶
Check if the file exists.
- Returns:
a boolean based on whether the file exists or not
- kind = 'BlobPath'¶
kind is a globally unique class variable which uniquely identifies a subtype of BlobPath
Each subtype defines its kind which should never clash with any other implementation. kind is used for serialisation
- open(mode: str = 'r')¶
Open the underlying file in the given mode.
This function mimics the builtin open function. It fetches the file from the underlying storage and opens it. Returns a file handle to the downloaded file. If the file is opened in write mode, it is uploaded back to the cloud when the handle is closed. Currently this function can only be opened with a context manager. (you can’t manually call close right now) If the file is opened using w mode, then the file does not need to exist in the underlying storage
- Parameters:
mode – the mode in which the file should be opened. Currently only a is not supported
- Returns:
a file handle where the user can read/write data. Once the context is finished, the file is uploaded to the backend if file was opened in w mode
- Raises:
blob_path.core.interface.DoesNotExist – The file does not exist
- property parent: BlobPath¶
The logical parent of the path.
Behavior is consistent with pathlib.PurePath.parent. In case of an empty path/root path, the current path is returned as is
- Returns:
A new BlobPath which is the parent of the current path
- serialise() SerialisedBlobPath ¶
serialise a BlobPath to a JSON-able dict which can be passed around
Generally, if a BlobPath is deserialised from some serialised representation, it should be perfectly reproducible. That is two path representations of the same serialisation anywhere (different process, different server, etc.) should point to the same file if it is accessible. This might not always be true (depending on what storage backend you are using), read the documentation of the underlying backend for caveats That said, the library tries to follow this requirement diligently, all paths which can be uniquely pointed from anywhere in the world (S3, Azure Blob Store, etc) always follow this.
- Returns:
A JSON-able dict
- Return type:
blob_path.core.interface.SerialisedBlobPath
- class blob_path.core.Presigned(*args, **kwargs)¶
Interface for BlobPath that provide pre-signed URLs.
A pre-signed URL is an HTTP URL which allows a user to download the content of a file using a normal HTTP GET request.
- presigned_url(expiry_seconds: int) str ¶
Generate a pre-signed URL for the underlying file.
Users should not assume the structure of the pre-signed URL (since this can change between different storage backends).
- Parameters:
expiry_seconds – Seconds after which the URL might expire. This is optional behavior. A subclass might ignore expiry_seconds and provide URLs that might never expire. Read the subclasses documentation for caveats
- Returns:
A URL where an HTTP GET would download a file
- Raises:
blob_path.core.interface.DoesNotExist – Raised if the file does not exist