Basic Usage¶
[1]:
%load_ext autoreload
%autoreload 2
BlobPath
typeNote that you would need to install the aws
extra to work with S3 paths:
pip install 'blob-path[aws]'
[2]:
from blob_path.backends.s3 import S3BlobPath
from pathlib import PurePath
bucket_name = "narang-public-s3"
object_key = PurePath("hello_world.txt")
region = "us-east-1"
blob_path = S3BlobPath(bucket_name, region, object_key)
pathlib.Path
, its not required that the file should exist or notexists
[3]:
blob_path.exists()
[3]:
True
BlobPath
provides is open
, it mimicks the builtin open
function to some extentLets write something to the object in our bucket
[4]:
with blob_path.open("w") as f:
f.write("hello world")
# the file would exist in S3 now, you should check it out
blob_path.exists()
[4]:
True
[5]:
# a single blob path can be serialised using the method `serialise`
blob_path.serialise()
[5]:
{'kind': 'blob-path-aws',
'payload': {'bucket': 'narang-public-s3',
'region': 'us-east-1',
'object_key': ['hello_world.txt']}}
[6]:
# lets deserialise them
# deserialise is a separate function and you can pass it any kind of blob path and it would correctly deserialise it
from blob_path.deserialise import deserialise
deserialised_s3_blob = deserialise(
{
"kind": "blob-path-aws",
"payload": {
"bucket": "narang-public-s3",
"region": "us-east-1",
"object_key": ["hello_world.txt"],
},
}
)
deserialised_s3_blob
[6]:
kind=blob-path-aws bucket=narang-public-s3 region=us-east-1 object_key=hello_world.txt
LocalRelativeBlobPath
, this path models a local FS relative path, which is always rooted at a single root directorypathlib.Path
, you could use LocalRelativeBlobPath
(this allows you to easily switch between using a cloud storage or a local storage for your files)[7]:
from blob_path.backends.local_relative import LocalRelativeBlobPath
# PurePath is a simple path representation, but it does not care whether its actually a path or not in your FS
# Its useful for logically representing various data structures, as an example, you could represent S3 object keys as `PurePaths`
from pathlib import PurePath
relpath = PurePath("local") / "storage.txt"
local_blob = LocalRelativeBlobPath(relpath)
[8]:
local_blob.exists()
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[8], line 1
----> 1 local_blob.exists()
File ~/Desktop/personal/blob-path/src/blob_path/backends/local_relative.py:74, in LocalRelativeBlobPath.exists(self)
73 def exists(self) -> bool:
---> 74 return (self._p()).exists()
File ~/Desktop/personal/blob-path/src/blob_path/backends/local_relative.py:94, in LocalRelativeBlobPath._p(self)
93 def _p(self) -> Path:
---> 94 return _get_implicit_base_path() / self._relpath
File ~/Desktop/personal/blob-path/src/blob_path/backends/local_relative.py:110, in _get_implicit_base_path()
109 def _get_implicit_base_path() -> Path:
--> 110 base_path = Path(get_implicit_var(BASE_VAR))
111 base_path.mkdir(exist_ok=True, parents=True)
112 return base_path
File ~/Desktop/personal/blob-path/src/blob_path/implicit.py:30, in get_implicit_var(var)
28 result = _PROVIDER(var)
29 if result is None:
---> 30 raise Exception(
31 "tried fetching implicit variable from environment "
32 + f"but the var os.environ['{var}'] does not exist"
33 )
34 return result
Exception: tried fetching implicit variable from environment but the var os.environ['IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR'] does not exist
Uh oh, we got an error, that too really early ;_; It says that we have not defined IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR
in our environment
This environment variable stores the root directory of your relative paths
[9]:
from pathlib import Path
import os
os.environ["IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR"] = str(
Path.home() / "tmp" / "local_fs_root"
)
# it passes now, and says that the file does not exist
local_blob.exists()
[9]:
True
LocalRelativeBlobPath
taking the root directory as an environment variable? Could we pass it in __init__
?LocalRelativeBlobPath
leaves out the root directory (its not part of the path representation)Implict variables¶
BlobPath
are called implicit variables. They are by default, picked from the environmentYou could mount the same path between multiple containers at different mount points and still pass around the serialised representation correctly (assuming you provide the implicit variables correctly)
Same for servers mounted with an NFS
This also works well for presigned URLs, where you can simply start an nginx server and pass that server’s base URL as an implicit variable to the path
IMPLICIT_BLOB_PATH_<BACKEND>_...
LocalRelativeBlobPath
has implicit variablesLet’s do a simple copy operation between an S3 path and a local path
[10]:
import shutil
# the long way
with deserialised_s3_blob.open("r") as fr:
with local_blob.open("w") as fw:
shutil.copyfileobj(fr, fw)
with local_blob.open("r") as f:
print(f.read())
hello world
[11]:
# delete first for the example
local_blob.delete()
deserialised_s3_blob.cp(local_blob)
with local_blob.open("r") as f:
print("local blob content copied from s3:", f.read())
# using a shortcut from the library
# this shortcut provides more convenience, any of the `src` or `dest` can be `pathlib.Path` too
# this makes it easy to deal with normal paths in your FS
from blob_path.shortcuts import cp
local_blob.delete()
cp(deserialised_s3_blob, local_blob)
with local_blob.open("r") as f:
print("copied using shortcut:", f.read())
local blob content copied from s3: hello world
copied using shortcut: hello world
You will need to install the azure extra
pip install 'blob-path[azure]'
[12]:
from blob_path.backends.azure_blob_storage import AzureBlobPath
from pathlib import PurePath
destination = AzureBlobPath("narang99blobstore", "testcontainer", PurePath("copied") / "from" / "s3.txt")
[13]:
deserialised_s3_blob.cp(destination)
destination.exists()
[13]:
True