docs for zanj v0.4.0

Contents

PyPI Checks Coverage code size, bytes PyPI - Downloads

ZANJ

Overview

The ZANJ format is meant to be a way of saving arbitrary objects to disk, in a way that is flexible, allows keeping configuration and data together, and is human readable. It is very loosely inspired by HDF5 and the derived exdir format, and the implementation is inspired by npz files.

This library was originally a module in muutils

Installation

Available on PyPI as zanj

pip install zanj

Usage

You can find a runnable example of this in demo.ipynb

Saving a basic object

Any SerializableDataclass of basic types can be saved as zanj:

import numpy as np
import pandas as pd
from muutils.json_serialize import SerializableDataclass, serializable_dataclass, serializable_field
from zanj import ZANJ

@serializable_dataclass
class BasicZanj(SerializableDataclass):
    a: str
    q: int = 42
    c: list[int] = serializable_field(default_factory=list)

# initialize a zanj reader/writer
zj = ZANJ()

# create an instance
instance: BasicZanj = BasicZanj("hello", 42, [1, 2, 3])
path: str = "tests/junk_data/path_to_save_instance.zanj"
zj.save(instance, path)
recovered: BasicZanj = zj.read(path)

ZANJ will intelligently handle nested serializable dataclasses, numpy arrays, pytorch tensors, and pandas dataframes:

import torch
import pandas as pd

@serializable_dataclass
class Complicated(SerializableDataclass):
    name: str
    arr1: np.ndarray
    arr2: np.ndarray
    iris_data: pd.DataFrame
    brain_data: pd.DataFrame
    container: list[BasicZanj]
    torch_tensor: torch.Tensor

For custom classes, you can specify a serialization_fn and loading_fn to handle the logic of converting to and from a json-serializable format:

@serializable_dataclass
class Complicated(SerializableDataclass):
    name: str
    device: torch.device = serializable_field(
        serialization_fn=lambda self: str(self.device),
        loading_fn=lambda data: torch.device(data["device"]),
    )

Note that loading_fn takes the dictionary of the whole class – this is in case you’ve stored data in multiple fields of the dict which are needed to reconstruct the object.

Saving Models

First, define a configuration class for your model. This class will hold the parameters for your model and any associated objects (like losses and optimizers). The configuration class should be a subclass of SerializableDataclass and use the serializable_field function to define fields that need special serialization.

Here’s an example that defines a GPT-like model configuration:

from zanj.torchutil import ConfiguredModel, set_config_class

@serializable_dataclass
class MyNNConfig(SerializableDataclass):
    input_dim: int
    hidden_dim: int
    output_dim: int

    # store the activation function by name, reconstruct it by looking it up in torch.nn
    act_fn: torch.nn.Module = serializable_field(
        serialization_fn=lambda x: x.__name__,
        loading_fn=lambda x: getattr(torch.nn, x["act_fn"]),
    )

    # same for the loss function
    loss_kwargs: dict = serializable_field(default_factory=dict)
    loss_factory: torch.nn.modules.loss._Loss = serializable_field(
        default_factory=lambda: torch.nn.CrossEntropyLoss,
        serialization_fn=lambda x: x.__name__,
        loading_fn=lambda x: getattr(torch.nn, x["loss_factory"]),
    )
    loss = property(lambda self: self.loss_factory(**self.loss_kwargs))

Then, define your model class. It should be a subclass of ConfiguredModel, and use the set_config_class decorator to associate it with your configuration class. The __init__ method should take a single argument, which is an instance of your configuration class. You must also call the superclass __init__ method with the configuration instance.

@set_config_class(MyNNConfig)
class MyNN(ConfiguredModel[MyNNConfig]):
    def __init__(self, config: MyNNConfig):
        # call the superclass init!
        # this will store the model in the zanj_model_config field
        super().__init__(config)

        # whatever you want here
        self.net = torch.nn.Sequential(
            torch.nn.Linear(config.input_dim, config.hidden_dim),
            config.act_fn(),
            torch.nn.Linear(config.hidden_dim, config.output_dim),
        )

    def forward(self, x):
        return self.net(x)

You can now create instances of your model, save them to disk, and load them back into memory:

config = MyNNConfig(
    input_dim=10,
    hidden_dim=20,
    output_dim=2,
    act_fn=torch.nn.ReLU,
    loss_kwargs=dict(reduction="mean"),
)

# create your model from the config, and save
model = MyNN(config)
fname = "tests/junk_data/path_to_save_model.zanj"
ZANJ().save(model, fname)
# load by calling the class method `read()`
loaded_model = MyNN.read(fname)
# zanj will actually infer the type of the object in the file 
# -- and will warn you if you don't have the correct package installed
loaded_another_way = ZANJ().read(fname)

Configuration

When initializing a ZANJ object, you can specify some configuration info about saving, such as:

# how big an array or list (including pandas DataFrame) can be before moving it from the core JSON file
external_array_threshold: int = ZANJ_GLOBAL_DEFAULTS.external_array_threshold
external_list_threshold: int = ZANJ_GLOBAL_DEFAULTS.external_list_threshold
# compression settings passed to `zipfile` package
compress: bool | int = ZANJ_GLOBAL_DEFAULTS.compress
# for doing very cursed things in your own custom loading or serialization functions
custom_settings: dict[str, Any] | None = ZANJ_GLOBAL_DEFAULTS.custom_settings
# specify additional serialization handlers
handlers_pre: MonoTuple[SerializerHandler] = tuple()
handlers_default: MonoTuple[SerializerHandler] = DEFAULT_SERIALIZER_HANDLERS_ZANJ,

Implementation

The on-disk format is a file <filename>.zanj is a zip file containing:

Comparison to other formats

Format Safe Zero-copy Lazy loading No file size limit Layout control Flexibility Bfloat16
pickle (PyTorch)
H5 (Tensorflow) ~ ~
HDF5 ? ~
SavedModel (Tensorflow)
MsgPack (flax)
Protobuf (ONNX)
Cap’n’Proto ~ ~
Numpy (npy,npz) ? ?
SafeTensors
exdir ? ? ? ?
ZANJ ❌* ❌*

* denotes this feature may be coming at a future date :)

(This table was stolen from safetensors)

Submodules

API Documentation

View Source on GitHub

zanj

PyPI Checks Coverage code size, bytes PyPI - Downloads

ZANJ

Overview

The ZANJ format is meant to be a way of saving arbitrary objects to disk, in a way that is flexible, allows keeping configuration and data together, and is human readable. It is very loosely inspired by HDF5 and the derived exdir format, and the implementation is inspired by npz files.

This library was originally a module in muutils

Installation

Available on PyPI as zanj

pip install zanj

Usage

You can find a runnable example of this in demo.ipynb

Saving a basic object

Any SerializableDataclass of basic types can be saved as zanj:

import numpy as np
import pandas as pd
from muutils.json_serialize import SerializableDataclass, serializable_dataclass, serializable_field
from zanj import ZANJ

@serializable_dataclass
class BasicZanj(SerializableDataclass):
    a: str
    q: int = 42
    c: list[int] = serializable_field(default_factory=list)

### initialize a zanj reader/writer
zj = ZANJ()

### create an instance
instance: BasicZanj = BasicZanj("hello", 42, [1, 2, 3])
path: str = "tests/junk_data/path_to_save_instance<a href="zanj/zanj.html">zanj.zanj</a>"
zj.save(instance, path)
recovered: BasicZanj = zj.read(path)

ZANJ will intelligently handle nested serializable dataclasses, numpy arrays, pytorch tensors, and pandas dataframes:

import torch
import pandas as pd

@serializable_dataclass
class Complicated(SerializableDataclass):
    name: str
    arr1: np.ndarray
    arr2: np.ndarray
    iris_data: pd.DataFrame
    brain_data: pd.DataFrame
    container: list[BasicZanj]
    torch_tensor: torch.Tensor

For custom classes, you can specify a serialization_fn and loading_fn to handle the logic of converting to and from a json-serializable format:

@serializable_dataclass
class Complicated(SerializableDataclass):
    name: str
    device: torch.device = serializable_field(
        serialization_fn=lambda self: str(self.device),
        loading_fn=lambda data: torch.device(data["device"]),
    )

Note that loading_fn takes the dictionary of the whole class – this is in case you’ve stored data in multiple fields of the dict which are needed to reconstruct the object.

Saving Models

First, define a configuration class for your model. This class will hold the parameters for your model and any associated objects (like losses and optimizers). The configuration class should be a subclass of SerializableDataclass and use the serializable_field function to define fields that need special serialization.

Here’s an example that defines a GPT-like model configuration:

from <a href="zanj/torchutil.html">zanj.torchutil</a> import ConfiguredModel, set_config_class

@serializable_dataclass
class MyNNConfig(SerializableDataclass):
    input_dim: int
    hidden_dim: int
    output_dim: int

    # store the activation function by name, reconstruct it by looking it up in torch.nn
    act_fn: torch.nn.Module = serializable_field(
        serialization_fn=lambda x: x.__name__,
        loading_fn=lambda x: getattr(torch.nn, x["act_fn"]),
    )

    # same for the loss function
    loss_kwargs: dict = serializable_field(default_factory=dict)
    loss_factory: torch.nn.modules.loss._Loss = serializable_field(
        default_factory=lambda: torch.nn.CrossEntropyLoss,
        serialization_fn=lambda x: x.__name__,
        loading_fn=lambda x: getattr(torch.nn, x["loss_factory"]),
    )
    loss = property(lambda self: self.loss_factory(**self.loss_kwargs))

Then, define your model class. It should be a subclass of ConfiguredModel, and use the set_config_class decorator to associate it with your configuration class. The __init__ method should take a single argument, which is an instance of your configuration class. You must also call the superclass __init__ method with the configuration instance.

@set_config_class(MyNNConfig)
class MyNN(ConfiguredModel[MyNNConfig]):
    def __init__(self, config: MyNNConfig):
        # call the superclass init!
        # this will store the model in the zanj_model_config field
        super().__init__(config)

        # whatever you want here
        self.net = torch.nn.Sequential(
            torch.nn.Linear(config.input_dim, config.hidden_dim),
            config.act_fn(),
            torch.nn.Linear(config.hidden_dim, config.output_dim),
        )

    def forward(self, x):
        return self.net(x)

You can now create instances of your model, save them to disk, and load them back into memory:

config = MyNNConfig(
    input_dim=10,
    hidden_dim=20,
    output_dim=2,
    act_fn=torch.nn.ReLU,
    loss_kwargs=dict(reduction="mean"),
)

### create your model from the config, and save
model = MyNN(config)
fname = "tests/junk_data/path_to_save_model<a href="zanj/zanj.html">zanj.zanj</a>"
ZANJ().save(model, fname)
### load by calling the class method `read()`
loaded_model = MyNN.read(fname)
### zanj will actually infer the type of the object in the file 
### -- and will warn you if you don't have the correct package installed
loaded_another_way = ZANJ().read(fname)

Configuration

When initializing a ZANJ object, you can specify some configuration info about saving, such as:

### how big an array or list (including pandas DataFrame) can be before moving it from the core JSON file
external_array_threshold: int = ZANJ_GLOBAL_DEFAULTS.external_array_threshold
external_list_threshold: int = ZANJ_GLOBAL_DEFAULTS.external_list_threshold
### compression settings passed to `zipfile` package
compress: bool | int = ZANJ_GLOBAL_DEFAULTS.compress
### for doing very cursed things in your own custom loading or serialization functions
custom_settings: dict[str, Any] | None = ZANJ_GLOBAL_DEFAULTS.custom_settings
### specify additional serialization handlers
handlers_pre: MonoTuple[SerializerHandler] = tuple()
handlers_default: MonoTuple[SerializerHandler] = DEFAULT_SERIALIZER_HANDLERS_ZANJ,

Implementation

The on-disk format is a file <filename><a href="zanj/zanj.html">zanj.zanj</a> is a zip file containing:

Comparison to other formats

Format Safe Zero-copy Lazy loading No file size limit Layout control Flexibility Bfloat16
pickle (PyTorch)
H5 (Tensorflow) ~ ~
HDF5 ? ~
SavedModel (Tensorflow)
MsgPack (flax)
Protobuf (ONNX)
Cap’n’Proto ~ ~
Numpy (npy,npz) ? ?
SafeTensors
exdir ? ? ? ?
ZANJ ❌* ❌*

* denotes this feature may be coming at a future date :)

(This table was stolen from safetensors)

View Source on GitHub

def register_loader_handler

(handler: zanj.loading.LoaderHandler)

View Source on GitHub

register a custom loader handler

class ZANJ(muutils.json_serialize.json_serialize.JsonSerializer):

View Source on GitHub

Zip up: Arrays in Numpy, JSON for everything else

given an arbitrary object, throw into a zip file, with arrays stored in .npy files, and everything else stored in a json file

(basically npz file with json)

create a ZANJ-class via z_cls = ZANJ().create(obj), and save/read instances of the object via z_cls.save(obj, path), z_cls.load(path). be sure to pass an instance of the object, to make sure that the attributes of the class can be correctly recognized

ZANJ

(
    error_mode: muutils.errormode.ErrorMode = ErrorMode.Except,
    internal_array_mode: Literal['list', 'array_list_meta', 'array_hex_meta', 'array_b64_meta', 'external', 'zero_dim'] = 'array_list_meta',
    external_array_threshold: int = 256,
    external_list_threshold: int = 256,
    compress: bool | int = True,
    custom_settings: dict[str, typing.Any] | None = None,
    handlers_pre: None = (),
    handlers_default: None = (ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='numpy.ndarray:external', desc='external numpy array', source_pckg='zanj'), ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='torch.Tensor:external', desc='external torch tensor', source_pckg='zanj'), ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='list:external', desc='external list', source_pckg='zanj'), ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='tuple:external', desc='external tuple', source_pckg='zanj'), ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='pandas.DataFrame:external', desc='external pandas DataFrame', source_pckg='zanj'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='base types', desc='base types (bool, int, float, str, None)'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='dictionaries', desc='dictionaries'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='(list, tuple) -> list', desc='lists and tuples as lists'), SerializerHandler(check=<function <lambda>>, serialize_func=<function _serialize_override_serialize_func>, uid='.serialize override', desc='objects with .serialize method'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='namedtuple -> dict', desc='namedtuples as dicts'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='dataclass -> dict', desc='dataclasses as dicts'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='path -> str', desc='Path objects as posix strings'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='obj -> str(obj)', desc='directly serialize objects in `SERIALIZE_DIRECT_AS_STR` to strings'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='numpy.ndarray', desc='numpy arrays'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='torch.Tensor', desc='pytorch tensors'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='pandas.DataFrame', desc='pandas DataFrames'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='(set, list, tuple, Iterable) -> list', desc='sets, lists, tuples, and Iterables as lists'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='fallback', desc='fallback handler -- serialize object attributes and special functions as strings'))
)

View Source on GitHub

def externals_info

(self) -> dict[str, dict[str, str | int | list[int]]]

View Source on GitHub

return information about the current externals

def meta

(
    self
) -> Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]

View Source on GitHub

return the metadata of the ZANJ archive

def save

(self, obj: Any, file_path: str | pathlib.Path) -> str

View Source on GitHub

save the object to a ZANJ archive. returns the path to the archive

def read

(self, file_path: Union[str, pathlib.Path]) -> Any

View Source on GitHub

load the object from a ZANJ archive ### TODO: load only some part of the zanj file by passing an ObjectPath

Inherited Members

docs for zanj v0.4.0

Contents

for storing/retrieving an item externally in a ZANJ archive

API Documentation

View Source on GitHub

zanj.externals

for storing/retrieving an item externally in a ZANJ archive

View Source on GitHub

class ExternalItem(typing.NamedTuple):

ExternalItem(item_type, data, path)

ExternalItem

(
    item_type: Literal['jsonl', 'npy'],
    data: Any,
    path: tuple[typing.Union[str, int], ...]
)

Create new instance of ExternalItem(item_type, data, path)

Alias for field number 0

Alias for field number 1

Alias for field number 2

Inherited Members

def load_jsonl

(
    zanj: "'LoadedZANJ'",
    fp: IO[bytes]
) -> list[typing.Union[bool, int, float, str, NoneType, typing.List[typing.Union[bool, int, float, str, NoneType, typing.List[typing.Any], typing.Dict[str, typing.Any]]], typing.Dict[str, typing.Union[bool, int, float, str, NoneType, typing.List[typing.Any], typing.Dict[str, typing.Any]]]]]

View Source on GitHub

def load_npy

(zanj: "'LoadedZANJ'", fp: IO[bytes]) -> numpy.ndarray

View Source on GitHub

def GET_EXTERNAL_LOAD_FUNC

(item_type: str) -> Callable[[zanj.zanj.ZANJ, IO[bytes]], Any]

View Source on GitHub

docs for zanj v0.4.0

API Documentation

View Source on GitHub

zanj.loading

View Source on GitHub

class LoaderHandler:

View Source on GitHub

handler for loading an object from a json file or a ZANJ archive

LoaderHandler

(
    check: Callable[[Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]], tuple[Union[str, int], ...], Any], bool],
    load: Callable[[Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]], tuple[Union[str, int], ...], Any], Any],
    uid: str,
    source_pckg: str,
    priority: int = 0,
    desc: str = '(no description)'
)

def serialize

(
    self
) -> Dict[str, Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]]

View Source on GitHub

serialize the handler info

def from_formattedclass

(cls, fc: type, priority: int = 0)

View Source on GitHub

create a loader from a class with serialize, load methods and __muutils_format__ attribute

def register_loader_handler

(handler: zanj.loading.LoaderHandler)

View Source on GitHub

register a custom loader handler

def get_item_loader

(
    json_item: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]],
    path: tuple[typing.Union[str, int], ...],
    zanj: typing.Any | None = None,
    error_mode: muutils.errormode.ErrorMode = ErrorMode.Warn
) -> zanj.loading.LoaderHandler | None

View Source on GitHub

get the loader for a json item

def load_item_recursive

(
    json_item: Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]],
    path: tuple[typing.Union[str, int], ...],
    zanj: typing.Any | None = None,
    error_mode: muutils.errormode.ErrorMode = ErrorMode.Warn,
    allow_not_loading: bool = True
) -> Any

View Source on GitHub

class LoadedZANJ:

View Source on GitHub

for loading a zanj file

LoadedZANJ

(path: str | pathlib.Path, zanj: Any)

View Source on GitHub

def populate_externals

(self) -> None

View Source on GitHub

put all external items into the main json data

docs for zanj v0.4.0

API Documentation

View Source on GitHub

zanj.serializing

View Source on GitHub

def jsonl_metadata

(
    data: list[typing.Dict[str, typing.Union[bool, int, float, str, NoneType, typing.List[typing.Union[bool, int, float, str, NoneType, typing.List[typing.Any], typing.Dict[str, typing.Any]]], typing.Dict[str, typing.Union[bool, int, float, str, NoneType, typing.List[typing.Any], typing.Dict[str, typing.Any]]]]]]
) -> dict

View Source on GitHub

metadata about a jsonl object

def store_npy

(
    self: Any,
    fp: IO[bytes],
    data: muutils.tensor_utils.jaxtype_factory.<locals>._BaseArray
) -> None

View Source on GitHub

store numpy array to given file as .npy

def store_jsonl

(
    self: Any,
    fp: IO[bytes],
    data: Sequence[Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]]
) -> None

View Source on GitHub

store sequence to given file as .jsonl

class ZANJSerializerHandler(muutils.json_serialize.json_serialize.SerializerHandler):

View Source on GitHub

a handler for ZANJ serialization

ZANJSerializerHandler

(
    uid: str,
    desc: str,
    *,
    check: Callable[[Any, Any, tuple[Union[str, int], ...]], bool],
    serialize_func: Callable[[Any, Any, tuple[Union[str, int], ...]], Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]],
    source_pckg: str
)

Inherited Members

def zanj_external_serialize

(
    jser: Any,
    data: Any,
    path: tuple[typing.Union[str, int], ...],
    item_type: Literal['jsonl', 'npy'],
    _format: str
) -> Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]

View Source on GitHub

stores a numpy array or jsonl externally in a ZANJ object

Parameters:

Returns:

Modifies:

docs for zanj v0.4.0

API Documentation

View Source on GitHub

zanj.torchutil

View Source on GitHub

def num_params

(m: torch.nn.modules.module.Module, only_trainable: bool = True)

View Source on GitHub

return total number of parameters in a model

https://stackoverflow.com/questions/49201236/check-the-total-number-of-parameters-in-a-pytorch-model

def get_module_device

(
    m: torch.nn.modules.module.Module
) -> tuple[bool, torch.device | dict[str, torch.device]]

View Source on GitHub

get the current devices

class ConfiguredModel(torch.nn.modules.module.Module, typing.Generic[~T_config]):

View Source on GitHub

a model that has a configuration, for saving with ZANJ

@set_config_class(YourConfig)
class YourModule(ConfiguredModel[YourConfig]):
    def __init__(self, cfg: YourConfig):
        super().__init__(cfg)

__init__() must initialize the model from a config object only, and call super().__init__(zanj_model_config)

If you are inheriting from another class + ConfiguredModel, ConfiguredModel must be the first class in the inheritance list

View Source on GitHub

def serialize

(
    self,
    path: tuple[typing.Union[str, int], ...] = (),
    zanj: zanj.zanj.ZANJ | None = None
) -> dict[str, typing.Any]

View Source on GitHub

def save

(self, file_path: str, zanj: zanj.zanj.ZANJ | None = None)

View Source on GitHub

def load

(
    cls,
    obj: dict[str, typing.Any],
    path: tuple[typing.Union[str, int], ...],
    zanj: zanj.zanj.ZANJ | None = None
) -> zanj.torchutil.ConfiguredModel

View Source on GitHub

load a model from a serialized object

def read

(
    cls,
    file_path: str,
    zanj: zanj.zanj.ZANJ | None = None
) -> zanj.torchutil.ConfiguredModel

View Source on GitHub

read a model from a file

def load_file

(
    cls,
    file_path: str,
    zanj: zanj.zanj.ZANJ | None = None
) -> zanj.torchutil.ConfiguredModel

View Source on GitHub

read a model from a file

def get_handler

(cls) -> zanj.loading.LoaderHandler

View Source on GitHub

def num_params

(self) -> int

View Source on GitHub

Inherited Members

def set_config_class

(
    config_class: Type[muutils.json_serialize.serializable_dataclass.SerializableDataclass]
) -> Callable[[Type[zanj.torchutil.ConfiguredModel]], Type[zanj.torchutil.ConfiguredModel]]

View Source on GitHub

class ConfigMismatchException(builtins.ValueError):

View Source on GitHub

Inappropriate argument value (of correct type).

ConfigMismatchException

(msg: str, diff)

View Source on GitHub

Inherited Members

def assert_model_cfg_equality

(
    model_a: zanj.torchutil.ConfiguredModel,
    model_b: zanj.torchutil.ConfiguredModel
)

View Source on GitHub

check both models are correct instances and have the same config

Raises: ConfigMismatchException: if the configs don’t match, e.diff will contain the diff

def assert_model_exact_equality

(
    model_a: zanj.torchutil.ConfiguredModel,
    model_b: zanj.torchutil.ConfiguredModel
)

View Source on GitHub

check the models are exactly equal, including state dict contents

docs for zanj v0.4.0

Contents

an HDF5/exdir file alternative, which uses json for attributes, allows serialization of arbitrary data

for large arrays, the output is a .tar.gz file with most data in a json file, but with sufficiently large arrays stored in binary .npy files

“ZANJ” is an acronym that the AI tool Elicit came up with for me. not to be confused with:

API Documentation

View Source on GitHub

zanj.zanj

an HDF5/exdir file alternative, which uses json for attributes, allows serialization of arbitrary data

for large arrays, the output is a .tar.gz file with most data in a json file, but with sufficiently large arrays stored in binary .npy files

“ZANJ” is an acronym that the AI tool Elicit came up with for me. not to be confused with:

View Source on GitHub

class ZANJ(muutils.json_serialize.json_serialize.JsonSerializer):

View Source on GitHub

Zip up: Arrays in Numpy, JSON for everything else

given an arbitrary object, throw into a zip file, with arrays stored in .npy files, and everything else stored in a json file

(basically npz file with json)

create a ZANJ-class via z_cls = ZANJ().create(obj), and save/read instances of the object via z_cls.save(obj, path), z_cls.load(path). be sure to pass an instance of the object, to make sure that the attributes of the class can be correctly recognized

ZANJ

(
    error_mode: muutils.errormode.ErrorMode = ErrorMode.Except,
    internal_array_mode: Literal['list', 'array_list_meta', 'array_hex_meta', 'array_b64_meta', 'external', 'zero_dim'] = 'array_list_meta',
    external_array_threshold: int = 256,
    external_list_threshold: int = 256,
    compress: bool | int = True,
    custom_settings: dict[str, typing.Any] | None = None,
    handlers_pre: None = (),
    handlers_default: None = (ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='numpy.ndarray:external', desc='external numpy array', source_pckg='zanj'), ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='torch.Tensor:external', desc='external torch tensor', source_pckg='zanj'), ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='list:external', desc='external list', source_pckg='zanj'), ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='tuple:external', desc='external tuple', source_pckg='zanj'), ZANJSerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='pandas.DataFrame:external', desc='external pandas DataFrame', source_pckg='zanj'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='base types', desc='base types (bool, int, float, str, None)'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='dictionaries', desc='dictionaries'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='(list, tuple) -> list', desc='lists and tuples as lists'), SerializerHandler(check=<function <lambda>>, serialize_func=<function _serialize_override_serialize_func>, uid='.serialize override', desc='objects with .serialize method'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='namedtuple -> dict', desc='namedtuples as dicts'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='dataclass -> dict', desc='dataclasses as dicts'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='path -> str', desc='Path objects as posix strings'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='obj -> str(obj)', desc='directly serialize objects in `SERIALIZE_DIRECT_AS_STR` to strings'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='numpy.ndarray', desc='numpy arrays'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='torch.Tensor', desc='pytorch tensors'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='pandas.DataFrame', desc='pandas DataFrames'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='(set, list, tuple, Iterable) -> list', desc='sets, lists, tuples, and Iterables as lists'), SerializerHandler(check=<function <lambda>>, serialize_func=<function <lambda>>, uid='fallback', desc='fallback handler -- serialize object attributes and special functions as strings'))
)

View Source on GitHub

def externals_info

(self) -> dict[str, dict[str, str | int | list[int]]]

View Source on GitHub

return information about the current externals

def meta

(
    self
) -> Union[bool, int, float, str, NoneType, List[Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]], Dict[str, Union[bool, int, float, str, NoneType, List[Any], Dict[str, Any]]]]

View Source on GitHub

return the metadata of the ZANJ archive

def save

(self, obj: Any, file_path: str | pathlib.Path) -> str

View Source on GitHub

save the object to a ZANJ archive. returns the path to the archive

def read

(self, file_path: Union[str, pathlib.Path]) -> Any

View Source on GitHub

load the object from a ZANJ archive ### TODO: load only some part of the zanj file by passing an ObjectPath

Inherited Members