Skip to content

📑 API Reference¤

This section holds a comprehensive documentation of all of classes, methods and functions in the humbldata package.

humbldata package.

humbldata.cli ¤

humbldata CLI.

humbldata.cli.say ¤

say(message: str = '') -> None

Say a message.

Source code in src\humbldata\cli.py
 8
 9
10
11
@app.command()
def say(message: str = "") -> None:
    """Say a message."""
    typer.echo(message)

humbldata.core ¤

The core module to contain logic & functions used in controllers.

This module is intended to contain sub-modules and functions that are not directly utilized from the package, but rather used in building the package itself. This means that the core module should not contain any code that is specific to the package's use case, but rather should be generic and reusable in other contexts.

humbldata.core.standard_models ¤

Models to represent core data structures of the Standardization Framework.

humbldata.core.standard_models.abstract ¤

Abstract core DATA MODELS to be inherited by other models.

humbldata.core.standard_models.abstract.data ¤

A wrapper around OpenBB Data Standardized Model to use with humbldata.

humbldata.core.standard_models.abstract.data.Data ¤

Bases: Data

An abstract standard_model to represent a base Data Model.

The Data Model should be used to define the data that is being collected and analyzed in a context.category.command call.

This Data model is meant to be inherited and built upon by other standard_models for a specific context.

Example
total_time = f"{end_time - start_time:.3f}"
class EquityHistoricalData(Data):

date: Union[dateType, datetime] = Field(
    description=DATA_DESCRIPTIONS.get("date", "")
)
open: float = Field(description=DATA_DESCRIPTIONS.get("open", ""))
high: float = Field(description=DATA_DESCRIPTIONS.get("high", ""))
low: float = Field(description=DATA_DESCRIPTIONS.get("low", ""))
close: float = Field(description=DATA_DESCRIPTIONS.get("close", ""))
volume: Optional[Union[float, int]] = Field(
    default=None, description=DATA_DESCRIPTIONS.get("volume", "")
)

@field_validator("date", mode="before", check_fields=False)
def date_validate(cls, v):  # pylint: disable=E0213
    v = parser.isoparse(str(v))
    if v.hour == 0 and v.minute == 0:
        return v.date()
    return v
Source code in src\humbldata\core\standard_models\abstract\data.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class Data(OpenBBData):
    """
    An abstract standard_model to represent a base Data Model.

    The Data Model should be used to define the data that is being
    collected and analyzed in a `context.category.command` call.

    This Data model is meant to be inherited and built upon by other
    standard_models for a specific context.

    Example
    -------
    ```py
    total_time = f"{end_time - start_time:.3f}"
    class EquityHistoricalData(Data):

    date: Union[dateType, datetime] = Field(
        description=DATA_DESCRIPTIONS.get("date", "")
    )
    open: float = Field(description=DATA_DESCRIPTIONS.get("open", ""))
    high: float = Field(description=DATA_DESCRIPTIONS.get("high", ""))
    low: float = Field(description=DATA_DESCRIPTIONS.get("low", ""))
    close: float = Field(description=DATA_DESCRIPTIONS.get("close", ""))
    volume: Optional[Union[float, int]] = Field(
        default=None, description=DATA_DESCRIPTIONS.get("volume", "")
    )

    @field_validator("date", mode="before", check_fields=False)
    def date_validate(cls, v):  # pylint: disable=E0213
        v = parser.isoparse(str(v))
        if v.hour == 0 and v.minute == 0:
            return v.date()
        return v

    ```
    """
humbldata.core.standard_models.abstract.errors ¤

An ABSTRACT DATA MODEL to be inherited by custom errors.

humbldata.core.standard_models.abstract.errors.HumblDataError ¤

Bases: BaseException

Base Error for HumblData logic.

Source code in src\humbldata\core\standard_models\abstract\errors.py
4
5
6
7
8
9
class HumblDataError(BaseException):
    """Base Error for HumblData logic."""

    def __init__(self, original: str | Exception | None = None):
        self.original = original
        super().__init__(str(original))
humbldata.core.standard_models.abstract.query_params ¤

A wrapper around OpenBB QueryParams Standardized Model to use with humbldata.

humbldata.core.standard_models.abstract.query_params.QueryParams ¤

Bases: QueryParams

An abstract standard_model to represent a base QueryParams Data.

QueryParams model should be used to define the query parameters for a context.category.command call.

This QueryParams model is meant to be inherited and built upon by other standard_models for a specific context.

Examples:

class EquityHistoricalQueryParams(QueryParams):

    symbol: str = Field(description=QUERY_DESCRIPTIONS.get("symbol", ""))
    interval: Optional[str] = Field(
        default="1d",
        description=QUERY_DESCRIPTIONS.get("interval", ""),
    )
    start_date: Optional[dateType] = Field(
        default=None,
        description=QUERY_DESCRIPTIONS.get("start_date", ""),
    )
    end_date: Optional[dateType] = Field(
        default=None,
        description=QUERY_DESCRIPTIONS.get("end_date", ""),
    )

    @field_validator("symbol", mode="before", check_fields=False)
    @classmethod
    def upper_symbol(cls, v: Union[str, List[str], Set[str]]):
        if isinstance(v, str):
            return v.upper()
        return ",".join([symbol.upper() for symbol in list(v)])

This would create a class that would be used to query historical price data for equities from any given command.

This could then be used to create a MandelbrotChannelEquityHistoricalQueryParams that would define what query parameters are needed for the Mandelbrot Channel command.

Source code in src\humbldata\core\standard_models\abstract\query_params.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
class QueryParams(OpenBBQueryParams):
    """
    An abstract standard_model to represent a base QueryParams Data.

    QueryParams model should be used to define the query parameters for a
    `context.category.command` call.

    This QueryParams model is meant to be inherited and built upon by other
    standard_models for a specific context.

    Examples
    --------
    ```py
    class EquityHistoricalQueryParams(QueryParams):

        symbol: str = Field(description=QUERY_DESCRIPTIONS.get("symbol", ""))
        interval: Optional[str] = Field(
            default="1d",
            description=QUERY_DESCRIPTIONS.get("interval", ""),
        )
        start_date: Optional[dateType] = Field(
            default=None,
            description=QUERY_DESCRIPTIONS.get("start_date", ""),
        )
        end_date: Optional[dateType] = Field(
            default=None,
            description=QUERY_DESCRIPTIONS.get("end_date", ""),
        )

        @field_validator("symbol", mode="before", check_fields=False)
        @classmethod
        def upper_symbol(cls, v: Union[str, List[str], Set[str]]):
            if isinstance(v, str):
                return v.upper()
            return ",".join([symbol.upper() for symbol in list(v)])
    ```

    This would create a class that would be used to query historical price data
    for equities from any given command.

    This could then be used to create a
    `MandelbrotChannelEquityHistoricalQueryParams` that would define what query
    parameters are needed for the Mandelbrot Channel command.
    """
humbldata.core.standard_models.abstract.singleton ¤

An ABSTRACT DATA MODEL, Singleton, to represent a class that should only have one instance.

humbldata.core.standard_models.abstract.singleton.SingletonMeta ¤

Bases: type, Generic[T]

SingletonMeta is a metaclass that creates a Singleton instance of a class.

Singleton design pattern restricts the instantiation of a class to a single instance. This is useful when exactly one object is needed to coordinate actions across the system.

Source code in src\humbldata\core\standard_models\abstract\singleton.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class SingletonMeta(type, Generic[T]):
    """
    SingletonMeta is a metaclass that creates a Singleton instance of a class.

    Singleton design pattern restricts the instantiation of a class to a single
    instance. This is useful when exactly one object is needed to coordinate
    actions across the system.
    """

    _instances: ClassVar[dict[T, T]] = {}  # type: ignore  # noqa: PGH003

    def __call__(cls, *args, **kwargs) -> T:
        """
        Override the __call__ method.

        If the class exists, otherwise creates a new instance and stores it in
        the _instances dictionary.
        """
        if cls not in cls._instances:
            instance = super().__call__(*args, **kwargs)
            cls._instances[cls] = instance  # type: ignore  # noqa: PGH003

        return cls._instances[cls]  # type: ignore  # noqa: PGH003
humbldata.core.standard_models.abstract.singleton.SingletonMeta.__call__ ¤
__call__(*args, **kwargs) -> T

Override the call method.

If the class exists, otherwise creates a new instance and stores it in the _instances dictionary.

Source code in src\humbldata\core\standard_models\abstract\singleton.py
21
22
23
24
25
26
27
28
29
30
31
32
def __call__(cls, *args, **kwargs) -> T:
    """
    Override the __call__ method.

    If the class exists, otherwise creates a new instance and stores it in
    the _instances dictionary.
    """
    if cls not in cls._instances:
        instance = super().__call__(*args, **kwargs)
        cls._instances[cls] = instance  # type: ignore  # noqa: PGH003

    return cls._instances[cls]  # type: ignore  # noqa: PGH003
humbldata.core.standard_models.abstract.tagged ¤

An ABSTRACT DATA MODEL, Tagged, to be inherited by other models as identifier.

humbldata.core.standard_models.abstract.tagged.Tagged ¤

Bases: BaseModel

A class to represent an object tagged with a uuid7.

Source code in src\humbldata\core\standard_models\abstract\tagged.py
 7
 8
 9
10
class Tagged(BaseModel):
    """A class to represent an object tagged with a uuid7."""

    id: str = Field(default_factory=uuid7str, alias="_id")

humbldata.core.standard_models.toolbox ¤

Context: Toolbox || Category: Standardized Framework Model.

This module defines the QueryParams and Data classes for the Toolbox context. THis is where all of the context(s) of your project go. The STANDARD MODELS for categories and subsequent commands are nested here.

Classes:

Name Description
ToolboxQueryParams

Query parameters for the ToolboxController.

ToolboxData

A Pydantic model that defines the data returned by the ToolboxController.

Attributes:

Name Type Description
symbol str

The symbol/ticker of the stock.

interval Optional[str]

The interval of the data. Defaults to '1d'.

start_date str

The start date of the data.

end_date str

The end date of the data.

humbldata.core.standard_models.toolbox.technical ¤

Context: Toolbox || Category: Technical.

humbldata.core.standard_models.toolbox.technical.mandelbrotchannel ¤

Mandelbrot Channel Standard Model.

Context: Toolbox || Category: Technical || Command: Mandelbrot Channel.

This module is used to define the QueryParams and Data model for the Mandelbrot Channel command.

humbldata.core.standard_models.toolbox.technical.mandelbrotchannel.MandelbrotChannelQueryParams ¤

Bases: QueryParams

QueryParams for the Mandelbrot Channel command.

Source code in src\humbldata\core\standard_models\toolbox\technical\mandelbrotchannel.py
21
22
23
24
25
26
class MandelbrotChannelQueryParams(QueryParams):
    """
    QueryParams for the Mandelbrot Channel command.


    """
humbldata.core.standard_models.toolbox.technical.mandelbrotchannel.MandelbrotChannelData ¤

Bases: Data

Data model for the Mandelbrot Channel command.

Source code in src\humbldata\core\standard_models\toolbox\technical\mandelbrotchannel.py
29
30
31
32
class MandelbrotChannelData(Data):
    """
    Data model for the Mandelbrot Channel command.
    """
humbldata.core.standard_models.toolbox.technical.mandelbrotchannel.MandelbrotChannelFetcher ¤

Bases: MandelbrotChannelQueryParams, MandelbrotChannelData

Fetcher for the Mandelbrot Channel command.

Source code in src\humbldata\core\standard_models\toolbox\technical\mandelbrotchannel.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
class MandelbrotChannelFetcher(
    MandelbrotChannelQueryParams, MandelbrotChannelData
):
    """
    Fetcher for the Mandelbrot Channel command.
    """

    def __init__(
        self,
        context_params: ToolboxQueryParams,
        command_params: MandelbrotChannelQueryParams,
    ):
        self._context_params = context_params
        self._command_params = command_params

    def transform_query(self):
        """Transform the params to the command-specific query."""

    def extract_data(self):
        """Extract the data from the provider."""
        # Assuming 'obb' is a predefined object in your context
        df = (
            obb.equity.price.historical(
                symbol=self.context_params.symbol,
                start_date=str(self.context_params.start_date),
                end_date=str(self.context_params.end_date),
                provider=self.command_params.provider,
                verbose=not self.command_params.kwargs.get("silent", False),
                **self.command_params.kwargs,
            ).to_polars()
        ).drop(["dividends", "stock_splits"], axis=1)
        return df

    def transform_data(self):
        """Transform the command-specific data."""
        # Placeholder for data transformation logic

    def fetch_data(self):
        # Call the methods in the desired order
        query = self.transform_query()
        raw_data = (
            self.extract_data()
        )  # This should use 'query' to fetch the data
        transformed_data = (
            self.transform_data()
        )  # This should transform 'raw_data'

        # Validate with MandelbrotChannelData, unpack dict into pydantic row by row
        return transformed_data
humbldata.core.standard_models.toolbox.technical.mandelbrotchannel.MandelbrotChannelFetcher.transform_query ¤
transform_query()

Transform the params to the command-specific query.

Source code in src\humbldata\core\standard_models\toolbox\technical\mandelbrotchannel.py
50
51
def transform_query(self):
    """Transform the params to the command-specific query."""
humbldata.core.standard_models.toolbox.technical.mandelbrotchannel.MandelbrotChannelFetcher.extract_data ¤
extract_data()

Extract the data from the provider.

Source code in src\humbldata\core\standard_models\toolbox\technical\mandelbrotchannel.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def extract_data(self):
    """Extract the data from the provider."""
    # Assuming 'obb' is a predefined object in your context
    df = (
        obb.equity.price.historical(
            symbol=self.context_params.symbol,
            start_date=str(self.context_params.start_date),
            end_date=str(self.context_params.end_date),
            provider=self.command_params.provider,
            verbose=not self.command_params.kwargs.get("silent", False),
            **self.command_params.kwargs,
        ).to_polars()
    ).drop(["dividends", "stock_splits"], axis=1)
    return df
humbldata.core.standard_models.toolbox.technical.mandelbrotchannel.MandelbrotChannelFetcher.transform_data ¤
transform_data()

Transform the command-specific data.

Source code in src\humbldata\core\standard_models\toolbox\technical\mandelbrotchannel.py
68
69
def transform_data(self):
    """Transform the command-specific data."""
humbldata.core.standard_models.toolbox.technical.realized_volatility ¤

Volatility Standard Model.

Context: Toolbox || Category: Technical || Command: Volatility.

This module is used to define the QueryParams and Data model for the Volatility command.

humbldata.core.standard_models.toolbox.technical.realized_volatility.RealizedVolatilityQueryParams ¤

Bases: QueryParams

QueryParams for the Realized Volatility command.

Source code in src\humbldata\core\standard_models\toolbox\technical\realized_volatility.py
21
22
23
24
class RealizedVolatilityQueryParams(QueryParams):
    """
    QueryParams for the Realized Volatility command.
    """
humbldata.core.standard_models.toolbox.technical.realized_volatility.RealizedVolatilityData ¤

Bases: Data

Data model for the Realized Volatility command.

Source code in src\humbldata\core\standard_models\toolbox\technical\realized_volatility.py
27
28
29
30
class RealizedVolatilityData(Data):
    """
    Data model for the Realized Volatility command.
    """
humbldata.core.standard_models.toolbox.technical.realized_volatility.RealizedVolatilityFetcher ¤

Bases: RealizedVolatilityQueryParams

Fetcher for the Realized Volatility command.

Source code in src\humbldata\core\standard_models\toolbox\technical\realized_volatility.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class RealizedVolatilityFetcher(RealizedVolatilityQueryParams):
    """
    Fetcher for the Realized Volatility command.
    """

    data_list: ClassVar[list[RealizedVolatilityData]] = []

    def __init__(
        self,
        context_params: ToolboxQueryParams,
        command_params: RealizedVolatilityQueryParams,
    ):
        self._context_params = context_params
        self._command_params = command_params

    def transform_query(self):
        """Transform the params to the command-specific query."""

    def extract_data(self):
        """Extract the data from the provider."""
        # Assuming 'obb' is a predefined object in your context
        df = (
            obb.equity.price.historical(
                symbol=self.context_params.symbol,
                start_date=str(self.context_params.start_date),
                end_date=str(self.context_params.end_date),
                provider=self.command_params.provider,
                verbose=not self.command_params.kwargs.get("silent", False),
                **self.command_params.kwargs,
            )
            .to_df()
            .reset_index()
        )
        return df

    def transform_data(self):
        """Transform the command-specific data."""
        # Placeholder for data transformation logic

    def fetch_data(self):
        """Execute the TET pattern."""
        # Call the methods in the desired order
        query = self.transform_query()
        raw_data = (
            self.extract_data()
        )  # This should use 'query' to fetch the data
        transformed_data = (
            self.transform_data()
        )  # This should transform 'raw_data'

        # Validate with VolatilityData, unpack dict into pydantic row by row
        return transformed_data
humbldata.core.standard_models.toolbox.technical.realized_volatility.RealizedVolatilityFetcher.transform_query ¤
transform_query()

Transform the params to the command-specific query.

Source code in src\humbldata\core\standard_models\toolbox\technical\realized_volatility.py
48
49
def transform_query(self):
    """Transform the params to the command-specific query."""
humbldata.core.standard_models.toolbox.technical.realized_volatility.RealizedVolatilityFetcher.extract_data ¤
extract_data()

Extract the data from the provider.

Source code in src\humbldata\core\standard_models\toolbox\technical\realized_volatility.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def extract_data(self):
    """Extract the data from the provider."""
    # Assuming 'obb' is a predefined object in your context
    df = (
        obb.equity.price.historical(
            symbol=self.context_params.symbol,
            start_date=str(self.context_params.start_date),
            end_date=str(self.context_params.end_date),
            provider=self.command_params.provider,
            verbose=not self.command_params.kwargs.get("silent", False),
            **self.command_params.kwargs,
        )
        .to_df()
        .reset_index()
    )
    return df
humbldata.core.standard_models.toolbox.technical.realized_volatility.RealizedVolatilityFetcher.transform_data ¤
transform_data()

Transform the command-specific data.

Source code in src\humbldata\core\standard_models\toolbox\technical\realized_volatility.py
68
69
def transform_data(self):
    """Transform the command-specific data."""
humbldata.core.standard_models.toolbox.technical.realized_volatility.RealizedVolatilityFetcher.fetch_data ¤
fetch_data()

Execute the TET pattern.

Source code in src\humbldata\core\standard_models\toolbox\technical\realized_volatility.py
72
73
74
75
76
77
78
79
80
81
82
83
84
def fetch_data(self):
    """Execute the TET pattern."""
    # Call the methods in the desired order
    query = self.transform_query()
    raw_data = (
        self.extract_data()
    )  # This should use 'query' to fetch the data
    transformed_data = (
        self.transform_data()
    )  # This should transform 'raw_data'

    # Validate with VolatilityData, unpack dict into pydantic row by row
    return transformed_data
humbldata.core.standard_models.toolbox.ToolboxQueryParams ¤

Bases: QueryParams

Query parameters for the ToolboxController.

This class defines the query parameters used by the ToolboxController, including the stock symbol, data interval, start date, and end date. It also includes a method to ensure the stock symbol is in uppercase.

Attributes:

Name Type Description
symbol str

The symbol or ticker of the stock.

interval Optional[str]

The interval of the data. Defaults to '1d'. Can be None.

start_date str

The start date for the data query.

end_date str

The end date for the data query.

Methods:

Name Description
upper_symbol

A Pydantic @field_validator() that converts the stock symbol to uppercase. If a list or set of symbols is provided, each symbol in the collection is converted to uppercase and returned as a comma-separated string.

Source code in src\humbldata\core\standard_models\toolbox\__init__.py
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
class ToolboxQueryParams(QueryParams):
    """
    Query parameters for the ToolboxController.

    This class defines the query parameters used by the ToolboxController,
    including the stock symbol, data interval, start date, and end date. It also
    includes a method to ensure the stock symbol is in uppercase.

    Attributes
    ----------
    symbol : str
        The symbol or ticker of the stock.
    interval : Optional[str]
        The interval of the data. Defaults to '1d'. Can be None.
    start_date : str
        The start date for the data query.
    end_date : str
        The end date for the data query.

    Methods
    -------
    upper_symbol(cls, v: Union[str, list[str], set[str]]) -> Union[str, list[str]]
        A Pydantic `@field_validator()` that converts the stock symbol to
        uppercase. If a list or set of symbols is provided, each symbol in the
        collection is converted to uppercase and returned as a comma-separated
        string.
    """

    symbol: str = Field(
        default="",
        title="The symbol/ticker of the stock",
        description=QUERY_DESCRIPTIONS.get("symbol", ""),
    )
    interval: str | None = Field(
        default="1d",
        title="The interval of the data",
        description=QUERY_DESCRIPTIONS.get("interval", ""),
    )
    start_date: str = Field(
        default="",
        title="The start date of the data",
        description="The starting date for the data query.",
    )
    end_date: str = Field(
        default="",
        title="The end date of the data",
        description="The ending date for the data query.",
    )

    @field_validator("symbol", mode="before", check_fields=False)
    @classmethod
    def upper_symbol(cls, v: str | list[str] | set[str]) -> str | list[str]:
        """
        Convert the stock symbol to uppercase.

        Parameters
        ----------
        v : Union[str, List[str], Set[str]]
            The stock symbol or collection of symbols to be converted.

        Returns
        -------
        Union[str, List[str]]
            The uppercase stock symbol or a comma-separated string of uppercase
            symbols.
        """
        if isinstance(v, str):
            return v.upper()
        return ",".join([symbol.upper() for symbol in list(v)])
humbldata.core.standard_models.toolbox.ToolboxQueryParams.upper_symbol classmethod ¤
upper_symbol(v: str | list[str] | set[str]) -> str | list[str]

Convert the stock symbol to uppercase.

Parameters:

Name Type Description Default
v Union[str, List[str], Set[str]]

The stock symbol or collection of symbols to be converted.

required

Returns:

Type Description
Union[str, List[str]]

The uppercase stock symbol or a comma-separated string of uppercase symbols.

Source code in src\humbldata\core\standard_models\toolbox\__init__.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
@field_validator("symbol", mode="before", check_fields=False)
@classmethod
def upper_symbol(cls, v: str | list[str] | set[str]) -> str | list[str]:
    """
    Convert the stock symbol to uppercase.

    Parameters
    ----------
    v : Union[str, List[str], Set[str]]
        The stock symbol or collection of symbols to be converted.

    Returns
    -------
    Union[str, List[str]]
        The uppercase stock symbol or a comma-separated string of uppercase
        symbols.
    """
    if isinstance(v, str):
        return v.upper()
    return ",".join([symbol.upper() for symbol in list(v)])
humbldata.core.standard_models.toolbox.ToolboxData ¤

Bases: Data

The Data for the ToolboxController.

WIP: I'm thinking that this is the final layer around which the HumblDataObject will be returned to the user, with all necessary information about the query, command, data and charts that they should want. This HumblDataObject will return values in json/dict format, with methods to allow transformation into polars_df, pandas_df, a list, a dict...

Source code in src\humbldata\core\standard_models\toolbox\__init__.py
106
107
108
109
110
111
112
113
114
115
class ToolboxData(Data):
    """
    The Data for the ToolboxController.

    WIP: I'm thinking that this is the final layer around which the
    HumblDataObject will be returned to the user, with all necessary information
    about the query, command, data and charts that they should want.
    This HumblDataObject will return values in json/dict format, with methods
    to allow transformation into polars_df, pandas_df, a list, a dict...
    """

humbldata.core.utils ¤

humbldata core utils.

Utils is used to keep; helpers, descriptions, constants, and other useful tools.

humbldata.core.utils.constants ¤

A module to contain all project-wide constants.

humbldata.core.utils.core_helpers ¤

A module to contain core helper functions for the program.

humbldata.core.utils.core_helpers.is_debug_mode ¤
is_debug_mode() -> bool

Check if the current system is in debug mode.

Returns:

Type Description
bool

True if the system is in debug mode, False otherwise.

Source code in src\humbldata\core\utils\core_helpers.py
12
13
14
15
16
17
18
19
20
21
def is_debug_mode() -> bool:
    """
    Check if the current system is in debug mode.

    Returns
    -------
    bool
        True if the system is in debug mode, False otherwise.
    """
    return False
humbldata.core.utils.core_helpers.log_start_end ¤
log_start_end(func: Callable | None = None, *, log: Logger | None = None) -> Callable

Add logging at the start and end of any function it decorates, including time tracking.

Handles exceptions by logging them and modifies behavior based on the system's debug mode. Logs the total time taken by the function.

Parameters:

Name Type Description Default
func Optional[Callable]

The function to decorate.

None
log Optional[Logger]

The logger to use for logging.

None

Returns:

Type Description
Callable

The decorated function.

Source code in src\humbldata\core\utils\core_helpers.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
def log_start_end(
    func: Callable | None = None, *, log: logging.Logger | None = None
) -> Callable:
    """
    Add logging at the start and end of any function it decorates, including time tracking.

    Handles exceptions by logging them and modifies behavior based on the
    system's debug mode. Logs the total time taken by the function.

    Parameters
    ----------
    func : Optional[Callable]
        The function to decorate.
    log : Optional[logging.Logger]
        The logger to use for logging.

    Returns
    -------
    Callable
        The decorated function.
    """
    assert callable(func) or func is None

    def decorator(func: Callable) -> Callable:
        @functools.wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            import time  # lazy import

            nonlocal log
            if log is None:
                log = logging.getLogger(func.__module__)

            start_time = time.time()
            log.info("START", extra={"func_name": func.__name__})

            try:
                result = func(*args, **kwargs)
            except KeyboardInterrupt:
                end_time = time.time()
                total_time = end_time - start_time
                log.info(
                    "Interrupted by user",
                    extra={
                        "func_name": func.__name__,
                        "total_time": total_time,
                    },
                )
                return []
            except Exception as e:
                end_time = time.time()
                total_time = end_time - start_time
                log.exception(
                    "Exception in:",
                    extra={
                        "func_name": func.__name__,
                        "exception": e,
                        "total_time": total_time,
                    },
                )
                return []
            else:
                end_time = time.time()
                total_time = end_time - start_time
                log.info(
                    "END ",
                    extra={
                        "func_name": func.__name__,
                        "total_time": total_time,
                    },
                )
                return result

        return wrapper

    return decorator(func) if callable(func) else decorator

humbldata.core.utils.descriptions ¤

Common descriptions for model fields.

humbldata.core.utils.env ¤

The Env Module, to control a single instance of environment variables.

humbldata.core.utils.env.Env ¤

A singleton environment to hold all Environment variables.

Source code in src\humbldata\core\utils\env.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class Env(metaclass=SingletonMeta):
    """A singleton environment to hold all Environment variables."""

    _environ: dict[str, str]

    def __init__(self) -> None:
        env_path = dotenv.find_dotenv()
        dotenv.load_dotenv(Path(env_path))

        self._environ = os.environ.copy()

    @property
    def OBB_PAT(self) -> str | None:  # noqa: N802
        """OpenBB Personal Access Token."""
        return self._environ.get("OBB_PAT", None)

    @property
    def OBB_LOGGED_IN(self) -> bool:
        return self.str2bool(self._environ.get("OBB_LOGGED_IN", False))

    @staticmethod
    def str2bool(value: str | bool) -> bool:
        """Match a value to its boolean correspondent.

        Args:
            value (str): The string value to be converted to a boolean.

        Returns
        -------
            bool: The boolean value corresponding to the input string.

        Raises
        ------
            ValueError: If the input string does not correspond to a boolean
            value.
        """
        if isinstance(value, bool):
            return value
        if value.lower() in {"false", "f", "0", "no", "n"}:
            return False
        if value.lower() in {"true", "t", "1", "yes", "y"}:
            return True
        msg = f"Failed to cast '{value}' to bool."
        raise ValueError(msg)
humbldata.core.utils.env.Env.OBB_PAT property ¤
OBB_PAT: str | None

OpenBB Personal Access Token.

humbldata.core.utils.env.Env.str2bool staticmethod ¤
str2bool(value: str | bool) -> bool

Match a value to its boolean correspondent.

Args: value (str): The string value to be converted to a boolean.

Returns:

Type Description
bool: The boolean value corresponding to the input string.

Raises:

Type Description
ValueError: If the input string does not correspond to a boolean

value.

Source code in src\humbldata\core\utils\env.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
@staticmethod
def str2bool(value: str | bool) -> bool:
    """Match a value to its boolean correspondent.

    Args:
        value (str): The string value to be converted to a boolean.

    Returns
    -------
        bool: The boolean value corresponding to the input string.

    Raises
    ------
        ValueError: If the input string does not correspond to a boolean
        value.
    """
    if isinstance(value, bool):
        return value
    if value.lower() in {"false", "f", "0", "no", "n"}:
        return False
    if value.lower() in {"true", "t", "1", "yes", "y"}:
        return True
    msg = f"Failed to cast '{value}' to bool."
    raise ValueError(msg)

humbldata.core.utils.openbb_helpers ¤

Core Module - OpenBB Helpers.

This module contains functions used to interact with OpenBB, or wrap commands to have specific data outputs.

humbldata.core.utils.openbb_helpers.obb_login ¤
obb_login(pat: str | None = None) -> bool

Log into the OpenBB Hub using a Personal Access Token (PAT).

This function wraps the obb.account.login method to provide a simplified interface for logging into OpenBB Hub. It optionally accepts a PAT. If no PAT is provided, it attempts to use the PAT stored in the environment variable OBB_PAT.

Parameters:

Name Type Description Default
pat str | None

The personal access token for authentication. If None, the token is retrieved from the environment variable OBB_PAT. Default is None.

None

Returns:

Type Description
bool

True if login is successful, False otherwise.

Raises:

Type Description
HumblDataError

If an error occurs during the login process.

Examples:

>>> # obb_login("your_personal_access_token_here")
True
>>> # obb_login()  # Assumes `OBB_PAT` is set in the environment
True
Source code in src\humbldata\core\utils\openbb_helpers.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def obb_login(pat: str | None = None) -> bool:
    """
    Log into the OpenBB Hub using a Personal Access Token (PAT).

    This function wraps the `obb.account.login` method to provide a simplified
    interface for logging into OpenBB Hub. It optionally accepts a PAT. If no PAT
    is provided, it attempts to use the PAT stored in the environment variable
    `OBB_PAT`.

    Parameters
    ----------
    pat : str | None, optional
        The personal access token for authentication. If None, the token is
        retrieved from the environment variable `OBB_PAT`. Default is None.

    Returns
    -------
    bool
        True if login is successful, False otherwise.

    Raises
    ------
    HumblDataError
        If an error occurs during the login process.

    Examples
    --------
    >>> # obb_login("your_personal_access_token_here")
    True

    >>> # obb_login()  # Assumes `OBB_PAT` is set in the environment
    True

    """
    if pat is None:
        pat = Env().OBB_PAT
    try:
        obb.account.login(pat=pat, remember_me=True)
        # obb.account.save()

        # dotenv.set_key(dotenv.find_dotenv(), "OBB_LOGGED_IN", "true")

        return True
    except Exception as e:
        from humbldata.core.standard_models.abstract.warnings import (
            HumblDataWarning,
        )

        # dotenv.set_key(dotenv.find_dotenv(), "OBB_LOGGED_IN", "false")

        warnings.warn(
            "An error occurred while logging into OpenBB. Details below:\n"
            + repr(e),
            category=HumblDataWarning,
            stacklevel=1,
        )
        return False
humbldata.core.utils.openbb_helpers.get_latest_price ¤
get_latest_price(symbol: str | list[str] | Series, provider: Literal['fmp', 'intrinio'] | None = None) -> LazyFrame

Context: Core || Category: Utils || Subcategory: OpenBB Helpers || Command: get_latest_price.

This function queries the latest stock price data using the specified provider. If no provider is specified, it defaults to using FinancialModelingPrep (fmp). The function returns a LazyFrame containing the stock symbols and their corresponding latest prices.

Parameters:

Name Type Description Default
symbol str | list[str] | Series

The stock symbol(s) for which to fetch the latest price. Can be a single symbol, a list of symbols, or a Polars Series of symbols.

required
provider Literal['fmp', 'intrinio'] | None

The data provider to use for fetching the stock prices. If not specified, a default provider is used.

None

Returns:

Type Description
LazyFrame

A Polars LazyFrame containing columns for the stock symbols ('symbol') and their most recent prices ('last_price').

Source code in src\humbldata\core\utils\openbb_helpers.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def get_latest_price(
    symbol: str | list[str] | pl.Series,
    provider: Literal["fmp", "intrinio"] | None = None,
) -> pl.LazyFrame:
    """
    Context: Core || Category: Utils || Subcategory: OpenBB Helpers || **Command: get_latest_price**.

    This function queries the latest stock price data using the specified
    provider. If no provider is specified, it defaults to using
    FinancialModelingPrep (`fmp`). The function returns a LazyFrame containing
    the stock symbols and their corresponding latest prices.

    Parameters
    ----------
    symbol : str | list[str] | pl.Series
        The stock symbol(s) for which to fetch the latest price. Can be a
        single symbol, a list of symbols, or a Polars Series of symbols.

    provider : Literal["fmp", "intrinio"] | None, optional
        The data provider to use for fetching the stock prices. If not
        specified, a default provider is used.

    Returns
    -------
    pl.LazyFrame
        A Polars LazyFrame containing columns for the stock symbols ('symbol')
        and their most recent prices ('last_price').
    """
    logging.getLogger("openbb_terminal.stocks.stocks_model").setLevel(
        logging.CRITICAL
    )

    latest_prices = (
        obb.equity.price.quote(symbol, provider=provider).to_polars().lazy()
    )
    return latest_prices.select(["symbol", "last_price"]).rename(
        {"last_price": "recent_price"}
    )

humbldata.toolbox ¤

Context: Toolbox.

A category to group all of the technical indicators available in the Toolbox()

Technical indicators rely on statistical transformations of time series data. These are raw math operations.

humbldata.toolbox.toolbox_controller ¤

Context: Toolbox.

The Toolbox Controller Module.

humbldata.toolbox.toolbox_controller.Toolbox ¤

Bases: ToolboxQueryParams

The top-level controller for all data analysis in the humbldata package.

This module serves as the primary controller, routing user-specified ToolboxQueryParams as core arguments that are used to fetch time series data.

The Toolbox controller also gives access to all sub-modules adn their functions.

It is designed to facilitate the collection of data across various types such as stocks, options, or alternative time series by requiring minimal input from the user.

Submodules

The Toolbox controller is composed of the following submodules:

  • technical:
  • quantitative:
  • fundamental:

Parameters:

Name Type Description Default
symbol str

The symbol or ticker of the stock.

required
interval str

The interval of the data. Defaults to '1d'.

required
start_date str

The start date for the data query.

required
end_date str

The end date for the data query.

required
Parameter Notes

The Parameters (symbol, interval, start_date, end_date) are the ToolboxQueryParams. They are used for data collection further down the pipeline. to execute operations on core data sets. This approach enables composable and standardized querying while accommodating data-specific collection logic.

Source code in src\humbldata\toolbox\toolbox_controller.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class Toolbox(ToolboxQueryParams):
    """

    The top-level controller for all data analysis in the `humbldata` package.

    This module serves as the primary controller, routing user-specified
    ToolboxQueryParams as core arguments that are used to fetch time series
    data.

    The `Toolbox` controller also gives access to all sub-modules adn their
    functions.

    It is designed to facilitate the collection of data across various types such as
    stocks, options, or alternative time series by requiring minimal input from the user.

    Submodules
    ----------
    The `Toolbox` controller is composed of the following submodules:

    - `technical`:
    - `quantitative`:
    - `fundamental`:

    Parameters
    ----------
    symbol : str
        The symbol or ticker of the stock.
    interval : str, optional
        The interval of the data. Defaults to '1d'.
    start_date : str
        The start date for the data query.
    end_date : str
        The end date for the data query.

    Parameter Notes
    -----
    The Parameters (`symbol`, `interval`, `start_date`, `end_date`)
    are the `ToolboxQueryParams`. They are used for data collection further
    down the pipeline. to execute operations on core data sets.
    This approach enables composable and standardized querying while
    accommodating data-specific collection logic.
    """

    def __init__(self, *args, **kwargs):
        """
        Initialize the Toolbox module.

        This method does not take any parameters and does not return anything.
        """
        super().__init__(*args, **kwargs)

    @property
    def technical(self):
        """
        The technical submodule of the Toolbox controller.

        Access to all the technical indicators.
        """
        return Technical(self)
humbldata.toolbox.toolbox_controller.Toolbox.__init__ ¤
__init__(*args, **kwargs)

Initialize the Toolbox module.

This method does not take any parameters and does not return anything.

Source code in src\humbldata\toolbox\toolbox_controller.py
55
56
57
58
59
60
61
def __init__(self, *args, **kwargs):
    """
    Initialize the Toolbox module.

    This method does not take any parameters and does not return anything.
    """
    super().__init__(*args, **kwargs)
humbldata.toolbox.toolbox_controller.Toolbox.technical property ¤
technical

The technical submodule of the Toolbox controller.

Access to all the technical indicators.

humbldata.toolbox.toolbox_helpers ¤

Context: Toolbox || Category: Helpers.

These Toolbox() helpers are used in various calculations in the toolbox context. Most of the helpers will be mathematical transformations of data. These functions should be DUMB functions.

humbldata.toolbox.toolbox_helpers.log_returns ¤

log_returns(data: Series | DataFrame | LazyFrame | None = None, _column_name: str = 'adj_close', *, _drop_nulls: bool = True, _sort: bool = True) -> Series | DataFrame | LazyFrame

Context: Toolbox || Category: Helpers || Command: log_returns.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY. Calculates the logarithmic returns for a given Polars Series, DataFrame, or LazyFrame. Logarithmic returns are widely used in the financial industry to measure the rate of return on investments over time. This function supports calculations on both individual series and dataframes containing financial time series data.

Parameters:

Name Type Description Default
data Series | DataFrame | LazyFrame

The input data for which to calculate the log returns. Default is None.

None
_drop_nulls bool

Whether to drop null values from the result. Default is True.

True
_column_name str

The column name to use for log return calculations in DataFrame or LazyFrame. Default is "adj_close".

'adj_close'
_sort bool

If True, sorts the DataFrame or LazyFrame by date and symbol before calculation. If you want a DUMB function, set to False. Default is True.

True

Returns:

Type Description
Series | DataFrame | LazyFrame

The original data, with an extra column of log returns of the input data. The return type matches the input type.

Raises:

Type Description
HumblDataError

If neither a series, DataFrame, nor LazyFrame is provided as input.

Examples:

>>> series = pl.Series([100, 105, 103])
>>> log_returns(data=series)
series([-inf, 0.048790, -0.019418])
>>> df = pl.DataFrame({"adj_close": [100, 105, 103]})
>>> log_returns(data=df)
shape: (3, 2)
┌───────────┬────────────┐
│ adj_close ┆ log_returns│
│ ---       ┆ ---        │
│ f64       ┆ f64        │
╞═══════════╪════════════╡
│ 100.0     ┆ NaN        │
├───────────┼────────────┤
│ 105.0     ┆ 0.048790   │
├───────────┼────────────┤
│ 103.0     ┆ -0.019418  │
└───────────┴────────────┘
Improvements

Add a parameter _sort_cols: list[str] | None = None to make the function even dumber. This way you could specify certain columns to sort by instead of using default date and symbol. If _sort_cols=None and _sort=True, then the function will use the default date and symbol columns for sorting.

Source code in src\humbldata\toolbox\toolbox_helpers.py
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
def log_returns(
    data: pl.Series | pl.DataFrame | pl.LazyFrame | None = None,
    _column_name: str = "adj_close",
    *,
    _drop_nulls: bool = True,
    _sort: bool = True,
) -> pl.Series | pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Helpers || **Command: log_returns**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.
    Calculates the logarithmic returns for a given Polars Series, DataFrame, or
    LazyFrame. Logarithmic returns are widely used in the financial
    industry to measure the rate of return on investments over time. This
    function supports calculations on both individual series and dataframes
    containing financial time series data.

    Parameters
    ----------
    data : pl.Series | pl.DataFrame | pl.LazyFrame, optional
        The input data for which to calculate the log returns. Default is None.
    _drop_nulls : bool, optional
        Whether to drop null values from the result. Default is True.
    _column_name : str, optional
        The column name to use for log return calculations in DataFrame or
        LazyFrame. Default is "adj_close".
    _sort : bool, optional
        If True, sorts the DataFrame or LazyFrame by `date` and `symbol` before
        calculation. If you want a DUMB function, set to False.
        Default is True.

    Returns
    -------
    pl.Series | pl.DataFrame | pl.LazyFrame
        The original `data`, with an extra column of `log returns` of the input
        data. The return type matches the input type.

    Raises
    ------
    HumblDataError
        If neither a series, DataFrame, nor LazyFrame is provided as input.

    Examples
    --------
    >>> series = pl.Series([100, 105, 103])
    >>> log_returns(data=series)
    series([-inf, 0.048790, -0.019418])

    >>> df = pl.DataFrame({"adj_close": [100, 105, 103]})
    >>> log_returns(data=df)
    shape: (3, 2)
    ┌───────────┬────────────┐
    │ adj_close ┆ log_returns│
    │ ---       ┆ ---        │
    │ f64       ┆ f64        │
    ╞═══════════╪════════════╡
    │ 100.0     ┆ NaN        │
    ├───────────┼────────────┤
    │ 105.0     ┆ 0.048790   │
    ├───────────┼────────────┤
    │ 103.0     ┆ -0.019418  │
    └───────────┴────────────┘

    Improvements
    -----------
    Add a parameter `_sort_cols: list[str] | None = None` to make the function even
    dumber. This way you could specify certain columns to sort by instead of
    using default `date` and `symbol`. If `_sort_cols=None` and `_sort=True`,
    then the function will use the default `date` and `symbol` columns for
    sorting.

    """
    # Calculation for Polars Series
    if isinstance(data, pl.Series):
        out = data.log().diff()
        if _drop_nulls:
            out = out.drop_nulls()
    # Calculation for Polars DataFrame or LazyFrame
    elif isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
        elif _sort and not sort_cols:
            msg = "Data must contain 'symbol' and 'date' columns for sorting."
            raise HumblDataError(msg)

        if "log_returns" not in data.columns:
            out = data.set_sorted(sort_cols).with_columns(
                pl.col(_column_name).log().diff().alias("log_returns")
            )
        else:
            out = data
        if _drop_nulls:
            out = out.drop_nulls(subset="log_returns")
    else:
        msg = "No valid data type was provided for `log_returns()` calculation."
        raise HumblDataError(msg)

    return out

humbldata.toolbox.toolbox_helpers.detrend ¤

detrend(data: DataFrame | LazyFrame | Series, _detrend_col: str = 'log_returns', _detrend_value_col: str | Series | None = 'window_mean', *, _sort: bool = False) -> DataFrame | LazyFrame | Series

Context: Toolbox || Category: Helpers || Command: detrend.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

Detrends a column in a DataFrame, LazyFrame, or Series by subtracting the values of another column from it. Optionally sorts the data by 'symbol' and 'date' before detrending if _sort is True.

Parameters:

Name Type Description Default
data Union[DataFrame, LazyFrame, Series]

The data structure containing the columns to be processed.

required
_detrend_col str

The name of the column from which values will be subtracted.

'log_returns'
_detrend_value_col str | Series | None

The name of the column whose values will be subtracted OR if you pass a pl.Series to the data parameter, then you can use this to pass a second pl.Series to subtract from the first.

'window_mean'
_sort bool

If True, sorts the data by 'symbol' and 'date' before detrending. Default is False.

False

Returns:

Type Description
Union[DataFrame, LazyFrame, Series]

The detrended data structure with the same type as the input, with an added column named f"detrended_{_detrend_col}".

Notes

Function doesn't use .over() in calculation. Once the data is sorted, subtracting _detrend_value_col from _detrend_col is a simple operation that doesn't need to be grouped, because the sorting has already aligned the rows for subtraction

Source code in src\humbldata\toolbox\toolbox_helpers.py
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
def detrend(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    _detrend_col: str = "log_returns",
    _detrend_value_col: str | pl.Series | None = "window_mean",
    *,
    _sort: bool = False,
) -> pl.DataFrame | pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: detrend**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    Detrends a column in a DataFrame, LazyFrame, or Series by subtracting the
    values of another column from it. Optionally sorts the data by 'symbol' and
    'date' before detrending if _sort is True.

    Parameters
    ----------
    data : Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The data structure containing the columns to be processed.
    _detrend_col : str
        The name of the column from which values will be subtracted.
    _detrend_value_col : str | pl.Series | None, optional
        The name of the column whose values will be subtracted OR if you
        pass a pl.Series to the `data` parameter, then you can use this to
        pass a second `pl.Series` to subtract from the first.
    _sort : bool, optional
        If True, sorts the data by 'symbol' and 'date' before detrending.
        Default is False.

    Returns
    -------
    Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The detrended data structure with the same type as the input,
        with an added column named `f"detrended_{_detrend_col}"`.

    Notes
    -----
    Function doesn't use `.over()` in calculation. Once the data is sorted,
    subtracting _detrend_value_col from _detrend_col is a simple operation
    that doesn't need to be grouped, because the sorting has already aligned
    the rows for subtraction
    """
    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort and sort_cols:
            data = data.sort(sort_cols)
        elif _sort and not sort_cols:
            msg = "Data must contain 'symbol' and 'date' columns for sorting."
            raise HumblDataError(msg)

    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        if (
            _detrend_value_col not in data.columns
            or _detrend_col not in data.columns
        ):
            msg = f"Both {_detrend_value_col} and {_detrend_col} must be columns in the data."
            raise HumblDataError(msg)
        detrended = data.set_sorted(sort_cols).with_columns(
            (pl.col(_detrend_col) - pl.col(_detrend_value_col)).alias(
                f"detrended_{_detrend_col}"
            )
        )
    elif isinstance(data, pl.Series):
        if not isinstance(_detrend_value_col, pl.Series):
            msg = "When 'data' is a Series, '_detrend_value_col' must also be a Series."
            raise HumblDataError(msg)
        detrended = data - _detrend_value_col
        detrended.rename(f"detrended_{_detrend_col}")

    return detrended

humbldata.toolbox.toolbox_helpers.cum_sum ¤

cum_sum(data: DataFrame | LazyFrame | Series | None = None, _column_name: str = 'detrended_returns', *, _sort: bool = True, _mandelbrot_usage: bool = True) -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Helpers || Command: cum_sum.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

Calculate the cumulative sum of a series or column in a DataFrame or LazyFrame.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame | Series | None

The data to process.

None
_column_name str

The name of the column to calculate the cumulative sum on, applicable if df is provided.

'detrended_returns'
_sort bool

If True, sorts the DataFrame or LazyFrame by date and symbol before calculation. Default is True.

True
_mandelbrot_usage bool

If True, performs additional checks specific to the Mandelbrot Channel calculation. This should be set to True when you have a cumulative deviate series, and False when not. Please check 'Notes' for more information. Default is True.

True

Returns:

Type Description
DataFrame | LazyFrame | Series

The DataFrame or Series with the cumulative deviate series added as a new column or as itself.

Notes

This function is used to calculate the cumulative sum for the deviate series of detrended returns for the data in the pipeline for calc_mandelbrot_channel.

So, although it is calculating a cumulative sum, it is known as a cumulative deviate because it is a cumulative sum on a deviate series, meaning that the cumulative sum should = 0 for each window. The _mandelbrot_usage parameter allows for checks to ensure the data is suitable for Mandelbrot Channel calculations, i.e that the deviate series was calculated correctly by the end of each series being 0, meaning the trend (the mean over the window_index) was successfully removed from the data.

Source code in src\humbldata\toolbox\toolbox_helpers.py
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
def cum_sum(
    data: pl.DataFrame | pl.LazyFrame | pl.Series | None = None,
    _column_name: str = "detrended_returns",
    *,
    _sort: bool = True,
    _mandelbrot_usage: bool = True,
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: cum_sum**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    Calculate the cumulative sum of a series or column in a DataFrame or
    LazyFrame.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame | pl.Series | None
        The data to process.
    _column_name : str
        The name of the column to calculate the cumulative sum on,
        applicable if df is provided.
    _sort : bool, optional
        If True, sorts the DataFrame or LazyFrame by date and symbol before
        calculation. Default is True.
    _mandelbrot_usage : bool, optional
        If True, performs additional checks specific to the Mandelbrot Channel
        calculation. This should be set to True when you have a cumulative
        deviate series, and False when not. Please check 'Notes' for more
        information. Default is True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame | pl.Series
        The DataFrame or Series with the cumulative deviate series added as a
        new column or as itself.

    Notes
    -----
    This function is used to calculate the cumulative sum for the deviate series
    of detrended returns for the data in the pipeline for
    `calc_mandelbrot_channel`.

    So, although it is calculating a cumulative sum, it is known as a cumulative
    deviate because it is a cumulative sum on a deviate series, meaning that the
    cumulative sum should = 0 for each window. The _mandelbrot_usage parameter
    allows for checks to ensure the data is suitable for Mandelbrot Channel
    calculations, i.e that the deviate series was calculated correctly by the
    end of each series being 0, meaning the trend (the mean over the
    window_index) was successfully removed from the data.
    """
    if isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort:
            data = data.sort(sort_cols)

        over_cols = _set_over_cols(data, "symbol", "window_index")
        if over_cols:
            out = data.set_sorted(sort_cols).with_columns(
                pl.col(_column_name).cum_sum().over(over_cols).alias("cum_sum")
            )
        else:
            out = data.with_columns(
                pl.col(_column_name).cum_sum().alias("cum_sum")
            )
    elif isinstance(data, pl.Series):
        out = data.cum_sum().alias("cum_sum")
    else:
        msg = "No DataFrame/LazyFrame/Series was provided."
        raise HumblDataError(msg)

    if _mandelbrot_usage:
        _cumsum_check(out, _column_name="cum_sum")

    return out

humbldata.toolbox.toolbox_helpers.std ¤

std(data: LazyFrame | DataFrame | Series, _column_name: str = 'cum_sum') -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Helpers || Command: std.

Calculate the standard deviation of the cumulative deviate series within each window of the dataset.

Parameters:

Name Type Description Default
df LazyFrame

The LazyFrame from which to calculate the standard deviation.

required
_column_name str

The name of the column from which to calculate the standard deviation, with "cumdev" as the default value.

'cum_sum'

Returns:

Type Description
LazyFrame

A LazyFrame with the standard deviation of the specified column for each window, added as a new column named "S".

Improvements

Just need to parametrize .over() call in the function if want an even dumber function, that doesn't calculate each window_index.

Source code in src\humbldata\toolbox\toolbox_helpers.py
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
def std(
    data: pl.LazyFrame | pl.DataFrame | pl.Series, _column_name: str = "cum_sum"
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Helpers || **Command: std**.

    Calculate the standard deviation of the cumulative deviate series within
    each window of the dataset.

    Parameters
    ----------
    df : pl.LazyFrame
        The LazyFrame from which to calculate the standard deviation.
    _column_name : str, optional
        The name of the column from which to calculate the standard deviation,
        with "cumdev" as the default value.

    Returns
    -------
    pl.LazyFrame
        A LazyFrame with the standard deviation of the specified column for each
        window, added as a new column named "S".

    Improvements
    -----------
    Just need to parametrize `.over()` call in the function if want an even
    dumber function, that doesn't calculate each `window_index`.
    """
    if isinstance(data, pl.Series):
        out = data.std()
    elif isinstance(data, pl.DataFrame | pl.LazyFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
        over_cols = _set_over_cols(data, "symbol", "window_index")

        if over_cols:
            out = data.set_sorted(sort_cols).with_columns(
                [
                    pl.col(_column_name)
                    .std()
                    .over(over_cols)
                    .alias(f"{_column_name}_std"),  # used to be 'S'
                ]
            )
        else:
            out = data.with_columns(
                pl.col(_column_name).std().alias("S"),
            )

    return out

humbldata.toolbox.toolbox_helpers.mean ¤

mean(data: DataFrame | LazyFrame | Series, _column_name: str = 'log_returns', *, _sort: bool = True) -> DataFrame | LazyFrame

Context: Toolbox || Category: Helpers || Function: mean.

This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

This function calculates the mean of a column (<_column_name>) over a each window in the dataset, if there are any. This window is intended to be the window that is passed in the calc_mandelbrot_channel() function. The mean calculated is meant to be used as the mean of each window within the time series. This way, each block of windows has their own mean, which can then be used to normalize the data (i.e remove the mean) from each window section.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The DataFrame or LazyFrame to calculate the mean on.

required
_column_name str

The name of the column to calculate the mean on.

'log_returns'
_sort bool

If True, sorts the DataFrame or LazyFrame by date before calculation. Default is False.

True

Returns:

Type Description
DataFrame | LazyFrame

The original DataFrame or LazyFrame with a window_mean & date column, which contains the mean of 'log_returns' per range/window.

Notes

Since this function is an aggregation function, it reduces the # of observations in the dataset,thus, unless I take each value and iterate each window_mean value to correlate to the row in the original dataframe, the function will return a dataframe WITHOUT the original data.

Source code in src\humbldata\toolbox\toolbox_helpers.py
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
def mean(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    _column_name: str = "log_returns",
    *,
    _sort: bool = True,
) -> pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Helpers || **Function: mean**.

    This is a DUMB command. It can be used in any CONTEXT or CATEGORY.

    This function calculates the mean of a column (<_column_name>) over a
    each window in the dataset, if there are any.
    This window is intended to be the `window` that is passed in the
    `calc_mandelbrot_channel()` function. The mean calculated is meant to be
    used as the mean of each `window` within the time series. This
    way, each block of windows has their own mean, which can then be used to
    normalize the data (i.e remove the mean) from each window section.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The DataFrame or LazyFrame to calculate the mean on.
    _column_name : str
        The name of the column to calculate the mean on.
    _sort : bool
        If True, sorts the DataFrame or LazyFrame by date before calculation.
        Default is False.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The original DataFrame or LazyFrame with a `window_mean` & `date` column,
        which contains the mean of 'log_returns' per range/window.


    Notes
    -----
    Since this function is an aggregation function, it reduces the # of
    observations in the dataset,thus, unless I take each value and iterate each
    window_mean value to correlate to the row in the original dataframe, the
    function will return a dataframe WITHOUT the original data.

    """
    if isinstance(data, pl.Series):
        out = data.mean()
    else:
        if data is None:
            msg = "No DataFrame was passed to the `mean()` function."
            raise HumblDataError(msg)
        sort_cols = _set_sort_cols(data, "symbol", "date")
        over_cols = _set_over_cols(data, "symbol", "window_index")
        if _sort and sort_cols:  # Check if _sort is True
            data = data.sort(sort_cols).set_sorted(sort_cols)
        if over_cols:
            out = data.with_columns(
                pl.col(_column_name).mean().over(over_cols).alias("window_mean")
            )
        else:
            out = data.with_columns(pl.col(_column_name).mean().alias("mean"))
        if sort_cols:
            out = out.sort(sort_cols)
    return out

humbldata.toolbox.toolbox_helpers.range_ ¤

range_(data: LazyFrame | DataFrame | Series, _column_name: str = 'cum_sum', *, _sort: bool = True) -> LazyFrame | DataFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers || Function: mandelbrot_range.

Calculate the range (max - min) of the cumulative deviate values of a specified column in a DataFrame for each window in the dataset, if there are any.

Parameters:

Name Type Description Default
data LazyFrame

The DataFrame to calculate the range from.

required
_column_name str

The column to calculate the range from, by default "cumdev".

'cum_sum'

Returns:

Type Description
LazyFrame | DataFrame

A DataFrame with the range of the specified column for each window.

Source code in src\humbldata\toolbox\toolbox_helpers.py
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
def range_(
    data: pl.LazyFrame | pl.DataFrame | pl.Series,
    _column_name: str = "cum_sum",
    *,
    _sort: bool = True,
) -> pl.LazyFrame | pl.DataFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers || **Function: mandelbrot_range**.

    Calculate the range (max - min) of the cumulative deviate values of a
    specified column in a DataFrame for each window in the dataset, if there are any.

    Parameters
    ----------
    data : pl.LazyFrame
        The DataFrame to calculate the range from.
    _column_name : str, optional
        The column to calculate the range from, by default "cumdev".

    Returns
    -------
    pl.LazyFrame | pl.DataFrame
        A DataFrame with the range of the specified column for each window.
    """
    if isinstance(data, pl.Series):
        out = data.max() - data.min()

    if isinstance(data, pl.LazyFrame | pl.DataFrame):
        sort_cols = _set_sort_cols(data, "symbol", "date")
    over_cols = _set_over_cols(data, "symbol", "window_index")
    if _sort:
        data = data.sort(sort_cols)
    if over_cols:
        out = (
            data.set_sorted(sort_cols)
            .with_columns(
                [
                    pl.col(_column_name)
                    .min()
                    .over(over_cols)
                    .alias(f"{_column_name}_min"),
                    pl.col(_column_name)
                    .max()
                    .over(over_cols)
                    .alias(f"{_column_name}_max"),
                ]
            )
            .sort(sort_cols)
            .with_columns(
                (
                    pl.col(f"{_column_name}_max")
                    - pl.col(f"{_column_name}_min")
                ).alias(f"{_column_name}_range"),  # used to be 'R'
            )
        )
    else:
        out = (
            data.with_columns(
                [
                    pl.col(_column_name).min().alias(f"{_column_name}_min"),
                    pl.col(_column_name).max().alias(f"{_column_name}_max"),
                ]
            )
            .sort(sort_cols)
            .with_columns(
                (
                    pl.col(f"{_column_name}_max")
                    - pl.col(f"{_column_name}_min")
                ).alias(f"{_column_name}_range"),
            )
        )

    return out

humbldata.toolbox.fundamental ¤

Context: Toolbox || Category: Fundamental.

A category to group all of the fundamental indicators available in the Toolbox().

Fundamental indicators relies on earnings data, valuation models of companies, balance sheet metrics etc...

humbldata.toolbox.quantitative ¤

Context: Toolbox || Category: Quantitative.

Quantitative indicators rely on statistical transformations of time series data.

humbldata.toolbox.technical ¤

humbldata.toolbox.technical.technical_controller ¤

Context: Toolbox || Category: Technical.

A controller to manage and compile all of the technical indicator models available. This will be passed as a @property to the Toolbox() class, giving access to the technical module and its functions.

humbldata.toolbox.technical.technical_controller.Technical ¤

Module for all technical analysis.

Attributes:

Name Type Description
standard_params ToolboxQueryParams

The standard query parameters for toolbox data.

Methods:

Name Description
mandelbrot_channel

Calculate the rescaled range statistics.

Source code in src\humbldata\toolbox\technical\technical_controller.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class Technical:
    """
    Module for all technical analysis.

    Attributes
    ----------
    standard_params : ToolboxQueryParams
        The standard query parameters for toolbox data.

    Methods
    -------
    mandelbrot_channel(command_params: MandelbrotChannelQueryParams)
        Calculate the rescaled range statistics.

    """

    def __init__(self, context_params):
        self._context_params = context_params

    def mandelbrot_channel(self, command_params: MandelbrotChannelQueryParams):
        """
        Calculate the rescaled range statistics.

        Explain the math...
        """
        from humbldata.core.standard_models.toolbox.technical.mandelbrotchannel import (
            MandelbrotChannelFetcher,
        )

        # Instantiate the Fetcher with the query parameters
        fetcher = MandelbrotChannelFetcher(self._context_params, command_params)

        # Use the fetcher to get the data
        return fetcher.fetch_data()
humbldata.toolbox.technical.technical_controller.Technical.mandelbrot_channel ¤
mandelbrot_channel(command_params: MandelbrotChannelQueryParams)

Calculate the rescaled range statistics.

Explain the math...

Source code in src\humbldata\toolbox\technical\technical_controller.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def mandelbrot_channel(self, command_params: MandelbrotChannelQueryParams):
    """
    Calculate the rescaled range statistics.

    Explain the math...
    """
    from humbldata.core.standard_models.toolbox.technical.mandelbrotchannel import (
        MandelbrotChannelFetcher,
    )

    # Instantiate the Fetcher with the query parameters
    fetcher = MandelbrotChannelFetcher(self._context_params, command_params)

    # Use the fetcher to get the data
    return fetcher.fetch_data()

humbldata.toolbox.technical.mandelbrot_channel ¤

humbldata.toolbox.technical.mandelbrot_channel.helpers ¤

Context: Toolbox || Category: Technical || Sub-Category: MandelBrot Channel || Sub-Category: Helpers.

These Toolbox() helpers are used in various calculations in the toolbox context. Most of the helpers will be mathematical transformations of data. These functions should be DUMB functions.

humbldata.toolbox.technical.mandelbrot_channel.helpers.add_window_index ¤
add_window_index(data: LazyFrame | DataFrame, window: str) -> LazyFrame | DataFrame
Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **add_window_index**.

Add a column to the dataframe indicating the window grouping for each row in a time series.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

The input data frame or lazy frame to which the window index will be added.

required
window str

The window size as a string, used to determine the grouping of rows into windows.

required

Returns:

Type Description
LazyFrame | DataFrame

The original data frame or lazy frame with an additional column named "window_index" indicating the window grouping for each row.

Notes
  • This function is essential for calculating the Mandelbrot Channel, where the dataset is split into numerous 'windows', and statistics are calculated for each window.
  • The function adds a dummy symbol column if the data contains only one symbol, to avoid errors in the group_by_dynamic() function.
  • It is utilized within the log_mean() function for window-based calculations.

Examples:

>>> data = pl.DataFrame({"date": ["2021-01-01", "2021-01-02"], "symbol": ["AAPL", "AAPL"], "value": [1, 2]})
>>> window = "1d"
>>> add_window_index(data, window)
shape: (2, 4)
┌────────────┬────────┬───────┬──────────────┐
│ date       ┆ symbol ┆ value ┆ window_index │
│ ---        ┆ ---    ┆ ---   ┆ ---          │
│ date       ┆ str    ┆ i64   ┆ i64          │
╞════════════╪════════╪═══════╪══════════════╡
│ 2021-01-01 ┆ AAPL   ┆ 1     ┆ 0            │
├────────────┼────────┼───────┼──────────────┤
│ 2021-01-02 ┆ AAPL   ┆ 2     ┆ 1            │
└────────────┴────────┴───────┴──────────────┘
Source code in src\humbldata\toolbox\technical\mandelbrot_channel\helpers.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def add_window_index(
    data: pl.LazyFrame | pl.DataFrame, window: str
) -> pl.LazyFrame | pl.DataFrame:
    """
        Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **add_window_index**.

    Add a column to the dataframe indicating the window grouping for each row in
    a time series.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input data frame or lazy frame to which the window index will be
        added.
    window : str
        The window size as a string, used to determine the grouping of rows into
        windows.

    Returns
    -------
    pl.LazyFrame | pl.DataFrame
        The original data frame or lazy frame with an additional column named
        "window_index" indicating
        the window grouping for each row.

    Notes
    -----
    - This function is essential for calculating the Mandelbrot Channel, where
    the dataset is split into
    numerous 'windows', and statistics are calculated for each window.
    - The function adds a dummy `symbol` column if the data contains only one
    symbol, to avoid errors in the `group_by_dynamic()` function.
    - It is utilized within the `log_mean()` function for window-based
    calculations.

    Examples
    --------
    >>> data = pl.DataFrame({"date": ["2021-01-01", "2021-01-02"], "symbol": ["AAPL", "AAPL"], "value": [1, 2]})
    >>> window = "1d"
    >>> add_window_index(data, window)
    shape: (2, 4)
    ┌────────────┬────────┬───────┬──────────────┐
    │ date       ┆ symbol ┆ value ┆ window_index │
    │ ---        ┆ ---    ┆ ---   ┆ ---          │
    │ date       ┆ str    ┆ i64   ┆ i64          │
    ╞════════════╪════════╪═══════╪══════════════╡
    │ 2021-01-01 ┆ AAPL   ┆ 1     ┆ 0            │
    ├────────────┼────────┼───────┼──────────────┤
    │ 2021-01-02 ┆ AAPL   ┆ 2     ┆ 1            │
    └────────────┴────────┴───────┴──────────────┘
    """

    def _create_monthly_window_index(col: str, k: int = 1):
        year_diff = pl.col(col).last().dt.year() - pl.col(col).dt.year()
        month_diff = pl.col(col).last().dt.month() - pl.col(col).dt.month()
        day_indicator = pl.col(col).dt.day() > pl.col(col).last().dt.day()
        return (12 * year_diff + month_diff - day_indicator) // k

    # Clean the window into stnaardized strings (i.e "1month"/"1 month" = "1mo")
    window = _window_format(window, _return_timedelta=False)  # returns `str`

    if "w" in window or "d" in window:
        msg = "The window cannot include 'd' or 'w', the window needs to be larger than 1 month!"
        raise HumblDataError(msg)

    window_monthly = _window_format_monthly(window)

    # Adding a 'dummy' column if only one symbol is present in data, to avoid
    # errors in the group_by_dynamic() function
    if "symbol" not in data.columns:
        data = data.with_columns(pl.lit("dummy").alias("symbol"))

    data = data.with_columns(
        _create_monthly_window_index(col="date", k=window_monthly)
        .alias("window_index")
        .over("symbol")
    )

    return data
humbldata.toolbox.technical.mandelbrot_channel.helpers.vol_buckets ¤
vol_buckets(data: DataFrame | LazyFrame, lo_quantile: float = 0.4, hi_quantile: float = 0.8, _column_name_volatility: str = 'realized_volatility', *, _boundary_group_down: bool = False) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_buckets.

Splitting data observations into 3 volatility buckets: low, mid and high. The function does this for each symbol present in the data.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

The input dataframe or lazy frame.

required
lo_quantile float

The lower quantile for bucketing. Default is 0.4.

0.4
hi_quantile float

The higher quantile for bucketing. Default is 0.8.

0.8
_column_name_volatility str

The name of the column to apply volatility bucketing. Default is "realized_volatility".

'realized_volatility'
_boundary_group_down bool

If True, then group boundary values down to the lower bucket, using vol_buckets_alt() If False, then group boundary values up to the higher bucket, using the Polars .qcut() method. Default is False.

False

Returns:

Type Description
LazyFrame

The data with an additional column: vol_bucket

Source code in src\humbldata\toolbox\technical\mandelbrot_channel\helpers.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def vol_buckets(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
    *,
    _boundary_group_down: bool = False,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_buckets**.

    Splitting data observations into 3 volatility buckets: low, mid and high.
    The function does this for each `symbol` present in the data.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input dataframe or lazy frame.
    lo_quantile : float
        The lower quantile for bucketing. Default is 0.4.
    hi_quantile : float
        The higher quantile for bucketing. Default is 0.8.
    _column_name_volatility : str
        The name of the column to apply volatility bucketing. Default is
        "realized_volatility".
    _boundary_group_down: bool = False
        If True, then group boundary values down to the lower bucket, using
        `vol_buckets_alt()` If False, then group boundary values up to the
        higher bucket, using the Polars `.qcut()` method.
        Default is False.

    Returns
    -------
    pl.LazyFrame
        The `data` with an additional column: `vol_bucket`
    """
    _check_required_columns(data, _column_name_volatility, "symbol")

    if not _boundary_group_down:
        # Grouping Boundary Values in Higher Bucket
        out = data.lazy().with_columns(
            pl.col(_column_name_volatility)
            .qcut(
                [lo_quantile, hi_quantile],
                labels=["low", "mid", "high"],
                left_closed=False,
            )
            .over("symbol")
            .alias("vol_bucket")
            .cast(pl.Utf8)
        )
    else:
        out = vol_buckets_alt(
            data, lo_quantile, hi_quantile, _column_name_volatility
        )

    return out
humbldata.toolbox.technical.mandelbrot_channel.helpers.vol_buckets_alt ¤
vol_buckets_alt(data: DataFrame | LazyFrame, lo_quantile: float = 0.4, hi_quantile: float = 0.8, _column_name_volatility: str = 'realized_volatility') -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_buckets_alt.

This is an alternative implementation of vol_buckets() using expressions, and not using .qcut(). The biggest difference is how the function groups values on the boundaries of quantiles. This function groups boundary values down Splitting data observations into 3 volatility buckets: low, mid and high. The function does this for each symbol present in the data.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

The input dataframe or lazy frame.

required
lo_quantile float

The lower quantile for bucketing. Default is 0.4.

0.4
hi_quantile float

The higher quantile for bucketing. Default is 0.8.

0.8
_column_name_volatility str

The name of the column to apply volatility bucketing. Default is "realized_volatility".

'realized_volatility'

Returns:

Type Description
LazyFrame

The data with an additional column: vol_bucket

Notes

The biggest difference is how the function groups values on the boundaries of quantiles. This function groups boundary values down to the lower bucket. So, if there is a value that lies on the mid/low border, this function will group it with low, whereas vol_buckets() will group it with mid

This function is also slightly less performant.

Source code in src\humbldata\toolbox\technical\mandelbrot_channel\helpers.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
def vol_buckets_alt(
    data: pl.DataFrame | pl.LazyFrame,
    lo_quantile: float = 0.4,
    hi_quantile: float = 0.8,
    _column_name_volatility: str = "realized_volatility",
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_buckets_alt**.

    This is an alternative implementation of `vol_buckets()` using expressions,
    and not using `.qcut()`.
    The biggest difference is how the function groups values on the boundaries
    of quantiles. This function groups boundary values down
    Splitting data observations into 3 volatility buckets: low, mid and high.
    The function does this for each `symbol` present in the data.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The input dataframe or lazy frame.
    lo_quantile : float
        The lower quantile for bucketing. Default is 0.4.
    hi_quantile : float
        The higher quantile for bucketing. Default is 0.8.
    _column_name_volatility : str
        The name of the column to apply volatility bucketing. Default is "realized_volatility".

    Returns
    -------
    pl.LazyFrame
        The `data` with an additional column: `vol_bucket`

    Notes
    -----
    The biggest difference is how the function groups values on the boundaries
    of quantiles. This function __groups boundary values down__ to the lower bucket.
    So, if there is a value that lies on the mid/low border, this function will
    group it with `low`, whereas `vol_buckets()` will group it with `mid`

    This function is also slightly less performant.
    """
    # Calculate low and high quantiles for each symbol
    low_vol = pl.col(_column_name_volatility).quantile(lo_quantile)
    high_vol = pl.col(_column_name_volatility).quantile(hi_quantile)

    # Determine the volatility bucket for each row using expressions
    vol_bucket = (
        pl.when(pl.col(_column_name_volatility) <= low_vol)
        .then(pl.lit("low"))
        .when(pl.col(_column_name_volatility) <= high_vol)
        .then(pl.lit("mid"))
        .otherwise(pl.lit("high"))
        .alias("vol_bucket")
    )

    # Add the volatility bucket column to the data
    out = data.lazy().with_columns(vol_bucket.over("symbol"))

    return out
humbldata.toolbox.technical.mandelbrot_channel.helpers.vol_filter ¤
vol_filter(data: DataFrame | LazyFrame) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: vol_filter.

If _rv_adjustment is True, then filter the data to only include rows that are in the same vol_bucket as the latest row for each symbol.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input dataframe or lazy frame. This should be the output of vol_buckets() function in calc_mandelbrot_channel().

required

Returns:

Type Description
LazyFrame

The data with only observations in the same volatility bucket as the most recent data observation

Source code in src\humbldata\toolbox\technical\mandelbrot_channel\helpers.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def vol_filter(
    data: pl.DataFrame | pl.LazyFrame,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **vol_filter**.

    If `_rv_adjustment` is True, then filter the data to only include rows
    that are in the same vol_bucket as the latest row for each symbol.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input dataframe or lazy frame. This should be the output of
        `vol_buckets()` function in `calc_mandelbrot_channel()`.

    Returns
    -------
    pl.LazyFrame
        The data with only observations in the same volatility bucket as the
        most recent data observation
    """
    _check_required_columns(data, "vol_bucket", "symbol")

    data = data.lazy().with_columns(
        pl.col("vol_bucket").last().over("symbol").alias("last_vol_bucket")
    )

    out = data.filter(
        (pl.col("vol_bucket") == pl.col("last_vol_bucket")).over("symbol")
    ).drop("last_vol_bucket")

    return out
humbldata.toolbox.technical.mandelbrot_channel.helpers.price_range ¤
price_range(data: LazyFrame | DataFrame, recent_price_data: DataFrame | LazyFrame | None = None, rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', _detrended_returns: str = 'detrended_log_returns', _column_name_cum_sum_max: str = 'cum_sum_max', _column_name_cum_sum_min: str = 'cum_sum_min', *, _rv_adjustment: bool = False, _sort: bool = True, **kwargs) -> LazyFrame

Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: price_range.

Calculate the price range for a given dataset using the Mandelbrot method.

This function computes the price range based on the recent price data, cumulative sum max and min, and RS method specified. It supports adjustments for real volatility and sorting of the data based on symbols and dates.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

The dataset containing the financial data.

required
recent_price_data DataFrame | LazyFrame | None

The dataset containing the most recent price data. If None, the most recent prices are extracted from data.

None
rs_method Literal['RS', 'RS_mean', 'RS_max', 'RS_min']

The RS value to use. Must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'. RS is the column that is the Range/STD of the detrended returns.

"RS"
_detrended_returns str

The column name for detrended returns in data

"detrended_log_returns"
_column_name_cum_sum_max str

The column name for cumulative sum max in data

"cum_sum_max"
_column_name_cum_sum_min str

The column name for cumulative sum min in data

"cum_sum_min"
_rv_adjustment bool

If True, calculated the std() for all observations (since they have already been filtered by volatility bucket). If False, then calculates the std() for the most recent window_index and uses that to adjust the price range.

False
_sort bool

If True, sorts the data based on symbols and dates.

True
**kwargs

Arbitrary keyword arguments.

{}

Returns:

Type Description
LazyFrame

The dataset with calculated price range, including columns for top and bottom prices.

Raises:

Type Description
HumblDataError

If the RS method specified is not supported.

Examples:

>>> price_range_data = price_range(data, recent_price_data=None, _rs_method="RS")
>>> print(price_range_data.columns)
['symbol', 'bottom_price', 'recent_price', 'top_price']
Notes

For rs_method, you should know how this affects the mandelbrot channel that is produced. Selecting RS uses the most recent RS value to calculate the price range, whereas selecting RS_mean, RS_max, or RS_min uses the mean, max, or min of the RS values, respectively.

Source code in src\humbldata\toolbox\technical\mandelbrot_channel\helpers.py
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
def price_range(
    data: pl.LazyFrame | pl.DataFrame,
    recent_price_data: pl.DataFrame | pl.LazyFrame | None = None,
    rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    _detrended_returns: str = "detrended_log_returns",  # Parameterized detrended_returns column
    _column_name_cum_sum_max: str = "cum_sum_max",
    _column_name_cum_sum_min: str = "cum_sum_min",
    *,
    _rv_adjustment: bool = False,
    _sort: bool = True,
    **kwargs,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: MandelBrot Channel || Sub-Category: Helpers || Command: **price_range**.

    Calculate the price range for a given dataset using the Mandelbrot method.

    This function computes the price range based on the recent price data,
    cumulative sum max and min, and RS method specified. It supports adjustments
    for real volatility and sorting of the data based on symbols and dates.

    Parameters
    ----------
    data : pl.LazyFrame | pl.DataFrame
        The dataset containing the financial data.
    recent_price_data : pl.DataFrame | pl.LazyFrame | None
        The dataset containing the most recent price data. If None, the most recent prices are extracted from `data`.
    rs_method : Literal["RS", "RS_mean", "RS_max", "RS_min"], default "RS"
        The RS value to use. Must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'.
        RS is the column that is the Range/STD of the detrended returns.
    _detrended_returns : str, default "detrended_log_returns"
        The column name for detrended returns in `data`
    _column_name_cum_sum_max : str, default "cum_sum_max"
        The column name for cumulative sum max in `data`
    _column_name_cum_sum_min : str, default "cum_sum_min"
        The column name for cumulative sum min in `data`
    _rv_adjustment : bool, default False
        If True, calculated the `std()` for all observations (since they have
        already been filtered by volatility bucket). If False, then calculates
        the `std()` for the most recent `window_index`
        and uses that to adjust the price range.
    _sort : bool, default True
        If True, sorts the data based on symbols and dates.
    **kwargs
        Arbitrary keyword arguments.

    Returns
    -------
    pl.LazyFrame
        The dataset with calculated price range, including columns for top and
        bottom prices.

    Raises
    ------
    HumblDataError
        If the RS method specified is not supported.

    Examples
    --------
    >>> price_range_data = price_range(data, recent_price_data=None, _rs_method="RS")
    >>> print(price_range_data.columns)
    ['symbol', 'bottom_price', 'recent_price', 'top_price']

    Notes
    -----
    For `rs_method`, you should know how this affects the mandelbrot channel
    that is produced. Selecting RS uses the most recent RS value to calculate
    the price range, whereas selecting RS_mean, RS_max, or RS_min uses the mean,
    max, or min of the RS values, respectively.
    """
    # Check if RS_method is one of the allowed values
    if rs_method not in RS_METHODS:
        msg = "RS_method must be one of 'RS', 'RS_mean', 'RS_max', 'RS_min'"
        raise HumblDataError(msg)

    if isinstance(data, pl.DataFrame):
        data = data.lazy()

    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort:
        data.sort(sort_cols)

    # Define Polars Expressions ================================================
    last_cum_sum_max = (
        pl.col(_column_name_cum_sum_max).last().alias("last_cum_sum_max")
    )
    last_cum_sum_min = (
        pl.col(_column_name_cum_sum_min).last().alias("last_cum_sum_min")
    )
    # Define a conditional expression for std_detrended_returns based on _rv_adjustment
    std_detrended_returns_expr = (
        pl.col(_detrended_returns).std().alias(f"std_{_detrended_returns}")
        if _rv_adjustment
        else pl.col(_detrended_returns)
        .filter(pl.col("window_index") == pl.col("window_index").min())
        .std()
        .alias(f"std_{_detrended_returns}")
    )
    date_expr = pl.col("date").max()
    # ===========================================================================

    if rs_method == "RS":
        rs_expr = pl.col("RS").last().alias("RS")
    elif rs_method == "RS_mean":
        rs_expr = pl.col("RS").mean().alias("RS_mean")
    elif rs_method == "RS_max":
        rs_expr = pl.col("RS").max().alias("RS_max")
    elif rs_method == "RS_min":
        rs_expr = pl.col("RS").min().alias("RS_min")

    if recent_price_data is None:
        # if no recent_prices_data is passed, then pull the most recent prices from the data
        recent_price_expr = pl.col("close").last().alias("recent_price")
        # Perform a single group_by operation to calculate both STD of detrended returns and RS statistics
        price_range_data = (
            data.group_by("symbol")
            .agg(
                [
                    date_expr,
                    # Conditional STD calculation based on _rv_adjustment
                    std_detrended_returns_expr,
                    # Recent Price Data
                    recent_price_expr,
                    # cum_sum_max/min last
                    last_cum_sum_max,
                    last_cum_sum_min,
                    # RS statistics
                    rs_expr,
                ]
            )
            # Join with recent_price_data on symbol
            .with_columns(
                (
                    pl.col(rs_method)
                    * pl.col("std_detrended_log_returns")
                    * pl.col("recent_price")
                ).alias("price_range")
            )
            .sort("symbol")
        )
    else:
        price_range_data = (
            data.group_by("symbol")
            .agg(
                [
                    date_expr,
                    # Conditional STD calculation based on _rv_adjustment
                    std_detrended_returns_expr,
                    # cum_sum_max/min last
                    last_cum_sum_max,
                    last_cum_sum_min,
                    # RS statistics
                    rs_expr,
                ]
            )
            # Join with recent_price_data on symbol
            .join(recent_price_data.lazy(), on="symbol")
            .with_columns(
                (
                    pl.col(rs_method)
                    * pl.col("std_detrended_log_returns")
                    * pl.col("recent_price")
                ).alias("price_range")
            )
            .sort("symbol")
        )
    # Relative Position Modifier
    out = _price_range_engine(price_range_data)

    return out
humbldata.toolbox.technical.mandelbrot_channel.model ¤

Context: Toolbox || Category: Technical || Command: calc_mandelbrot_channel.

A command to generate a Mandelbrot Channel for any time series.

humbldata.toolbox.technical.mandelbrot_channel.model.calc_mandelbrot_channel ¤
calc_mandelbrot_channel(data: DataFrame | LazyFrame, window: str = '1m', rv_adjustment: bool = True, _rv_method: str = 'std', _rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, _rv_grouped_mean: bool = True, _live_price: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || **Command: calc_mandelbrot_channel`.

Calculates the Mandelbrot Channel for a given time series based on the provided standard and extra parameters.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The time series data for which to calculate the Mandelbrot Channel.

required
window str

The window size for the calculation, specified as a string.

'1m'
rv_adjustment bool

Whether to adjust the calculation for realized volatility.

True
_rv_grouped_mean bool

Whether to use the grouped mean in the realized volatility calculation.

True
_rv_method str

The method to use for calculating realized volatility. You only need to supply a value if rv_adjustment is True.

'std'
_rs_method Literal['RS', 'RS_mean', 'RS_max', 'RS_min']

The method to use for calculating the range over standard deviation. You can choose either RS/RS_mean/RS_min/RS_max. This changes the width of the calculated Mandelbrot Channel

'RS'
_live_price bool

Whether to use live price data in the calculation. This may add a significant amount of time to the calculation (1-3s)

True

Returns:

Type Description
LazyFrame

The calculated Mandelbrot Channel data for the given time series.

Notes

Since the function returns a pl.LazyFrame, don't forget to run .collect() on the output to get a DataFrame. Lazy logic saves the calculation for when it is needed.

Source code in src\humbldata\toolbox\technical\mandelbrot_channel\model.py
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
def calc_mandelbrot_channel(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_adjustment: bool = True,
    _rv_method: str = "std",
    _rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    _rv_grouped_mean: bool = True,
    _live_price: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || **Command: calc_mandelbrot_channel`.

    Calculates the Mandelbrot Channel for a given time series based on the
    provided standard and extra parameters.

    Parameters
    ----------
    data: pl.DataFrame | pl.LazyFrame
        The time series data for which to calculate the Mandelbrot Channel.
    window: str, default "1m"
        The window size for the calculation, specified as a string.
    rv_adjustment: bool, default True
        Whether to adjust the calculation for realized volatility.
    _rv_grouped_mean: bool, default True
        Whether to use the grouped mean in the realized volatility calculation.
    _rv_method: str, default "std"
        The method to use for calculating realized volatility. You only need to
        supply a value if `rv_adjustment` is True.
    _rs_method: str, default "RS"
        The method to use for calculating the range over standard deviation.
        You can choose either RS/RS_mean/RS_min/RS_max. This changes the width of
        the calculated Mandelbrot Channel
    _live_price: bool, default True
        Whether to use live price data in the calculation. This may add a
        significant amount of time to the calculation (1-3s)

    Returns
    -------
    pl.LazyFrame
        The calculated Mandelbrot Channel data for the given time series.

    Notes
    -----
    Since the function returns a pl.LazyFrame, don't forget to run `.collect()`
    on the output to get a DataFrame. Lazy logic saves the calculation for when
    it is needed.
    """
    # Setup ====================================================================
    window_int = _window_format(window, _return_timedelta=True)
    sort_cols = _set_sort_cols(data, "symbol", "date")

    data = data.lazy()
    # Step 1: Collect Price Data -----------------------------------------------
    # Step X: Add window bins --------------------------------------------------
    # We want date grouping, non-overlapping window bins
    data1 = add_window_index(data, window=window)

    # Step X: Calculate Log Returns + Rvol -------------------------------------
    if "log_returns" not in data1.columns:
        data2 = log_returns(data1, _column_name="close")
    else:
        data2 = data1

    # Step X: Calculate Log Mean Series ----------------------------------------
    if isinstance(data2, pl.DataFrame | pl.LazyFrame):
        data3 = mean(data2)
    else:
        msg = "A series was passed to `mean()` calculation. Please provide a DataFrame or LazyFrame."
        raise HumblDataError(msg)
    # Step X: Calculate Mean De-trended Series ---------------------------------
    data4 = detrend(
        data3, _detrend_value_col="window_mean", _detrend_col="log_returns"
    )
    # Step X: Calculate Cumulative Deviate Series ------------------------------
    data5 = cum_sum(data4, _column_name="detrended_log_returns")
    # Step X: Calculate Mandelbrot Range ---------------------------------------
    data6 = range_(data5, _column_name="cum_sum")
    # Step X: Calculate Standard Deviation -------------------------------------
    data7 = std(data6, _column_name="cum_sum")
    # Step X: Calculate Range (R) & Standard Deviation (S) ---------------------
    if rv_adjustment:
        # Step 8.1: Calculate Realized Volatility ------------------------------
        data7 = calc_realized_volatility(
            data=data7,
            window=window,
            method=_rv_method,
            grouped_mean=_rv_grouped_mean,
        )
        # rename col for easy selection
        for col in data7.columns:
            if "volatility_pct" in col:
                data7 = data7.rename({col: "realized_volatility"})
        # Step 8.2: Calculate Volatility Bucket Stats --------------------------
        data7 = vol_buckets(data=data7, lo_quantile=0.3, hi_quantile=0.65)
        data7 = vol_filter(data7)

    # Step X: Calculate RS -----------------------------------------------------
    data8 = data7.sort(sort_cols).with_columns(
        (pl.col("cum_sum_range") / pl.col("cum_sum_std")).alias("RS")
    )

    # Step X: Collect Recent Prices --------------------------------------------
    if _live_price:
        symbols = (
            data.select("symbol").unique().sort("symbol").collect().to_series()
        )
        recent_prices = get_latest_price(symbols)
    else:
        recent_prices = None

    # Step X: Calculate Rescaled Price Range ----------------------------------
    out = price_range(
        data=data8,
        recent_price_data=recent_prices,
        rs_method=_rs_method,
        _rv_adjustment=rv_adjustment,
    )

    return out
humbldata.toolbox.technical.mandelbrot_channel.model.acalc_mandelbrot_channel async ¤
acalc_mandelbrot_channel(data: DataFrame | LazyFrame, window: str = '1m', rv_adjustment: bool = True, _rv_method: str = 'std', _rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, _rv_grouped_mean: bool = True, _live_price: bool = True) -> DataFrame | LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || Command: acalc_mandelbrot_channel.

Asynchronous wrapper for calc_mandelbrot_channel. This function allows calc_mandelbrot_channel to be called in an async context.

Notes

This does not make calc_mandelbrot_channel() non-blocking or asynchronous.

Source code in src\humbldata\toolbox\technical\mandelbrot_channel\model.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
async def acalc_mandelbrot_channel(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_adjustment: bool = True,
    _rv_method: str = "std",
    _rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    _rv_grouped_mean: bool = True,
    _live_price: bool = True,
) -> pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || **Command: acalc_mandelbrot_channel**.

    Asynchronous wrapper for calc_mandelbrot_channel.
    This function allows calc_mandelbrot_channel to be called in an async context.

    Notes
    -----
    This does not make `calc_mandelbrot_channel()` non-blocking or asynchronous.
    """
    # Directly call the synchronous calc_mandelbrot_channel function
    return calc_mandelbrot_channel(
        data=data,
        window=window,
        rv_adjustment=rv_adjustment,
        _rv_method=_rv_method,
        _rs_method=_rs_method,
        _rv_grouped_mean=_rv_grouped_mean,
        _live_price=_live_price,
    )
humbldata.toolbox.technical.mandelbrot_channel.model.calc_mandelbrot_channel_historical ¤
calc_mandelbrot_channel_historical(data: DataFrame | LazyFrame, window: str = '1m', rv_adjustment: bool = True, _rv_method: str = 'std', _rs_method: Literal['RS', 'RS_mean', 'RS_max', 'RS_min'] = 'RS', *, _rv_grouped_mean: bool = True, _live_price: bool = True) -> DataFrame | LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || Command: calc_mandelbrot_channel_historical.

Calculates the Mandelbrot Channel for a given time series based on the provided standard and extra parameters, over time! This means that instead of using the dataset to calculate one statistic at the current point in time, this function starts at the beginning of the dataset and calculates the statistic for date present in the dataset, up to the current point in time.

Source code in src\humbldata\toolbox\technical\mandelbrot_channel\model.py
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
def calc_mandelbrot_channel_historical(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    rv_adjustment: bool = True,
    _rv_method: str = "std",
    _rs_method: Literal["RS", "RS_mean", "RS_max", "RS_min"] = "RS",
    *,
    _rv_grouped_mean: bool = True,
    _live_price: bool = True,
) -> pl.DataFrame | pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Mandelbrot Channel || **Command: calc_mandelbrot_channel_historical**.

    Calculates the Mandelbrot Channel for a given time series based on the
    provided standard and extra parameters, over time! This means that instead
    of using the dataset to calculate one statistic at the current point in time,
    this function starts at the beginning of the dataset and calculates the statistic
    for date present in the dataset, up to the current point in time.
    """
    nest_asyncio.apply()

    return asyncio.run(
        _acalc_mandelbrot_channel_historical_engine(
            data=data,
            window=window,
            rv_adjustment=rv_adjustment,
            _rv_method=_rv_method,
            _rs_method=_rs_method,
            _rv_grouped_mean=_rv_grouped_mean,
            _live_price=_live_price,
        )
    )

humbldata.toolbox.technical.volatility ¤

humbldata.toolbox.technical.volatility.realized_volatility_helpers ¤

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers.

All of the volatility estimators used in calc_realized_volatility(). These are various methods to calculate the realized volatility of financial data.

humbldata.toolbox.technical.volatility.realized_volatility_helpers.std ¤
std(data: DataFrame | LazyFrame | Series, window: str = '1m', trading_periods=252, _drop_nulls: bool = True, _avg_trading_days: bool = False, _column_name_returns: str = 'log_returns', _sort: bool = True) -> LazyFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _std.

This function computes the standard deviation of returns, which is a common measure of volatility.It calculates the rolling standard deviation for a given window size, optionally adjusting for the average number of trading days and scaling the result to an annualized volatility percentage.

Parameters:

Name Type Description Default
data Union[DataFrame, LazyFrame, Series]

The input data containing the returns. It can be a DataFrame, LazyFrame, or Series.

required
window str

The rolling window size for calculating the standard deviation. The default is "1m" (one month).

'1m'
trading_periods int

The number of trading periods in a year, used for annualizing the volatility. The default is 252.

252
_drop_nulls bool

If True, null values will be dropped from the result. The default is True.

True
_avg_trading_days bool

If True, the average number of trading days will be used when calculating the window size. The default is True.

False
_column_name_returns str

The name of the column containing the returns. This parameter is used when data is a DataFrame or LazyFrame. The default is "log_returns".

'log_returns'

Returns:

Type Description
Union[DataFrame, LazyFrame, Series]

The input data structure with an additional column for the rolling standard deviation of returns, or the modified Series with the rolling standard deviation values.

Source code in src\humbldata\toolbox\technical\volatility\realized_volatility_helpers.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def std(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    window: str = "1m",
    trading_periods=252,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _column_name_returns: str = "log_returns",
    _sort: bool = True,
) -> pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _std**.

    This function computes the standard deviation of returns, which is a common
    measure of volatility.It calculates the rolling standard deviation for a
    given window size, optionally adjusting for the average number of trading
    days and scaling the result to an annualized volatility percentage.

    Parameters
    ----------
    data : Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The input data containing the returns. It can be a DataFrame, LazyFrame,
        or Series.
    window : str, optional
        The rolling window size for calculating the standard deviation.
        The default is "1m" (one month).
    trading_periods : int, optional
        The number of trading periods in a year, used for annualizing the
        volatility. The default is 252.
    _drop_nulls : bool, optional
        If True, null values will be dropped from the result.
        The default is True.
    _avg_trading_days : bool, optional
        If True, the average number of trading days will be used when
        calculating the window size. The default is True.
    _column_name_returns : str, optional
        The name of the column containing the returns. This parameter is used
        when `data` is a DataFrame or LazyFrame. The default is "log_returns".

    Returns
    -------
    Union[pl.DataFrame, pl.LazyFrame, pl.Series]
        The input data structure with an additional column for the rolling
        standard deviation of returns, or the modified Series with the rolling
        standard deviation values.
    """
    window_int: int = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    ).days
    if isinstance(data, pl.Series):
        return data.rolling_std(window_size=window_int, min_periods=1)
    else:
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort:
            data = data.lazy().sort(sort_cols)
        # convert window_timedelta to days to use fixed window
        result = (
            data.lazy()
            .set_sorted(sort_cols)
            .with_columns(
                (
                    pl.col(_column_name_returns).rolling_std(
                        window_size=window_int,
                        min_periods=2,  # using min_periods=2, bc if min_periods=1, the first value will be 0.
                        by="date",
                    )
                    * math.sqrt(trading_periods)
                    * 100
                ).alias(f"std_volatility_pct_{window_int}D")
            )
        )
    if _drop_nulls:
        return result.drop_nulls(subset=f"std_volatility_pct_{window_int}D")
    return result
humbldata.toolbox.technical.volatility.realized_volatility_helpers.parkinson ¤
parkinson(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', *, _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Calculate Parkinson's volatility over a specified window.

Parkinson's volatility is a measure that uses the stock's high and low prices of the day rather than just close to close prices. It is particularly useful for capturing large price movements during the day.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input data containing the stock prices.

required
window int

The rolling window size for calculating volatility, by default 30.

'1m'
trading_periods int

The number of trading periods in a year, by default 252.

required
_column_name_high str

The name of the column containing the high prices, by default "high".

'high'
_column_name_low str

The name of the column containing the low prices, by default "low".

'low'
_drop_nulls bool

Whether to drop null values from the result, by default True.

True
_avg_trading_days bool

Whether to use the average number of trading days when calculating the window size, by default True.

False

Returns:

Type Description
DataFrame | LazyFrame

The calculated Parkinson's volatility, with an additional column "parkinson_volatility_pct_{window_int}D" indicating the percentage volatility.

Notes

This function requires the input data to have 'high' and 'low' columns to calculate the logarithm of their ratio, which is squared and scaled by a constant to estimate volatility. The result is then annualized and expressed as a percentage.

Usage

If you pass "1m as a window argument and _avg_trading_days=False. The result will be 30. If _avg_trading_days=True, the result will be 21.

Examples:

>>> data = pl.DataFrame({'high': [120, 125], 'low': [115, 120]})
>>> _parkinson(data)
A DataFrame with the calculated Parkinson's volatility.
Source code in src\humbldata\toolbox\technical\volatility\realized_volatility_helpers.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def parkinson(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    *,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Calculate Parkinson's volatility over a specified window.

    Parkinson's volatility is a measure that uses the stock's high and low prices
    of the day rather than just close to close prices. It is particularly useful
    for capturing large price movements during the day.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the stock prices.
    window : int, optional
        The rolling window size for calculating volatility, by default 30.
    trading_periods : int, optional
        The number of trading periods in a year, by default 252.
    _column_name_high : str, optional
        The name of the column containing the high prices, by default "high".
    _column_name_low : str, optional
        The name of the column containing the low prices, by default "low".
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _avg_trading_days : bool, optional
        Whether to use the average number of trading days when calculating the
        window size, by default True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The calculated Parkinson's volatility, with an additional column
        "parkinson_volatility_pct_{window_int}D"
        indicating the percentage volatility.

    Notes
    -----
    This function requires the input data to have 'high' and 'low' columns to
    calculate
    the logarithm of their ratio, which is squared and scaled by a constant to
    estimate
    volatility. The result is then annualized and expressed as a percentage.

    Usage
    -----
    If you pass `"1m` as a `window` argument and  `_avg_trading_days=False`.
    The result will be `30`. If `_avg_trading_days=True`, the result will be
    `21`.

    Examples
    --------
    >>> data = pl.DataFrame({'high': [120, 125], 'low': [115, 120]})
    >>> _parkinson(data)
    A DataFrame with the calculated Parkinson's volatility.
    """
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort:
        data = data.lazy().sort(sort_cols)

    var1 = 1.0 / (4.0 * math.log(2.0))
    var2 = (
        data.lazy()
        .set_sorted(sort_cols)
        .select((pl.col(_column_name_high) / pl.col(_column_name_low)).log())
        .collect()
        .to_series()
    )
    rs = var1 * var2**2

    window_int: int = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    ).days
    result = (
        data.lazy()
        .set_sorted(sort_cols)
        .with_columns(
            (
                rs.rolling_map(
                    _annual_vol, window_size=window_int, min_periods=1
                )
                * 100
            ).alias(f"parkinson_volatility_pct_{window_int}D")
        )
    )
    if _drop_nulls:
        return result.drop_nulls(
            subset=f"parkinson_volatility_pct_{window_int}D"
        )

    return result
humbldata.toolbox.technical.volatility.realized_volatility_helpers.garman_klass ¤
garman_klass(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'adj_close', _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _garman_klass.

Calculates the Garman-Klass volatility for a given dataset.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input data containing the price information.

required
window str

The rolling window size for volatility calculation, by default "1m".

'1m'
_column_name_high str

The name of the column containing the high prices, by default "high".

'high'
_column_name_low str

The name of the column containing the low prices, by default "low".

'low'
_column_name_open str

The name of the column containing the opening prices, by default "open".

'open'
_column_name_close str

The name of the column containing the adjusted closing prices, by default "adj_close".

'adj_close'
_drop_nulls bool

Whether to drop null values from the result, by default True.

True
_avg_trading_days bool

Whether to use the average number of trading days when calculating the window size, by default True.

False

Returns:

Type Description
DataFrame | LazyFrame | Series

The calculated Garman-Klass volatility, with an additional column "volatility_pct" indicating the percentage volatility.

Notes

Garman-Klass volatility extends Parkinson’s volatility by considering the opening and closing prices in addition to the high and low prices. This approach provides a more accurate estimation of volatility, especially in markets with significant activity at the opening and closing of trading sessions.

Source code in src\humbldata\toolbox\technical\volatility\realized_volatility_helpers.py
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
def garman_klass(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "adj_close",
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _garman_klass**.

    Calculates the Garman-Klass volatility for a given dataset.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the price information.
    window : str, optional
        The rolling window size for volatility calculation, by default "1m".
    _column_name_high : str, optional
        The name of the column containing the high prices, by default "high".
    _column_name_low : str, optional
        The name of the column containing the low prices, by default "low".
    _column_name_open : str, optional
        The name of the column containing the opening prices, by default "open".
    _column_name_close : str, optional
        The name of the column containing the adjusted closing prices, by
        default "adj_close".
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _avg_trading_days : bool, optional
        Whether to use the average number of trading days when calculating the
        window size, by default True.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame | pl.Series
        The calculated Garman-Klass volatility, with an additional column
        "volatility_pct" indicating the percentage volatility.

    Notes
    -----
    Garman-Klass volatility extends Parkinson’s volatility by considering the
    opening and closing prices in addition to the high and low prices. This
    approach provides a more accurate estimation of volatility, especially in
    markets with significant activity at the opening and closing of trading
    sessions.
    """
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort:
        data = data.lazy().sort(sort_cols)
    log_hi_lo = (
        data.lazy()
        .set_sorted(sort_cols)
        .select((pl.col(_column_name_high) / pl.col(_column_name_low)).log())
        .collect()
        .to_series()
    )
    log_close_open = (
        data.lazy()
        .select((pl.col(_column_name_close) / pl.col(_column_name_open)).log())
        .collect()
        .to_series()
    )
    rs: pl.Series = 0.5 * log_hi_lo**2 - (2 * np.log(2) - 1) * log_close_open**2

    window_int: int = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    ).days
    result = data.lazy().with_columns(
        (
            rs.rolling_map(_annual_vol, window_size=window_int, min_periods=1)
            * 100
        ).alias(f"gk_volatility_pct_{window_int}D")
    )
    if _drop_nulls:
        return result.drop_nulls(subset=f"gk_volatility_pct_{window_int}D")
    return result
humbldata.toolbox.technical.volatility.realized_volatility_helpers.hodges_tompkins ¤
hodges_tompkins(data: DataFrame | LazyFrame | Series, window: str = '1m', trading_periods=252, _column_name_returns: str = 'log_returns', *, _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame | Series

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _hodges_tompkins.

Hodges-Tompkins volatility is a bias correction for estimation using an overlapping data sample that produces unbiased estimates and a substantial gain in efficiency.

Source code in src\humbldata\toolbox\technical\volatility\realized_volatility_helpers.py
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
def hodges_tompkins(
    data: pl.DataFrame | pl.LazyFrame | pl.Series,
    window: str = "1m",
    trading_periods=252,
    _column_name_returns: str = "log_returns",
    *,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame | pl.Series:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _hodges_tompkins**.

    Hodges-Tompkins volatility is a bias correction for estimation using an
    overlapping data sample that produces unbiased estimates and a
    substantial gain in efficiency.
    """
    # When calculating rv_mean, need a different adjustment factor,
    # so window doesn't influence the Volatility_mean
    # RV_MEAN

    # Define Window Size
    window_timedelta = _window_format(
        window, _return_timedelta=True, _avg_trading_days=_avg_trading_days
    )
    # Calculate STD, assigned to `vol`
    if isinstance(data, pl.Series):
        vol = data.rolling_std(window_size=window_timedelta.days, min_periods=1)
    else:
        sort_cols = _set_sort_cols(data, "symbol", "date")
        if _sort:
            data = data.lazy().sort(sort_cols)
        vol = (
            data.lazy()
            .set_sorted(sort_cols)
            .select(
                pl.col(_column_name_returns).rolling_std(
                    window_size=window_timedelta, min_periods=1, by="date"
                )
                * np.sqrt(trading_periods)
            )
        )

    # Assign window size to h for adjustment
    h: int = window_timedelta.days

    if isinstance(data, pl.Series):
        count = data.len()
    elif isinstance(data, pl.LazyFrame):
        count = data.collect().shape[0]
    else:
        count = data.shape[0]

    n = (count - h) + 1
    adj_factor = 1.0 / (1.0 - (h / n) + ((h**2 - 1) / (3 * n**2)))

    if isinstance(data, pl.Series):
        return (vol * adj_factor) * 100
    else:
        result = data.lazy().with_columns(
            ((vol.collect() * adj_factor) * 100)
            .to_series()
            .alias(f"ht_volatility_pct_{h}D")
        )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"ht_volatility_pct_{h}D")
    return result
humbldata.toolbox.technical.volatility.realized_volatility_helpers.rogers_satchell ¤
rogers_satchell(data: DataFrame | LazyFrame, window: str = '1m', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'adj_close', _drop_nulls: bool = True, _avg_trading_days: bool = False, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _rogers_satchell.

Rogers-Satchell is an estimator for measuring the volatility of securities with an average return not equal to zero. Unlike Parkinson and Garman-Klass estimators, Rogers-Satchell incorporates a drift term (mean return not equal to zero). This function calculates the Rogers-Satchell volatility estimator over a specified window and optionally drops null values from the result.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input data for which to calculate the Rogers-Satchell volatility estimator. This can be either a DataFrame or a LazyFrame. There need to be OHLC columns present in the data.

required
window str

The window over which to calculate the volatility estimator. The window is specified as a string, such as "1m" for one month.

"1m"
_column_name_high str

The name of the column representing the high prices in the data.

"high"
_column_name_low str

The name of the column representing the low prices in the data.

"low"
_column_name_open str

The name of the column representing the opening prices in the data.

"open"
_column_name_close str

The name of the column representing the adjusted closing prices in the data.

"adj_close"
_drop_nulls bool

Whether to drop null values from the result. If True, rows with null values in the calculated volatility column will be removed from the output.

True
_avg_trading_days bool

Indicates whether to use the average number of trading days per window. This affects how the window size is interpreted. i.e instead of "1mo" returning timedelta(days=31), it will return timedelta(days=21).

True

Returns:

Type Description
DataFrame | LazyFrame

The input data with an additional column containing the calculated Rogers-Satchell volatility estimator. The return type matches the input type (DataFrame or LazyFrame).

Source code in src\humbldata\toolbox\technical\volatility\realized_volatility_helpers.py
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
def rogers_satchell(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "adj_close",
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _rogers_satchell**.

    Rogers-Satchell is an estimator for measuring the volatility of
    securities with an average return not equal to zero. Unlike Parkinson
    and Garman-Klass estimators, Rogers-Satchell incorporates a drift term
    (mean return not equal to zero). This function calculates the
    Rogers-Satchell volatility estimator over a specified window and optionally
    drops null values from the result.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data for which to calculate the Rogers-Satchell volatility
        estimator. This can be either a DataFrame or a LazyFrame. There need to
        be OHLC columns present in the data.
    window : str, default "1m"
        The window over which to calculate the volatility estimator. The
        window is specified as a string, such as "1m" for one month.
    _column_name_high : str, default "high"
        The name of the column representing the high prices in the data.
    _column_name_low : str, default "low"
        The name of the column representing the low prices in the data.
    _column_name_open : str, default "open"
        The name of the column representing the opening prices in the data.
    _column_name_close : str, default "adj_close"
        The name of the column representing the adjusted closing prices in the
        data.
    _drop_nulls : bool, default True
        Whether to drop null values from the result. If True, rows with null
        values in the calculated volatility column will be removed from the
        output.
    _avg_trading_days : bool, default True
        Indicates whether to use the average number of trading days per window.
        This affects how the window size is interpreted. i.e instead of "1mo"
        returning `timedelta(days=31)`, it will return `timedelta(days=21)`.

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The input data with an additional column containing the calculated
        Rogers-Satchell volatility estimator. The return type matches the input
        type (DataFrame or LazyFrame).
    """
    # Check if all required columns are present in the DataFrame
    _check_required_columns(
        data,
        _column_name_high,
        _column_name_low,
        _column_name_open,
        _column_name_close,
    )
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort:
        data = data.lazy().sort(sort_cols)
    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = (
        data.lazy()
        .set_sorted(sort_cols)
        .with_columns(
            [
                (pl.col(_column_name_high) / pl.col(_column_name_open))
                .log()
                .alias("log_ho"),
                (pl.col(_column_name_low) / pl.col(_column_name_open))
                .log()
                .alias("log_lo"),
                (pl.col(_column_name_close) / pl.col(_column_name_open))
                .log()
                .alias("log_co"),
            ]
        )
        .with_columns(
            (
                pl.col("log_ho") * (pl.col("log_ho") - pl.col("log_co"))
                + pl.col("log_lo") * (pl.col("log_lo") - pl.col("log_co"))
            ).alias("rs")
        )
    )
    result = data.lazy().with_columns(
        (
            pl.col("rs").rolling_map(
                _annual_vol, window_size=window_int, min_periods=1
            )
            * 100
        ).alias(f"rs_volatility_pct_{window_int}D")
    )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"rs_volatility_pct_{window_int}D")
    return result
humbldata.toolbox.technical.volatility.realized_volatility_helpers.yang_zhang ¤
yang_zhang(data: DataFrame | LazyFrame, window: str = '1m', trading_periods: int = 252, _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', _column_name_close: str = 'adj_close', _avg_trading_days: bool = False, _drop_nulls: bool = True, _sort: bool = True) -> LazyFrame

Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || Command: _yang_zhang.

Yang-Zhang volatility is the combination of the overnight (close-to-open volatility), a weighted average of the Rogers-Satchell volatility and the day’s open-to-close volatility.

Source code in src\humbldata\toolbox\technical\volatility\realized_volatility_helpers.py
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
def yang_zhang(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    trading_periods: int = 252,
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    _column_name_close: str = "adj_close",
    _avg_trading_days: bool = False,
    _drop_nulls: bool = True,
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Context: Toolbox || Category: Technical || Sub-Category: Volatility Helpers || **Command: _yang_zhang**.

    Yang-Zhang volatility is the combination of the overnight
    (close-to-open volatility), a weighted average of the Rogers-Satchell
    volatility and the day’s open-to-close volatility.
    """
    # check required columns
    _check_required_columns(
        data,
        _column_name_high,
        _column_name_low,
        _column_name_open,
        _column_name_close,
    )
    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort:
        data = data.lazy().sort(sort_cols)

    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = (
        data.lazy()
        .set_sorted(sort_cols)
        .with_columns(
            [
                (pl.col(_column_name_high) / pl.col(_column_name_open))
                .log()
                .alias("log_ho"),
                (pl.col(_column_name_low) / pl.col(_column_name_open))
                .log()
                .alias("log_lo"),
                (pl.col(_column_name_close) / pl.col(_column_name_open))
                .log()
                .alias("log_co"),
                (pl.col(_column_name_open) / pl.col(_column_name_close).shift())
                .log()
                .alias("log_oc"),
                (
                    pl.col(_column_name_close)
                    / pl.col(_column_name_close).shift()
                )
                .log()
                .alias("log_cc"),
            ]
        )
        .with_columns(
            [
                (pl.col("log_oc") ** 2).alias("log_oc_sq"),
                (pl.col("log_cc") ** 2).alias("log_cc_sq"),
                (
                    pl.col("log_ho") * (pl.col("log_ho") - pl.col("log_co"))
                    + pl.col("log_lo") * (pl.col("log_lo") - pl.col("log_co"))
                ).alias("rs"),
            ]
        )
    )

    k = 0.34 / (1.34 + (window_int + 1) / (window_int - 1))
    data = _yang_zhang_engine(data=data, window=window_int)
    result = (
        data.lazy()
        .with_columns(
            (
                (
                    pl.col("open_vol")
                    + k * pl.col("close_vol")
                    + (1 - k) * pl.col("window_rs")
                ).sqrt()
                * np.sqrt(trading_periods)
                * 100
            ).alias(f"yz_volatility_pct_{window_int}D")
        )
        .select(
            pl.exclude(
                [
                    "log_ho",
                    "log_lo",
                    "log_co",
                    "log_oc",
                    "log_cc",
                    "log_oc_sq",
                    "log_cc_sq",
                    "rs",
                    "close_vol",
                    "open_vol",
                    "window_rs",
                ]
            )
        )
    )
    if _drop_nulls:
        return result.drop_nulls(subset=f"yz_volatility_pct_{window_int}D")
    return result
humbldata.toolbox.technical.volatility.realized_volatility_helpers.squared_returns ¤
squared_returns(data: DataFrame | LazyFrame, window: str = '1m', trading_periods: int = 252, _drop_nulls: bool = True, _avg_trading_days: bool = False, _column_name_returns: str = 'log_returns', _sort: bool = True) -> LazyFrame

Calculate squared returns over a rolling window.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The input data containing the price information.

required
window str

The rolling window size for calculating squared returns, by default "1m".

'1m'
trading_periods int

The number of trading periods in a year, used for scaling the result. The default is 252.

252
_drop_nulls bool

Whether to drop null values from the result, by default True.

True
_column_name_returns str

The name of the column containing the price data, by default "adj_close".

'log_returns'

Returns:

Type Description
DataFrame | LazyFrame

The input data structure with an additional column for the rolling squared returns.

Source code in src\humbldata\toolbox\technical\volatility\realized_volatility_helpers.py
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
def squared_returns(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    trading_periods: int = 252,
    _drop_nulls: bool = True,
    _avg_trading_days: bool = False,
    _column_name_returns: str = "log_returns",
    _sort: bool = True,
) -> pl.LazyFrame:
    """
    Calculate squared returns over a rolling window.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The input data containing the price information.
    window : str, optional
        The rolling window size for calculating squared returns, by default "1m".
    trading_periods : int, optional
        The number of trading periods in a year, used for scaling the result.
        The default is 252.
    _drop_nulls : bool, optional
        Whether to drop null values from the result, by default True.
    _column_name_returns : str, optional
        The name of the column containing the price data, by default "adj_close".

    Returns
    -------
    pl.DataFrame | pl.LazyFrame
        The input data structure with an additional column for the rolling
        squared returns.
    """
    _check_required_columns(data, _column_name_returns)

    sort_cols = _set_sort_cols(data, "symbol", "date")
    if _sort:
        data = data.lazy().sort(sort_cols)

    # assign window
    window_int: int = _window_format(
        window=window,
        _return_timedelta=True,
        _avg_trading_days=_avg_trading_days,
    ).days

    data = (
        data.lazy()
        .set_sorted(sort_cols)
        .with_columns(
            ((pl.col(_column_name_returns) * 100) ** 2).alias(
                "sq_log_returns_pct"
            )
        )
    )
    # Calculate rolling squared returns
    result = (
        data.lazy()
        .with_columns(
            pl.col("sq_log_returns_pct")
            .rolling_mean(window_size=window_int, min_periods=1)
            .alias(f"sq_volatility_pct_{window_int}D")
        )
        .drop("sq_log_returns_pct")
    )
    if _drop_nulls:
        result = result.drop_nulls(subset=f"sq_volatility_pct_{window_int}D")
    return result
humbldata.toolbox.technical.volatility.realized_volatility_model ¤

Context: Toolbox || Category: Technical || Command: calc_realized_volatility.

A command to generate Realized Volatility for any time series. A complete set of volatility estimators based on Euan Sinclair's Volatility Trading

humbldata.toolbox.technical.volatility.realized_volatility_model.calc_realized_volatility ¤
calc_realized_volatility(data: DataFrame | LazyFrame, window: str = '1m', method: Literal['std', 'parkinson', 'garman_klass', 'gk', 'hodges_tompkins', 'ht', 'rogers_satchell', 'rs', 'yang_zhang', 'yz', 'squared_returns', 'sq'] = 'std', grouped_mean: list[int] | None = None, _trading_periods: int = 252, _column_name_returns: str = 'log_returns', _column_name_close: str = 'close', _column_name_high: str = 'high', _column_name_low: str = 'low', _column_name_open: str = 'open', *, _sort: bool = True) -> LazyFrame | DataFrame

Context: Toolbox || Category: Technical || Command: calc_realized_volatility.

Calculates the Realized Volatility for a given time series based on the provided standard and extra parameters. This function adds ONE rolling volatility column to the input DataFrame.

Parameters:

Name Type Description Default
data DataFrame | LazyFrame

The time series data for which to calculate the Realized Volatility.

required
window str

The window size for a rolling volatility calculation, default is "1m" (1 month).

'1m'
method Literal['std', 'parkinson', 'garman_klass', 'hodges_tompkins', 'rogers_satchell', 'yang_zhang', 'squared_returns']

The volatility estimator to use. You can also use abbreviations to access the same methods. The abbreviations are: gk for garman_klass, ht for hodges_tompkins, rs for rogers_satchell, yz for yang_zhang, sq for squared_returns.

'std'
grouped_mean list[int] | None

A list of window sizes to use for calculating volatility. If provided, the volatility method will be calculated across these various windows, and then an averaged value of all the windows will be returned. If None, a single window size specified by window parameter will be used.

None
_sort bool

If True, the data will be sorted before calculation. Default is True.

True
_trading_periods int

The number of trading periods in a year, default is 252 (the typical number of trading days in a year).

252
_column_name_returns str

The name of the column containing the returns. Default is "log_returns".

'log_returns'
_column_name_close str

The name of the column containing the close prices. Default is "close".

'close'
_column_name_high str

The name of the column containing the high prices. Default is "high".

'high'
_column_name_low str

The name of the column containing the low prices. Default is "low".

'low'
_column_name_open str

The name of the column containing the open prices. Default is "open".

'open'

Returns:

Type Description
VolatilityData

The calculated Realized Volatility data for the given time series.

Notes
  • Rolling calculations are used to show a time series of recent volatility that captures only a certain number of data points. The window size is used to determine the number of data points to use in the calculation. We do this because when looking at the volatility of a stock, you get a better insight (more granular) into the characteristics of the volatility seeing how 1-month or 3-month rolling volatility looked over time.

  • This function does not accept pl.Series because the methods used to calculate volatility require, high, low, close, open columns for the data. It would be too cumbersome to pass each series needed for the calculation as a separate argument. Therefore, the function only accepts pl.DataFrame or pl.LazyFrame as input.

Source code in src\humbldata\toolbox\technical\volatility\realized_volatility_model.py
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def calc_realized_volatility(
    data: pl.DataFrame | pl.LazyFrame,
    window: str = "1m",
    method: Literal[  # used to be rvol_method
        "std",
        "parkinson",
        "garman_klass",
        "gk",
        "hodges_tompkins",
        "ht",
        "rogers_satchell",
        "rs",
        "yang_zhang",
        "yz",
        "squared_returns",
        "sq",
    ] = "std",
    grouped_mean: list[int] | None = None,  # used to be rv_mean
    _trading_periods: int = 252,
    _column_name_returns: str = "log_returns",
    _column_name_close: str = "close",
    _column_name_high: str = "high",
    _column_name_low: str = "low",
    _column_name_open: str = "open",
    *,
    _sort: bool = True,
) -> pl.LazyFrame | pl.DataFrame:
    """
    Context: Toolbox || Category: Technical || **Command: calc_realized_volatility**.

    Calculates the Realized Volatility for a given time series based on the
    provided standard and extra parameters. This function adds ONE rolling
    volatility column to the input DataFrame.

    Parameters
    ----------
    data : pl.DataFrame | pl.LazyFrame
        The time series data for which to calculate the Realized Volatility.
    window : str
        The window size for a rolling volatility calculation, default is `"1m"`
        (1 month).
    method : Literal["std", "parkinson", "garman_klass", "hodges_tompkins","rogers_satchell", "yang_zhang", "squared_returns"]
        The volatility estimator to use. You can also use abbreviations to
        access the same methods. The abbreviations are: `gk` for `garman_klass`,
        `ht` for `hodges_tompkins`, `rs` for `rogers_satchell`, `yz` for
        `yang_zhang`, `sq` for `squared_returns`.
    grouped_mean : list[int] | None
        A list of window sizes to use for calculating volatility. If provided,
        the volatility method will be calculated across these various windows,
        and then an averaged value of all the windows will be returned. If `None`,
        a single window size specified by `window` parameter will be used.
    _sort : bool
        If True, the data will be sorted before calculation. Default is True.
    _trading_periods : int
        The number of trading periods in a year, default is 252 (the typical
        number of trading days in a year).
    _column_name_returns : str
        The name of the column containing the returns. Default is "log_returns".
    _column_name_close : str
        The name of the column containing the close prices. Default is "close".
    _column_name_high : str
        The name of the column containing the high prices. Default is "high".
    _column_name_low : str
        The name of the column containing the low prices. Default is "low".
    _column_name_open : str
        The name of the column containing the open prices. Default is "open".

    Returns
    -------
    VolatilityData
        The calculated Realized Volatility data for the given time series.

    Notes
    -----
    - Rolling calculations are used to show a time series of recent volatility
    that captures only a certain number of data points. The window size is
    used to determine the number of data points to use in the calculation. We do
    this because when looking at the volatility of a stock, you get a better
    insight (more granular) into the characteristics of the volatility seeing how 1-month or
    3-month rolling volatility looked over time.

    - This function does not accept `pl.Series` because the methods used to
    calculate volatility require, high, low, close, open columns for the data.
    It would be too cumbersome to pass each series needed for the calculation
    as a separate argument. Therefore, the function only accepts `pl.DataFrame`
    or `pl.LazyFrame` as input.
    """  # noqa: W505
    # Step 1: Get the correct realized volatility function =====================
    func = VOLATILITY_METHODS.get(method)
    if not func:
        msg = f"Volatility method: '{method}' is not supported."
        raise HumblDataError(msg)

    # Step 2: Get the names of the parameters that the function accepts ========
    func_params = inspect.signature(func).parameters

    # Step 3: Filter out the parameters not accepted by the function ===========
    args_to_pass = {
        key: value for key, value in locals().items() if key in func_params
    }

    # Step 4: Calculate Realized Volatility ====================================
    if grouped_mean:
        # calculate volatility over multiple windows and average the result, add to a new column
        print("🚧 WIP!")
    else:
        out = func(**args_to_pass)

    return out