pytermgui.parser

This module provides TIM, PyTermGUI's Terminal Inline Markup language. It is a simple, performant and easy to read way to style, colorize & modify text.

Basic rundown

TIM is included with the purpose of making styling easier to read and manage.

Its syntax is based on square brackets, within which tags are strictly separated by one space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & foreground), styles, unsetters and macros.

The 16 simple colors of the terminal exist as named tags that refer to their numerical value.

Here is a simple example of the syntax, using the pytermgui.pretty submodule to syntax-highlight it inside the REPL:

>>> from pytermgui import pretty
>>> '[141 @61 bold] Hello [!upper inverse] There '

General syntax

Background colors are always denoted by a leading @ character in front of the color tag. Styles are just the name of the style and macros have an exclamation mark in front of them. Additionally, unsetters use a leading slash (/) for their syntax. Color tokens have special unsetters: they use /fg to cancel foreground colors, and /bg to do so with backgrounds.

Macros:

Macros are any type of callable that take at least *args; this is the value of the plain text enclosed by the tag group within which the given macro resides. Additionally, macros can be given any number of positional arguments from within markup, using the syntax:

[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro

This syntax gets parsed as follows:

macro("Text that the macro applies to.", "arg1", "arg2", "arg3")

macro here is whatever the name macro was defined as prior.

Colors:

Colors can be of three general types: xterm-256, RGB and HEX.

xterm-256 stands for one of the 256 xterm colors. You can use ptg -c to see the all of the available colors. Its syntax is just the 0-base index of the color, like [141]

RGB colors are pretty self explanatory. Their syntax is follows the format RED;GREEN;BLUE, such as [111;222;333].

HEX colors are basically just RGB with extra steps. Their syntax is #RRGGBB, such as [#FA72BF]. This code then gets converted to a tuple of RGB colors under the hood, so from then on RGB and HEX colors are treated the same, and emit the same tokens.

As mentioned above, all colors can be made to act on the background instead by prepending the color tag with @, such as @141, @111;222;333 or @#FA72BF. To clear these effects, use /fg for foreground and /bg for background colors.

MarkupLanguage and instancing

All markup behaviour is done by an instance of the MarkupLanguage class. This is done partially for organization reasons, but also to allow a sort of sandboxing of custom definitions and settings.

PyTermGUI provides the tim name as the global markup language instance. For historical reasons, the same instance is available as markup. This should be used pretty much all of the time, and custom instances should only ever come about when some security-sensitive macro definitions are needed, as markup is used by every widget, including user-input ones such as InputField.

For the rest of this page, MarkupLanguage will refer to whichever instance you are using.

TL;DR : Use tim always, unless a security concern blocks you from doing so.

Caching

By default, all markup parse results are cached and returned when the same input is given. To disable this behaviour, set your markup instance (usually markup)'s should_cache field to False.

Customization

There are a couple of ways to customize how markup is parsed. Custom tags can be created by calling MarkupLanguage.alias. For defining custom macros, you can use MarkupLanguage.define. For more information, see each method's documentation.

   1"""
   2This module provides `TIM`, PyTermGUI's Terminal Inline Markup language. It is a simple,
   3performant and easy to read way to style, colorize & modify text.
   4
   5Basic rundown
   6-------------
   7
   8TIM is included with the purpose of making styling easier to read and manage.
   9
  10Its syntax is based on square brackets, within which tags are strictly separated by one
  11space character. Tags can stand for colors (xterm-256, RGB or HEX, both background &
  12foreground), styles, unsetters and macros.
  13
  14The 16 simple colors of the terminal exist as named tags that refer to their numerical
  15value.
  16
  17Here is a simple example of the syntax, using the `pytermgui.pretty` submodule to
  18syntax-highlight it inside the REPL:
  19
  20```python3
  21>>> from pytermgui import pretty
  22>>> '[141 @61 bold] Hello [!upper inverse] There '
  23```
  24
  25<p align=center>
  26<img src="https://github.com/bczsalba/pytermgui/blob/master/assets/docs/parser/\
  27simple_example.png?raw=true" width=70%>
  28</p>
  29
  30
  31General syntax
  32--------------
  33
  34Background colors are always denoted by a leading `@` character in front of the color
  35tag. Styles are just the name of the style and macros have an exclamation mark in front
  36of them. Additionally, unsetters use a leading slash (`/`) for their syntax. Color
  37tokens have special unsetters: they use `/fg` to cancel foreground colors, and `/bg` to
  38do so with backgrounds.
  39
  40### Macros:
  41
  42Macros are any type of callable that take at least *args; this is the value of the plain
  43text enclosed by the tag group within which the given macro resides. Additionally,
  44macros can be given any number of positional arguments from within markup, using the
  45syntax:
  46
  47```
  48[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro
  49```
  50
  51This syntax gets parsed as follows:
  52
  53```python3
  54macro("Text that the macro applies to.", "arg1", "arg2", "arg3")
  55```
  56
  57`macro` here is whatever the name `macro` was defined as prior.
  58
  59### Colors:
  60
  61Colors can be of three general types: xterm-256, RGB and HEX.
  62
  63`xterm-256` stands for one of the 256 xterm colors. You can use `ptg -c` to see the all
  64of the available colors. Its syntax is just the 0-base index of the color, like `[141]`
  65
  66`RGB` colors are pretty self explanatory. Their syntax is follows the format
  67`RED;GREEN;BLUE`, such as `[111;222;333]`.
  68
  69`HEX` colors are basically just RGB with extra steps. Their syntax is `#RRGGBB`, such as
  70`[#FA72BF]`. This code then gets converted to a tuple of RGB colors under the hood, so
  71from then on RGB and HEX colors are treated the same, and emit the same tokens.
  72
  73As mentioned above, all colors can be made to act on the background instead by
  74prepending the color tag with `@`, such as `@141`, `@111;222;333` or `@#FA72BF`. To
  75clear these effects, use `/fg` for foreground and `/bg` for background colors.
  76
  77`MarkupLanguage` and instancing
  78-------------------------------
  79
  80All markup behaviour is done by an instance of the `MarkupLanguage` class. This is done
  81partially for organization reasons, but also to allow a sort of sandboxing of custom
  82definitions and settings.
  83
  84PyTermGUI provides the `tim` name as the global markup language instance. For historical
  85reasons, the same instance is available as `markup`. This should be used pretty much all
  86of the time, and custom instances should only ever come about when some
  87security-sensitive macro definitions are needed, as `markup` is used by every widget,
  88including user-input ones such as `InputField`.
  89
  90For the rest of this page, `MarkupLanguage` will refer to whichever instance you are
  91using.
  92
  93TL;DR : Use `tim` always, unless a security concern blocks you from doing so.
  94
  95Caching
  96-------
  97
  98By default, all markup parse results are cached and returned when the same input is
  99given. To disable this behaviour, set your markup instance (usually `markup`)'s
 100`should_cache` field to False.
 101
 102Customization
 103-------------
 104
 105There are a couple of ways to customize how markup is parsed. Custom tags can be created
 106by calling `MarkupLanguage.alias`. For defining custom macros, you can use
 107`MarkupLanguage.define`. For more information, see each method's documentation.
 108"""
 109# pylint: disable=too-many-lines
 110
 111from __future__ import annotations
 112
 113from argparse import ArgumentParser
 114from contextlib import suppress
 115from dataclasses import dataclass
 116from enum import Enum
 117from enum import auto as _auto
 118from functools import cached_property
 119from random import shuffle
 120from typing import Callable, Iterator, List, Tuple
 121
 122from .colors import Color, StandardColor, str_to_color
 123from .exceptions import AnsiSyntaxError, ColorSyntaxError, MarkupSyntaxError
 124from .regex import RE_ANSI, RE_LINK, RE_MACRO, RE_MARKUP
 125from .terminal import get_terminal
 126
 127__all__ = [
 128    "StyledText",
 129    "MacroCallable",
 130    "MacroCall",
 131    "MarkupLanguage",
 132    "markup",
 133    "tim",
 134]
 135
 136MacroCallable = Callable[..., str]
 137MacroCall = Tuple[MacroCallable, List[str]]
 138
 139STYLE_MAP = {
 140    "bold": "1",
 141    "dim": "2",
 142    "italic": "3",
 143    "underline": "4",
 144    "blink": "5",
 145    "blink2": "6",
 146    "inverse": "7",
 147    "invisible": "8",
 148    "strikethrough": "9",
 149    "overline": "53",
 150}
 151
 152UNSETTER_MAP: dict[str, str | None] = {
 153    "/": "0",
 154    "/bold": "22",
 155    "/dim": "22",
 156    "/italic": "23",
 157    "/underline": "24",
 158    "/blink": "25",
 159    "/blink2": "26",
 160    "/inverse": "27",
 161    "/invisible": "28",
 162    "/strikethrough": "29",
 163    "/fg": "39",
 164    "/bg": "49",
 165    "/overline": "54",
 166}
 167
 168
 169def macro_align(width: str, alignment: str, content: str) -> str:
 170    """Aligns given text using fstrings.
 171
 172    Args:
 173        width: The width to align to.
 174        alignment: One of "left", "center", "right".
 175        content: The content to align; implicit argument.
 176    """
 177
 178    aligner = "<" if alignment == "left" else (">" if alignment == "right" else "^")
 179    return f"{content:{aligner}{width}}"
 180
 181
 182def macro_expand(lang: MarkupLanguage, tag: str) -> str:
 183    """Expands a tag alias."""
 184
 185    if not tag in lang.user_tags:
 186        return tag
 187
 188    return lang.get_markup(f"\x1b[{lang.user_tags[tag]}m ")[:-1]
 189
 190
 191def macro_strip_fg(item: str) -> str:
 192    """Strips foreground color from item"""
 193
 194    return markup.parse(f"[/fg]{item}")
 195
 196
 197def macro_strip_bg(item: str) -> str:
 198    """Strips foreground color from item"""
 199
 200    return markup.parse(f"[/bg]{item}")
 201
 202
 203def macro_shuffle(item: str) -> str:
 204    """Shuffles a string using shuffle.shuffle on its list cast."""
 205
 206    shuffled = list(item)
 207    shuffle(shuffled)
 208
 209    return "".join(shuffled)
 210
 211
 212def macro_link(*args) -> str:
 213    """Creates a clickable hyperlink.
 214
 215    Note:
 216        Since this is a pretty new feature for terminals, its support is limited.
 217    """
 218
 219    *uri_parts, label = args
 220    uri = ":".join(uri_parts)
 221
 222    return f"\x1b]8;;{uri}\x1b\\{label}\x1b]8;;\x1b\\"
 223
 224
 225def _apply_colors(colors: list[str] | list[int], item: str) -> str:
 226    """Applies the given list of colors to the item, spread out evenly."""
 227
 228    blocksize = max(round(len(item) / len(colors)), 1)
 229
 230    out = ""
 231    current_block = 0
 232    for i, char in enumerate(item):
 233        if i % blocksize == 0 and current_block < len(colors):
 234            out += f"[{colors[current_block]}]"
 235            current_block += 1
 236
 237        out += char
 238
 239    return markup.parse(out)
 240
 241
 242def macro_rainbow(item: str) -> str:
 243    """Creates rainbow-colored text."""
 244
 245    colors = ["red", "208", "yellow", "green", "brightblue", "blue", "93"]
 246
 247    return _apply_colors(colors, item)
 248
 249
 250def macro_gradient(base_str: str, item: str) -> str:
 251    """Creates an xterm-256 gradient from a base color.
 252
 253    This exploits the way the colors are arranged in the xterm color table; every
 254    36th color is the next item of a single gradient.
 255
 256    The start of this given gradient is calculated by decreasing the given base by 36 on
 257    every iteration as long as the point is a valid gradient start.
 258
 259    After that, the 6 colors of this gradient are calculated and applied.
 260    """
 261
 262    if not base_str.isdigit():
 263        raise ValueError(f"Gradient base has to be a digit, got {base_str}.")
 264
 265    base = int(base_str)
 266    if base < 16 or base > 231:
 267        raise ValueError("Gradient base must be between 16 and 232")
 268
 269    while base > 52:
 270        base -= 36
 271
 272    colors = []
 273    for i in range(6):
 274        colors.append(base + 36 * i)
 275
 276    return _apply_colors(colors, item)
 277
 278
 279class TokenType(Enum):
 280    """An Enum to store various token types."""
 281
 282    LINK = _auto()
 283    """A terminal hyperlink."""
 284
 285    PLAIN = _auto()
 286    """Plain text, nothing interesting."""
 287
 288    COLOR = _auto()
 289    """A color token. Has a `pytermgui.colors.Color` instance as its data."""
 290
 291    STYLE = _auto()
 292    """A builtin terminal style, such as `bold` or `italic`."""
 293
 294    MACRO = _auto()
 295    """A PTG markup macro. The macro itself is stored inside `self.data`."""
 296
 297    ESCAPED = _auto()
 298    """An escaped token."""
 299
 300    UNSETTER = _auto()
 301    """A token that unsets some other attribute."""
 302
 303    POSITION = _auto()
 304    """A token representing a positioning string. `self.data` follows the format `x,y`."""
 305
 306
 307@dataclass
 308class Token:
 309    """A class holding information on a singular markup or ANSI style unit.
 310
 311    Attributes:
 312    """
 313
 314    ttype: TokenType
 315    """The type of this token."""
 316
 317    data: str | MacroCall | Color | None
 318    """The data contained within this token. This changes based on the `ttype` attr."""
 319
 320    name: str = "<unnamed-token>"
 321    """An optional display name of the token. Defaults to `data` when not given."""
 322
 323    def __post_init__(self) -> None:
 324        """Sets `name` to `data` if not provided."""
 325
 326        if self.name == "<unnamed-token>":
 327            if isinstance(self.data, str):
 328                self.name = self.data
 329
 330            elif isinstance(self.data, Color):
 331                self.name = self.data.name
 332
 333            else:
 334                raise TypeError
 335
 336        # Create LINK from a plain token
 337        if self.ttype is TokenType.PLAIN:
 338            assert isinstance(self.data, str)
 339
 340            link_match = RE_LINK.match(self.data)
 341
 342            if link_match is not None:
 343                self.data, self.name = link_match.groups()
 344                self.ttype = TokenType.LINK
 345
 346        if self.ttype is TokenType.ESCAPED:
 347            assert isinstance(self.data, str)
 348
 349            self.name = self.data[1:]
 350
 351    def __eq__(self, other: object) -> bool:
 352        """Checks equality with `other`."""
 353
 354        if other is None:
 355            return False
 356
 357        if not isinstance(other, type(self)):
 358            return False
 359
 360        return other.data == self.data and other.ttype is self.ttype
 361
 362    @cached_property
 363    def sequence(self) -> str | None:
 364        """Returns the ANSI sequence this token represents."""
 365
 366        if self.data is None:
 367            return None
 368
 369        if self.ttype in [TokenType.PLAIN, TokenType.MACRO, TokenType.ESCAPED]:
 370            return None
 371
 372        if self.ttype is TokenType.LINK:
 373            return macro_link(self.data, self.name)
 374
 375        if self.ttype is TokenType.POSITION:
 376            assert isinstance(self.data, str)
 377            position = self.data.split(",")
 378            return f"\x1b[{position[1]};{position[0]}H"
 379
 380        # Colors and styles
 381        data = self.data
 382
 383        if self.ttype in [TokenType.STYLE, TokenType.UNSETTER]:
 384            return f"\033[{data}m"
 385
 386        assert isinstance(data, Color)
 387        return data.sequence
 388
 389
 390class StyledText(str):
 391    """A styled text object.
 392
 393    The purpose of this class is to implement some things regular `str`
 394    breaks at when encountering ANSI sequences.
 395
 396    Instances of this class are usually spat out by `MarkupLanguage.parse`,
 397    but may be manually constructed if the need arises. Everything works even
 398    if there is no ANSI tomfoolery going on.
 399    """
 400
 401    value: str
 402    """The underlying, ANSI-inclusive string value."""
 403
 404    _plain: str | None = None
 405    _tokens: list[Token] | None = None
 406
 407    def __new__(cls, value: str = ""):
 408        """Creates a StyledText, gets markup tags."""
 409
 410        obj = super().__new__(cls, value)
 411        obj.value = value
 412
 413        return obj
 414
 415    def _generate_tokens(self) -> None:
 416        """Generates self._tokens & self._plain."""
 417
 418        self._tokens = list(tim.tokenize_ansi(self.value))
 419
 420        self._plain = ""
 421        for token in self._tokens:
 422            if token.ttype is not TokenType.PLAIN:
 423                continue
 424
 425            assert isinstance(token.data, str)
 426            self._plain += token.data
 427
 428    @property
 429    def tokens(self) -> list[Token]:
 430        """Returns all markup tokens of this object.
 431
 432        Generated on-demand, at the first call to this or the self.plain
 433        property.
 434        """
 435
 436        if self._tokens is not None:
 437            return self._tokens
 438
 439        self._generate_tokens()
 440        assert self._tokens is not None
 441        return self._tokens
 442
 443    @property
 444    def plain(self) -> str:
 445        """Returns the value of this object, with no ANSI sequences.
 446
 447        Generated on-demand, at the first call to this or the self.tokens
 448        property.
 449        """
 450
 451        if self._plain is not None:
 452            return self._plain
 453
 454        self._generate_tokens()
 455        assert self._plain is not None
 456        return self._plain
 457
 458    def plain_index(self, index: int | None) -> int | None:
 459        """Finds given index inside plain text."""
 460
 461        if index is None:
 462            return None
 463
 464        styled_chars = 0
 465        plain_chars = 0
 466        negative_index = False
 467
 468        tokens = self.tokens.copy()
 469        if index < 0:
 470            tokens.reverse()
 471            index = abs(index)
 472            negative_index = True
 473
 474        for token in tokens:
 475            if token.data is None:
 476                continue
 477
 478            if token.ttype is not TokenType.PLAIN:
 479                assert token.sequence is not None
 480                styled_chars += len(token.sequence)
 481                continue
 482
 483            assert isinstance(token.data, str)
 484            for _ in range(len(token.data)):
 485                if plain_chars == index:
 486                    if negative_index:
 487                        return -1 * (plain_chars + styled_chars)
 488
 489                    return styled_chars + plain_chars
 490
 491                plain_chars += 1
 492
 493        return None
 494
 495    def __len__(self) -> int:
 496        """Gets "real" length of object."""
 497
 498        return len(self.plain)
 499
 500    def __getitem__(self, subscript: int | slice) -> str:
 501        """Gets an item, adjusted for non-plain text.
 502
 503        Args:
 504            subscript: The integer or slice to find.
 505
 506        Returns:
 507            The elements described by the subscript.
 508
 509        Raises:
 510            IndexError: The given index is out of range.
 511        """
 512
 513        if isinstance(subscript, int):
 514            plain_index = self.plain_index(subscript)
 515            if plain_index is None:
 516                raise IndexError("StyledText index out of range")
 517
 518            return self.value[plain_index]
 519
 520        return self.value[
 521            slice(
 522                self.plain_index(subscript.start),
 523                self.plain_index(subscript.stop),
 524                subscript.step,
 525            )
 526        ]
 527
 528
 529class MarkupLanguage:
 530    """A class representing an instance of a Markup Language.
 531
 532    This class is used for all markup/ANSI parsing, tokenizing and usage.
 533
 534    ```python3
 535    from pytermgui import tim
 536
 537    tim.alias("my-tag", "@152 72 bold")
 538    tim.print("This is [my-tag]my-tag[/]!")
 539    ```
 540
 541    <p style="text-align: center">
 542        <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\
 543docs/parser/markup_language.png"
 544        style="width: 80%">
 545    </p>
 546    """
 547
 548    raise_unknown_markup: bool = False
 549    """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags."""
 550
 551    def __init__(self, default_macros: bool = True) -> None:
 552        """Initializes a MarkupLanguage.
 553
 554        Args:
 555            default_macros: If not set, the builtin macros are not defined.
 556        """
 557
 558        self.tags: dict[str, str] = STYLE_MAP.copy()
 559        self._cache: dict[str, StyledText] = {}
 560        self.macros: dict[str, MacroCallable] = {}
 561        self.user_tags: dict[str, str] = {}
 562        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
 563
 564        self.should_cache: bool = True
 565
 566        if default_macros:
 567            self.define("!link", macro_link)
 568            self.define("!align", macro_align)
 569            self.define("!markup", self.get_markup)
 570            self.define("!shuffle", macro_shuffle)
 571            self.define("!strip_bg", macro_strip_bg)
 572            self.define("!strip_fg", macro_strip_fg)
 573            self.define("!rainbow", macro_rainbow)
 574            self.define("!gradient", macro_gradient)
 575            self.define("!upper", lambda item: str(item.upper()))
 576            self.define("!lower", lambda item: str(item.lower()))
 577            self.define("!title", lambda item: str(item.title()))
 578            self.define("!capitalize", lambda item: str(item.capitalize()))
 579            self.define("!expand", lambda tag: macro_expand(self, tag))
 580            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
 581
 582        self.alias("code", "dim @black")
 583        self.alias("code.str", "142")
 584        self.alias("code.multiline_str", "code.str")
 585        self.alias("code.none", "167")
 586        self.alias("code.global", "214")
 587        self.alias("code.number", "175")
 588        self.alias("code.keyword", "203")
 589        self.alias("code.identifier", "109")
 590        self.alias("code.name", "code.global")
 591        self.alias("code.comment", "240 italic")
 592        self.alias("code.builtin", "code.global")
 593        self.alias("code.file", "code.identifier")
 594        self.alias("code.symbol", "code.identifier")
 595
 596    def _get_color_token(self, tag: str) -> Token | None:
 597        """Tries to get a color token from the given tag.
 598
 599        Args:
 600            tag: The tag to parse.
 601
 602        Returns:
 603            A color token if the given tag could be parsed into one, else None.
 604        """
 605
 606        try:
 607            color = str_to_color(tag, use_cache=self.should_cache)
 608
 609        except ColorSyntaxError:
 610            return None
 611
 612        return Token(name=color.value, ttype=TokenType.COLOR, data=color)
 613
 614    def _get_style_token(self, tag: str) -> Token | None:
 615        """Tries to get a style (including unsetter) token from tags, user tags and unsetters.
 616
 617        Args:
 618            tag: The tag to parse.
 619
 620        Returns:
 621            A `Token` if one could be created, None otherwise.
 622        """
 623
 624        if tag in self.unsetters:
 625            return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag])
 626
 627        if tag in self.user_tags:
 628            return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag])
 629
 630        if tag in self.tags:
 631            return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag])
 632
 633        return None
 634
 635    def print(self, *args, **kwargs) -> None:
 636        """Parse all arguments and pass them through to print, along with kwargs."""
 637
 638        parsed = []
 639        for arg in args:
 640            parsed.append(self.parse(str(arg)))
 641
 642        get_terminal().print(*parsed, **kwargs)
 643
 644    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
 645        """Converts the given markup string into an iterator of `Token`.
 646
 647        Args:
 648            markup_text: The text to look at.
 649
 650        Returns:
 651            An iterator of tokens. The reason this is an iterator is to possibly save
 652            on memory.
 653        """
 654
 655        end = 0
 656        start = 0
 657        cursor = 0
 658        for match in RE_MARKUP.finditer(markup_text):
 659            full, escapes, tag_text = match.groups()
 660            start, end = match.span()
 661
 662            # Add plain text between last and current match
 663            if start > cursor:
 664                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
 665
 666            if not escapes == "" and len(escapes) % 2 == 1:
 667                cursor = end
 668                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
 669                continue
 670
 671            for tag in tag_text.split():
 672                token = self._get_style_token(tag)
 673                if token is not None:
 674                    yield token
 675                    continue
 676
 677                # Try to find a color token
 678                token = self._get_color_token(tag)
 679                if token is not None:
 680                    yield token
 681                    continue
 682
 683                macro_match = RE_MACRO.match(tag)
 684                if macro_match is not None:
 685                    name, args = macro_match.groups()
 686                    macro_args = () if args is None else args.split(":")
 687
 688                    if not name in self.macros:
 689                        raise MarkupSyntaxError(
 690                            tag=tag,
 691                            cause="is not a defined macro",
 692                            context=markup_text,
 693                        )
 694
 695                    yield Token(
 696                        name=tag,
 697                        ttype=TokenType.MACRO,
 698                        data=(self.macros[name], macro_args),
 699                    )
 700                    continue
 701
 702                if self.raise_unknown_markup:
 703                    raise MarkupSyntaxError(
 704                        tag=tag, cause="not defined", context=markup_text
 705                    )
 706
 707            cursor = end
 708
 709        # Add remaining text as plain
 710        if len(markup_text) > cursor:
 711            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
 712
 713    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
 714        """Converts the given ANSI string into an iterator of `Token`.
 715
 716        Args:
 717            ansi: The text to look at.
 718
 719        Returns:
 720            An iterator of tokens. The reason this is an iterator is to possibly save
 721            on memory.
 722        """
 723
 724        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
 725            """Determines whether a code is in the given dict of tags."""
 726
 727            for name, current in tags.items():
 728                if current == code:
 729                    return name
 730
 731            return None
 732
 733        def _generate_color(
 734            parts: list[str], code: str
 735        ) -> tuple[str, TokenType, Color]:
 736            """Generates a color token."""
 737
 738            data: Color
 739            if len(parts) == 1:
 740                data = StandardColor.from_ansi(code)
 741                name = data.name
 742                ttype = TokenType.COLOR
 743
 744            else:
 745                data = str_to_color(code)
 746                name = data.name
 747                ttype = TokenType.COLOR
 748
 749            return name, ttype, data
 750
 751        end = 0
 752        start = 0
 753        cursor = 0
 754
 755        # StyledText messes with indexing, so we need to cast it
 756        # back to str.
 757        if isinstance(ansi, StyledText):
 758            ansi = str(ansi)
 759
 760        for match in RE_ANSI.finditer(ansi):
 761            code = match.groups()[0]
 762            start, end = match.span()
 763
 764            if code is None:
 765                continue
 766
 767            parts = code.split(";")
 768
 769            if start > cursor:
 770                plain = ansi[cursor:start]
 771
 772                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
 773
 774            name: str | None = code
 775            ttype = None
 776            data: str | Color = parts[0]
 777
 778            # Styles & Unsetters
 779            if len(parts) == 1:
 780                # Covariancy is not an issue here, even though mypy seems to think so.
 781                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
 782                if name is not None:
 783                    ttype = TokenType.UNSETTER
 784
 785                else:
 786                    name = _is_in_tags(parts[0], self.tags)
 787                    if name is not None:
 788                        ttype = TokenType.STYLE
 789
 790            # Colors
 791            if ttype is None:
 792                with suppress(ColorSyntaxError):
 793                    name, ttype, data = _generate_color(parts, code)
 794
 795            if name is None or ttype is None or data is None:
 796                if len(parts) != 2:
 797                    raise AnsiSyntaxError(
 798                        tag=parts[0], cause="not recognized", context=ansi
 799                    )
 800
 801                name = "position"
 802                ttype = TokenType.POSITION
 803                data = ",".join(reversed(parts))
 804
 805            yield Token(name=name, ttype=ttype, data=data)
 806            cursor = end
 807
 808        if cursor < len(ansi):
 809            plain = ansi[cursor:]
 810
 811            yield Token(ttype=TokenType.PLAIN, data=plain)
 812
 813    def define(self, name: str, method: MacroCallable) -> None:
 814        """Defines a Macro tag that executes the given method.
 815
 816        Args:
 817            name: The name the given method will be reachable by within markup.
 818                The given value gets "!" prepended if it isn't present already.
 819            method: The method this macro will execute.
 820        """
 821
 822        if not name.startswith("!"):
 823            name = f"!{name}"
 824
 825        self.macros[name] = method
 826        self.unsetters[f"/{name}"] = None
 827
 828    def alias(self, name: str, value: str) -> None:
 829        """Aliases the given name to a value, and generates an unsetter for it.
 830
 831        Note that it is not possible to alias macros.
 832
 833        Args:
 834            name: The name of the new tag.
 835            value: The value the new tag will stand for.
 836        """
 837
 838        def _get_unsetter(token: Token) -> str | None:
 839            """Get unsetter for a token"""
 840
 841            if token.ttype is TokenType.PLAIN:
 842                return None
 843
 844            if token.ttype is TokenType.UNSETTER:
 845                return self.unsetters[token.name]
 846
 847            if token.ttype is TokenType.COLOR:
 848                assert isinstance(token.data, Color)
 849
 850                if token.data.background:
 851                    return self.unsetters["/bg"]
 852
 853                return self.unsetters["/fg"]
 854
 855            name = f"/{token.name}"
 856            if not name in self.unsetters:
 857                raise KeyError(f"Could not find unsetter for token {token}.")
 858
 859            return self.unsetters[name]
 860
 861        if name.startswith("!"):
 862            raise ValueError('Only macro tags can always start with "!".')
 863
 864        setter = ""
 865        unsetter = ""
 866
 867        # Try to link to existing tag
 868        if value in self.user_tags:
 869            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
 870            self.user_tags[name] = self.user_tags[value]
 871            return
 872
 873        for token in self.tokenize_markup(f"[{value}]"):
 874            if token.ttype is TokenType.PLAIN:
 875                continue
 876
 877            assert token.sequence is not None
 878            setter += token.sequence
 879
 880            t_unsetter = _get_unsetter(token)
 881            unsetter += f"\x1b[{t_unsetter}m"
 882
 883        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
 884        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
 885
 886        marked: list[str] = []
 887        for item in self._cache:
 888            if name in item:
 889                marked.append(item)
 890
 891        for item in marked:
 892            del self._cache[item]
 893
 894    # TODO: I cannot cut down the one-too-many branch that this has at the moment.
 895    #       We could look into it in the future, however.
 896    def parse(  # pylint: disable=too-many-branches
 897        self, markup_text: str
 898    ) -> StyledText:
 899        """Parses the given markup.
 900
 901        Args:
 902            markup_text: The markup to parse.
 903
 904        Returns:
 905            A `StyledText` instance of the result of parsing the input. This
 906            custom `str` class is used to allow accessing the plain value of
 907            the output, as well as to cleanly index within it. It is analogous
 908            to builtin `str`, only adds extra things on top.
 909        """
 910
 911        applied_macros: list[tuple[str, MacroCall]] = []
 912        previous_token: Token | None = None
 913        previous_sequence = ""
 914        sequence = ""
 915        out = ""
 916
 917        def _apply_macros(text: str) -> str:
 918            """Apply current macros to text"""
 919
 920            for _, (method, args) in applied_macros:
 921                text = method(*args, text)
 922
 923            return text
 924
 925        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
 926            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
 927                return False
 928
 929            return (
 930                type(previous) is type(new)
 931                and previous.data.background == new.data.background
 932            )
 933
 934        if (
 935            self.should_cache
 936            and markup_text in self._cache
 937            and len(RE_MACRO.findall(markup_text)) == 0
 938        ):
 939            return self._cache[markup_text]
 940
 941        token: Token
 942        for token in self.tokenize_markup(markup_text):
 943            if sequence != "" and previous_token == token:
 944                continue
 945
 946            # Optimize out previously added color tokens, as only the most
 947            # recent would be visible anyways.
 948            if (
 949                token.sequence is not None
 950                and previous_token is not None
 951                and _is_same_colorgroup(previous_token, token)
 952            ):
 953                sequence = token.sequence
 954                continue
 955
 956            if token.ttype == TokenType.UNSETTER and token.data == "0":
 957                out += "\033[0m"
 958                sequence = ""
 959                applied_macros = []
 960                continue
 961
 962            previous_token = token
 963
 964            # Macro unsetters are stored with None as their data
 965            if token.data is None and token.ttype is TokenType.UNSETTER:
 966                for item, data in applied_macros.copy():
 967                    macro_match = RE_MACRO.match(item)
 968                    assert macro_match is not None
 969
 970                    macro_name = macro_match.groups()[0]
 971
 972                    if f"/{macro_name}" == token.name:
 973                        applied_macros.remove((item, data))
 974
 975                continue
 976
 977            if token.ttype is TokenType.MACRO:
 978                assert isinstance(token.data, tuple)
 979
 980                applied_macros.append((token.name, token.data))
 981                continue
 982
 983            if token.sequence is None:
 984                applied = sequence
 985
 986                if not out.endswith("\x1b[0m"):
 987                    for item in previous_sequence.split("\x1b"):
 988                        if item == "" or item[1:-1] in self.unsetters.values():
 989                            continue
 990
 991                        item = f"\x1b{item}"
 992                        applied = applied.replace(item, "")
 993
 994                out += applied + _apply_macros(token.name)
 995                previous_sequence = sequence
 996                sequence = ""
 997                continue
 998
 999            sequence += token.sequence
1000
1001        if sequence + previous_sequence != "":
1002            out += "\x1b[0m"
1003
1004        out = StyledText(out)
1005        self._cache[markup_text] = out
1006        return out
1007
1008    def get_markup(self, ansi: str) -> str:
1009        """Generates markup from ANSI text.
1010
1011        Args:
1012            ansi: The text to get markup from.
1013
1014        Returns:
1015            A markup string that can be parsed to get (visually) the same
1016            result. Note that this conversion is lossy in a way: there are some
1017            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
1018            conversion.
1019        """
1020
1021        current_tags: list[str] = []
1022        out = ""
1023        for token in self.tokenize_ansi(ansi):
1024            if token.ttype is TokenType.PLAIN:
1025                if len(current_tags) != 0:
1026                    out += "[" + " ".join(current_tags) + "]"
1027
1028                assert isinstance(token.data, str)
1029                out += token.data
1030                current_tags = []
1031                continue
1032
1033            if token.ttype is TokenType.ESCAPED:
1034                assert isinstance(token.data, str)
1035
1036                current_tags.append(token.data)
1037                continue
1038
1039            current_tags.append(token.name)
1040
1041        return out
1042
1043    def prettify_ansi(self, text: str) -> str:
1044        """Returns a prettified (syntax-highlighted) ANSI str.
1045
1046        This is useful to quickly "inspect" a given ANSI string. However,
1047        for most real uses `MarkupLanguage.prettify_markup` would be
1048        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1049        as it is much more verbose.
1050
1051        Args:
1052            text: The ANSI-text to prettify.
1053
1054        Returns:
1055            The prettified ANSI text. This text's styles remain valid,
1056            so copy-pasting the argument into a command (like printf)
1057            that can show styled text will work the same way.
1058        """
1059
1060        out = ""
1061        sequences = ""
1062        for token in self.tokenize_ansi(text):
1063            if token.ttype is TokenType.PLAIN:
1064                assert isinstance(token.data, str)
1065                out += token.data
1066                continue
1067
1068            assert token.sequence is not None
1069            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1070            sequences += token.sequence
1071            out += sequences
1072
1073        return out
1074
1075    def prettify_markup(self, text: str) -> str:
1076        """Returns a prettified (syntax-highlighted) markup str.
1077
1078        Args:
1079            text: The markup-text to prettify.
1080
1081        Returns:
1082            Prettified markup. This markup, excluding its styles,
1083            remains valid markup.
1084        """
1085
1086        def _apply_macros(text: str) -> str:
1087            """Apply current macros to text"""
1088
1089            for _, (method, args) in applied_macros:
1090                text = method(*args, text)
1091
1092            return text
1093
1094        def _pop_macro(name: str) -> None:
1095            """Pops a macro from applied_macros."""
1096
1097            for i, (macro_name, _) in enumerate(applied_macros):
1098                if macro_name == name:
1099                    applied_macros.pop(i)
1100                    break
1101
1102        def _finish(out: str, in_sequence: bool) -> str:
1103            """Adds ending cap to the given string."""
1104
1105            if in_sequence:
1106                if not out.endswith("\x1b[0m"):
1107                    out += "\x1b[0m"
1108
1109                return out + "]"
1110
1111            return out + "[/]"
1112
1113        styles: dict[TokenType, str] = {
1114            TokenType.MACRO: "210",
1115            TokenType.ESCAPED: "210 bold",
1116            TokenType.UNSETTER: "strikethrough",
1117        }
1118
1119        applied_macros: list[tuple[str, MacroCall]] = []
1120
1121        out = ""
1122        in_sequence = False
1123        current_styles: list[Token] = []
1124
1125        for token in self.tokenize_markup(text):
1126            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1127                if in_sequence:
1128                    out += "]"
1129
1130                in_sequence = False
1131
1132                sequence = ""
1133                for style in current_styles:
1134                    if style.sequence is None:
1135                        continue
1136
1137                    sequence += style.sequence
1138
1139                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1140                continue
1141
1142            out += " " if in_sequence else "["
1143            in_sequence = True
1144
1145            if token.ttype is TokenType.UNSETTER:
1146                if token.name == "/":
1147                    applied_macros = []
1148
1149                name = token.name[1:]
1150
1151                if name in self.macros:
1152                    _pop_macro(name)
1153
1154                current_styles.append(token)
1155
1156                out += self.parse(
1157                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1158                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1159                )
1160                continue
1161
1162            if token.ttype is TokenType.MACRO:
1163                assert isinstance(token.data, tuple)
1164
1165                name = token.name
1166                if "(" in name:
1167                    name = name[: token.name.index("(")]
1168
1169                applied_macros.append((name, token.data))
1170
1171                try:
1172                    out += token.data[0](*token.data[1], token.name)
1173                    continue
1174
1175                except TypeError:  # Not enough arguments
1176                    pass
1177
1178            if token.sequence is not None:
1179                current_styles.append(token)
1180
1181            style_markup = styles.get(token.ttype) or token.name
1182            out += self.parse(f"[{style_markup}]{token.name}")
1183
1184        return _finish(out, in_sequence)
1185
1186    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1187        """Gets all plain tokens within text, with their respective styles applied.
1188
1189        Args:
1190            text: The ANSI-sequence containing string to find plains from.
1191
1192        Returns:
1193            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1194            containing the styles that are relevant and active on the given plain.
1195        """
1196
1197        def _apply_styles(styles: list[Token], text: str) -> str:
1198            """Applies given styles to text."""
1199
1200            for token in styles:
1201                if token.ttype is TokenType.MACRO:
1202                    assert isinstance(token.data, tuple)
1203                    text = token.data[0](*token.data[1], text)
1204                    continue
1205
1206                if token.sequence is None:
1207                    continue
1208
1209                text = token.sequence + text
1210
1211            return text
1212
1213        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1214            """Removes an unsetter from the list, returns the new list."""
1215
1216            if token.name == "/":
1217                return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles))
1218
1219            target_name = token.name[1:]
1220            for style in styles:
1221                # bold & dim unsetters represent the same character, so we have
1222                # to treat them the same way.
1223                style_name = style.name
1224
1225                if style.name == "dim":
1226                    style_name = "bold"
1227
1228                if style_name == target_name:
1229                    styles.remove(style)
1230
1231                elif (
1232                    style_name.startswith(target_name)
1233                    and style.ttype is TokenType.MACRO
1234                ):
1235                    styles.remove(style)
1236
1237                elif style.ttype is TokenType.COLOR:
1238                    assert isinstance(style.data, Color)
1239                    if target_name == "fg" and not style.data.background:
1240                        styles.remove(style)
1241
1242                    elif target_name == "bg" and style.data.background:
1243                        styles.remove(style)
1244
1245            return styles
1246
1247        def _pop_position(styles: list[Token]) -> list[Token]:
1248            for token in styles.copy():
1249                if token.ttype is TokenType.POSITION:
1250                    styles.remove(token)
1251
1252            return styles
1253
1254        styles: list[Token] = []
1255        for token in self.tokenize_ansi(text):
1256            if token.ttype is TokenType.COLOR:
1257                for i, style in enumerate(reversed(styles)):
1258                    if style.ttype is TokenType.COLOR:
1259                        assert isinstance(style.data, Color)
1260                        assert isinstance(token.data, Color)
1261
1262                        if style.data.background != token.data.background:
1263                            continue
1264
1265                        styles[len(styles) - i - 1] = token
1266                        break
1267                else:
1268                    styles.append(token)
1269
1270                continue
1271
1272            if token.ttype is TokenType.LINK:
1273                styles.append(token)
1274                yield StyledText(_apply_styles(styles, token.name))
1275
1276            if token.ttype is TokenType.PLAIN:
1277                assert isinstance(token.data, str)
1278                yield StyledText(_apply_styles(styles, token.data))
1279                styles = _pop_position(styles)
1280                continue
1281
1282            if token.ttype is TokenType.UNSETTER:
1283                styles = _pop_unsetter(token, styles)
1284                continue
1285
1286            styles.append(token)
1287
1288
1289def main() -> None:
1290    """Main method"""
1291
1292    parser = ArgumentParser()
1293
1294    markup_group = parser.add_argument_group("Markup->ANSI")
1295    markup_group.add_argument(
1296        "-p", "--parse", metavar=("TXT"), help="parse a markup text"
1297    )
1298    markup_group.add_argument(
1299        "-e", "--escape", help="escape parsed markup", action="store_true"
1300    )
1301    # markup_group.add_argument(
1302    # "-o",
1303    # "--optimize",
1304    # help="set optimization level for markup parsing",
1305    # action="count",
1306    # default=0,
1307    # )
1308
1309    markup_group.add_argument("--alias", action="append", help="alias src=dst")
1310
1311    ansi_group = parser.add_argument_group("ANSI->Markup")
1312    ansi_group.add_argument(
1313        "-m", "--markup", metavar=("TXT"), help="get markup from ANSI text"
1314    )
1315    ansi_group.add_argument(
1316        "-s",
1317        "--show-inverse",
1318        action="store_true",
1319        help="show result of parsing result markup",
1320    )
1321
1322    args = parser.parse_args()
1323
1324    lang = MarkupLanguage()
1325
1326    if args.markup:
1327        markup_text = lang.get_markup(args.markup)
1328        print(markup_text, end="")
1329
1330        if args.show_inverse:
1331            print("->", lang.parse(markup_text))
1332        else:
1333            print()
1334
1335    if args.parse:
1336        if args.alias:
1337            for alias in args.alias:
1338                src, dest = alias.split("=")
1339                lang.alias(src, dest)
1340
1341        parsed = lang.parse(args.parse)
1342
1343        if args.escape:
1344            print(ascii(parsed))
1345        else:
1346            print(parsed)
1347
1348        return
1349
1350
1351tim = markup = MarkupLanguage()
1352"""The default TIM instances."""
1353
1354if __name__ == "__main__":
1355    main()
class StyledText(builtins.str):
391class StyledText(str):
392    """A styled text object.
393
394    The purpose of this class is to implement some things regular `str`
395    breaks at when encountering ANSI sequences.
396
397    Instances of this class are usually spat out by `MarkupLanguage.parse`,
398    but may be manually constructed if the need arises. Everything works even
399    if there is no ANSI tomfoolery going on.
400    """
401
402    value: str
403    """The underlying, ANSI-inclusive string value."""
404
405    _plain: str | None = None
406    _tokens: list[Token] | None = None
407
408    def __new__(cls, value: str = ""):
409        """Creates a StyledText, gets markup tags."""
410
411        obj = super().__new__(cls, value)
412        obj.value = value
413
414        return obj
415
416    def _generate_tokens(self) -> None:
417        """Generates self._tokens & self._plain."""
418
419        self._tokens = list(tim.tokenize_ansi(self.value))
420
421        self._plain = ""
422        for token in self._tokens:
423            if token.ttype is not TokenType.PLAIN:
424                continue
425
426            assert isinstance(token.data, str)
427            self._plain += token.data
428
429    @property
430    def tokens(self) -> list[Token]:
431        """Returns all markup tokens of this object.
432
433        Generated on-demand, at the first call to this or the self.plain
434        property.
435        """
436
437        if self._tokens is not None:
438            return self._tokens
439
440        self._generate_tokens()
441        assert self._tokens is not None
442        return self._tokens
443
444    @property
445    def plain(self) -> str:
446        """Returns the value of this object, with no ANSI sequences.
447
448        Generated on-demand, at the first call to this or the self.tokens
449        property.
450        """
451
452        if self._plain is not None:
453            return self._plain
454
455        self._generate_tokens()
456        assert self._plain is not None
457        return self._plain
458
459    def plain_index(self, index: int | None) -> int | None:
460        """Finds given index inside plain text."""
461
462        if index is None:
463            return None
464
465        styled_chars = 0
466        plain_chars = 0
467        negative_index = False
468
469        tokens = self.tokens.copy()
470        if index < 0:
471            tokens.reverse()
472            index = abs(index)
473            negative_index = True
474
475        for token in tokens:
476            if token.data is None:
477                continue
478
479            if token.ttype is not TokenType.PLAIN:
480                assert token.sequence is not None
481                styled_chars += len(token.sequence)
482                continue
483
484            assert isinstance(token.data, str)
485            for _ in range(len(token.data)):
486                if plain_chars == index:
487                    if negative_index:
488                        return -1 * (plain_chars + styled_chars)
489
490                    return styled_chars + plain_chars
491
492                plain_chars += 1
493
494        return None
495
496    def __len__(self) -> int:
497        """Gets "real" length of object."""
498
499        return len(self.plain)
500
501    def __getitem__(self, subscript: int | slice) -> str:
502        """Gets an item, adjusted for non-plain text.
503
504        Args:
505            subscript: The integer or slice to find.
506
507        Returns:
508            The elements described by the subscript.
509
510        Raises:
511            IndexError: The given index is out of range.
512        """
513
514        if isinstance(subscript, int):
515            plain_index = self.plain_index(subscript)
516            if plain_index is None:
517                raise IndexError("StyledText index out of range")
518
519            return self.value[plain_index]
520
521        return self.value[
522            slice(
523                self.plain_index(subscript.start),
524                self.plain_index(subscript.stop),
525                subscript.step,
526            )
527        ]

A styled text object.

The purpose of this class is to implement some things regular str breaks at when encountering ANSI sequences.

Instances of this class are usually spat out by MarkupLanguage.parse, but may be manually constructed if the need arises. Everything works even if there is no ANSI tomfoolery going on.

StyledText(value: str = '')
408    def __new__(cls, value: str = ""):
409        """Creates a StyledText, gets markup tags."""
410
411        obj = super().__new__(cls, value)
412        obj.value = value
413
414        return obj

Creates a StyledText, gets markup tags.

value: str

The underlying, ANSI-inclusive string value.

tokens: list[pytermgui.parser.Token]

Returns all markup tokens of this object.

Generated on-demand, at the first call to this or the self.plain property.

plain: str

Returns the value of this object, with no ANSI sequences.

Generated on-demand, at the first call to this or the self.tokens property.

def plain_index(self, index: int | None) -> int | None:
459    def plain_index(self, index: int | None) -> int | None:
460        """Finds given index inside plain text."""
461
462        if index is None:
463            return None
464
465        styled_chars = 0
466        plain_chars = 0
467        negative_index = False
468
469        tokens = self.tokens.copy()
470        if index < 0:
471            tokens.reverse()
472            index = abs(index)
473            negative_index = True
474
475        for token in tokens:
476            if token.data is None:
477                continue
478
479            if token.ttype is not TokenType.PLAIN:
480                assert token.sequence is not None
481                styled_chars += len(token.sequence)
482                continue
483
484            assert isinstance(token.data, str)
485            for _ in range(len(token.data)):
486                if plain_chars == index:
487                    if negative_index:
488                        return -1 * (plain_chars + styled_chars)
489
490                    return styled_chars + plain_chars
491
492                plain_chars += 1
493
494        return None

Finds given index inside plain text.

Inherited Members
builtins.str
encode
replace
split
rsplit
join
capitalize
casefold
title
center
count
expandtabs
find
partition
index
ljust
lower
lstrip
rfind
rindex
rjust
rstrip
rpartition
splitlines
strip
swapcase
translate
upper
startswith
endswith
removeprefix
removesuffix
isascii
islower
isupper
istitle
isspace
isdecimal
isdigit
isnumeric
isalpha
isalnum
isidentifier
isprintable
zfill
format
format_map
maketrans
MacroCallable = typing.Callable[..., str]
MacroCall = typing.Tuple[typing.Callable[..., str], typing.List[str]]
class MarkupLanguage:
 530class MarkupLanguage:
 531    """A class representing an instance of a Markup Language.
 532
 533    This class is used for all markup/ANSI parsing, tokenizing and usage.
 534
 535    ```python3
 536    from pytermgui import tim
 537
 538    tim.alias("my-tag", "@152 72 bold")
 539    tim.print("This is [my-tag]my-tag[/]!")
 540    ```
 541
 542    <p style="text-align: center">
 543        <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\
 544docs/parser/markup_language.png"
 545        style="width: 80%">
 546    </p>
 547    """
 548
 549    raise_unknown_markup: bool = False
 550    """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags."""
 551
 552    def __init__(self, default_macros: bool = True) -> None:
 553        """Initializes a MarkupLanguage.
 554
 555        Args:
 556            default_macros: If not set, the builtin macros are not defined.
 557        """
 558
 559        self.tags: dict[str, str] = STYLE_MAP.copy()
 560        self._cache: dict[str, StyledText] = {}
 561        self.macros: dict[str, MacroCallable] = {}
 562        self.user_tags: dict[str, str] = {}
 563        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
 564
 565        self.should_cache: bool = True
 566
 567        if default_macros:
 568            self.define("!link", macro_link)
 569            self.define("!align", macro_align)
 570            self.define("!markup", self.get_markup)
 571            self.define("!shuffle", macro_shuffle)
 572            self.define("!strip_bg", macro_strip_bg)
 573            self.define("!strip_fg", macro_strip_fg)
 574            self.define("!rainbow", macro_rainbow)
 575            self.define("!gradient", macro_gradient)
 576            self.define("!upper", lambda item: str(item.upper()))
 577            self.define("!lower", lambda item: str(item.lower()))
 578            self.define("!title", lambda item: str(item.title()))
 579            self.define("!capitalize", lambda item: str(item.capitalize()))
 580            self.define("!expand", lambda tag: macro_expand(self, tag))
 581            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
 582
 583        self.alias("code", "dim @black")
 584        self.alias("code.str", "142")
 585        self.alias("code.multiline_str", "code.str")
 586        self.alias("code.none", "167")
 587        self.alias("code.global", "214")
 588        self.alias("code.number", "175")
 589        self.alias("code.keyword", "203")
 590        self.alias("code.identifier", "109")
 591        self.alias("code.name", "code.global")
 592        self.alias("code.comment", "240 italic")
 593        self.alias("code.builtin", "code.global")
 594        self.alias("code.file", "code.identifier")
 595        self.alias("code.symbol", "code.identifier")
 596
 597    def _get_color_token(self, tag: str) -> Token | None:
 598        """Tries to get a color token from the given tag.
 599
 600        Args:
 601            tag: The tag to parse.
 602
 603        Returns:
 604            A color token if the given tag could be parsed into one, else None.
 605        """
 606
 607        try:
 608            color = str_to_color(tag, use_cache=self.should_cache)
 609
 610        except ColorSyntaxError:
 611            return None
 612
 613        return Token(name=color.value, ttype=TokenType.COLOR, data=color)
 614
 615    def _get_style_token(self, tag: str) -> Token | None:
 616        """Tries to get a style (including unsetter) token from tags, user tags and unsetters.
 617
 618        Args:
 619            tag: The tag to parse.
 620
 621        Returns:
 622            A `Token` if one could be created, None otherwise.
 623        """
 624
 625        if tag in self.unsetters:
 626            return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag])
 627
 628        if tag in self.user_tags:
 629            return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag])
 630
 631        if tag in self.tags:
 632            return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag])
 633
 634        return None
 635
 636    def print(self, *args, **kwargs) -> None:
 637        """Parse all arguments and pass them through to print, along with kwargs."""
 638
 639        parsed = []
 640        for arg in args:
 641            parsed.append(self.parse(str(arg)))
 642
 643        get_terminal().print(*parsed, **kwargs)
 644
 645    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
 646        """Converts the given markup string into an iterator of `Token`.
 647
 648        Args:
 649            markup_text: The text to look at.
 650
 651        Returns:
 652            An iterator of tokens. The reason this is an iterator is to possibly save
 653            on memory.
 654        """
 655
 656        end = 0
 657        start = 0
 658        cursor = 0
 659        for match in RE_MARKUP.finditer(markup_text):
 660            full, escapes, tag_text = match.groups()
 661            start, end = match.span()
 662
 663            # Add plain text between last and current match
 664            if start > cursor:
 665                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
 666
 667            if not escapes == "" and len(escapes) % 2 == 1:
 668                cursor = end
 669                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
 670                continue
 671
 672            for tag in tag_text.split():
 673                token = self._get_style_token(tag)
 674                if token is not None:
 675                    yield token
 676                    continue
 677
 678                # Try to find a color token
 679                token = self._get_color_token(tag)
 680                if token is not None:
 681                    yield token
 682                    continue
 683
 684                macro_match = RE_MACRO.match(tag)
 685                if macro_match is not None:
 686                    name, args = macro_match.groups()
 687                    macro_args = () if args is None else args.split(":")
 688
 689                    if not name in self.macros:
 690                        raise MarkupSyntaxError(
 691                            tag=tag,
 692                            cause="is not a defined macro",
 693                            context=markup_text,
 694                        )
 695
 696                    yield Token(
 697                        name=tag,
 698                        ttype=TokenType.MACRO,
 699                        data=(self.macros[name], macro_args),
 700                    )
 701                    continue
 702
 703                if self.raise_unknown_markup:
 704                    raise MarkupSyntaxError(
 705                        tag=tag, cause="not defined", context=markup_text
 706                    )
 707
 708            cursor = end
 709
 710        # Add remaining text as plain
 711        if len(markup_text) > cursor:
 712            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
 713
 714    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
 715        """Converts the given ANSI string into an iterator of `Token`.
 716
 717        Args:
 718            ansi: The text to look at.
 719
 720        Returns:
 721            An iterator of tokens. The reason this is an iterator is to possibly save
 722            on memory.
 723        """
 724
 725        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
 726            """Determines whether a code is in the given dict of tags."""
 727
 728            for name, current in tags.items():
 729                if current == code:
 730                    return name
 731
 732            return None
 733
 734        def _generate_color(
 735            parts: list[str], code: str
 736        ) -> tuple[str, TokenType, Color]:
 737            """Generates a color token."""
 738
 739            data: Color
 740            if len(parts) == 1:
 741                data = StandardColor.from_ansi(code)
 742                name = data.name
 743                ttype = TokenType.COLOR
 744
 745            else:
 746                data = str_to_color(code)
 747                name = data.name
 748                ttype = TokenType.COLOR
 749
 750            return name, ttype, data
 751
 752        end = 0
 753        start = 0
 754        cursor = 0
 755
 756        # StyledText messes with indexing, so we need to cast it
 757        # back to str.
 758        if isinstance(ansi, StyledText):
 759            ansi = str(ansi)
 760
 761        for match in RE_ANSI.finditer(ansi):
 762            code = match.groups()[0]
 763            start, end = match.span()
 764
 765            if code is None:
 766                continue
 767
 768            parts = code.split(";")
 769
 770            if start > cursor:
 771                plain = ansi[cursor:start]
 772
 773                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
 774
 775            name: str | None = code
 776            ttype = None
 777            data: str | Color = parts[0]
 778
 779            # Styles & Unsetters
 780            if len(parts) == 1:
 781                # Covariancy is not an issue here, even though mypy seems to think so.
 782                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
 783                if name is not None:
 784                    ttype = TokenType.UNSETTER
 785
 786                else:
 787                    name = _is_in_tags(parts[0], self.tags)
 788                    if name is not None:
 789                        ttype = TokenType.STYLE
 790
 791            # Colors
 792            if ttype is None:
 793                with suppress(ColorSyntaxError):
 794                    name, ttype, data = _generate_color(parts, code)
 795
 796            if name is None or ttype is None or data is None:
 797                if len(parts) != 2:
 798                    raise AnsiSyntaxError(
 799                        tag=parts[0], cause="not recognized", context=ansi
 800                    )
 801
 802                name = "position"
 803                ttype = TokenType.POSITION
 804                data = ",".join(reversed(parts))
 805
 806            yield Token(name=name, ttype=ttype, data=data)
 807            cursor = end
 808
 809        if cursor < len(ansi):
 810            plain = ansi[cursor:]
 811
 812            yield Token(ttype=TokenType.PLAIN, data=plain)
 813
 814    def define(self, name: str, method: MacroCallable) -> None:
 815        """Defines a Macro tag that executes the given method.
 816
 817        Args:
 818            name: The name the given method will be reachable by within markup.
 819                The given value gets "!" prepended if it isn't present already.
 820            method: The method this macro will execute.
 821        """
 822
 823        if not name.startswith("!"):
 824            name = f"!{name}"
 825
 826        self.macros[name] = method
 827        self.unsetters[f"/{name}"] = None
 828
 829    def alias(self, name: str, value: str) -> None:
 830        """Aliases the given name to a value, and generates an unsetter for it.
 831
 832        Note that it is not possible to alias macros.
 833
 834        Args:
 835            name: The name of the new tag.
 836            value: The value the new tag will stand for.
 837        """
 838
 839        def _get_unsetter(token: Token) -> str | None:
 840            """Get unsetter for a token"""
 841
 842            if token.ttype is TokenType.PLAIN:
 843                return None
 844
 845            if token.ttype is TokenType.UNSETTER:
 846                return self.unsetters[token.name]
 847
 848            if token.ttype is TokenType.COLOR:
 849                assert isinstance(token.data, Color)
 850
 851                if token.data.background:
 852                    return self.unsetters["/bg"]
 853
 854                return self.unsetters["/fg"]
 855
 856            name = f"/{token.name}"
 857            if not name in self.unsetters:
 858                raise KeyError(f"Could not find unsetter for token {token}.")
 859
 860            return self.unsetters[name]
 861
 862        if name.startswith("!"):
 863            raise ValueError('Only macro tags can always start with "!".')
 864
 865        setter = ""
 866        unsetter = ""
 867
 868        # Try to link to existing tag
 869        if value in self.user_tags:
 870            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
 871            self.user_tags[name] = self.user_tags[value]
 872            return
 873
 874        for token in self.tokenize_markup(f"[{value}]"):
 875            if token.ttype is TokenType.PLAIN:
 876                continue
 877
 878            assert token.sequence is not None
 879            setter += token.sequence
 880
 881            t_unsetter = _get_unsetter(token)
 882            unsetter += f"\x1b[{t_unsetter}m"
 883
 884        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
 885        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
 886
 887        marked: list[str] = []
 888        for item in self._cache:
 889            if name in item:
 890                marked.append(item)
 891
 892        for item in marked:
 893            del self._cache[item]
 894
 895    # TODO: I cannot cut down the one-too-many branch that this has at the moment.
 896    #       We could look into it in the future, however.
 897    def parse(  # pylint: disable=too-many-branches
 898        self, markup_text: str
 899    ) -> StyledText:
 900        """Parses the given markup.
 901
 902        Args:
 903            markup_text: The markup to parse.
 904
 905        Returns:
 906            A `StyledText` instance of the result of parsing the input. This
 907            custom `str` class is used to allow accessing the plain value of
 908            the output, as well as to cleanly index within it. It is analogous
 909            to builtin `str`, only adds extra things on top.
 910        """
 911
 912        applied_macros: list[tuple[str, MacroCall]] = []
 913        previous_token: Token | None = None
 914        previous_sequence = ""
 915        sequence = ""
 916        out = ""
 917
 918        def _apply_macros(text: str) -> str:
 919            """Apply current macros to text"""
 920
 921            for _, (method, args) in applied_macros:
 922                text = method(*args, text)
 923
 924            return text
 925
 926        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
 927            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
 928                return False
 929
 930            return (
 931                type(previous) is type(new)
 932                and previous.data.background == new.data.background
 933            )
 934
 935        if (
 936            self.should_cache
 937            and markup_text in self._cache
 938            and len(RE_MACRO.findall(markup_text)) == 0
 939        ):
 940            return self._cache[markup_text]
 941
 942        token: Token
 943        for token in self.tokenize_markup(markup_text):
 944            if sequence != "" and previous_token == token:
 945                continue
 946
 947            # Optimize out previously added color tokens, as only the most
 948            # recent would be visible anyways.
 949            if (
 950                token.sequence is not None
 951                and previous_token is not None
 952                and _is_same_colorgroup(previous_token, token)
 953            ):
 954                sequence = token.sequence
 955                continue
 956
 957            if token.ttype == TokenType.UNSETTER and token.data == "0":
 958                out += "\033[0m"
 959                sequence = ""
 960                applied_macros = []
 961                continue
 962
 963            previous_token = token
 964
 965            # Macro unsetters are stored with None as their data
 966            if token.data is None and token.ttype is TokenType.UNSETTER:
 967                for item, data in applied_macros.copy():
 968                    macro_match = RE_MACRO.match(item)
 969                    assert macro_match is not None
 970
 971                    macro_name = macro_match.groups()[0]
 972
 973                    if f"/{macro_name}" == token.name:
 974                        applied_macros.remove((item, data))
 975
 976                continue
 977
 978            if token.ttype is TokenType.MACRO:
 979                assert isinstance(token.data, tuple)
 980
 981                applied_macros.append((token.name, token.data))
 982                continue
 983
 984            if token.sequence is None:
 985                applied = sequence
 986
 987                if not out.endswith("\x1b[0m"):
 988                    for item in previous_sequence.split("\x1b"):
 989                        if item == "" or item[1:-1] in self.unsetters.values():
 990                            continue
 991
 992                        item = f"\x1b{item}"
 993                        applied = applied.replace(item, "")
 994
 995                out += applied + _apply_macros(token.name)
 996                previous_sequence = sequence
 997                sequence = ""
 998                continue
 999
1000            sequence += token.sequence
1001
1002        if sequence + previous_sequence != "":
1003            out += "\x1b[0m"
1004
1005        out = StyledText(out)
1006        self._cache[markup_text] = out
1007        return out
1008
1009    def get_markup(self, ansi: str) -> str:
1010        """Generates markup from ANSI text.
1011
1012        Args:
1013            ansi: The text to get markup from.
1014
1015        Returns:
1016            A markup string that can be parsed to get (visually) the same
1017            result. Note that this conversion is lossy in a way: there are some
1018            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
1019            conversion.
1020        """
1021
1022        current_tags: list[str] = []
1023        out = ""
1024        for token in self.tokenize_ansi(ansi):
1025            if token.ttype is TokenType.PLAIN:
1026                if len(current_tags) != 0:
1027                    out += "[" + " ".join(current_tags) + "]"
1028
1029                assert isinstance(token.data, str)
1030                out += token.data
1031                current_tags = []
1032                continue
1033
1034            if token.ttype is TokenType.ESCAPED:
1035                assert isinstance(token.data, str)
1036
1037                current_tags.append(token.data)
1038                continue
1039
1040            current_tags.append(token.name)
1041
1042        return out
1043
1044    def prettify_ansi(self, text: str) -> str:
1045        """Returns a prettified (syntax-highlighted) ANSI str.
1046
1047        This is useful to quickly "inspect" a given ANSI string. However,
1048        for most real uses `MarkupLanguage.prettify_markup` would be
1049        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1050        as it is much more verbose.
1051
1052        Args:
1053            text: The ANSI-text to prettify.
1054
1055        Returns:
1056            The prettified ANSI text. This text's styles remain valid,
1057            so copy-pasting the argument into a command (like printf)
1058            that can show styled text will work the same way.
1059        """
1060
1061        out = ""
1062        sequences = ""
1063        for token in self.tokenize_ansi(text):
1064            if token.ttype is TokenType.PLAIN:
1065                assert isinstance(token.data, str)
1066                out += token.data
1067                continue
1068
1069            assert token.sequence is not None
1070            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1071            sequences += token.sequence
1072            out += sequences
1073
1074        return out
1075
1076    def prettify_markup(self, text: str) -> str:
1077        """Returns a prettified (syntax-highlighted) markup str.
1078
1079        Args:
1080            text: The markup-text to prettify.
1081
1082        Returns:
1083            Prettified markup. This markup, excluding its styles,
1084            remains valid markup.
1085        """
1086
1087        def _apply_macros(text: str) -> str:
1088            """Apply current macros to text"""
1089
1090            for _, (method, args) in applied_macros:
1091                text = method(*args, text)
1092
1093            return text
1094
1095        def _pop_macro(name: str) -> None:
1096            """Pops a macro from applied_macros."""
1097
1098            for i, (macro_name, _) in enumerate(applied_macros):
1099                if macro_name == name:
1100                    applied_macros.pop(i)
1101                    break
1102
1103        def _finish(out: str, in_sequence: bool) -> str:
1104            """Adds ending cap to the given string."""
1105
1106            if in_sequence:
1107                if not out.endswith("\x1b[0m"):
1108                    out += "\x1b[0m"
1109
1110                return out + "]"
1111
1112            return out + "[/]"
1113
1114        styles: dict[TokenType, str] = {
1115            TokenType.MACRO: "210",
1116            TokenType.ESCAPED: "210 bold",
1117            TokenType.UNSETTER: "strikethrough",
1118        }
1119
1120        applied_macros: list[tuple[str, MacroCall]] = []
1121
1122        out = ""
1123        in_sequence = False
1124        current_styles: list[Token] = []
1125
1126        for token in self.tokenize_markup(text):
1127            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1128                if in_sequence:
1129                    out += "]"
1130
1131                in_sequence = False
1132
1133                sequence = ""
1134                for style in current_styles:
1135                    if style.sequence is None:
1136                        continue
1137
1138                    sequence += style.sequence
1139
1140                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1141                continue
1142
1143            out += " " if in_sequence else "["
1144            in_sequence = True
1145
1146            if token.ttype is TokenType.UNSETTER:
1147                if token.name == "/":
1148                    applied_macros = []
1149
1150                name = token.name[1:]
1151
1152                if name in self.macros:
1153                    _pop_macro(name)
1154
1155                current_styles.append(token)
1156
1157                out += self.parse(
1158                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1159                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1160                )
1161                continue
1162
1163            if token.ttype is TokenType.MACRO:
1164                assert isinstance(token.data, tuple)
1165
1166                name = token.name
1167                if "(" in name:
1168                    name = name[: token.name.index("(")]
1169
1170                applied_macros.append((name, token.data))
1171
1172                try:
1173                    out += token.data[0](*token.data[1], token.name)
1174                    continue
1175
1176                except TypeError:  # Not enough arguments
1177                    pass
1178
1179            if token.sequence is not None:
1180                current_styles.append(token)
1181
1182            style_markup = styles.get(token.ttype) or token.name
1183            out += self.parse(f"[{style_markup}]{token.name}")
1184
1185        return _finish(out, in_sequence)
1186
1187    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1188        """Gets all plain tokens within text, with their respective styles applied.
1189
1190        Args:
1191            text: The ANSI-sequence containing string to find plains from.
1192
1193        Returns:
1194            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1195            containing the styles that are relevant and active on the given plain.
1196        """
1197
1198        def _apply_styles(styles: list[Token], text: str) -> str:
1199            """Applies given styles to text."""
1200
1201            for token in styles:
1202                if token.ttype is TokenType.MACRO:
1203                    assert isinstance(token.data, tuple)
1204                    text = token.data[0](*token.data[1], text)
1205                    continue
1206
1207                if token.sequence is None:
1208                    continue
1209
1210                text = token.sequence + text
1211
1212            return text
1213
1214        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1215            """Removes an unsetter from the list, returns the new list."""
1216
1217            if token.name == "/":
1218                return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles))
1219
1220            target_name = token.name[1:]
1221            for style in styles:
1222                # bold & dim unsetters represent the same character, so we have
1223                # to treat them the same way.
1224                style_name = style.name
1225
1226                if style.name == "dim":
1227                    style_name = "bold"
1228
1229                if style_name == target_name:
1230                    styles.remove(style)
1231
1232                elif (
1233                    style_name.startswith(target_name)
1234                    and style.ttype is TokenType.MACRO
1235                ):
1236                    styles.remove(style)
1237
1238                elif style.ttype is TokenType.COLOR:
1239                    assert isinstance(style.data, Color)
1240                    if target_name == "fg" and not style.data.background:
1241                        styles.remove(style)
1242
1243                    elif target_name == "bg" and style.data.background:
1244                        styles.remove(style)
1245
1246            return styles
1247
1248        def _pop_position(styles: list[Token]) -> list[Token]:
1249            for token in styles.copy():
1250                if token.ttype is TokenType.POSITION:
1251                    styles.remove(token)
1252
1253            return styles
1254
1255        styles: list[Token] = []
1256        for token in self.tokenize_ansi(text):
1257            if token.ttype is TokenType.COLOR:
1258                for i, style in enumerate(reversed(styles)):
1259                    if style.ttype is TokenType.COLOR:
1260                        assert isinstance(style.data, Color)
1261                        assert isinstance(token.data, Color)
1262
1263                        if style.data.background != token.data.background:
1264                            continue
1265
1266                        styles[len(styles) - i - 1] = token
1267                        break
1268                else:
1269                    styles.append(token)
1270
1271                continue
1272
1273            if token.ttype is TokenType.LINK:
1274                styles.append(token)
1275                yield StyledText(_apply_styles(styles, token.name))
1276
1277            if token.ttype is TokenType.PLAIN:
1278                assert isinstance(token.data, str)
1279                yield StyledText(_apply_styles(styles, token.data))
1280                styles = _pop_position(styles)
1281                continue
1282
1283            if token.ttype is TokenType.UNSETTER:
1284                styles = _pop_unsetter(token, styles)
1285                continue
1286
1287            styles.append(token)

A class representing an instance of a Markup Language.

This class is used for all markup/ANSI parsing, tokenizing and usage.

from pytermgui import tim

tim.alias("my-tag", "@152 72 bold")
tim.print("This is [my-tag]my-tag[/]!")

MarkupLanguage(default_macros: bool = True)
552    def __init__(self, default_macros: bool = True) -> None:
553        """Initializes a MarkupLanguage.
554
555        Args:
556            default_macros: If not set, the builtin macros are not defined.
557        """
558
559        self.tags: dict[str, str] = STYLE_MAP.copy()
560        self._cache: dict[str, StyledText] = {}
561        self.macros: dict[str, MacroCallable] = {}
562        self.user_tags: dict[str, str] = {}
563        self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy()
564
565        self.should_cache: bool = True
566
567        if default_macros:
568            self.define("!link", macro_link)
569            self.define("!align", macro_align)
570            self.define("!markup", self.get_markup)
571            self.define("!shuffle", macro_shuffle)
572            self.define("!strip_bg", macro_strip_bg)
573            self.define("!strip_fg", macro_strip_fg)
574            self.define("!rainbow", macro_rainbow)
575            self.define("!gradient", macro_gradient)
576            self.define("!upper", lambda item: str(item.upper()))
577            self.define("!lower", lambda item: str(item.lower()))
578            self.define("!title", lambda item: str(item.title()))
579            self.define("!capitalize", lambda item: str(item.capitalize()))
580            self.define("!expand", lambda tag: macro_expand(self, tag))
581            self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args))
582
583        self.alias("code", "dim @black")
584        self.alias("code.str", "142")
585        self.alias("code.multiline_str", "code.str")
586        self.alias("code.none", "167")
587        self.alias("code.global", "214")
588        self.alias("code.number", "175")
589        self.alias("code.keyword", "203")
590        self.alias("code.identifier", "109")
591        self.alias("code.name", "code.global")
592        self.alias("code.comment", "240 italic")
593        self.alias("code.builtin", "code.global")
594        self.alias("code.file", "code.identifier")
595        self.alias("code.symbol", "code.identifier")

Initializes a MarkupLanguage.

Args
  • default_macros: If not set, the builtin macros are not defined.
raise_unknown_markup: bool = False

Raise pytermgui.exceptions.MarkupSyntaxError when encountering unknown markup tags.

def print(self, *args, **kwargs) -> None:
636    def print(self, *args, **kwargs) -> None:
637        """Parse all arguments and pass them through to print, along with kwargs."""
638
639        parsed = []
640        for arg in args:
641            parsed.append(self.parse(str(arg)))
642
643        get_terminal().print(*parsed, **kwargs)

Parse all arguments and pass them through to print, along with kwargs.

def tokenize_markup(self, markup_text: str) -> Iterator[pytermgui.parser.Token]:
645    def tokenize_markup(self, markup_text: str) -> Iterator[Token]:
646        """Converts the given markup string into an iterator of `Token`.
647
648        Args:
649            markup_text: The text to look at.
650
651        Returns:
652            An iterator of tokens. The reason this is an iterator is to possibly save
653            on memory.
654        """
655
656        end = 0
657        start = 0
658        cursor = 0
659        for match in RE_MARKUP.finditer(markup_text):
660            full, escapes, tag_text = match.groups()
661            start, end = match.span()
662
663            # Add plain text between last and current match
664            if start > cursor:
665                yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start])
666
667            if not escapes == "" and len(escapes) % 2 == 1:
668                cursor = end
669                yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :])
670                continue
671
672            for tag in tag_text.split():
673                token = self._get_style_token(tag)
674                if token is not None:
675                    yield token
676                    continue
677
678                # Try to find a color token
679                token = self._get_color_token(tag)
680                if token is not None:
681                    yield token
682                    continue
683
684                macro_match = RE_MACRO.match(tag)
685                if macro_match is not None:
686                    name, args = macro_match.groups()
687                    macro_args = () if args is None else args.split(":")
688
689                    if not name in self.macros:
690                        raise MarkupSyntaxError(
691                            tag=tag,
692                            cause="is not a defined macro",
693                            context=markup_text,
694                        )
695
696                    yield Token(
697                        name=tag,
698                        ttype=TokenType.MACRO,
699                        data=(self.macros[name], macro_args),
700                    )
701                    continue
702
703                if self.raise_unknown_markup:
704                    raise MarkupSyntaxError(
705                        tag=tag, cause="not defined", context=markup_text
706                    )
707
708            cursor = end
709
710        # Add remaining text as plain
711        if len(markup_text) > cursor:
712            yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])

Converts the given markup string into an iterator of Token.

Args
  • markup_text: The text to look at.
Returns

An iterator of tokens. The reason this is an iterator is to possibly save on memory.

def tokenize_ansi(self, ansi: str) -> Iterator[pytermgui.parser.Token]:
714    def tokenize_ansi(self, ansi: str) -> Iterator[Token]:
715        """Converts the given ANSI string into an iterator of `Token`.
716
717        Args:
718            ansi: The text to look at.
719
720        Returns:
721            An iterator of tokens. The reason this is an iterator is to possibly save
722            on memory.
723        """
724
725        def _is_in_tags(code: str, tags: dict[str, str]) -> str | None:
726            """Determines whether a code is in the given dict of tags."""
727
728            for name, current in tags.items():
729                if current == code:
730                    return name
731
732            return None
733
734        def _generate_color(
735            parts: list[str], code: str
736        ) -> tuple[str, TokenType, Color]:
737            """Generates a color token."""
738
739            data: Color
740            if len(parts) == 1:
741                data = StandardColor.from_ansi(code)
742                name = data.name
743                ttype = TokenType.COLOR
744
745            else:
746                data = str_to_color(code)
747                name = data.name
748                ttype = TokenType.COLOR
749
750            return name, ttype, data
751
752        end = 0
753        start = 0
754        cursor = 0
755
756        # StyledText messes with indexing, so we need to cast it
757        # back to str.
758        if isinstance(ansi, StyledText):
759            ansi = str(ansi)
760
761        for match in RE_ANSI.finditer(ansi):
762            code = match.groups()[0]
763            start, end = match.span()
764
765            if code is None:
766                continue
767
768            parts = code.split(";")
769
770            if start > cursor:
771                plain = ansi[cursor:start]
772
773                yield Token(name=plain, ttype=TokenType.PLAIN, data=plain)
774
775            name: str | None = code
776            ttype = None
777            data: str | Color = parts[0]
778
779            # Styles & Unsetters
780            if len(parts) == 1:
781                # Covariancy is not an issue here, even though mypy seems to think so.
782                name = _is_in_tags(parts[0], self.unsetters)  # type: ignore
783                if name is not None:
784                    ttype = TokenType.UNSETTER
785
786                else:
787                    name = _is_in_tags(parts[0], self.tags)
788                    if name is not None:
789                        ttype = TokenType.STYLE
790
791            # Colors
792            if ttype is None:
793                with suppress(ColorSyntaxError):
794                    name, ttype, data = _generate_color(parts, code)
795
796            if name is None or ttype is None or data is None:
797                if len(parts) != 2:
798                    raise AnsiSyntaxError(
799                        tag=parts[0], cause="not recognized", context=ansi
800                    )
801
802                name = "position"
803                ttype = TokenType.POSITION
804                data = ",".join(reversed(parts))
805
806            yield Token(name=name, ttype=ttype, data=data)
807            cursor = end
808
809        if cursor < len(ansi):
810            plain = ansi[cursor:]
811
812            yield Token(ttype=TokenType.PLAIN, data=plain)

Converts the given ANSI string into an iterator of Token.

Args
  • ansi: The text to look at.
Returns

An iterator of tokens. The reason this is an iterator is to possibly save on memory.

def define(self, name: str, method: Callable[..., str]) -> None:
814    def define(self, name: str, method: MacroCallable) -> None:
815        """Defines a Macro tag that executes the given method.
816
817        Args:
818            name: The name the given method will be reachable by within markup.
819                The given value gets "!" prepended if it isn't present already.
820            method: The method this macro will execute.
821        """
822
823        if not name.startswith("!"):
824            name = f"!{name}"
825
826        self.macros[name] = method
827        self.unsetters[f"/{name}"] = None

Defines a Macro tag that executes the given method.

Args
  • name: The name the given method will be reachable by within markup. The given value gets "!" prepended if it isn't present already.
  • method: The method this macro will execute.
def alias(self, name: str, value: str) -> None:
829    def alias(self, name: str, value: str) -> None:
830        """Aliases the given name to a value, and generates an unsetter for it.
831
832        Note that it is not possible to alias macros.
833
834        Args:
835            name: The name of the new tag.
836            value: The value the new tag will stand for.
837        """
838
839        def _get_unsetter(token: Token) -> str | None:
840            """Get unsetter for a token"""
841
842            if token.ttype is TokenType.PLAIN:
843                return None
844
845            if token.ttype is TokenType.UNSETTER:
846                return self.unsetters[token.name]
847
848            if token.ttype is TokenType.COLOR:
849                assert isinstance(token.data, Color)
850
851                if token.data.background:
852                    return self.unsetters["/bg"]
853
854                return self.unsetters["/fg"]
855
856            name = f"/{token.name}"
857            if not name in self.unsetters:
858                raise KeyError(f"Could not find unsetter for token {token}.")
859
860            return self.unsetters[name]
861
862        if name.startswith("!"):
863            raise ValueError('Only macro tags can always start with "!".')
864
865        setter = ""
866        unsetter = ""
867
868        # Try to link to existing tag
869        if value in self.user_tags:
870            self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"]
871            self.user_tags[name] = self.user_tags[value]
872            return
873
874        for token in self.tokenize_markup(f"[{value}]"):
875            if token.ttype is TokenType.PLAIN:
876                continue
877
878            assert token.sequence is not None
879            setter += token.sequence
880
881            t_unsetter = _get_unsetter(token)
882            unsetter += f"\x1b[{t_unsetter}m"
883
884        self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m")
885        self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m")
886
887        marked: list[str] = []
888        for item in self._cache:
889            if name in item:
890                marked.append(item)
891
892        for item in marked:
893            del self._cache[item]

Aliases the given name to a value, and generates an unsetter for it.

Note that it is not possible to alias macros.

Args
  • name: The name of the new tag.
  • value: The value the new tag will stand for.
def parse(self, markup_text: str) -> pytermgui.parser.StyledText:
 897    def parse(  # pylint: disable=too-many-branches
 898        self, markup_text: str
 899    ) -> StyledText:
 900        """Parses the given markup.
 901
 902        Args:
 903            markup_text: The markup to parse.
 904
 905        Returns:
 906            A `StyledText` instance of the result of parsing the input. This
 907            custom `str` class is used to allow accessing the plain value of
 908            the output, as well as to cleanly index within it. It is analogous
 909            to builtin `str`, only adds extra things on top.
 910        """
 911
 912        applied_macros: list[tuple[str, MacroCall]] = []
 913        previous_token: Token | None = None
 914        previous_sequence = ""
 915        sequence = ""
 916        out = ""
 917
 918        def _apply_macros(text: str) -> str:
 919            """Apply current macros to text"""
 920
 921            for _, (method, args) in applied_macros:
 922                text = method(*args, text)
 923
 924            return text
 925
 926        def _is_same_colorgroup(previous: Token, new: Token) -> bool:
 927            if not isinstance(new.data, Color) or not isinstance(previous.data, Color):
 928                return False
 929
 930            return (
 931                type(previous) is type(new)
 932                and previous.data.background == new.data.background
 933            )
 934
 935        if (
 936            self.should_cache
 937            and markup_text in self._cache
 938            and len(RE_MACRO.findall(markup_text)) == 0
 939        ):
 940            return self._cache[markup_text]
 941
 942        token: Token
 943        for token in self.tokenize_markup(markup_text):
 944            if sequence != "" and previous_token == token:
 945                continue
 946
 947            # Optimize out previously added color tokens, as only the most
 948            # recent would be visible anyways.
 949            if (
 950                token.sequence is not None
 951                and previous_token is not None
 952                and _is_same_colorgroup(previous_token, token)
 953            ):
 954                sequence = token.sequence
 955                continue
 956
 957            if token.ttype == TokenType.UNSETTER and token.data == "0":
 958                out += "\033[0m"
 959                sequence = ""
 960                applied_macros = []
 961                continue
 962
 963            previous_token = token
 964
 965            # Macro unsetters are stored with None as their data
 966            if token.data is None and token.ttype is TokenType.UNSETTER:
 967                for item, data in applied_macros.copy():
 968                    macro_match = RE_MACRO.match(item)
 969                    assert macro_match is not None
 970
 971                    macro_name = macro_match.groups()[0]
 972
 973                    if f"/{macro_name}" == token.name:
 974                        applied_macros.remove((item, data))
 975
 976                continue
 977
 978            if token.ttype is TokenType.MACRO:
 979                assert isinstance(token.data, tuple)
 980
 981                applied_macros.append((token.name, token.data))
 982                continue
 983
 984            if token.sequence is None:
 985                applied = sequence
 986
 987                if not out.endswith("\x1b[0m"):
 988                    for item in previous_sequence.split("\x1b"):
 989                        if item == "" or item[1:-1] in self.unsetters.values():
 990                            continue
 991
 992                        item = f"\x1b{item}"
 993                        applied = applied.replace(item, "")
 994
 995                out += applied + _apply_macros(token.name)
 996                previous_sequence = sequence
 997                sequence = ""
 998                continue
 999
1000            sequence += token.sequence
1001
1002        if sequence + previous_sequence != "":
1003            out += "\x1b[0m"
1004
1005        out = StyledText(out)
1006        self._cache[markup_text] = out
1007        return out

Parses the given markup.

Args
  • markup_text: The markup to parse.
Returns

A StyledText instance of the result of parsing the input. This custom str class is used to allow accessing the plain value of the output, as well as to cleanly index within it. It is analogous to builtin str, only adds extra things on top.

def get_markup(self, ansi: str) -> str:
1009    def get_markup(self, ansi: str) -> str:
1010        """Generates markup from ANSI text.
1011
1012        Args:
1013            ansi: The text to get markup from.
1014
1015        Returns:
1016            A markup string that can be parsed to get (visually) the same
1017            result. Note that this conversion is lossy in a way: there are some
1018            details (like macros) that cannot be preserved in an ANSI->Markup->ANSI
1019            conversion.
1020        """
1021
1022        current_tags: list[str] = []
1023        out = ""
1024        for token in self.tokenize_ansi(ansi):
1025            if token.ttype is TokenType.PLAIN:
1026                if len(current_tags) != 0:
1027                    out += "[" + " ".join(current_tags) + "]"
1028
1029                assert isinstance(token.data, str)
1030                out += token.data
1031                current_tags = []
1032                continue
1033
1034            if token.ttype is TokenType.ESCAPED:
1035                assert isinstance(token.data, str)
1036
1037                current_tags.append(token.data)
1038                continue
1039
1040            current_tags.append(token.name)
1041
1042        return out

Generates markup from ANSI text.

Args
  • ansi: The text to get markup from.
Returns

A markup string that can be parsed to get (visually) the same result. Note that this conversion is lossy in a way: there are some details (like macros) that cannot be preserved in an ANSI->Markup->ANSI conversion.

def prettify_ansi(self, text: str) -> str:
1044    def prettify_ansi(self, text: str) -> str:
1045        """Returns a prettified (syntax-highlighted) ANSI str.
1046
1047        This is useful to quickly "inspect" a given ANSI string. However,
1048        for most real uses `MarkupLanguage.prettify_markup` would be
1049        preferable, given an argument of `MarkupLanguage.get_markup(text)`,
1050        as it is much more verbose.
1051
1052        Args:
1053            text: The ANSI-text to prettify.
1054
1055        Returns:
1056            The prettified ANSI text. This text's styles remain valid,
1057            so copy-pasting the argument into a command (like printf)
1058            that can show styled text will work the same way.
1059        """
1060
1061        out = ""
1062        sequences = ""
1063        for token in self.tokenize_ansi(text):
1064            if token.ttype is TokenType.PLAIN:
1065                assert isinstance(token.data, str)
1066                out += token.data
1067                continue
1068
1069            assert token.sequence is not None
1070            out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b")
1071            sequences += token.sequence
1072            out += sequences
1073
1074        return out

Returns a prettified (syntax-highlighted) ANSI str.

This is useful to quickly "inspect" a given ANSI string. However, for most real uses MarkupLanguage.prettify_markup would be preferable, given an argument of MarkupLanguage.get_markup(text), as it is much more verbose.

Args
  • text: The ANSI-text to prettify.
Returns

The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way.

def prettify_markup(self, text: str) -> str:
1076    def prettify_markup(self, text: str) -> str:
1077        """Returns a prettified (syntax-highlighted) markup str.
1078
1079        Args:
1080            text: The markup-text to prettify.
1081
1082        Returns:
1083            Prettified markup. This markup, excluding its styles,
1084            remains valid markup.
1085        """
1086
1087        def _apply_macros(text: str) -> str:
1088            """Apply current macros to text"""
1089
1090            for _, (method, args) in applied_macros:
1091                text = method(*args, text)
1092
1093            return text
1094
1095        def _pop_macro(name: str) -> None:
1096            """Pops a macro from applied_macros."""
1097
1098            for i, (macro_name, _) in enumerate(applied_macros):
1099                if macro_name == name:
1100                    applied_macros.pop(i)
1101                    break
1102
1103        def _finish(out: str, in_sequence: bool) -> str:
1104            """Adds ending cap to the given string."""
1105
1106            if in_sequence:
1107                if not out.endswith("\x1b[0m"):
1108                    out += "\x1b[0m"
1109
1110                return out + "]"
1111
1112            return out + "[/]"
1113
1114        styles: dict[TokenType, str] = {
1115            TokenType.MACRO: "210",
1116            TokenType.ESCAPED: "210 bold",
1117            TokenType.UNSETTER: "strikethrough",
1118        }
1119
1120        applied_macros: list[tuple[str, MacroCall]] = []
1121
1122        out = ""
1123        in_sequence = False
1124        current_styles: list[Token] = []
1125
1126        for token in self.tokenize_markup(text):
1127            if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]:
1128                if in_sequence:
1129                    out += "]"
1130
1131                in_sequence = False
1132
1133                sequence = ""
1134                for style in current_styles:
1135                    if style.sequence is None:
1136                        continue
1137
1138                    sequence += style.sequence
1139
1140                out += f"{sequence}{_apply_macros(token.name)}\033[0m"
1141                continue
1142
1143            out += " " if in_sequence else "["
1144            in_sequence = True
1145
1146            if token.ttype is TokenType.UNSETTER:
1147                if token.name == "/":
1148                    applied_macros = []
1149
1150                name = token.name[1:]
1151
1152                if name in self.macros:
1153                    _pop_macro(name)
1154
1155                current_styles.append(token)
1156
1157                out += self.parse(
1158                    ("" if (name in self.tags) or (name in self.user_tags) else "")
1159                    + f"[{styles[TokenType.UNSETTER]}]/{name}"
1160                )
1161                continue
1162
1163            if token.ttype is TokenType.MACRO:
1164                assert isinstance(token.data, tuple)
1165
1166                name = token.name
1167                if "(" in name:
1168                    name = name[: token.name.index("(")]
1169
1170                applied_macros.append((name, token.data))
1171
1172                try:
1173                    out += token.data[0](*token.data[1], token.name)
1174                    continue
1175
1176                except TypeError:  # Not enough arguments
1177                    pass
1178
1179            if token.sequence is not None:
1180                current_styles.append(token)
1181
1182            style_markup = styles.get(token.ttype) or token.name
1183            out += self.parse(f"[{style_markup}]{token.name}")
1184
1185        return _finish(out, in_sequence)

Returns a prettified (syntax-highlighted) markup str.

Args
  • text: The markup-text to prettify.
Returns

Prettified markup. This markup, excluding its styles, remains valid markup.

def get_styled_plains(self, text: str) -> Iterator[pytermgui.parser.StyledText]:
1187    def get_styled_plains(self, text: str) -> Iterator[StyledText]:
1188        """Gets all plain tokens within text, with their respective styles applied.
1189
1190        Args:
1191            text: The ANSI-sequence containing string to find plains from.
1192
1193        Returns:
1194            An iterator of `StyledText` objects, each yielded when a new plain token is found,
1195            containing the styles that are relevant and active on the given plain.
1196        """
1197
1198        def _apply_styles(styles: list[Token], text: str) -> str:
1199            """Applies given styles to text."""
1200
1201            for token in styles:
1202                if token.ttype is TokenType.MACRO:
1203                    assert isinstance(token.data, tuple)
1204                    text = token.data[0](*token.data[1], text)
1205                    continue
1206
1207                if token.sequence is None:
1208                    continue
1209
1210                text = token.sequence + text
1211
1212            return text
1213
1214        def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]:
1215            """Removes an unsetter from the list, returns the new list."""
1216
1217            if token.name == "/":
1218                return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles))
1219
1220            target_name = token.name[1:]
1221            for style in styles:
1222                # bold & dim unsetters represent the same character, so we have
1223                # to treat them the same way.
1224                style_name = style.name
1225
1226                if style.name == "dim":
1227                    style_name = "bold"
1228
1229                if style_name == target_name:
1230                    styles.remove(style)
1231
1232                elif (
1233                    style_name.startswith(target_name)
1234                    and style.ttype is TokenType.MACRO
1235                ):
1236                    styles.remove(style)
1237
1238                elif style.ttype is TokenType.COLOR:
1239                    assert isinstance(style.data, Color)
1240                    if target_name == "fg" and not style.data.background:
1241                        styles.remove(style)
1242
1243                    elif target_name == "bg" and style.data.background:
1244                        styles.remove(style)
1245
1246            return styles
1247
1248        def _pop_position(styles: list[Token]) -> list[Token]:
1249            for token in styles.copy():
1250                if token.ttype is TokenType.POSITION:
1251                    styles.remove(token)
1252
1253            return styles
1254
1255        styles: list[Token] = []
1256        for token in self.tokenize_ansi(text):
1257            if token.ttype is TokenType.COLOR:
1258                for i, style in enumerate(reversed(styles)):
1259                    if style.ttype is TokenType.COLOR:
1260                        assert isinstance(style.data, Color)
1261                        assert isinstance(token.data, Color)
1262
1263                        if style.data.background != token.data.background:
1264                            continue
1265
1266                        styles[len(styles) - i - 1] = token
1267                        break
1268                else:
1269                    styles.append(token)
1270
1271                continue
1272
1273            if token.ttype is TokenType.LINK:
1274                styles.append(token)
1275                yield StyledText(_apply_styles(styles, token.name))
1276
1277            if token.ttype is TokenType.PLAIN:
1278                assert isinstance(token.data, str)
1279                yield StyledText(_apply_styles(styles, token.data))
1280                styles = _pop_position(styles)
1281                continue
1282
1283            if token.ttype is TokenType.UNSETTER:
1284                styles = _pop_unsetter(token, styles)
1285                continue
1286
1287            styles.append(token)

Gets all plain tokens within text, with their respective styles applied.

Args
  • text: The ANSI-sequence containing string to find plains from.
Returns

An iterator of StyledText objects, each yielded when a new plain token is found, containing the styles that are relevant and active on the given plain.