pytermgui.parser
This module provides TIM
, PyTermGUI's Terminal Inline Markup language. It is a simple,
performant and easy to read way to style, colorize & modify text.
Basic rundown
TIM is included with the purpose of making styling easier to read and manage.
Its syntax is based on square brackets, within which tags are strictly separated by one space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & foreground), styles, unsetters and macros.
The 16 simple colors of the terminal exist as named tags that refer to their numerical value.
Here is a simple example of the syntax, using the pytermgui.pretty
submodule to
syntax-highlight it inside the REPL:
>>> from pytermgui import pretty
>>> '[141 @61 bold] Hello [!upper inverse] There '
General syntax
Background colors are always denoted by a leading @
character in front of the color
tag. Styles are just the name of the style and macros have an exclamation mark in front
of them. Additionally, unsetters use a leading slash (/
) for their syntax. Color
tokens have special unsetters: they use /fg
to cancel foreground colors, and /bg
to
do so with backgrounds.
Macros:
Macros are any type of callable that take at least *args; this is the value of the plain text enclosed by the tag group within which the given macro resides. Additionally, macros can be given any number of positional arguments from within markup, using the syntax:
[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro
This syntax gets parsed as follows:
macro("Text that the macro applies to.", "arg1", "arg2", "arg3")
macro
here is whatever the name macro
was defined as prior.
Colors:
Colors can be of three general types: xterm-256, RGB and HEX.
xterm-256
stands for one of the 256 xterm colors. You can use ptg -c
to see the all
of the available colors. Its syntax is just the 0-base index of the color, like [141]
RGB
colors are pretty self explanatory. Their syntax is follows the format
RED;GREEN;BLUE
, such as [111;222;333]
.
HEX
colors are basically just RGB with extra steps. Their syntax is #RRGGBB
, such as
[#FA72BF]
. This code then gets converted to a tuple of RGB colors under the hood, so
from then on RGB and HEX colors are treated the same, and emit the same tokens.
As mentioned above, all colors can be made to act on the background instead by
prepending the color tag with @
, such as @141
, @111;222;333
or @#FA72BF
. To
clear these effects, use /fg
for foreground and /bg
for background colors.
MarkupLanguage
and instancing
All markup behaviour is done by an instance of the MarkupLanguage
class. This is done
partially for organization reasons, but also to allow a sort of sandboxing of custom
definitions and settings.
PyTermGUI provides the tim
name as the global markup language instance. For historical
reasons, the same instance is available as markup
. This should be used pretty much all
of the time, and custom instances should only ever come about when some
security-sensitive macro definitions are needed, as markup
is used by every widget,
including user-input ones such as InputField
.
For the rest of this page, MarkupLanguage
will refer to whichever instance you are
using.
TL;DR : Use tim
always, unless a security concern blocks you from doing so.
Caching
By default, all markup parse results are cached and returned when the same input is
given. To disable this behaviour, set your markup instance (usually markup
)'s
should_cache
field to False.
Customization
There are a couple of ways to customize how markup is parsed. Custom tags can be created
by calling MarkupLanguage.alias
. For defining custom macros, you can use
MarkupLanguage.define
. For more information, see each method's documentation.
1""" 2This module provides `TIM`, PyTermGUI's Terminal Inline Markup language. It is a simple, 3performant and easy to read way to style, colorize & modify text. 4 5Basic rundown 6------------- 7 8TIM is included with the purpose of making styling easier to read and manage. 9 10Its syntax is based on square brackets, within which tags are strictly separated by one 11space character. Tags can stand for colors (xterm-256, RGB or HEX, both background & 12foreground), styles, unsetters and macros. 13 14The 16 simple colors of the terminal exist as named tags that refer to their numerical 15value. 16 17Here is a simple example of the syntax, using the `pytermgui.pretty` submodule to 18syntax-highlight it inside the REPL: 19 20```python3 21>>> from pytermgui import pretty 22>>> '[141 @61 bold] Hello [!upper inverse] There ' 23``` 24 25<p align=center> 26<img src="https://github.com/bczsalba/pytermgui/blob/master/assets/docs/parser/\ 27simple_example.png?raw=true" width=70%> 28</p> 29 30 31General syntax 32-------------- 33 34Background colors are always denoted by a leading `@` character in front of the color 35tag. Styles are just the name of the style and macros have an exclamation mark in front 36of them. Additionally, unsetters use a leading slash (`/`) for their syntax. Color 37tokens have special unsetters: they use `/fg` to cancel foreground colors, and `/bg` to 38do so with backgrounds. 39 40### Macros: 41 42Macros are any type of callable that take at least *args; this is the value of the plain 43text enclosed by the tag group within which the given macro resides. Additionally, 44macros can be given any number of positional arguments from within markup, using the 45syntax: 46 47``` 48[!macro(arg1:arg2:arg3)]Text that the macro applies to.[/!macro]plain text, no macro 49``` 50 51This syntax gets parsed as follows: 52 53```python3 54macro("Text that the macro applies to.", "arg1", "arg2", "arg3") 55``` 56 57`macro` here is whatever the name `macro` was defined as prior. 58 59### Colors: 60 61Colors can be of three general types: xterm-256, RGB and HEX. 62 63`xterm-256` stands for one of the 256 xterm colors. You can use `ptg -c` to see the all 64of the available colors. Its syntax is just the 0-base index of the color, like `[141]` 65 66`RGB` colors are pretty self explanatory. Their syntax is follows the format 67`RED;GREEN;BLUE`, such as `[111;222;333]`. 68 69`HEX` colors are basically just RGB with extra steps. Their syntax is `#RRGGBB`, such as 70`[#FA72BF]`. This code then gets converted to a tuple of RGB colors under the hood, so 71from then on RGB and HEX colors are treated the same, and emit the same tokens. 72 73As mentioned above, all colors can be made to act on the background instead by 74prepending the color tag with `@`, such as `@141`, `@111;222;333` or `@#FA72BF`. To 75clear these effects, use `/fg` for foreground and `/bg` for background colors. 76 77`MarkupLanguage` and instancing 78------------------------------- 79 80All markup behaviour is done by an instance of the `MarkupLanguage` class. This is done 81partially for organization reasons, but also to allow a sort of sandboxing of custom 82definitions and settings. 83 84PyTermGUI provides the `tim` name as the global markup language instance. For historical 85reasons, the same instance is available as `markup`. This should be used pretty much all 86of the time, and custom instances should only ever come about when some 87security-sensitive macro definitions are needed, as `markup` is used by every widget, 88including user-input ones such as `InputField`. 89 90For the rest of this page, `MarkupLanguage` will refer to whichever instance you are 91using. 92 93TL;DR : Use `tim` always, unless a security concern blocks you from doing so. 94 95Caching 96------- 97 98By default, all markup parse results are cached and returned when the same input is 99given. To disable this behaviour, set your markup instance (usually `markup`)'s 100`should_cache` field to False. 101 102Customization 103------------- 104 105There are a couple of ways to customize how markup is parsed. Custom tags can be created 106by calling `MarkupLanguage.alias`. For defining custom macros, you can use 107`MarkupLanguage.define`. For more information, see each method's documentation. 108""" 109# pylint: disable=too-many-lines 110 111from __future__ import annotations 112 113from argparse import ArgumentParser 114from contextlib import suppress 115from dataclasses import dataclass 116from enum import Enum 117from enum import auto as _auto 118from functools import cached_property 119from random import shuffle 120from typing import Callable, Iterator, List, Tuple 121 122from .colors import Color, StandardColor, str_to_color 123from .exceptions import AnsiSyntaxError, ColorSyntaxError, MarkupSyntaxError 124from .regex import RE_ANSI, RE_LINK, RE_MACRO, RE_MARKUP 125from .terminal import get_terminal 126 127__all__ = [ 128 "StyledText", 129 "MacroCallable", 130 "MacroCall", 131 "MarkupLanguage", 132 "markup", 133 "tim", 134] 135 136MacroCallable = Callable[..., str] 137MacroCall = Tuple[MacroCallable, List[str]] 138 139STYLE_MAP = { 140 "bold": "1", 141 "dim": "2", 142 "italic": "3", 143 "underline": "4", 144 "blink": "5", 145 "blink2": "6", 146 "inverse": "7", 147 "invisible": "8", 148 "strikethrough": "9", 149 "overline": "53", 150} 151 152UNSETTER_MAP: dict[str, str | None] = { 153 "/": "0", 154 "/bold": "22", 155 "/dim": "22", 156 "/italic": "23", 157 "/underline": "24", 158 "/blink": "25", 159 "/blink2": "26", 160 "/inverse": "27", 161 "/invisible": "28", 162 "/strikethrough": "29", 163 "/fg": "39", 164 "/bg": "49", 165 "/overline": "54", 166} 167 168 169def macro_align(width: str, alignment: str, content: str) -> str: 170 """Aligns given text using fstrings. 171 172 Args: 173 width: The width to align to. 174 alignment: One of "left", "center", "right". 175 content: The content to align; implicit argument. 176 """ 177 178 aligner = "<" if alignment == "left" else (">" if alignment == "right" else "^") 179 return f"{content:{aligner}{width}}" 180 181 182def macro_expand(lang: MarkupLanguage, tag: str) -> str: 183 """Expands a tag alias.""" 184 185 if not tag in lang.user_tags: 186 return tag 187 188 return lang.get_markup(f"\x1b[{lang.user_tags[tag]}m ")[:-1] 189 190 191def macro_strip_fg(item: str) -> str: 192 """Strips foreground color from item""" 193 194 return markup.parse(f"[/fg]{item}") 195 196 197def macro_strip_bg(item: str) -> str: 198 """Strips foreground color from item""" 199 200 return markup.parse(f"[/bg]{item}") 201 202 203def macro_shuffle(item: str) -> str: 204 """Shuffles a string using shuffle.shuffle on its list cast.""" 205 206 shuffled = list(item) 207 shuffle(shuffled) 208 209 return "".join(shuffled) 210 211 212def macro_link(*args) -> str: 213 """Creates a clickable hyperlink. 214 215 Note: 216 Since this is a pretty new feature for terminals, its support is limited. 217 """ 218 219 *uri_parts, label = args 220 uri = ":".join(uri_parts) 221 222 return f"\x1b]8;;{uri}\x1b\\{label}\x1b]8;;\x1b\\" 223 224 225def _apply_colors(colors: list[str] | list[int], item: str) -> str: 226 """Applies the given list of colors to the item, spread out evenly.""" 227 228 blocksize = max(round(len(item) / len(colors)), 1) 229 230 out = "" 231 current_block = 0 232 for i, char in enumerate(item): 233 if i % blocksize == 0 and current_block < len(colors): 234 out += f"[{colors[current_block]}]" 235 current_block += 1 236 237 out += char 238 239 return markup.parse(out) 240 241 242def macro_rainbow(item: str) -> str: 243 """Creates rainbow-colored text.""" 244 245 colors = ["red", "208", "yellow", "green", "brightblue", "blue", "93"] 246 247 return _apply_colors(colors, item) 248 249 250def macro_gradient(base_str: str, item: str) -> str: 251 """Creates an xterm-256 gradient from a base color. 252 253 This exploits the way the colors are arranged in the xterm color table; every 254 36th color is the next item of a single gradient. 255 256 The start of this given gradient is calculated by decreasing the given base by 36 on 257 every iteration as long as the point is a valid gradient start. 258 259 After that, the 6 colors of this gradient are calculated and applied. 260 """ 261 262 if not base_str.isdigit(): 263 raise ValueError(f"Gradient base has to be a digit, got {base_str}.") 264 265 base = int(base_str) 266 if base < 16 or base > 231: 267 raise ValueError("Gradient base must be between 16 and 232") 268 269 while base > 52: 270 base -= 36 271 272 colors = [] 273 for i in range(6): 274 colors.append(base + 36 * i) 275 276 return _apply_colors(colors, item) 277 278 279class TokenType(Enum): 280 """An Enum to store various token types.""" 281 282 LINK = _auto() 283 """A terminal hyperlink.""" 284 285 PLAIN = _auto() 286 """Plain text, nothing interesting.""" 287 288 COLOR = _auto() 289 """A color token. Has a `pytermgui.colors.Color` instance as its data.""" 290 291 STYLE = _auto() 292 """A builtin terminal style, such as `bold` or `italic`.""" 293 294 MACRO = _auto() 295 """A PTG markup macro. The macro itself is stored inside `self.data`.""" 296 297 ESCAPED = _auto() 298 """An escaped token.""" 299 300 UNSETTER = _auto() 301 """A token that unsets some other attribute.""" 302 303 POSITION = _auto() 304 """A token representing a positioning string. `self.data` follows the format `x,y`.""" 305 306 307@dataclass 308class Token: 309 """A class holding information on a singular markup or ANSI style unit. 310 311 Attributes: 312 """ 313 314 ttype: TokenType 315 """The type of this token.""" 316 317 data: str | MacroCall | Color | None 318 """The data contained within this token. This changes based on the `ttype` attr.""" 319 320 name: str = "<unnamed-token>" 321 """An optional display name of the token. Defaults to `data` when not given.""" 322 323 def __post_init__(self) -> None: 324 """Sets `name` to `data` if not provided.""" 325 326 if self.name == "<unnamed-token>": 327 if isinstance(self.data, str): 328 self.name = self.data 329 330 elif isinstance(self.data, Color): 331 self.name = self.data.name 332 333 else: 334 raise TypeError 335 336 # Create LINK from a plain token 337 if self.ttype is TokenType.PLAIN: 338 assert isinstance(self.data, str) 339 340 link_match = RE_LINK.match(self.data) 341 342 if link_match is not None: 343 self.data, self.name = link_match.groups() 344 self.ttype = TokenType.LINK 345 346 if self.ttype is TokenType.ESCAPED: 347 assert isinstance(self.data, str) 348 349 self.name = self.data[1:] 350 351 def __eq__(self, other: object) -> bool: 352 """Checks equality with `other`.""" 353 354 if other is None: 355 return False 356 357 if not isinstance(other, type(self)): 358 return False 359 360 return other.data == self.data and other.ttype is self.ttype 361 362 @cached_property 363 def sequence(self) -> str | None: 364 """Returns the ANSI sequence this token represents.""" 365 366 if self.data is None: 367 return None 368 369 if self.ttype in [TokenType.PLAIN, TokenType.MACRO, TokenType.ESCAPED]: 370 return None 371 372 if self.ttype is TokenType.LINK: 373 return macro_link(self.data, self.name) 374 375 if self.ttype is TokenType.POSITION: 376 assert isinstance(self.data, str) 377 position = self.data.split(",") 378 return f"\x1b[{position[1]};{position[0]}H" 379 380 # Colors and styles 381 data = self.data 382 383 if self.ttype in [TokenType.STYLE, TokenType.UNSETTER]: 384 return f"\033[{data}m" 385 386 assert isinstance(data, Color) 387 return data.sequence 388 389 390class StyledText(str): 391 """A styled text object. 392 393 The purpose of this class is to implement some things regular `str` 394 breaks at when encountering ANSI sequences. 395 396 Instances of this class are usually spat out by `MarkupLanguage.parse`, 397 but may be manually constructed if the need arises. Everything works even 398 if there is no ANSI tomfoolery going on. 399 """ 400 401 value: str 402 """The underlying, ANSI-inclusive string value.""" 403 404 _plain: str | None = None 405 _tokens: list[Token] | None = None 406 407 def __new__(cls, value: str = ""): 408 """Creates a StyledText, gets markup tags.""" 409 410 obj = super().__new__(cls, value) 411 obj.value = value 412 413 return obj 414 415 def _generate_tokens(self) -> None: 416 """Generates self._tokens & self._plain.""" 417 418 self._tokens = list(tim.tokenize_ansi(self.value)) 419 420 self._plain = "" 421 for token in self._tokens: 422 if token.ttype is not TokenType.PLAIN: 423 continue 424 425 assert isinstance(token.data, str) 426 self._plain += token.data 427 428 @property 429 def tokens(self) -> list[Token]: 430 """Returns all markup tokens of this object. 431 432 Generated on-demand, at the first call to this or the self.plain 433 property. 434 """ 435 436 if self._tokens is not None: 437 return self._tokens 438 439 self._generate_tokens() 440 assert self._tokens is not None 441 return self._tokens 442 443 @property 444 def plain(self) -> str: 445 """Returns the value of this object, with no ANSI sequences. 446 447 Generated on-demand, at the first call to this or the self.tokens 448 property. 449 """ 450 451 if self._plain is not None: 452 return self._plain 453 454 self._generate_tokens() 455 assert self._plain is not None 456 return self._plain 457 458 def plain_index(self, index: int | None) -> int | None: 459 """Finds given index inside plain text.""" 460 461 if index is None: 462 return None 463 464 styled_chars = 0 465 plain_chars = 0 466 negative_index = False 467 468 tokens = self.tokens.copy() 469 if index < 0: 470 tokens.reverse() 471 index = abs(index) 472 negative_index = True 473 474 for token in tokens: 475 if token.data is None: 476 continue 477 478 if token.ttype is not TokenType.PLAIN: 479 assert token.sequence is not None 480 styled_chars += len(token.sequence) 481 continue 482 483 assert isinstance(token.data, str) 484 for _ in range(len(token.data)): 485 if plain_chars == index: 486 if negative_index: 487 return -1 * (plain_chars + styled_chars) 488 489 return styled_chars + plain_chars 490 491 plain_chars += 1 492 493 return None 494 495 def __len__(self) -> int: 496 """Gets "real" length of object.""" 497 498 return len(self.plain) 499 500 def __getitem__(self, subscript: int | slice) -> str: 501 """Gets an item, adjusted for non-plain text. 502 503 Args: 504 subscript: The integer or slice to find. 505 506 Returns: 507 The elements described by the subscript. 508 509 Raises: 510 IndexError: The given index is out of range. 511 """ 512 513 if isinstance(subscript, int): 514 plain_index = self.plain_index(subscript) 515 if plain_index is None: 516 raise IndexError("StyledText index out of range") 517 518 return self.value[plain_index] 519 520 return self.value[ 521 slice( 522 self.plain_index(subscript.start), 523 self.plain_index(subscript.stop), 524 subscript.step, 525 ) 526 ] 527 528 529class MarkupLanguage: 530 """A class representing an instance of a Markup Language. 531 532 This class is used for all markup/ANSI parsing, tokenizing and usage. 533 534 ```python3 535 from pytermgui import tim 536 537 tim.alias("my-tag", "@152 72 bold") 538 tim.print("This is [my-tag]my-tag[/]!") 539 ``` 540 541 <p style="text-align: center"> 542 <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\ 543docs/parser/markup_language.png" 544 style="width: 80%"> 545 </p> 546 """ 547 548 raise_unknown_markup: bool = False 549 """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags.""" 550 551 def __init__(self, default_macros: bool = True) -> None: 552 """Initializes a MarkupLanguage. 553 554 Args: 555 default_macros: If not set, the builtin macros are not defined. 556 """ 557 558 self.tags: dict[str, str] = STYLE_MAP.copy() 559 self._cache: dict[str, StyledText] = {} 560 self.macros: dict[str, MacroCallable] = {} 561 self.user_tags: dict[str, str] = {} 562 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 563 564 self.should_cache: bool = True 565 566 if default_macros: 567 self.define("!link", macro_link) 568 self.define("!align", macro_align) 569 self.define("!markup", self.get_markup) 570 self.define("!shuffle", macro_shuffle) 571 self.define("!strip_bg", macro_strip_bg) 572 self.define("!strip_fg", macro_strip_fg) 573 self.define("!rainbow", macro_rainbow) 574 self.define("!gradient", macro_gradient) 575 self.define("!upper", lambda item: str(item.upper())) 576 self.define("!lower", lambda item: str(item.lower())) 577 self.define("!title", lambda item: str(item.title())) 578 self.define("!capitalize", lambda item: str(item.capitalize())) 579 self.define("!expand", lambda tag: macro_expand(self, tag)) 580 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 581 582 self.alias("code", "dim @black") 583 self.alias("code.str", "142") 584 self.alias("code.multiline_str", "code.str") 585 self.alias("code.none", "167") 586 self.alias("code.global", "214") 587 self.alias("code.number", "175") 588 self.alias("code.keyword", "203") 589 self.alias("code.identifier", "109") 590 self.alias("code.name", "code.global") 591 self.alias("code.comment", "240 italic") 592 self.alias("code.builtin", "code.global") 593 self.alias("code.file", "code.identifier") 594 self.alias("code.symbol", "code.identifier") 595 596 def _get_color_token(self, tag: str) -> Token | None: 597 """Tries to get a color token from the given tag. 598 599 Args: 600 tag: The tag to parse. 601 602 Returns: 603 A color token if the given tag could be parsed into one, else None. 604 """ 605 606 try: 607 color = str_to_color(tag, use_cache=self.should_cache) 608 609 except ColorSyntaxError: 610 return None 611 612 return Token(name=color.value, ttype=TokenType.COLOR, data=color) 613 614 def _get_style_token(self, tag: str) -> Token | None: 615 """Tries to get a style (including unsetter) token from tags, user tags and unsetters. 616 617 Args: 618 tag: The tag to parse. 619 620 Returns: 621 A `Token` if one could be created, None otherwise. 622 """ 623 624 if tag in self.unsetters: 625 return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag]) 626 627 if tag in self.user_tags: 628 return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag]) 629 630 if tag in self.tags: 631 return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) 632 633 return None 634 635 def print(self, *args, **kwargs) -> None: 636 """Parse all arguments and pass them through to print, along with kwargs.""" 637 638 parsed = [] 639 for arg in args: 640 parsed.append(self.parse(str(arg))) 641 642 get_terminal().print(*parsed, **kwargs) 643 644 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 645 """Converts the given markup string into an iterator of `Token`. 646 647 Args: 648 markup_text: The text to look at. 649 650 Returns: 651 An iterator of tokens. The reason this is an iterator is to possibly save 652 on memory. 653 """ 654 655 end = 0 656 start = 0 657 cursor = 0 658 for match in RE_MARKUP.finditer(markup_text): 659 full, escapes, tag_text = match.groups() 660 start, end = match.span() 661 662 # Add plain text between last and current match 663 if start > cursor: 664 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 665 666 if not escapes == "" and len(escapes) % 2 == 1: 667 cursor = end 668 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 669 continue 670 671 for tag in tag_text.split(): 672 token = self._get_style_token(tag) 673 if token is not None: 674 yield token 675 continue 676 677 # Try to find a color token 678 token = self._get_color_token(tag) 679 if token is not None: 680 yield token 681 continue 682 683 macro_match = RE_MACRO.match(tag) 684 if macro_match is not None: 685 name, args = macro_match.groups() 686 macro_args = () if args is None else args.split(":") 687 688 if not name in self.macros: 689 raise MarkupSyntaxError( 690 tag=tag, 691 cause="is not a defined macro", 692 context=markup_text, 693 ) 694 695 yield Token( 696 name=tag, 697 ttype=TokenType.MACRO, 698 data=(self.macros[name], macro_args), 699 ) 700 continue 701 702 if self.raise_unknown_markup: 703 raise MarkupSyntaxError( 704 tag=tag, cause="not defined", context=markup_text 705 ) 706 707 cursor = end 708 709 # Add remaining text as plain 710 if len(markup_text) > cursor: 711 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:]) 712 713 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 714 """Converts the given ANSI string into an iterator of `Token`. 715 716 Args: 717 ansi: The text to look at. 718 719 Returns: 720 An iterator of tokens. The reason this is an iterator is to possibly save 721 on memory. 722 """ 723 724 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 725 """Determines whether a code is in the given dict of tags.""" 726 727 for name, current in tags.items(): 728 if current == code: 729 return name 730 731 return None 732 733 def _generate_color( 734 parts: list[str], code: str 735 ) -> tuple[str, TokenType, Color]: 736 """Generates a color token.""" 737 738 data: Color 739 if len(parts) == 1: 740 data = StandardColor.from_ansi(code) 741 name = data.name 742 ttype = TokenType.COLOR 743 744 else: 745 data = str_to_color(code) 746 name = data.name 747 ttype = TokenType.COLOR 748 749 return name, ttype, data 750 751 end = 0 752 start = 0 753 cursor = 0 754 755 # StyledText messes with indexing, so we need to cast it 756 # back to str. 757 if isinstance(ansi, StyledText): 758 ansi = str(ansi) 759 760 for match in RE_ANSI.finditer(ansi): 761 code = match.groups()[0] 762 start, end = match.span() 763 764 if code is None: 765 continue 766 767 parts = code.split(";") 768 769 if start > cursor: 770 plain = ansi[cursor:start] 771 772 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 773 774 name: str | None = code 775 ttype = None 776 data: str | Color = parts[0] 777 778 # Styles & Unsetters 779 if len(parts) == 1: 780 # Covariancy is not an issue here, even though mypy seems to think so. 781 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 782 if name is not None: 783 ttype = TokenType.UNSETTER 784 785 else: 786 name = _is_in_tags(parts[0], self.tags) 787 if name is not None: 788 ttype = TokenType.STYLE 789 790 # Colors 791 if ttype is None: 792 with suppress(ColorSyntaxError): 793 name, ttype, data = _generate_color(parts, code) 794 795 if name is None or ttype is None or data is None: 796 if len(parts) != 2: 797 raise AnsiSyntaxError( 798 tag=parts[0], cause="not recognized", context=ansi 799 ) 800 801 name = "position" 802 ttype = TokenType.POSITION 803 data = ",".join(reversed(parts)) 804 805 yield Token(name=name, ttype=ttype, data=data) 806 cursor = end 807 808 if cursor < len(ansi): 809 plain = ansi[cursor:] 810 811 yield Token(ttype=TokenType.PLAIN, data=plain) 812 813 def define(self, name: str, method: MacroCallable) -> None: 814 """Defines a Macro tag that executes the given method. 815 816 Args: 817 name: The name the given method will be reachable by within markup. 818 The given value gets "!" prepended if it isn't present already. 819 method: The method this macro will execute. 820 """ 821 822 if not name.startswith("!"): 823 name = f"!{name}" 824 825 self.macros[name] = method 826 self.unsetters[f"/{name}"] = None 827 828 def alias(self, name: str, value: str) -> None: 829 """Aliases the given name to a value, and generates an unsetter for it. 830 831 Note that it is not possible to alias macros. 832 833 Args: 834 name: The name of the new tag. 835 value: The value the new tag will stand for. 836 """ 837 838 def _get_unsetter(token: Token) -> str | None: 839 """Get unsetter for a token""" 840 841 if token.ttype is TokenType.PLAIN: 842 return None 843 844 if token.ttype is TokenType.UNSETTER: 845 return self.unsetters[token.name] 846 847 if token.ttype is TokenType.COLOR: 848 assert isinstance(token.data, Color) 849 850 if token.data.background: 851 return self.unsetters["/bg"] 852 853 return self.unsetters["/fg"] 854 855 name = f"/{token.name}" 856 if not name in self.unsetters: 857 raise KeyError(f"Could not find unsetter for token {token}.") 858 859 return self.unsetters[name] 860 861 if name.startswith("!"): 862 raise ValueError('Only macro tags can always start with "!".') 863 864 setter = "" 865 unsetter = "" 866 867 # Try to link to existing tag 868 if value in self.user_tags: 869 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 870 self.user_tags[name] = self.user_tags[value] 871 return 872 873 for token in self.tokenize_markup(f"[{value}]"): 874 if token.ttype is TokenType.PLAIN: 875 continue 876 877 assert token.sequence is not None 878 setter += token.sequence 879 880 t_unsetter = _get_unsetter(token) 881 unsetter += f"\x1b[{t_unsetter}m" 882 883 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 884 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 885 886 marked: list[str] = [] 887 for item in self._cache: 888 if name in item: 889 marked.append(item) 890 891 for item in marked: 892 del self._cache[item] 893 894 # TODO: I cannot cut down the one-too-many branch that this has at the moment. 895 # We could look into it in the future, however. 896 def parse( # pylint: disable=too-many-branches 897 self, markup_text: str 898 ) -> StyledText: 899 """Parses the given markup. 900 901 Args: 902 markup_text: The markup to parse. 903 904 Returns: 905 A `StyledText` instance of the result of parsing the input. This 906 custom `str` class is used to allow accessing the plain value of 907 the output, as well as to cleanly index within it. It is analogous 908 to builtin `str`, only adds extra things on top. 909 """ 910 911 applied_macros: list[tuple[str, MacroCall]] = [] 912 previous_token: Token | None = None 913 previous_sequence = "" 914 sequence = "" 915 out = "" 916 917 def _apply_macros(text: str) -> str: 918 """Apply current macros to text""" 919 920 for _, (method, args) in applied_macros: 921 text = method(*args, text) 922 923 return text 924 925 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 926 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 927 return False 928 929 return ( 930 type(previous) is type(new) 931 and previous.data.background == new.data.background 932 ) 933 934 if ( 935 self.should_cache 936 and markup_text in self._cache 937 and len(RE_MACRO.findall(markup_text)) == 0 938 ): 939 return self._cache[markup_text] 940 941 token: Token 942 for token in self.tokenize_markup(markup_text): 943 if sequence != "" and previous_token == token: 944 continue 945 946 # Optimize out previously added color tokens, as only the most 947 # recent would be visible anyways. 948 if ( 949 token.sequence is not None 950 and previous_token is not None 951 and _is_same_colorgroup(previous_token, token) 952 ): 953 sequence = token.sequence 954 continue 955 956 if token.ttype == TokenType.UNSETTER and token.data == "0": 957 out += "\033[0m" 958 sequence = "" 959 applied_macros = [] 960 continue 961 962 previous_token = token 963 964 # Macro unsetters are stored with None as their data 965 if token.data is None and token.ttype is TokenType.UNSETTER: 966 for item, data in applied_macros.copy(): 967 macro_match = RE_MACRO.match(item) 968 assert macro_match is not None 969 970 macro_name = macro_match.groups()[0] 971 972 if f"/{macro_name}" == token.name: 973 applied_macros.remove((item, data)) 974 975 continue 976 977 if token.ttype is TokenType.MACRO: 978 assert isinstance(token.data, tuple) 979 980 applied_macros.append((token.name, token.data)) 981 continue 982 983 if token.sequence is None: 984 applied = sequence 985 986 if not out.endswith("\x1b[0m"): 987 for item in previous_sequence.split("\x1b"): 988 if item == "" or item[1:-1] in self.unsetters.values(): 989 continue 990 991 item = f"\x1b{item}" 992 applied = applied.replace(item, "") 993 994 out += applied + _apply_macros(token.name) 995 previous_sequence = sequence 996 sequence = "" 997 continue 998 999 sequence += token.sequence 1000 1001 if sequence + previous_sequence != "": 1002 out += "\x1b[0m" 1003 1004 out = StyledText(out) 1005 self._cache[markup_text] = out 1006 return out 1007 1008 def get_markup(self, ansi: str) -> str: 1009 """Generates markup from ANSI text. 1010 1011 Args: 1012 ansi: The text to get markup from. 1013 1014 Returns: 1015 A markup string that can be parsed to get (visually) the same 1016 result. Note that this conversion is lossy in a way: there are some 1017 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 1018 conversion. 1019 """ 1020 1021 current_tags: list[str] = [] 1022 out = "" 1023 for token in self.tokenize_ansi(ansi): 1024 if token.ttype is TokenType.PLAIN: 1025 if len(current_tags) != 0: 1026 out += "[" + " ".join(current_tags) + "]" 1027 1028 assert isinstance(token.data, str) 1029 out += token.data 1030 current_tags = [] 1031 continue 1032 1033 if token.ttype is TokenType.ESCAPED: 1034 assert isinstance(token.data, str) 1035 1036 current_tags.append(token.data) 1037 continue 1038 1039 current_tags.append(token.name) 1040 1041 return out 1042 1043 def prettify_ansi(self, text: str) -> str: 1044 """Returns a prettified (syntax-highlighted) ANSI str. 1045 1046 This is useful to quickly "inspect" a given ANSI string. However, 1047 for most real uses `MarkupLanguage.prettify_markup` would be 1048 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1049 as it is much more verbose. 1050 1051 Args: 1052 text: The ANSI-text to prettify. 1053 1054 Returns: 1055 The prettified ANSI text. This text's styles remain valid, 1056 so copy-pasting the argument into a command (like printf) 1057 that can show styled text will work the same way. 1058 """ 1059 1060 out = "" 1061 sequences = "" 1062 for token in self.tokenize_ansi(text): 1063 if token.ttype is TokenType.PLAIN: 1064 assert isinstance(token.data, str) 1065 out += token.data 1066 continue 1067 1068 assert token.sequence is not None 1069 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1070 sequences += token.sequence 1071 out += sequences 1072 1073 return out 1074 1075 def prettify_markup(self, text: str) -> str: 1076 """Returns a prettified (syntax-highlighted) markup str. 1077 1078 Args: 1079 text: The markup-text to prettify. 1080 1081 Returns: 1082 Prettified markup. This markup, excluding its styles, 1083 remains valid markup. 1084 """ 1085 1086 def _apply_macros(text: str) -> str: 1087 """Apply current macros to text""" 1088 1089 for _, (method, args) in applied_macros: 1090 text = method(*args, text) 1091 1092 return text 1093 1094 def _pop_macro(name: str) -> None: 1095 """Pops a macro from applied_macros.""" 1096 1097 for i, (macro_name, _) in enumerate(applied_macros): 1098 if macro_name == name: 1099 applied_macros.pop(i) 1100 break 1101 1102 def _finish(out: str, in_sequence: bool) -> str: 1103 """Adds ending cap to the given string.""" 1104 1105 if in_sequence: 1106 if not out.endswith("\x1b[0m"): 1107 out += "\x1b[0m" 1108 1109 return out + "]" 1110 1111 return out + "[/]" 1112 1113 styles: dict[TokenType, str] = { 1114 TokenType.MACRO: "210", 1115 TokenType.ESCAPED: "210 bold", 1116 TokenType.UNSETTER: "strikethrough", 1117 } 1118 1119 applied_macros: list[tuple[str, MacroCall]] = [] 1120 1121 out = "" 1122 in_sequence = False 1123 current_styles: list[Token] = [] 1124 1125 for token in self.tokenize_markup(text): 1126 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1127 if in_sequence: 1128 out += "]" 1129 1130 in_sequence = False 1131 1132 sequence = "" 1133 for style in current_styles: 1134 if style.sequence is None: 1135 continue 1136 1137 sequence += style.sequence 1138 1139 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1140 continue 1141 1142 out += " " if in_sequence else "[" 1143 in_sequence = True 1144 1145 if token.ttype is TokenType.UNSETTER: 1146 if token.name == "/": 1147 applied_macros = [] 1148 1149 name = token.name[1:] 1150 1151 if name in self.macros: 1152 _pop_macro(name) 1153 1154 current_styles.append(token) 1155 1156 out += self.parse( 1157 ("" if (name in self.tags) or (name in self.user_tags) else "") 1158 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1159 ) 1160 continue 1161 1162 if token.ttype is TokenType.MACRO: 1163 assert isinstance(token.data, tuple) 1164 1165 name = token.name 1166 if "(" in name: 1167 name = name[: token.name.index("(")] 1168 1169 applied_macros.append((name, token.data)) 1170 1171 try: 1172 out += token.data[0](*token.data[1], token.name) 1173 continue 1174 1175 except TypeError: # Not enough arguments 1176 pass 1177 1178 if token.sequence is not None: 1179 current_styles.append(token) 1180 1181 style_markup = styles.get(token.ttype) or token.name 1182 out += self.parse(f"[{style_markup}]{token.name}") 1183 1184 return _finish(out, in_sequence) 1185 1186 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1187 """Gets all plain tokens within text, with their respective styles applied. 1188 1189 Args: 1190 text: The ANSI-sequence containing string to find plains from. 1191 1192 Returns: 1193 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1194 containing the styles that are relevant and active on the given plain. 1195 """ 1196 1197 def _apply_styles(styles: list[Token], text: str) -> str: 1198 """Applies given styles to text.""" 1199 1200 for token in styles: 1201 if token.ttype is TokenType.MACRO: 1202 assert isinstance(token.data, tuple) 1203 text = token.data[0](*token.data[1], text) 1204 continue 1205 1206 if token.sequence is None: 1207 continue 1208 1209 text = token.sequence + text 1210 1211 return text 1212 1213 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1214 """Removes an unsetter from the list, returns the new list.""" 1215 1216 if token.name == "/": 1217 return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles)) 1218 1219 target_name = token.name[1:] 1220 for style in styles: 1221 # bold & dim unsetters represent the same character, so we have 1222 # to treat them the same way. 1223 style_name = style.name 1224 1225 if style.name == "dim": 1226 style_name = "bold" 1227 1228 if style_name == target_name: 1229 styles.remove(style) 1230 1231 elif ( 1232 style_name.startswith(target_name) 1233 and style.ttype is TokenType.MACRO 1234 ): 1235 styles.remove(style) 1236 1237 elif style.ttype is TokenType.COLOR: 1238 assert isinstance(style.data, Color) 1239 if target_name == "fg" and not style.data.background: 1240 styles.remove(style) 1241 1242 elif target_name == "bg" and style.data.background: 1243 styles.remove(style) 1244 1245 return styles 1246 1247 def _pop_position(styles: list[Token]) -> list[Token]: 1248 for token in styles.copy(): 1249 if token.ttype is TokenType.POSITION: 1250 styles.remove(token) 1251 1252 return styles 1253 1254 styles: list[Token] = [] 1255 for token in self.tokenize_ansi(text): 1256 if token.ttype is TokenType.COLOR: 1257 for i, style in enumerate(reversed(styles)): 1258 if style.ttype is TokenType.COLOR: 1259 assert isinstance(style.data, Color) 1260 assert isinstance(token.data, Color) 1261 1262 if style.data.background != token.data.background: 1263 continue 1264 1265 styles[len(styles) - i - 1] = token 1266 break 1267 else: 1268 styles.append(token) 1269 1270 continue 1271 1272 if token.ttype is TokenType.LINK: 1273 styles.append(token) 1274 yield StyledText(_apply_styles(styles, token.name)) 1275 1276 if token.ttype is TokenType.PLAIN: 1277 assert isinstance(token.data, str) 1278 yield StyledText(_apply_styles(styles, token.data)) 1279 styles = _pop_position(styles) 1280 continue 1281 1282 if token.ttype is TokenType.UNSETTER: 1283 styles = _pop_unsetter(token, styles) 1284 continue 1285 1286 styles.append(token) 1287 1288 1289def main() -> None: 1290 """Main method""" 1291 1292 parser = ArgumentParser() 1293 1294 markup_group = parser.add_argument_group("Markup->ANSI") 1295 markup_group.add_argument( 1296 "-p", "--parse", metavar=("TXT"), help="parse a markup text" 1297 ) 1298 markup_group.add_argument( 1299 "-e", "--escape", help="escape parsed markup", action="store_true" 1300 ) 1301 # markup_group.add_argument( 1302 # "-o", 1303 # "--optimize", 1304 # help="set optimization level for markup parsing", 1305 # action="count", 1306 # default=0, 1307 # ) 1308 1309 markup_group.add_argument("--alias", action="append", help="alias src=dst") 1310 1311 ansi_group = parser.add_argument_group("ANSI->Markup") 1312 ansi_group.add_argument( 1313 "-m", "--markup", metavar=("TXT"), help="get markup from ANSI text" 1314 ) 1315 ansi_group.add_argument( 1316 "-s", 1317 "--show-inverse", 1318 action="store_true", 1319 help="show result of parsing result markup", 1320 ) 1321 1322 args = parser.parse_args() 1323 1324 lang = MarkupLanguage() 1325 1326 if args.markup: 1327 markup_text = lang.get_markup(args.markup) 1328 print(markup_text, end="") 1329 1330 if args.show_inverse: 1331 print("->", lang.parse(markup_text)) 1332 else: 1333 print() 1334 1335 if args.parse: 1336 if args.alias: 1337 for alias in args.alias: 1338 src, dest = alias.split("=") 1339 lang.alias(src, dest) 1340 1341 parsed = lang.parse(args.parse) 1342 1343 if args.escape: 1344 print(ascii(parsed)) 1345 else: 1346 print(parsed) 1347 1348 return 1349 1350 1351tim = markup = MarkupLanguage() 1352"""The default TIM instances.""" 1353 1354if __name__ == "__main__": 1355 main()
391class StyledText(str): 392 """A styled text object. 393 394 The purpose of this class is to implement some things regular `str` 395 breaks at when encountering ANSI sequences. 396 397 Instances of this class are usually spat out by `MarkupLanguage.parse`, 398 but may be manually constructed if the need arises. Everything works even 399 if there is no ANSI tomfoolery going on. 400 """ 401 402 value: str 403 """The underlying, ANSI-inclusive string value.""" 404 405 _plain: str | None = None 406 _tokens: list[Token] | None = None 407 408 def __new__(cls, value: str = ""): 409 """Creates a StyledText, gets markup tags.""" 410 411 obj = super().__new__(cls, value) 412 obj.value = value 413 414 return obj 415 416 def _generate_tokens(self) -> None: 417 """Generates self._tokens & self._plain.""" 418 419 self._tokens = list(tim.tokenize_ansi(self.value)) 420 421 self._plain = "" 422 for token in self._tokens: 423 if token.ttype is not TokenType.PLAIN: 424 continue 425 426 assert isinstance(token.data, str) 427 self._plain += token.data 428 429 @property 430 def tokens(self) -> list[Token]: 431 """Returns all markup tokens of this object. 432 433 Generated on-demand, at the first call to this or the self.plain 434 property. 435 """ 436 437 if self._tokens is not None: 438 return self._tokens 439 440 self._generate_tokens() 441 assert self._tokens is not None 442 return self._tokens 443 444 @property 445 def plain(self) -> str: 446 """Returns the value of this object, with no ANSI sequences. 447 448 Generated on-demand, at the first call to this or the self.tokens 449 property. 450 """ 451 452 if self._plain is not None: 453 return self._plain 454 455 self._generate_tokens() 456 assert self._plain is not None 457 return self._plain 458 459 def plain_index(self, index: int | None) -> int | None: 460 """Finds given index inside plain text.""" 461 462 if index is None: 463 return None 464 465 styled_chars = 0 466 plain_chars = 0 467 negative_index = False 468 469 tokens = self.tokens.copy() 470 if index < 0: 471 tokens.reverse() 472 index = abs(index) 473 negative_index = True 474 475 for token in tokens: 476 if token.data is None: 477 continue 478 479 if token.ttype is not TokenType.PLAIN: 480 assert token.sequence is not None 481 styled_chars += len(token.sequence) 482 continue 483 484 assert isinstance(token.data, str) 485 for _ in range(len(token.data)): 486 if plain_chars == index: 487 if negative_index: 488 return -1 * (plain_chars + styled_chars) 489 490 return styled_chars + plain_chars 491 492 plain_chars += 1 493 494 return None 495 496 def __len__(self) -> int: 497 """Gets "real" length of object.""" 498 499 return len(self.plain) 500 501 def __getitem__(self, subscript: int | slice) -> str: 502 """Gets an item, adjusted for non-plain text. 503 504 Args: 505 subscript: The integer or slice to find. 506 507 Returns: 508 The elements described by the subscript. 509 510 Raises: 511 IndexError: The given index is out of range. 512 """ 513 514 if isinstance(subscript, int): 515 plain_index = self.plain_index(subscript) 516 if plain_index is None: 517 raise IndexError("StyledText index out of range") 518 519 return self.value[plain_index] 520 521 return self.value[ 522 slice( 523 self.plain_index(subscript.start), 524 self.plain_index(subscript.stop), 525 subscript.step, 526 ) 527 ]
A styled text object.
The purpose of this class is to implement some things regular str
breaks at when encountering ANSI sequences.
Instances of this class are usually spat out by MarkupLanguage.parse
,
but may be manually constructed if the need arises. Everything works even
if there is no ANSI tomfoolery going on.
408 def __new__(cls, value: str = ""): 409 """Creates a StyledText, gets markup tags.""" 410 411 obj = super().__new__(cls, value) 412 obj.value = value 413 414 return obj
Creates a StyledText, gets markup tags.
Returns all markup tokens of this object.
Generated on-demand, at the first call to this or the self.plain property.
Returns the value of this object, with no ANSI sequences.
Generated on-demand, at the first call to this or the self.tokens property.
459 def plain_index(self, index: int | None) -> int | None: 460 """Finds given index inside plain text.""" 461 462 if index is None: 463 return None 464 465 styled_chars = 0 466 plain_chars = 0 467 negative_index = False 468 469 tokens = self.tokens.copy() 470 if index < 0: 471 tokens.reverse() 472 index = abs(index) 473 negative_index = True 474 475 for token in tokens: 476 if token.data is None: 477 continue 478 479 if token.ttype is not TokenType.PLAIN: 480 assert token.sequence is not None 481 styled_chars += len(token.sequence) 482 continue 483 484 assert isinstance(token.data, str) 485 for _ in range(len(token.data)): 486 if plain_chars == index: 487 if negative_index: 488 return -1 * (plain_chars + styled_chars) 489 490 return styled_chars + plain_chars 491 492 plain_chars += 1 493 494 return None
Finds given index inside plain text.
Inherited Members
- builtins.str
- encode
- replace
- split
- rsplit
- join
- capitalize
- casefold
- title
- center
- count
- expandtabs
- find
- partition
- index
- ljust
- lower
- lstrip
- rfind
- rindex
- rjust
- rstrip
- rpartition
- splitlines
- strip
- swapcase
- translate
- upper
- startswith
- endswith
- removeprefix
- removesuffix
- isascii
- islower
- isupper
- istitle
- isspace
- isdecimal
- isdigit
- isnumeric
- isalpha
- isalnum
- isidentifier
- isprintable
- zfill
- format
- format_map
- maketrans
530class MarkupLanguage: 531 """A class representing an instance of a Markup Language. 532 533 This class is used for all markup/ANSI parsing, tokenizing and usage. 534 535 ```python3 536 from pytermgui import tim 537 538 tim.alias("my-tag", "@152 72 bold") 539 tim.print("This is [my-tag]my-tag[/]!") 540 ``` 541 542 <p style="text-align: center"> 543 <img src="https://raw.githubusercontent.com/bczsalba/pytermgui/master/assets/\ 544docs/parser/markup_language.png" 545 style="width: 80%"> 546 </p> 547 """ 548 549 raise_unknown_markup: bool = False 550 """Raise `pytermgui.exceptions.MarkupSyntaxError` when encountering unknown markup tags.""" 551 552 def __init__(self, default_macros: bool = True) -> None: 553 """Initializes a MarkupLanguage. 554 555 Args: 556 default_macros: If not set, the builtin macros are not defined. 557 """ 558 559 self.tags: dict[str, str] = STYLE_MAP.copy() 560 self._cache: dict[str, StyledText] = {} 561 self.macros: dict[str, MacroCallable] = {} 562 self.user_tags: dict[str, str] = {} 563 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 564 565 self.should_cache: bool = True 566 567 if default_macros: 568 self.define("!link", macro_link) 569 self.define("!align", macro_align) 570 self.define("!markup", self.get_markup) 571 self.define("!shuffle", macro_shuffle) 572 self.define("!strip_bg", macro_strip_bg) 573 self.define("!strip_fg", macro_strip_fg) 574 self.define("!rainbow", macro_rainbow) 575 self.define("!gradient", macro_gradient) 576 self.define("!upper", lambda item: str(item.upper())) 577 self.define("!lower", lambda item: str(item.lower())) 578 self.define("!title", lambda item: str(item.title())) 579 self.define("!capitalize", lambda item: str(item.capitalize())) 580 self.define("!expand", lambda tag: macro_expand(self, tag)) 581 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 582 583 self.alias("code", "dim @black") 584 self.alias("code.str", "142") 585 self.alias("code.multiline_str", "code.str") 586 self.alias("code.none", "167") 587 self.alias("code.global", "214") 588 self.alias("code.number", "175") 589 self.alias("code.keyword", "203") 590 self.alias("code.identifier", "109") 591 self.alias("code.name", "code.global") 592 self.alias("code.comment", "240 italic") 593 self.alias("code.builtin", "code.global") 594 self.alias("code.file", "code.identifier") 595 self.alias("code.symbol", "code.identifier") 596 597 def _get_color_token(self, tag: str) -> Token | None: 598 """Tries to get a color token from the given tag. 599 600 Args: 601 tag: The tag to parse. 602 603 Returns: 604 A color token if the given tag could be parsed into one, else None. 605 """ 606 607 try: 608 color = str_to_color(tag, use_cache=self.should_cache) 609 610 except ColorSyntaxError: 611 return None 612 613 return Token(name=color.value, ttype=TokenType.COLOR, data=color) 614 615 def _get_style_token(self, tag: str) -> Token | None: 616 """Tries to get a style (including unsetter) token from tags, user tags and unsetters. 617 618 Args: 619 tag: The tag to parse. 620 621 Returns: 622 A `Token` if one could be created, None otherwise. 623 """ 624 625 if tag in self.unsetters: 626 return Token(name=tag, ttype=TokenType.UNSETTER, data=self.unsetters[tag]) 627 628 if tag in self.user_tags: 629 return Token(name=tag, ttype=TokenType.STYLE, data=self.user_tags[tag]) 630 631 if tag in self.tags: 632 return Token(name=tag, ttype=TokenType.STYLE, data=self.tags[tag]) 633 634 return None 635 636 def print(self, *args, **kwargs) -> None: 637 """Parse all arguments and pass them through to print, along with kwargs.""" 638 639 parsed = [] 640 for arg in args: 641 parsed.append(self.parse(str(arg))) 642 643 get_terminal().print(*parsed, **kwargs) 644 645 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 646 """Converts the given markup string into an iterator of `Token`. 647 648 Args: 649 markup_text: The text to look at. 650 651 Returns: 652 An iterator of tokens. The reason this is an iterator is to possibly save 653 on memory. 654 """ 655 656 end = 0 657 start = 0 658 cursor = 0 659 for match in RE_MARKUP.finditer(markup_text): 660 full, escapes, tag_text = match.groups() 661 start, end = match.span() 662 663 # Add plain text between last and current match 664 if start > cursor: 665 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 666 667 if not escapes == "" and len(escapes) % 2 == 1: 668 cursor = end 669 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 670 continue 671 672 for tag in tag_text.split(): 673 token = self._get_style_token(tag) 674 if token is not None: 675 yield token 676 continue 677 678 # Try to find a color token 679 token = self._get_color_token(tag) 680 if token is not None: 681 yield token 682 continue 683 684 macro_match = RE_MACRO.match(tag) 685 if macro_match is not None: 686 name, args = macro_match.groups() 687 macro_args = () if args is None else args.split(":") 688 689 if not name in self.macros: 690 raise MarkupSyntaxError( 691 tag=tag, 692 cause="is not a defined macro", 693 context=markup_text, 694 ) 695 696 yield Token( 697 name=tag, 698 ttype=TokenType.MACRO, 699 data=(self.macros[name], macro_args), 700 ) 701 continue 702 703 if self.raise_unknown_markup: 704 raise MarkupSyntaxError( 705 tag=tag, cause="not defined", context=markup_text 706 ) 707 708 cursor = end 709 710 # Add remaining text as plain 711 if len(markup_text) > cursor: 712 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:]) 713 714 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 715 """Converts the given ANSI string into an iterator of `Token`. 716 717 Args: 718 ansi: The text to look at. 719 720 Returns: 721 An iterator of tokens. The reason this is an iterator is to possibly save 722 on memory. 723 """ 724 725 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 726 """Determines whether a code is in the given dict of tags.""" 727 728 for name, current in tags.items(): 729 if current == code: 730 return name 731 732 return None 733 734 def _generate_color( 735 parts: list[str], code: str 736 ) -> tuple[str, TokenType, Color]: 737 """Generates a color token.""" 738 739 data: Color 740 if len(parts) == 1: 741 data = StandardColor.from_ansi(code) 742 name = data.name 743 ttype = TokenType.COLOR 744 745 else: 746 data = str_to_color(code) 747 name = data.name 748 ttype = TokenType.COLOR 749 750 return name, ttype, data 751 752 end = 0 753 start = 0 754 cursor = 0 755 756 # StyledText messes with indexing, so we need to cast it 757 # back to str. 758 if isinstance(ansi, StyledText): 759 ansi = str(ansi) 760 761 for match in RE_ANSI.finditer(ansi): 762 code = match.groups()[0] 763 start, end = match.span() 764 765 if code is None: 766 continue 767 768 parts = code.split(";") 769 770 if start > cursor: 771 plain = ansi[cursor:start] 772 773 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 774 775 name: str | None = code 776 ttype = None 777 data: str | Color = parts[0] 778 779 # Styles & Unsetters 780 if len(parts) == 1: 781 # Covariancy is not an issue here, even though mypy seems to think so. 782 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 783 if name is not None: 784 ttype = TokenType.UNSETTER 785 786 else: 787 name = _is_in_tags(parts[0], self.tags) 788 if name is not None: 789 ttype = TokenType.STYLE 790 791 # Colors 792 if ttype is None: 793 with suppress(ColorSyntaxError): 794 name, ttype, data = _generate_color(parts, code) 795 796 if name is None or ttype is None or data is None: 797 if len(parts) != 2: 798 raise AnsiSyntaxError( 799 tag=parts[0], cause="not recognized", context=ansi 800 ) 801 802 name = "position" 803 ttype = TokenType.POSITION 804 data = ",".join(reversed(parts)) 805 806 yield Token(name=name, ttype=ttype, data=data) 807 cursor = end 808 809 if cursor < len(ansi): 810 plain = ansi[cursor:] 811 812 yield Token(ttype=TokenType.PLAIN, data=plain) 813 814 def define(self, name: str, method: MacroCallable) -> None: 815 """Defines a Macro tag that executes the given method. 816 817 Args: 818 name: The name the given method will be reachable by within markup. 819 The given value gets "!" prepended if it isn't present already. 820 method: The method this macro will execute. 821 """ 822 823 if not name.startswith("!"): 824 name = f"!{name}" 825 826 self.macros[name] = method 827 self.unsetters[f"/{name}"] = None 828 829 def alias(self, name: str, value: str) -> None: 830 """Aliases the given name to a value, and generates an unsetter for it. 831 832 Note that it is not possible to alias macros. 833 834 Args: 835 name: The name of the new tag. 836 value: The value the new tag will stand for. 837 """ 838 839 def _get_unsetter(token: Token) -> str | None: 840 """Get unsetter for a token""" 841 842 if token.ttype is TokenType.PLAIN: 843 return None 844 845 if token.ttype is TokenType.UNSETTER: 846 return self.unsetters[token.name] 847 848 if token.ttype is TokenType.COLOR: 849 assert isinstance(token.data, Color) 850 851 if token.data.background: 852 return self.unsetters["/bg"] 853 854 return self.unsetters["/fg"] 855 856 name = f"/{token.name}" 857 if not name in self.unsetters: 858 raise KeyError(f"Could not find unsetter for token {token}.") 859 860 return self.unsetters[name] 861 862 if name.startswith("!"): 863 raise ValueError('Only macro tags can always start with "!".') 864 865 setter = "" 866 unsetter = "" 867 868 # Try to link to existing tag 869 if value in self.user_tags: 870 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 871 self.user_tags[name] = self.user_tags[value] 872 return 873 874 for token in self.tokenize_markup(f"[{value}]"): 875 if token.ttype is TokenType.PLAIN: 876 continue 877 878 assert token.sequence is not None 879 setter += token.sequence 880 881 t_unsetter = _get_unsetter(token) 882 unsetter += f"\x1b[{t_unsetter}m" 883 884 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 885 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 886 887 marked: list[str] = [] 888 for item in self._cache: 889 if name in item: 890 marked.append(item) 891 892 for item in marked: 893 del self._cache[item] 894 895 # TODO: I cannot cut down the one-too-many branch that this has at the moment. 896 # We could look into it in the future, however. 897 def parse( # pylint: disable=too-many-branches 898 self, markup_text: str 899 ) -> StyledText: 900 """Parses the given markup. 901 902 Args: 903 markup_text: The markup to parse. 904 905 Returns: 906 A `StyledText` instance of the result of parsing the input. This 907 custom `str` class is used to allow accessing the plain value of 908 the output, as well as to cleanly index within it. It is analogous 909 to builtin `str`, only adds extra things on top. 910 """ 911 912 applied_macros: list[tuple[str, MacroCall]] = [] 913 previous_token: Token | None = None 914 previous_sequence = "" 915 sequence = "" 916 out = "" 917 918 def _apply_macros(text: str) -> str: 919 """Apply current macros to text""" 920 921 for _, (method, args) in applied_macros: 922 text = method(*args, text) 923 924 return text 925 926 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 927 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 928 return False 929 930 return ( 931 type(previous) is type(new) 932 and previous.data.background == new.data.background 933 ) 934 935 if ( 936 self.should_cache 937 and markup_text in self._cache 938 and len(RE_MACRO.findall(markup_text)) == 0 939 ): 940 return self._cache[markup_text] 941 942 token: Token 943 for token in self.tokenize_markup(markup_text): 944 if sequence != "" and previous_token == token: 945 continue 946 947 # Optimize out previously added color tokens, as only the most 948 # recent would be visible anyways. 949 if ( 950 token.sequence is not None 951 and previous_token is not None 952 and _is_same_colorgroup(previous_token, token) 953 ): 954 sequence = token.sequence 955 continue 956 957 if token.ttype == TokenType.UNSETTER and token.data == "0": 958 out += "\033[0m" 959 sequence = "" 960 applied_macros = [] 961 continue 962 963 previous_token = token 964 965 # Macro unsetters are stored with None as their data 966 if token.data is None and token.ttype is TokenType.UNSETTER: 967 for item, data in applied_macros.copy(): 968 macro_match = RE_MACRO.match(item) 969 assert macro_match is not None 970 971 macro_name = macro_match.groups()[0] 972 973 if f"/{macro_name}" == token.name: 974 applied_macros.remove((item, data)) 975 976 continue 977 978 if token.ttype is TokenType.MACRO: 979 assert isinstance(token.data, tuple) 980 981 applied_macros.append((token.name, token.data)) 982 continue 983 984 if token.sequence is None: 985 applied = sequence 986 987 if not out.endswith("\x1b[0m"): 988 for item in previous_sequence.split("\x1b"): 989 if item == "" or item[1:-1] in self.unsetters.values(): 990 continue 991 992 item = f"\x1b{item}" 993 applied = applied.replace(item, "") 994 995 out += applied + _apply_macros(token.name) 996 previous_sequence = sequence 997 sequence = "" 998 continue 999 1000 sequence += token.sequence 1001 1002 if sequence + previous_sequence != "": 1003 out += "\x1b[0m" 1004 1005 out = StyledText(out) 1006 self._cache[markup_text] = out 1007 return out 1008 1009 def get_markup(self, ansi: str) -> str: 1010 """Generates markup from ANSI text. 1011 1012 Args: 1013 ansi: The text to get markup from. 1014 1015 Returns: 1016 A markup string that can be parsed to get (visually) the same 1017 result. Note that this conversion is lossy in a way: there are some 1018 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 1019 conversion. 1020 """ 1021 1022 current_tags: list[str] = [] 1023 out = "" 1024 for token in self.tokenize_ansi(ansi): 1025 if token.ttype is TokenType.PLAIN: 1026 if len(current_tags) != 0: 1027 out += "[" + " ".join(current_tags) + "]" 1028 1029 assert isinstance(token.data, str) 1030 out += token.data 1031 current_tags = [] 1032 continue 1033 1034 if token.ttype is TokenType.ESCAPED: 1035 assert isinstance(token.data, str) 1036 1037 current_tags.append(token.data) 1038 continue 1039 1040 current_tags.append(token.name) 1041 1042 return out 1043 1044 def prettify_ansi(self, text: str) -> str: 1045 """Returns a prettified (syntax-highlighted) ANSI str. 1046 1047 This is useful to quickly "inspect" a given ANSI string. However, 1048 for most real uses `MarkupLanguage.prettify_markup` would be 1049 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1050 as it is much more verbose. 1051 1052 Args: 1053 text: The ANSI-text to prettify. 1054 1055 Returns: 1056 The prettified ANSI text. This text's styles remain valid, 1057 so copy-pasting the argument into a command (like printf) 1058 that can show styled text will work the same way. 1059 """ 1060 1061 out = "" 1062 sequences = "" 1063 for token in self.tokenize_ansi(text): 1064 if token.ttype is TokenType.PLAIN: 1065 assert isinstance(token.data, str) 1066 out += token.data 1067 continue 1068 1069 assert token.sequence is not None 1070 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1071 sequences += token.sequence 1072 out += sequences 1073 1074 return out 1075 1076 def prettify_markup(self, text: str) -> str: 1077 """Returns a prettified (syntax-highlighted) markup str. 1078 1079 Args: 1080 text: The markup-text to prettify. 1081 1082 Returns: 1083 Prettified markup. This markup, excluding its styles, 1084 remains valid markup. 1085 """ 1086 1087 def _apply_macros(text: str) -> str: 1088 """Apply current macros to text""" 1089 1090 for _, (method, args) in applied_macros: 1091 text = method(*args, text) 1092 1093 return text 1094 1095 def _pop_macro(name: str) -> None: 1096 """Pops a macro from applied_macros.""" 1097 1098 for i, (macro_name, _) in enumerate(applied_macros): 1099 if macro_name == name: 1100 applied_macros.pop(i) 1101 break 1102 1103 def _finish(out: str, in_sequence: bool) -> str: 1104 """Adds ending cap to the given string.""" 1105 1106 if in_sequence: 1107 if not out.endswith("\x1b[0m"): 1108 out += "\x1b[0m" 1109 1110 return out + "]" 1111 1112 return out + "[/]" 1113 1114 styles: dict[TokenType, str] = { 1115 TokenType.MACRO: "210", 1116 TokenType.ESCAPED: "210 bold", 1117 TokenType.UNSETTER: "strikethrough", 1118 } 1119 1120 applied_macros: list[tuple[str, MacroCall]] = [] 1121 1122 out = "" 1123 in_sequence = False 1124 current_styles: list[Token] = [] 1125 1126 for token in self.tokenize_markup(text): 1127 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1128 if in_sequence: 1129 out += "]" 1130 1131 in_sequence = False 1132 1133 sequence = "" 1134 for style in current_styles: 1135 if style.sequence is None: 1136 continue 1137 1138 sequence += style.sequence 1139 1140 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1141 continue 1142 1143 out += " " if in_sequence else "[" 1144 in_sequence = True 1145 1146 if token.ttype is TokenType.UNSETTER: 1147 if token.name == "/": 1148 applied_macros = [] 1149 1150 name = token.name[1:] 1151 1152 if name in self.macros: 1153 _pop_macro(name) 1154 1155 current_styles.append(token) 1156 1157 out += self.parse( 1158 ("" if (name in self.tags) or (name in self.user_tags) else "") 1159 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1160 ) 1161 continue 1162 1163 if token.ttype is TokenType.MACRO: 1164 assert isinstance(token.data, tuple) 1165 1166 name = token.name 1167 if "(" in name: 1168 name = name[: token.name.index("(")] 1169 1170 applied_macros.append((name, token.data)) 1171 1172 try: 1173 out += token.data[0](*token.data[1], token.name) 1174 continue 1175 1176 except TypeError: # Not enough arguments 1177 pass 1178 1179 if token.sequence is not None: 1180 current_styles.append(token) 1181 1182 style_markup = styles.get(token.ttype) or token.name 1183 out += self.parse(f"[{style_markup}]{token.name}") 1184 1185 return _finish(out, in_sequence) 1186 1187 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1188 """Gets all plain tokens within text, with their respective styles applied. 1189 1190 Args: 1191 text: The ANSI-sequence containing string to find plains from. 1192 1193 Returns: 1194 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1195 containing the styles that are relevant and active on the given plain. 1196 """ 1197 1198 def _apply_styles(styles: list[Token], text: str) -> str: 1199 """Applies given styles to text.""" 1200 1201 for token in styles: 1202 if token.ttype is TokenType.MACRO: 1203 assert isinstance(token.data, tuple) 1204 text = token.data[0](*token.data[1], text) 1205 continue 1206 1207 if token.sequence is None: 1208 continue 1209 1210 text = token.sequence + text 1211 1212 return text 1213 1214 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1215 """Removes an unsetter from the list, returns the new list.""" 1216 1217 if token.name == "/": 1218 return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles)) 1219 1220 target_name = token.name[1:] 1221 for style in styles: 1222 # bold & dim unsetters represent the same character, so we have 1223 # to treat them the same way. 1224 style_name = style.name 1225 1226 if style.name == "dim": 1227 style_name = "bold" 1228 1229 if style_name == target_name: 1230 styles.remove(style) 1231 1232 elif ( 1233 style_name.startswith(target_name) 1234 and style.ttype is TokenType.MACRO 1235 ): 1236 styles.remove(style) 1237 1238 elif style.ttype is TokenType.COLOR: 1239 assert isinstance(style.data, Color) 1240 if target_name == "fg" and not style.data.background: 1241 styles.remove(style) 1242 1243 elif target_name == "bg" and style.data.background: 1244 styles.remove(style) 1245 1246 return styles 1247 1248 def _pop_position(styles: list[Token]) -> list[Token]: 1249 for token in styles.copy(): 1250 if token.ttype is TokenType.POSITION: 1251 styles.remove(token) 1252 1253 return styles 1254 1255 styles: list[Token] = [] 1256 for token in self.tokenize_ansi(text): 1257 if token.ttype is TokenType.COLOR: 1258 for i, style in enumerate(reversed(styles)): 1259 if style.ttype is TokenType.COLOR: 1260 assert isinstance(style.data, Color) 1261 assert isinstance(token.data, Color) 1262 1263 if style.data.background != token.data.background: 1264 continue 1265 1266 styles[len(styles) - i - 1] = token 1267 break 1268 else: 1269 styles.append(token) 1270 1271 continue 1272 1273 if token.ttype is TokenType.LINK: 1274 styles.append(token) 1275 yield StyledText(_apply_styles(styles, token.name)) 1276 1277 if token.ttype is TokenType.PLAIN: 1278 assert isinstance(token.data, str) 1279 yield StyledText(_apply_styles(styles, token.data)) 1280 styles = _pop_position(styles) 1281 continue 1282 1283 if token.ttype is TokenType.UNSETTER: 1284 styles = _pop_unsetter(token, styles) 1285 continue 1286 1287 styles.append(token)
A class representing an instance of a Markup Language.
This class is used for all markup/ANSI parsing, tokenizing and usage.
from pytermgui import tim
tim.alias("my-tag", "@152 72 bold")
tim.print("This is [my-tag]my-tag[/]!")
552 def __init__(self, default_macros: bool = True) -> None: 553 """Initializes a MarkupLanguage. 554 555 Args: 556 default_macros: If not set, the builtin macros are not defined. 557 """ 558 559 self.tags: dict[str, str] = STYLE_MAP.copy() 560 self._cache: dict[str, StyledText] = {} 561 self.macros: dict[str, MacroCallable] = {} 562 self.user_tags: dict[str, str] = {} 563 self.unsetters: dict[str, str | None] = UNSETTER_MAP.copy() 564 565 self.should_cache: bool = True 566 567 if default_macros: 568 self.define("!link", macro_link) 569 self.define("!align", macro_align) 570 self.define("!markup", self.get_markup) 571 self.define("!shuffle", macro_shuffle) 572 self.define("!strip_bg", macro_strip_bg) 573 self.define("!strip_fg", macro_strip_fg) 574 self.define("!rainbow", macro_rainbow) 575 self.define("!gradient", macro_gradient) 576 self.define("!upper", lambda item: str(item.upper())) 577 self.define("!lower", lambda item: str(item.lower())) 578 self.define("!title", lambda item: str(item.title())) 579 self.define("!capitalize", lambda item: str(item.capitalize())) 580 self.define("!expand", lambda tag: macro_expand(self, tag)) 581 self.define("!debug", lambda *args: ",".join(ascii(arg) for arg in args)) 582 583 self.alias("code", "dim @black") 584 self.alias("code.str", "142") 585 self.alias("code.multiline_str", "code.str") 586 self.alias("code.none", "167") 587 self.alias("code.global", "214") 588 self.alias("code.number", "175") 589 self.alias("code.keyword", "203") 590 self.alias("code.identifier", "109") 591 self.alias("code.name", "code.global") 592 self.alias("code.comment", "240 italic") 593 self.alias("code.builtin", "code.global") 594 self.alias("code.file", "code.identifier") 595 self.alias("code.symbol", "code.identifier")
Initializes a MarkupLanguage.
Args
- default_macros: If not set, the builtin macros are not defined.
Raise pytermgui.exceptions.MarkupSyntaxError
when encountering unknown markup tags.
636 def print(self, *args, **kwargs) -> None: 637 """Parse all arguments and pass them through to print, along with kwargs.""" 638 639 parsed = [] 640 for arg in args: 641 parsed.append(self.parse(str(arg))) 642 643 get_terminal().print(*parsed, **kwargs)
Parse all arguments and pass them through to print, along with kwargs.
645 def tokenize_markup(self, markup_text: str) -> Iterator[Token]: 646 """Converts the given markup string into an iterator of `Token`. 647 648 Args: 649 markup_text: The text to look at. 650 651 Returns: 652 An iterator of tokens. The reason this is an iterator is to possibly save 653 on memory. 654 """ 655 656 end = 0 657 start = 0 658 cursor = 0 659 for match in RE_MARKUP.finditer(markup_text): 660 full, escapes, tag_text = match.groups() 661 start, end = match.span() 662 663 # Add plain text between last and current match 664 if start > cursor: 665 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:start]) 666 667 if not escapes == "" and len(escapes) % 2 == 1: 668 cursor = end 669 yield Token(ttype=TokenType.ESCAPED, data=full[len(escapes) - 1 :]) 670 continue 671 672 for tag in tag_text.split(): 673 token = self._get_style_token(tag) 674 if token is not None: 675 yield token 676 continue 677 678 # Try to find a color token 679 token = self._get_color_token(tag) 680 if token is not None: 681 yield token 682 continue 683 684 macro_match = RE_MACRO.match(tag) 685 if macro_match is not None: 686 name, args = macro_match.groups() 687 macro_args = () if args is None else args.split(":") 688 689 if not name in self.macros: 690 raise MarkupSyntaxError( 691 tag=tag, 692 cause="is not a defined macro", 693 context=markup_text, 694 ) 695 696 yield Token( 697 name=tag, 698 ttype=TokenType.MACRO, 699 data=(self.macros[name], macro_args), 700 ) 701 continue 702 703 if self.raise_unknown_markup: 704 raise MarkupSyntaxError( 705 tag=tag, cause="not defined", context=markup_text 706 ) 707 708 cursor = end 709 710 # Add remaining text as plain 711 if len(markup_text) > cursor: 712 yield Token(ttype=TokenType.PLAIN, data=markup_text[cursor:])
Converts the given markup string into an iterator of Token
.
Args
- markup_text: The text to look at.
Returns
An iterator of tokens. The reason this is an iterator is to possibly save on memory.
714 def tokenize_ansi(self, ansi: str) -> Iterator[Token]: 715 """Converts the given ANSI string into an iterator of `Token`. 716 717 Args: 718 ansi: The text to look at. 719 720 Returns: 721 An iterator of tokens. The reason this is an iterator is to possibly save 722 on memory. 723 """ 724 725 def _is_in_tags(code: str, tags: dict[str, str]) -> str | None: 726 """Determines whether a code is in the given dict of tags.""" 727 728 for name, current in tags.items(): 729 if current == code: 730 return name 731 732 return None 733 734 def _generate_color( 735 parts: list[str], code: str 736 ) -> tuple[str, TokenType, Color]: 737 """Generates a color token.""" 738 739 data: Color 740 if len(parts) == 1: 741 data = StandardColor.from_ansi(code) 742 name = data.name 743 ttype = TokenType.COLOR 744 745 else: 746 data = str_to_color(code) 747 name = data.name 748 ttype = TokenType.COLOR 749 750 return name, ttype, data 751 752 end = 0 753 start = 0 754 cursor = 0 755 756 # StyledText messes with indexing, so we need to cast it 757 # back to str. 758 if isinstance(ansi, StyledText): 759 ansi = str(ansi) 760 761 for match in RE_ANSI.finditer(ansi): 762 code = match.groups()[0] 763 start, end = match.span() 764 765 if code is None: 766 continue 767 768 parts = code.split(";") 769 770 if start > cursor: 771 plain = ansi[cursor:start] 772 773 yield Token(name=plain, ttype=TokenType.PLAIN, data=plain) 774 775 name: str | None = code 776 ttype = None 777 data: str | Color = parts[0] 778 779 # Styles & Unsetters 780 if len(parts) == 1: 781 # Covariancy is not an issue here, even though mypy seems to think so. 782 name = _is_in_tags(parts[0], self.unsetters) # type: ignore 783 if name is not None: 784 ttype = TokenType.UNSETTER 785 786 else: 787 name = _is_in_tags(parts[0], self.tags) 788 if name is not None: 789 ttype = TokenType.STYLE 790 791 # Colors 792 if ttype is None: 793 with suppress(ColorSyntaxError): 794 name, ttype, data = _generate_color(parts, code) 795 796 if name is None or ttype is None or data is None: 797 if len(parts) != 2: 798 raise AnsiSyntaxError( 799 tag=parts[0], cause="not recognized", context=ansi 800 ) 801 802 name = "position" 803 ttype = TokenType.POSITION 804 data = ",".join(reversed(parts)) 805 806 yield Token(name=name, ttype=ttype, data=data) 807 cursor = end 808 809 if cursor < len(ansi): 810 plain = ansi[cursor:] 811 812 yield Token(ttype=TokenType.PLAIN, data=plain)
Converts the given ANSI string into an iterator of Token
.
Args
- ansi: The text to look at.
Returns
An iterator of tokens. The reason this is an iterator is to possibly save on memory.
814 def define(self, name: str, method: MacroCallable) -> None: 815 """Defines a Macro tag that executes the given method. 816 817 Args: 818 name: The name the given method will be reachable by within markup. 819 The given value gets "!" prepended if it isn't present already. 820 method: The method this macro will execute. 821 """ 822 823 if not name.startswith("!"): 824 name = f"!{name}" 825 826 self.macros[name] = method 827 self.unsetters[f"/{name}"] = None
Defines a Macro tag that executes the given method.
Args
- name: The name the given method will be reachable by within markup. The given value gets "!" prepended if it isn't present already.
- method: The method this macro will execute.
829 def alias(self, name: str, value: str) -> None: 830 """Aliases the given name to a value, and generates an unsetter for it. 831 832 Note that it is not possible to alias macros. 833 834 Args: 835 name: The name of the new tag. 836 value: The value the new tag will stand for. 837 """ 838 839 def _get_unsetter(token: Token) -> str | None: 840 """Get unsetter for a token""" 841 842 if token.ttype is TokenType.PLAIN: 843 return None 844 845 if token.ttype is TokenType.UNSETTER: 846 return self.unsetters[token.name] 847 848 if token.ttype is TokenType.COLOR: 849 assert isinstance(token.data, Color) 850 851 if token.data.background: 852 return self.unsetters["/bg"] 853 854 return self.unsetters["/fg"] 855 856 name = f"/{token.name}" 857 if not name in self.unsetters: 858 raise KeyError(f"Could not find unsetter for token {token}.") 859 860 return self.unsetters[name] 861 862 if name.startswith("!"): 863 raise ValueError('Only macro tags can always start with "!".') 864 865 setter = "" 866 unsetter = "" 867 868 # Try to link to existing tag 869 if value in self.user_tags: 870 self.unsetters[f"/{name}"] = self.unsetters[f"/{value}"] 871 self.user_tags[name] = self.user_tags[value] 872 return 873 874 for token in self.tokenize_markup(f"[{value}]"): 875 if token.ttype is TokenType.PLAIN: 876 continue 877 878 assert token.sequence is not None 879 setter += token.sequence 880 881 t_unsetter = _get_unsetter(token) 882 unsetter += f"\x1b[{t_unsetter}m" 883 884 self.unsetters[f"/{name}"] = unsetter.lstrip("\x1b[").rstrip("m") 885 self.user_tags[name] = setter.lstrip("\x1b[").rstrip("m") 886 887 marked: list[str] = [] 888 for item in self._cache: 889 if name in item: 890 marked.append(item) 891 892 for item in marked: 893 del self._cache[item]
Aliases the given name to a value, and generates an unsetter for it.
Note that it is not possible to alias macros.
Args
- name: The name of the new tag.
- value: The value the new tag will stand for.
897 def parse( # pylint: disable=too-many-branches 898 self, markup_text: str 899 ) -> StyledText: 900 """Parses the given markup. 901 902 Args: 903 markup_text: The markup to parse. 904 905 Returns: 906 A `StyledText` instance of the result of parsing the input. This 907 custom `str` class is used to allow accessing the plain value of 908 the output, as well as to cleanly index within it. It is analogous 909 to builtin `str`, only adds extra things on top. 910 """ 911 912 applied_macros: list[tuple[str, MacroCall]] = [] 913 previous_token: Token | None = None 914 previous_sequence = "" 915 sequence = "" 916 out = "" 917 918 def _apply_macros(text: str) -> str: 919 """Apply current macros to text""" 920 921 for _, (method, args) in applied_macros: 922 text = method(*args, text) 923 924 return text 925 926 def _is_same_colorgroup(previous: Token, new: Token) -> bool: 927 if not isinstance(new.data, Color) or not isinstance(previous.data, Color): 928 return False 929 930 return ( 931 type(previous) is type(new) 932 and previous.data.background == new.data.background 933 ) 934 935 if ( 936 self.should_cache 937 and markup_text in self._cache 938 and len(RE_MACRO.findall(markup_text)) == 0 939 ): 940 return self._cache[markup_text] 941 942 token: Token 943 for token in self.tokenize_markup(markup_text): 944 if sequence != "" and previous_token == token: 945 continue 946 947 # Optimize out previously added color tokens, as only the most 948 # recent would be visible anyways. 949 if ( 950 token.sequence is not None 951 and previous_token is not None 952 and _is_same_colorgroup(previous_token, token) 953 ): 954 sequence = token.sequence 955 continue 956 957 if token.ttype == TokenType.UNSETTER and token.data == "0": 958 out += "\033[0m" 959 sequence = "" 960 applied_macros = [] 961 continue 962 963 previous_token = token 964 965 # Macro unsetters are stored with None as their data 966 if token.data is None and token.ttype is TokenType.UNSETTER: 967 for item, data in applied_macros.copy(): 968 macro_match = RE_MACRO.match(item) 969 assert macro_match is not None 970 971 macro_name = macro_match.groups()[0] 972 973 if f"/{macro_name}" == token.name: 974 applied_macros.remove((item, data)) 975 976 continue 977 978 if token.ttype is TokenType.MACRO: 979 assert isinstance(token.data, tuple) 980 981 applied_macros.append((token.name, token.data)) 982 continue 983 984 if token.sequence is None: 985 applied = sequence 986 987 if not out.endswith("\x1b[0m"): 988 for item in previous_sequence.split("\x1b"): 989 if item == "" or item[1:-1] in self.unsetters.values(): 990 continue 991 992 item = f"\x1b{item}" 993 applied = applied.replace(item, "") 994 995 out += applied + _apply_macros(token.name) 996 previous_sequence = sequence 997 sequence = "" 998 continue 999 1000 sequence += token.sequence 1001 1002 if sequence + previous_sequence != "": 1003 out += "\x1b[0m" 1004 1005 out = StyledText(out) 1006 self._cache[markup_text] = out 1007 return out
Parses the given markup.
Args
- markup_text: The markup to parse.
Returns
A
StyledText
instance of the result of parsing the input. This customstr
class is used to allow accessing the plain value of the output, as well as to cleanly index within it. It is analogous to builtinstr
, only adds extra things on top.
1009 def get_markup(self, ansi: str) -> str: 1010 """Generates markup from ANSI text. 1011 1012 Args: 1013 ansi: The text to get markup from. 1014 1015 Returns: 1016 A markup string that can be parsed to get (visually) the same 1017 result. Note that this conversion is lossy in a way: there are some 1018 details (like macros) that cannot be preserved in an ANSI->Markup->ANSI 1019 conversion. 1020 """ 1021 1022 current_tags: list[str] = [] 1023 out = "" 1024 for token in self.tokenize_ansi(ansi): 1025 if token.ttype is TokenType.PLAIN: 1026 if len(current_tags) != 0: 1027 out += "[" + " ".join(current_tags) + "]" 1028 1029 assert isinstance(token.data, str) 1030 out += token.data 1031 current_tags = [] 1032 continue 1033 1034 if token.ttype is TokenType.ESCAPED: 1035 assert isinstance(token.data, str) 1036 1037 current_tags.append(token.data) 1038 continue 1039 1040 current_tags.append(token.name) 1041 1042 return out
Generates markup from ANSI text.
Args
- ansi: The text to get markup from.
Returns
A markup string that can be parsed to get (visually) the same result. Note that this conversion is lossy in a way: there are some details (like macros) that cannot be preserved in an ANSI->Markup->ANSI conversion.
1044 def prettify_ansi(self, text: str) -> str: 1045 """Returns a prettified (syntax-highlighted) ANSI str. 1046 1047 This is useful to quickly "inspect" a given ANSI string. However, 1048 for most real uses `MarkupLanguage.prettify_markup` would be 1049 preferable, given an argument of `MarkupLanguage.get_markup(text)`, 1050 as it is much more verbose. 1051 1052 Args: 1053 text: The ANSI-text to prettify. 1054 1055 Returns: 1056 The prettified ANSI text. This text's styles remain valid, 1057 so copy-pasting the argument into a command (like printf) 1058 that can show styled text will work the same way. 1059 """ 1060 1061 out = "" 1062 sequences = "" 1063 for token in self.tokenize_ansi(text): 1064 if token.ttype is TokenType.PLAIN: 1065 assert isinstance(token.data, str) 1066 out += token.data 1067 continue 1068 1069 assert token.sequence is not None 1070 out += "\x1b[0m" + token.sequence + token.sequence.replace("\x1b", "\\x1b") 1071 sequences += token.sequence 1072 out += sequences 1073 1074 return out
Returns a prettified (syntax-highlighted) ANSI str.
This is useful to quickly "inspect" a given ANSI string. However,
for most real uses MarkupLanguage.prettify_markup
would be
preferable, given an argument of MarkupLanguage.get_markup(text)
,
as it is much more verbose.
Args
- text: The ANSI-text to prettify.
Returns
The prettified ANSI text. This text's styles remain valid, so copy-pasting the argument into a command (like printf) that can show styled text will work the same way.
1076 def prettify_markup(self, text: str) -> str: 1077 """Returns a prettified (syntax-highlighted) markup str. 1078 1079 Args: 1080 text: The markup-text to prettify. 1081 1082 Returns: 1083 Prettified markup. This markup, excluding its styles, 1084 remains valid markup. 1085 """ 1086 1087 def _apply_macros(text: str) -> str: 1088 """Apply current macros to text""" 1089 1090 for _, (method, args) in applied_macros: 1091 text = method(*args, text) 1092 1093 return text 1094 1095 def _pop_macro(name: str) -> None: 1096 """Pops a macro from applied_macros.""" 1097 1098 for i, (macro_name, _) in enumerate(applied_macros): 1099 if macro_name == name: 1100 applied_macros.pop(i) 1101 break 1102 1103 def _finish(out: str, in_sequence: bool) -> str: 1104 """Adds ending cap to the given string.""" 1105 1106 if in_sequence: 1107 if not out.endswith("\x1b[0m"): 1108 out += "\x1b[0m" 1109 1110 return out + "]" 1111 1112 return out + "[/]" 1113 1114 styles: dict[TokenType, str] = { 1115 TokenType.MACRO: "210", 1116 TokenType.ESCAPED: "210 bold", 1117 TokenType.UNSETTER: "strikethrough", 1118 } 1119 1120 applied_macros: list[tuple[str, MacroCall]] = [] 1121 1122 out = "" 1123 in_sequence = False 1124 current_styles: list[Token] = [] 1125 1126 for token in self.tokenize_markup(text): 1127 if token.ttype in [TokenType.PLAIN, TokenType.ESCAPED]: 1128 if in_sequence: 1129 out += "]" 1130 1131 in_sequence = False 1132 1133 sequence = "" 1134 for style in current_styles: 1135 if style.sequence is None: 1136 continue 1137 1138 sequence += style.sequence 1139 1140 out += f"{sequence}{_apply_macros(token.name)}\033[0m" 1141 continue 1142 1143 out += " " if in_sequence else "[" 1144 in_sequence = True 1145 1146 if token.ttype is TokenType.UNSETTER: 1147 if token.name == "/": 1148 applied_macros = [] 1149 1150 name = token.name[1:] 1151 1152 if name in self.macros: 1153 _pop_macro(name) 1154 1155 current_styles.append(token) 1156 1157 out += self.parse( 1158 ("" if (name in self.tags) or (name in self.user_tags) else "") 1159 + f"[{styles[TokenType.UNSETTER]}]/{name}" 1160 ) 1161 continue 1162 1163 if token.ttype is TokenType.MACRO: 1164 assert isinstance(token.data, tuple) 1165 1166 name = token.name 1167 if "(" in name: 1168 name = name[: token.name.index("(")] 1169 1170 applied_macros.append((name, token.data)) 1171 1172 try: 1173 out += token.data[0](*token.data[1], token.name) 1174 continue 1175 1176 except TypeError: # Not enough arguments 1177 pass 1178 1179 if token.sequence is not None: 1180 current_styles.append(token) 1181 1182 style_markup = styles.get(token.ttype) or token.name 1183 out += self.parse(f"[{style_markup}]{token.name}") 1184 1185 return _finish(out, in_sequence)
Returns a prettified (syntax-highlighted) markup str.
Args
- text: The markup-text to prettify.
Returns
Prettified markup. This markup, excluding its styles, remains valid markup.
1187 def get_styled_plains(self, text: str) -> Iterator[StyledText]: 1188 """Gets all plain tokens within text, with their respective styles applied. 1189 1190 Args: 1191 text: The ANSI-sequence containing string to find plains from. 1192 1193 Returns: 1194 An iterator of `StyledText` objects, each yielded when a new plain token is found, 1195 containing the styles that are relevant and active on the given plain. 1196 """ 1197 1198 def _apply_styles(styles: list[Token], text: str) -> str: 1199 """Applies given styles to text.""" 1200 1201 for token in styles: 1202 if token.ttype is TokenType.MACRO: 1203 assert isinstance(token.data, tuple) 1204 text = token.data[0](*token.data[1], text) 1205 continue 1206 1207 if token.sequence is None: 1208 continue 1209 1210 text = token.sequence + text 1211 1212 return text 1213 1214 def _pop_unsetter(token: Token, styles: list[Token]) -> list[Token]: 1215 """Removes an unsetter from the list, returns the new list.""" 1216 1217 if token.name == "/": 1218 return list(filter(lambda tkn: tkn.ttype is TokenType.POSITION, styles)) 1219 1220 target_name = token.name[1:] 1221 for style in styles: 1222 # bold & dim unsetters represent the same character, so we have 1223 # to treat them the same way. 1224 style_name = style.name 1225 1226 if style.name == "dim": 1227 style_name = "bold" 1228 1229 if style_name == target_name: 1230 styles.remove(style) 1231 1232 elif ( 1233 style_name.startswith(target_name) 1234 and style.ttype is TokenType.MACRO 1235 ): 1236 styles.remove(style) 1237 1238 elif style.ttype is TokenType.COLOR: 1239 assert isinstance(style.data, Color) 1240 if target_name == "fg" and not style.data.background: 1241 styles.remove(style) 1242 1243 elif target_name == "bg" and style.data.background: 1244 styles.remove(style) 1245 1246 return styles 1247 1248 def _pop_position(styles: list[Token]) -> list[Token]: 1249 for token in styles.copy(): 1250 if token.ttype is TokenType.POSITION: 1251 styles.remove(token) 1252 1253 return styles 1254 1255 styles: list[Token] = [] 1256 for token in self.tokenize_ansi(text): 1257 if token.ttype is TokenType.COLOR: 1258 for i, style in enumerate(reversed(styles)): 1259 if style.ttype is TokenType.COLOR: 1260 assert isinstance(style.data, Color) 1261 assert isinstance(token.data, Color) 1262 1263 if style.data.background != token.data.background: 1264 continue 1265 1266 styles[len(styles) - i - 1] = token 1267 break 1268 else: 1269 styles.append(token) 1270 1271 continue 1272 1273 if token.ttype is TokenType.LINK: 1274 styles.append(token) 1275 yield StyledText(_apply_styles(styles, token.name)) 1276 1277 if token.ttype is TokenType.PLAIN: 1278 assert isinstance(token.data, str) 1279 yield StyledText(_apply_styles(styles, token.data)) 1280 styles = _pop_position(styles) 1281 continue 1282 1283 if token.ttype is TokenType.UNSETTER: 1284 styles = _pop_unsetter(token, styles) 1285 continue 1286 1287 styles.append(token)
Gets all plain tokens within text, with their respective styles applied.
Args
- text: The ANSI-sequence containing string to find plains from.
Returns
An iterator of
StyledText
objects, each yielded when a new plain token is found, containing the styles that are relevant and active on the given plain.