midgard.parsers._parser_chain

Basic functionality for parsing datafiles line by line

Description:

This module contains functions and classes for parsing datafiles.

Example:

from midgard import parsers
my_new_parser = parsers.parse_file('my_new_parser', 'file_name.txt', ...)
my_data = my_new_parser.as_dict()

ChainParser

ChainParser(file_path:Union[str, pathlib.Path], encoding:Union[str, NoneType]=None, logger:Union[Callable[[str], NoneType], NoneType]=<built-in function print>) -> None

An abstract base class that has basic methods for parsing a datafile

This class provides functionality for parsing a file with chained groups of information. You should inherit from this one, and at least specify the necessary parameters in setup_parser.

ChainParser.parse_line()

parse_line(self, line:str, cache:Dict[str, Any], parser:midgard.parsers._parser_chain.ParserDef) -> None

Parse line

A line is parsed by separating a line in fields. How the separation is done, is defined in the parser_def entry of the ParserDef.

Args:

ChainParser.read_data()

read_data(self) -> None

Read data from a data file and parse the contents

ChainParser.setup_parser()

setup_parser(self) -> Any

Set up information needed for the parser

Return an iterable of ParserDef's that describe the structure of the file that will be parsed

ParserDef

ParserDef(end_marker:Callable[[str, int, str], bool], label:Callable[[str, int], str], parser_def:Dict[str, Dict[str, Any]], skip_line:Union[Callable[[str], bool], NoneType]=None, end_callback:Union[Callable[[Dict[str, Any]], NoneType], NoneType]=None)

A convenience class for defining the necessary fields of a parser

A single parser can read and parse one group of datalines, defined through the ParserDef by specifying how to parse each line (parser_def), how to identify each line (label), how to recognize the end of the group of lines (end_marker) and finally what (if anything) should be done after all lines in a group is read (end_callback).

The end_marker, label, skip_line and end_callback parameters should all be functions with the following signatures:

end_marker   = func(line, line_num, next_line)
label        = func(line, line_num)
skip_line    = func(line)
end_callback = func(cache)

The parser definition parser_def includes the parser, field, strip and delimiter entries. The parser entry points to the parser function and the field entry defines how to separate the line in fields. The separated fields are saved either in a dictionary or in a list. In the last case the line is split on whitespace by default. With the delimiter entry the default definition can be overwritten. Leading and trailing whitespace characters are removed by default before a line is parsed. This default can be overwritten by defining the characters, which should be removed with the 'strip' entry. The parser dictionary is defined like:

parser_def = { <label>: {'fields':    <dict or list of fields>,
                         'parser':    <parser function>,
                         'delimiter': <optional delimiter for splitting line>,
                         'strip':     <optional characters to be removed from beginning and end of line>
             }}

Args: