midgard.parsers._parser_chain
Basic functionality for parsing datafiles line by line
Description:
This module contains functions and classes for parsing datafiles.
Example:
from midgard import parsers
my_new_parser = parsers.parse_file('my_new_parser', 'file_name.txt', ...)
my_data = my_new_parser.as_dict()
ChainParser
ChainParser(file_path:Union[str, pathlib.Path], encoding:Union[str, NoneType]=None, logger:Union[Callable[[str], NoneType], NoneType]=<built-in function print>) -> None
An abstract base class that has basic methods for parsing a datafile
This class provides functionality for parsing a file with chained groups of information. You should inherit from
this one, and at least specify the necessary parameters in setup_parser
.
ChainParser.parse_line()
parse_line(self, line:str, cache:Dict[str, Any], parser:midgard.parsers._parser_chain.ParserDef) -> None
Parse line
A line is parsed by separating a line in fields. How the separation is done, is defined in the parser_def
entry of the ParserDef.
Args:
line
: Line to be parsed.cache
: Store temporary data.parser
: Dictionary with defined parsers with the keys 'parser_def', 'label' and 'end_marker'.
ChainParser.read_data()
read_data(self) -> None
Read data from a data file and parse the contents
ChainParser.setup_parser()
setup_parser(self) -> Any
Set up information needed for the parser
Return an iterable of ParserDef's that describe the structure of the file that will be parsed
ParserDef
ParserDef(end_marker:Callable[[str, int, str], bool], label:Callable[[str, int], str], parser_def:Dict[str, Dict[str, Any]], skip_line:Union[Callable[[str], bool], NoneType]=None, end_callback:Union[Callable[[Dict[str, Any]], NoneType], NoneType]=None)
A convenience class for defining the necessary fields of a parser
A single parser can read and parse one group of datalines, defined through the ParserDef by specifying how to parse each line (parser_def), how to identify each line (label), how to recognize the end of the group of lines (end_marker) and finally what (if anything) should be done after all lines in a group is read (end_callback).
The end_marker, label, skip_line and end_callback parameters should all be functions with the following signatures:
end_marker = func(line, line_num, next_line)
label = func(line, line_num)
skip_line = func(line)
end_callback = func(cache)
The parser definition parser_def
includes the parser
, field
, strip
and delimiter
entries. The parser
entry points to the parser function and the field
entry defines how to separate the line in fields. The separated
fields are saved either in a dictionary or in a list. In the last case the line is split on whitespace by
default. With the delimiter
entry the default definition can be overwritten. Leading and trailing whitespace
characters are removed by default before a line is parsed. This default can be overwritten by defining the
characters, which should be removed with the 'strip' entry. The parser
dictionary is defined like:
parser_def = { <label>: {'fields': <dict or list of fields>,
'parser': <parser function>,
'delimiter': <optional delimiter for splitting line>,
'strip': <optional characters to be removed from beginning and end of line>
}}
Args:
end_marker
: A function returning True for the last line in a group.label
: A function returning a label used in the parser_def.parser_def
: A dict with 'parser' and 'fields' defining the parser.skip_line
: A function returning True if the line should be skipped.end_callback
: A function called after reading all lines in a group.