Python Toolkit for WMO BUFR Messages¶
PyBufrKit is a pure Python package to work with WMO BUFR (FM-94) messages. It can be used as both a command line tool or library to decode and encode BUFR messages. Here is a brief list of some of the features:
- Handles both compressed and un-compressed messages
- Handles all practical operator descriptors, including data quality info, stats, bitmaps, etc.
- Tested with the same set of BUFR files used by ecCodes and BUFRDC.
Read more documentation at http://pybufrkit.readthedocs.io/
Installation¶
PyBufrKit is compatible with both Python 2.6+ and 3.5+. To install from PyPi:
pip install pybufrkit
Or from source:
python setup.py install
Command Line Usage¶
The command line usage of the toolkit takes the following form:
pybufrkit [OPTIONS] sub-command ...
where the sub-command
is one of following actions that can be performed by the tool:
decode
- Decode a BUFR file to outputs of various format, e.g. JSONencode
- Encode a BUFR file from a JSON inputinfo
- Decode a BUFR file upt to the data associated to the BUFR Template (exclusive)lookup
- Look up information about the given BUFR descriptorcompile
- Compile the given comma separated BUFR descriptors
Here are a few examples using the tool from command line. For more details, please refer
to the help option, e.g. pybufrkit decode -h
. Also checkout the
documentation.
- Decode a BUFR file
pybufrkit decode BUFR_FILE
Decode a BUFR file and display it as a hierarchical structure corresponding to the BUFR Descriptors. In addition, the attribute descriptors are associated to their (bitmap) corresponding descriptors.
pybufrkit decode -a BUFR_FILE
- Decode a BUFR file and convert to JSON format (the JSON can be encoded back to the BUFR format)
pybufrkit decode -j BUFR_FILE
- Encode a JSON file to BUFR
pybufrkit encode JSON_FILE BUFR_FILE
- Decoded a BUFR file to JSON, pipe it to the encoder to encode it back to BUFR
pybufrkit decode -j BUFR_FILE | pybufrkit encode -
- Decode only the metadata sections of a BUFR file
pybufrkit info BUFR_FILE
- Lookup information for a Element Descriptor (along with its code table)
pybufrkit lookup -l 020003
- Compile a BUFR Template composed as a comma separated list of descriptors
pybufrkit compile 309052,205060
Library Usage¶
The followings are some basic library usage:
# Decode a BUFR file
from pybufrkit.decoder import Decoder
decoder = Decoder()
with open(SOME_BUFR_FILE, 'rb') as ins:
bufr_message = decoder.process(ins.read())
# Convert the BUFR message to JSON
from pybufrkit.renderer import FlatJsonRenderer
json_string = FlatJsonRenderer().render(bufr_message)
# Encode the JSON back to BUFR file
from pybufrkit.encoder import Encoder
encoder = Encoder()
bufr_message_new = encoder.process(json_string)
with open(BUFR_OUTPUT_FILE, 'wb') as outs:
outs.write(bufr_message_new.serialized_bytes)
How It Works¶
BUFR Message Configuration files¶
The configurations describe how a BUFR message is composed of sections and how each section is organised. They are used to provide the overall structure definition of a BUFR message. The purpose of using configurations is to allow greater flexibility so that changes of BUFR Spec do NOT (to a certain extent) require program changes.
The builtin Configuration JSON files are located in the definitions directory inside the package. It can also be configured to load from an user provided directory. The naming convention of the files is as follows:
sectionX[-Y].json
where X is the section index, Y is the edition number and optional.
Each section is configured with some metadata and a list of parameters. It takes the following general format:
{
"index": 0, # zero-based section index
"description": "Indicator section",
"default": true, # use this config if an edition-specific one is not available
"optional": false, # whether this section is optional
"end_of_message": false, # whether this is the last section
"parameters": [ # a list of parameter configs
{
"name": "start_signature", # parameter name
"nbits": 32, # number of bits
"type": "bytes", # parameter type determines how the value can be processed from the input bits
"expected": "BUFR", # expected value for this parameter (will be validated if not None)
"as_property": false # whether this parameter can be accessed from the parent message object
},
...
]
}
A few more notes about the configuration:
- Some section, e.g. Section 1, has edition-specific configuration, e.g.
section1-1.json
. However, most sections have a single configuration for all editions. Thedefault
field is used to indicate that the configuration is a catch-all for any editions that does not have its own specific config. - Number of bits can be set to
0
, which means value of the corresponding parameter takes all the bits left for the section. - There are generally two categories of parameter types, simple and complex.
- Simple types are
uint
(unsigned integer),int
(signed integer),bool
,bin
(binary bits) andbytes
(string). - Complex types include
unexpanded_descriptors
andtemplate_data
. How they are processed is taken care of by a processor, e.g. Decoder. The configuration file does not concern how they are interpreted.
- Simple types are
- The
expected
value will be validated against the actual value processed from the input. Currently, it is only used to ensure the start and stop signatures of a BUFR message. - To allow loose coupling between sections, the parent message object can be configured to
proxy some fields from a section. This is what the
as_property
field is for. For an example, theedition
field from section 0 is needed for other sections to determine their structures. Therefore theedition
field is proxyed by the parent message object so that it can be accessed by other sections without worrying about exactly which section provides this information.
Decoder and Encoder¶
These components process the input as prescribed by the configurations.
Each sections are processed in order of the section index. The components
also provide specialised methods to process parameters of complex types.
The processing of template_data
is where most of the program logic goes.
The Decoder
and Encoder
are sub-classes of the same abstract Coder
class.
They are implemented using the
Template Method Pattern.
The Coder
base class provides bootstrapping and common functions needed by all
of the sub-classes and leaves spaces for sub-classes to fill in the actual
processing of the parameters.
For an example, the base class knows how to process an Element Descriptor. It prepares all necessary information about the descriptor, including its type, number of bits, units, scale, reference, etc. Depending on its type, the base class then invoke a method provided by the subclass to handle the actual processing, which can be either decoding or encoding.
BUFR Template Compilation¶
The main purpose of Template Compilation is performance. However since bit operations are the most time consuming part in the overall processing. The performance gain somewhat is limited. Depending on the total number of descriptors to be processed for a message, template compilation provides 10 - 30% performance boost. Larger number of descriptors, i.e. longer message, generally lead to less performance gain.
The implementation of TemplateCompiler
is similar to the Decoder/Encoder.
It is also a subclass of the abstract Coder
class. It uses introspection
to record method calls dispatched by the base class. The recorded calls
can then be executed more efficiently because they bypass most of the
BUFR template processing, such as checking descriptor type, expand sequence
descriptors, etc.
Template Data Wiring¶
A BUFR Template is by nature hierarchical. For an example, a sequence descriptor has all of its child descriptors nested under it. When data associated to the template is decoded, they can also be organised in a hierarchical format. This is especially necessary when some operator descriptors, such as 204YYY (associated field) and bitmap related operators (222, 223, 224, 225, 232, 235, 236, 237), make some values as attributes to other values. The wiring process associates attributes to their owners so that their meanings are explicit.
Renderer¶
This component is responsible for rendering the processed BUFR message object in different formats, e.g. plain text, JSON.
Show me the code¶
Check the code to see details.