pyaxis

Pcaxis Parser module parses px files into dataframes.

This module obtains a pandas DataFrame of tabular data from a PC-Axis file or URL. Reads data and metadata from PC-Axis [1] into a dataframe and dictionary, and returns a dictionary containing both structures.

Example

from pyaxis import pyaxis

px = pyaxis.parse(self.base_path + ‘px/2184.px’, encoding=’ISO-8859-2’)

[1]https://www.scb.se/en/services/statistical-programs-for-px-files/

..todo:

meta_split: "NOTE" attribute can be multiple, but only the last one
is added to the dictionary
pyaxis.build_dataframe(dimension_names, dimension_members, data_values, null_values, sd_values)

Build a dataframe from dimensions and data.

Adds the cartesian product of dimension members plus the series of data.
Parameters:
  • dimension_names (list of string) –
  • dimension_members (list of string) –
  • data_values (pd.Series) – pandas series with the data values column.
Returns:

data (pandas dataframe)

pyaxis.get_dimensions(metadata)

Read STUB and HEADING values from metadata dictionary.

Parameters:metadata – dictionary of metadata
Returns:dimension_names (list) dimension_members (list)
pyaxis.metadata_extract(pc_axis)

Extract metadata and data from pc-axis file contents.

Parameters:pc_axis (str) – pc_axis file contents.
Returns:
each item conforms to an
ATTRIBUTE=VALUES pattern.

data (string): data values.

Return type:metadata_attributes (list of string)
pyaxis.metadata_split_to_dict(metadata_elements)

Split the list of metadata elements into a multi-valued keys dict.

Parameters:metadata_elements (list of string) – pairs ATTRIBUTE=VALUES
Returns:{‘attribute1’: [‘value1’, ‘value2’, … ], …}
Return type:metadata (dictionary)
pyaxis.parse(uri, encoding, timeout=10, null_values='^"\\."$', sd_values='"\\.\\."')

Extract metadata and data sections from pc-axis.

Parameters:
  • uri (str) – file name or URL
  • encoding (str) – charset encoding
  • timeout (int) – request timeout in seconds; optional
  • null_values (str) – regex with the pattern for the null values in the px file. Defaults to ‘.’.
  • sd_values (str) – regex with the pattern for the statistical disclosured values in the px file. Defaults to ‘..’.
Returns:

dictionary of metadata and pandas df.

METADATA: dictionary of metadata DATA: pandas dataframe

Return type:

pc_axis_dict (dictionary)

pyaxis.read(uri, encoding, timeout=10)

Read a text file from file system or URL.

Parameters:
  • uri (str) – file name or URL
  • encoding (str) – charset encoding
  • timeout (int) – request timeout; optional
Returns:

file contents.

Return type:

raw_pcaxis (str)

pyaxis.uri_type(uri)

Determine the type of URI.

Args:
uri (str): pc-axis file name or URL
Returns:
uri_type_result (str): ‘URL’ | ‘FILE’