pyaxis¶
Pcaxis Parser module parses px files into dataframes.
This module obtains a pandas DataFrame of tabular data from a PC-Axis file or URL. Reads data and metadata from PC-Axis [1] into a dataframe and dictionary, and returns a dictionary containing both structures.
Example
from pyaxis import pyaxis
px = pyaxis.parse(self.base_path + ‘px/2184.px’, encoding=’ISO-8859-2’)
[1] | https://www.scb.se/en/services/statistical-programs-for-px-files/ |
..todo:
meta_split: "NOTE" attribute can be multiple, but only the last one
is added to the dictionary
-
pyaxis.
build_dataframe
(dimension_names, dimension_members, data_values, null_values, sd_values)¶ Build a dataframe from dimensions and data.
Adds the cartesian product of dimension members plus the series of data.Parameters: - dimension_names (list of string) –
- dimension_members (list of string) –
- data_values (pd.Series) – pandas series with the data values column.
Returns: data (pandas dataframe)
-
pyaxis.
get_dimensions
(metadata)¶ Read STUB and HEADING values from metadata dictionary.
Parameters: metadata – dictionary of metadata Returns: dimension_names (list) dimension_members (list)
-
pyaxis.
metadata_extract
(pc_axis)¶ Extract metadata and data from pc-axis file contents.
Parameters: pc_axis (str) – pc_axis file contents. Returns: - each item conforms to an
- ATTRIBUTE=VALUES pattern.
data (string): data values.
Return type: metadata_attributes (list of string)
-
pyaxis.
metadata_split_to_dict
(metadata_elements)¶ Split the list of metadata elements into a multi-valued keys dict.
Parameters: metadata_elements (list of string) – pairs ATTRIBUTE=VALUES Returns: {‘attribute1’: [‘value1’, ‘value2’, … ], …} Return type: metadata (dictionary)
-
pyaxis.
parse
(uri, encoding, timeout=10, null_values='^"\\."$', sd_values='"\\.\\."')¶ Extract metadata and data sections from pc-axis.
Parameters: - uri (str) – file name or URL
- encoding (str) – charset encoding
- timeout (int) – request timeout in seconds; optional
- null_values (str) – regex with the pattern for the null values in the px file. Defaults to ‘.’.
- sd_values (str) – regex with the pattern for the statistical disclosured values in the px file. Defaults to ‘..’.
Returns: - dictionary of metadata and pandas df.
METADATA: dictionary of metadata DATA: pandas dataframe
Return type: pc_axis_dict (dictionary)
-
pyaxis.
read
(uri, encoding, timeout=10)¶ Read a text file from file system or URL.
Parameters: - uri (str) – file name or URL
- encoding (str) – charset encoding
- timeout (int) – request timeout; optional
Returns: file contents.
Return type: raw_pcaxis (str)
-
pyaxis.
uri_type
(uri)¶ Determine the type of URI.
- Args:
- uri (str): pc-axis file name or URL
- Returns:
- uri_type_result (str): ‘URL’ | ‘FILE’