pyjstat

pyjstat is a python module for JSON-stat formatted data manipulation.

This module allows reading and writing JSON-stat 1 format with python, using data frame structures provided by the widely accepted pandas library 2. The JSON-stat format is a simple lightweight JSON format for data dissemination. Pyjstat is inspired in rjstat 3, a library to read and write JSON-stat with R, by ajschumacher.

pyjstat is written and maintained by Miguel Expósito Martín and is distributed under the Apache 2.0 License (see LICENSE file).

1

http://json-stat.org/ for JSON-stat information

2

http://pandas.pydata.org for Python Data Analysis Library information

3

https://github.com/ajschumacher/rjstat for rjstat library information

Example

Importing a JSON-stat file into a pandas data frame can be done as follows:

import urllib2
import json
import pyjstat
results = pyjstat.from_json_stat(json.load(urllib2.urlopen(
'http://json-stat.org/samples/oecd-canada.json')))
print results
class pyjstat.Collection(*args, **kwargs)

A class representing a JSONstat collection.

get(element)

Get element from collection.

Get ith element from a collection in an object of the corresponding class.

Parameters

output (string) – can accept ‘jsonstat’ or ‘dataframe_list’

Returns

Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter.

classmethod read(data)

Read data from URL or OrderedDict.

Parameters

data – can be a URL pointing to a JSONstat file, a JSON string or an OrderedDict.

Returns

An object of class Collection populated with data.

write(output='jsonstat')

Write to JSON-stat or list of df.

Writes data from a Collection object to JSONstat or list of Pandas Dataframes.

Parameters

output (string) – can accept ‘jsonstat’ or ‘dataframe_list’

Returns

Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter.

class pyjstat.Dataset(*args, **kwargs)

A class representing a JSONstat dataset.

get_dimension_index(name, value)

Get a dimension index from its name.

Convert a dimension ID string and a category ID string into the numeric index of that category in that dimension

Parameters
  • name (string) – ID string of the dimension.

  • value (string) – ID string of the category.

Returns

index of the category in the dimension.

Return type

ndx[value](int)

get_dimension_indices(query)

Get dimension indices.

Converts a dimension/category list of dicts into a list of dimension indices.

Parameters

query (list) – dimension/category list of dicts.

Returns

list of dimensions’ indices.

Return type

indices(list)

get_value(query)

Get data value.

Convert a dimension/category list of dicts into a data value in three steps.

Parameters

query (list) – list of dicts with the desired query.

Returns

numeric data value.

Return type

value(float)

get_value_by_index(index)

Convert a numeric value index into its data value.

Parameters

index (int) – numeric value index.

Returns

Numeric data value.

Return type

self[‘value’][index](float)

get_value_index(indices)

Convert a list of dimension indices into a numeric value index.

Parameters

indices (list) – list of dimension’s indices.

Returns

numeric value index.

Return type

num(int)

classmethod read(data, verify=True, **kwargs)

Read data from URL, Dataframe, JSON string/file or OrderedDict.

Parameters
  • data – can be a Pandas Dataframe, a JSON file, a JSON string, an OrderedDict or a URL pointing to a JSONstat file.

  • verify – whether to host’s SSL certificate.

  • kwargs – optional arguments for to_json_stat().

Returns

An object of class Dataset populated with data.

write(output='jsonstat', naming='label', value='value')

Write data from a Dataset object to JSONstat or Pandas Dataframe.

Parameters
  • output (string) – can accept ‘jsonstat’ or ‘dataframe’. Default to ‘jsonstat’.

  • naming (string) –

    optional, ingored if output = ‘jsonstat’.

    Dimension naming.

    Possible values: ‘label’ or ‘id’. Defaults to ‘label’.

  • value (string) –

    optional, ignored if output = ‘jsonstat’.

    Name of value column.

    Defaults to ‘value’.

Returns

Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter.

class pyjstat.Dimension(*args, **kwargs)

A class representing a JSONstat dimension.

classmethod read(data)

Read data from URL, Dataframe, JSON string/file or OrderedDict.

Parameters

data – can be a Pandas Dataframe, a JSON string, a JSON file, an OrderedDict or a URL pointing to a JSONstat file.

Returns

An object of class Dimension populated with data.

write(output='jsonstat')

Write data from a Dataset object to JSONstat or Pandas Dataframe.

Parameters

output (string) – can accept ‘jsonstat’ or ‘dataframe’

Returns

Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter.

class pyjstat.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Custom JSON encoder class for Numpy data types.

default(obj)

Encode by default.

pyjstat.check_input(naming)

Check and validate input params.

Parameters

naming (string) – a string containing the naming type (label or id).

Returns

Nothing

Raises

ValueError – if the parameter is not in the allowed list.

pyjstat.check_version_2(dataset)

Check for json-stat version.

Check if json-stat version attribute exists and is equal or greater than 2.0 for a given dataset.

Parameters

dataset (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(),

Returns

True if version exists and is equal or greater than 2.0,

False otherwise. For datasets without the version attribute, always return False.

Return type

bool

pyjstat.from_json_stat(datasets, naming='label', value='value')

Decode JSON-stat formatted data into pandas.DataFrame object.

Parameters
  • datasets (OrderedDict, list) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(), for example. Both List and OrderedDict are accepted as inputs.

  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.Defaults to ‘label’.

  • value (string, optional) – name of the value column. Defaults to ‘value’.

Returns

list of pandas.DataFrame with imported data.

Return type

results(list)

pyjstat.generate_df(js_dict, naming, value='value')

Decode JSON-stat dict into pandas.DataFrame object.

Helper method that should be called inside from_json_stat().

Parameters
  • js_dict (OrderedDict) – OrderedDict with data in JSON-stat format, previously deserialized into a python object by json.load() or json.loads(), for example.

  • naming (string) – dimension naming. Possible values: ‘label’ or ‘id.’

  • value (string, optional) – name of the value column. Defaults to ‘value’.

Returns

pandas.DataFrame with converted data.

Return type

output(DataFrame)

pyjstat.get_df_row(dimensions, naming='label', i=0, record=None)

Generate row dimension values for a pandas dataframe.

Parameters
  • dimensions (list) – list of pandas dataframes with dimension labels generated by get_dim_label or get_dim_index methods.

  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.

  • i (int) – dimension list iteration index. Default is 0, it’s used in the recursive calls to the method.

  • record (list) – list of values representing a pandas dataframe row, except for the value column. Default is empty, it’s used in the recursive calls to the method.

Yields

list – list with pandas dataframe column values except for value column

pyjstat.get_dim_index(js_dict, dim)

Get index from a given dimension.

Parameters
  • js_dict (dict) – dictionary containing dataset data and metadata.

  • dim (string) – dimension name obtained from JSON file.

Returns

DataFrame with index-based dimension data.

Return type

dim_index (pandas.DataFrame)

pyjstat.get_dim_label(js_dict, dim, dim_input='dataset')

Get label from a given dimension.

Parameters
  • js_dict (dict) – dictionary containing dataset data and metadata.

  • dim (string) – dimension name obtained from JSON file.

Returns

DataFrame with label-based dimension data.

Return type

dim_label(pandas.DataFrame)

pyjstat.get_dimensions(js_dict, naming)

Get dimensions from input data.

Parameters
  • js_dict (dict) – dictionary containing dataset data and metadata.

  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.

Returns

list of pandas data frames with dimension

category data.

dim_names (list): list of strings with dimension names.

Return type

dimensions (list)

pyjstat.get_values(js_dict, value='value')

Get values from input data.

Parameters
  • js_dict (dict) – dictionary containing dataset data and metadata.

  • value (string, optional) – name of the value column. Defaults to ‘value’.

Returns

list of dataset values.

Return type

values (list)

pyjstat.request(path, verify=True)

Send a request to a given URL accepting JSON format.

Parameters

path (str) – The URI to be requested.

Returns

Deserialized JSON Python object.

Return type

response

Raises
  • HTTPError – the HTTP error returned by the requested server.

  • InvalidURL – an invalid URL has been requested.

  • Exception – generic exception.

pyjstat.to_int(variable)

Convert variable to integer or string depending on the case.

Parameters

variable (string) – a string containing a real string or an integer.

Returns

an integer or a string, depending on the content

of variable.

Return type

variable(int, string)

pyjstat.to_json_stat(input_df, value='value', output='list', version='1.3')

Encode pandas.DataFrame object into JSON-stat format.

The DataFrames must have exactly one value column.

Parameters
  • df (pandas.DataFrame) – pandas data frame (or list of data frames) to

  • encode.

  • value (string, optional) – name of the value column. Defaults to ‘value’.

  • output (string) – accepts two values: ‘list’ or ‘dict’. Produce list of dicts or dict of dicts as output.

  • version (string) – desired json-stat version. 2.0 is preferred now. Apart from this, only older 1.3 format is accepted, which is the default parameter in order to preserve backwards compatibility.

Returns

String with JSON-stat object.

Return type

output(string)

pyjstat.to_str(variable)

Convert variable to integer or string depending on the case.

Parameters

variable (string) – a string containing a real string or an integer.

Returns

an integer or a string, depending on the content

of variable.

Return type

variable(int, string)

pyjstat.uniquify(seq)

Return unique values in a list in the original order.

See: http://www.peterbe.com/plog/uniqifiers-benchmark

Parameters

seq (list) – original list.

Returns

list without duplicates preserving original order.

Return type

list

pyjstat.unnest_collection(collection, df_list)

Unnest collection extracting its datasets and converting them to df.

Parameters
  • collection (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(),

  • df_list (list) – list variable which will contain the converted datasets.

Returns

Nothing.