pyjstat¶
pyjstat is a python module for JSON-stat formatted data manipulation.
This module allows reading and writing JSON-stat 1 format with python, using data frame structures provided by the widely accepted pandas library 2. The JSON-stat format is a simple lightweight JSON format for data dissemination. Pyjstat is inspired in rjstat 3, a library to read and write JSON-stat with R, by ajschumacher.
pyjstat is written and maintained by Miguel Expósito Martín and is distributed under the Apache 2.0 License (see LICENSE file).
- 1
http://json-stat.org/ for JSON-stat information
- 2
http://pandas.pydata.org for Python Data Analysis Library information
- 3
https://github.com/ajschumacher/rjstat for rjstat library information
Example
Importing a JSON-stat file into a pandas data frame can be done as follows:
import urllib2
import json
import pyjstat
results = pyjstat.from_json_stat(json.load(urllib2.urlopen(
'http://json-stat.org/samples/oecd-canada.json')))
print results
-
class
pyjstat.
Collection
(*args, **kwargs)¶ A class representing a JSONstat collection.
-
get
(element)¶ Get element from collection.
Get ith element from a collection in an object of the corresponding class.
- Parameters
output (string) – can accept ‘jsonstat’ or ‘dataframe_list’
- Returns
Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter.
-
classmethod
read
(data)¶ Read data from URL or OrderedDict.
- Parameters
data – can be a URL pointing to a JSONstat file, a JSON string or an OrderedDict.
- Returns
An object of class Collection populated with data.
-
write
(output='jsonstat')¶ Write to JSON-stat or list of df.
Writes data from a Collection object to JSONstat or list of Pandas Dataframes.
- Parameters
output (string) – can accept ‘jsonstat’ or ‘dataframe_list’
- Returns
Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter.
-
-
class
pyjstat.
Dataset
(*args, **kwargs)¶ A class representing a JSONstat dataset.
-
get_dimension_index
(name, value)¶ Get a dimension index from its name.
Convert a dimension ID string and a category ID string into the numeric index of that category in that dimension
- Parameters
name (string) – ID string of the dimension.
value (string) – ID string of the category.
- Returns
index of the category in the dimension.
- Return type
ndx[value](int)
-
get_dimension_indices
(query)¶ Get dimension indices.
Converts a dimension/category list of dicts into a list of dimension indices.
- Parameters
query (list) – dimension/category list of dicts.
- Returns
list of dimensions’ indices.
- Return type
indices(list)
-
get_value
(query)¶ Get data value.
Convert a dimension/category list of dicts into a data value in three steps.
- Parameters
query (list) – list of dicts with the desired query.
- Returns
numeric data value.
- Return type
value(float)
-
get_value_by_index
(index)¶ Convert a numeric value index into its data value.
- Parameters
index (int) – numeric value index.
- Returns
Numeric data value.
- Return type
self[‘value’][index](float)
-
get_value_index
(indices)¶ Convert a list of dimension indices into a numeric value index.
- Parameters
indices (list) – list of dimension’s indices.
- Returns
numeric value index.
- Return type
num(int)
-
classmethod
read
(data, verify=True, **kwargs)¶ Read data from URL, Dataframe, JSON string/file or OrderedDict.
- Parameters
data – can be a Pandas Dataframe, a JSON file, a JSON string, an OrderedDict or a URL pointing to a JSONstat file.
verify – whether to host’s SSL certificate.
kwargs – optional arguments for to_json_stat().
- Returns
An object of class Dataset populated with data.
-
write
(output='jsonstat', naming='label', value='value')¶ Write data from a Dataset object to JSONstat or Pandas Dataframe.
- Parameters
output (string) – can accept ‘jsonstat’ or ‘dataframe’. Default to ‘jsonstat’.
naming (string) –
- optional, ingored if output = ‘jsonstat’.
Dimension naming.
Possible values: ‘label’ or ‘id’. Defaults to ‘label’.
value (string) –
- optional, ignored if output = ‘jsonstat’.
Name of value column.
Defaults to ‘value’.
- Returns
Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter.
-
-
class
pyjstat.
Dimension
(*args, **kwargs)¶ A class representing a JSONstat dimension.
-
classmethod
read
(data)¶ Read data from URL, Dataframe, JSON string/file or OrderedDict.
- Parameters
data – can be a Pandas Dataframe, a JSON string, a JSON file, an OrderedDict or a URL pointing to a JSONstat file.
- Returns
An object of class Dimension populated with data.
-
write
(output='jsonstat')¶ Write data from a Dataset object to JSONstat or Pandas Dataframe.
- Parameters
output (string) – can accept ‘jsonstat’ or ‘dataframe’
- Returns
Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter.
-
classmethod
-
class
pyjstat.
NumpyEncoder
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶ Custom JSON encoder class for Numpy data types.
-
default
(obj)¶ Encode by default.
-
-
pyjstat.
check_input
(naming)¶ Check and validate input params.
- Parameters
naming (string) – a string containing the naming type (label or id).
- Returns
Nothing
- Raises
ValueError – if the parameter is not in the allowed list.
-
pyjstat.
check_version_2
(dataset)¶ Check for json-stat version.
Check if json-stat version attribute exists and is equal or greater than 2.0 for a given dataset.
- Parameters
dataset (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(),
- Returns
- True if version exists and is equal or greater than 2.0,
False otherwise. For datasets without the version attribute, always return False.
- Return type
bool
-
pyjstat.
from_json_stat
(datasets, naming='label', value='value')¶ Decode JSON-stat formatted data into pandas.DataFrame object.
- Parameters
datasets (OrderedDict, list) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(), for example. Both List and OrderedDict are accepted as inputs.
naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.Defaults to ‘label’.
value (string, optional) – name of the value column. Defaults to ‘value’.
- Returns
list of pandas.DataFrame with imported data.
- Return type
results(list)
-
pyjstat.
generate_df
(js_dict, naming, value='value')¶ Decode JSON-stat dict into pandas.DataFrame object.
Helper method that should be called inside from_json_stat().
- Parameters
js_dict (OrderedDict) – OrderedDict with data in JSON-stat format, previously deserialized into a python object by json.load() or json.loads(), for example.
naming (string) – dimension naming. Possible values: ‘label’ or ‘id.’
value (string, optional) – name of the value column. Defaults to ‘value’.
- Returns
pandas.DataFrame with converted data.
- Return type
output(DataFrame)
-
pyjstat.
get_df_row
(dimensions, naming='label', i=0, record=None)¶ Generate row dimension values for a pandas dataframe.
- Parameters
dimensions (list) – list of pandas dataframes with dimension labels generated by get_dim_label or get_dim_index methods.
naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.
i (int) – dimension list iteration index. Default is 0, it’s used in the recursive calls to the method.
record (list) – list of values representing a pandas dataframe row, except for the value column. Default is empty, it’s used in the recursive calls to the method.
- Yields
list – list with pandas dataframe column values except for value column
-
pyjstat.
get_dim_index
(js_dict, dim)¶ Get index from a given dimension.
- Parameters
js_dict (dict) – dictionary containing dataset data and metadata.
dim (string) – dimension name obtained from JSON file.
- Returns
DataFrame with index-based dimension data.
- Return type
dim_index (pandas.DataFrame)
-
pyjstat.
get_dim_label
(js_dict, dim, dim_input='dataset')¶ Get label from a given dimension.
- Parameters
js_dict (dict) – dictionary containing dataset data and metadata.
dim (string) – dimension name obtained from JSON file.
- Returns
DataFrame with label-based dimension data.
- Return type
dim_label(pandas.DataFrame)
-
pyjstat.
get_dimensions
(js_dict, naming)¶ Get dimensions from input data.
- Parameters
js_dict (dict) – dictionary containing dataset data and metadata.
naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.
- Returns
- list of pandas data frames with dimension
category data.
dim_names (list): list of strings with dimension names.
- Return type
dimensions (list)
-
pyjstat.
get_values
(js_dict, value='value')¶ Get values from input data.
- Parameters
js_dict (dict) – dictionary containing dataset data and metadata.
value (string, optional) – name of the value column. Defaults to ‘value’.
- Returns
list of dataset values.
- Return type
values (list)
-
pyjstat.
request
(path, verify=True)¶ Send a request to a given URL accepting JSON format.
- Parameters
path (str) – The URI to be requested.
- Returns
Deserialized JSON Python object.
- Return type
response
- Raises
HTTPError – the HTTP error returned by the requested server.
InvalidURL – an invalid URL has been requested.
Exception – generic exception.
-
pyjstat.
to_int
(variable)¶ Convert variable to integer or string depending on the case.
- Parameters
variable (string) – a string containing a real string or an integer.
- Returns
- an integer or a string, depending on the content
of variable.
- Return type
variable(int, string)
-
pyjstat.
to_json_stat
(input_df, value='value', output='list', version='1.3')¶ Encode pandas.DataFrame object into JSON-stat format.
The DataFrames must have exactly one value column.
- Parameters
df (pandas.DataFrame) – pandas data frame (or list of data frames) to
encode. –
value (string, optional) – name of the value column. Defaults to ‘value’.
output (string) – accepts two values: ‘list’ or ‘dict’. Produce list of dicts or dict of dicts as output.
version (string) – desired json-stat version. 2.0 is preferred now. Apart from this, only older 1.3 format is accepted, which is the default parameter in order to preserve backwards compatibility.
- Returns
String with JSON-stat object.
- Return type
output(string)
-
pyjstat.
to_str
(variable)¶ Convert variable to integer or string depending on the case.
- Parameters
variable (string) – a string containing a real string or an integer.
- Returns
- an integer or a string, depending on the content
of variable.
- Return type
variable(int, string)
-
pyjstat.
uniquify
(seq)¶ Return unique values in a list in the original order.
See: http://www.peterbe.com/plog/uniqifiers-benchmark
- Parameters
seq (list) – original list.
- Returns
list without duplicates preserving original order.
- Return type
list
-
pyjstat.
unnest_collection
(collection, df_list)¶ Unnest collection extracting its datasets and converting them to df.
- Parameters
collection (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(),
df_list (list) – list variable which will contain the converted datasets.
- Returns
Nothing.