Parsing custom database

This tutorial shows how to build a custom database in Excel and how to parse it using MARIO.

Parsing from Excel

Start by opening Excel or any equivalent software. Any custom MARIO-readable IOT must follow these rules: - It must be in .xlsx format - It must have two sheets. The first must contain the table, the second must be named “units” and contains the info on units of measure

For instance, the following example is for a SUT of 2 regions, 2 commodities and 2 activities.

Table sheet

Alt text

Alt text

The structure is the same for both IOTs and SUTs with the difference SUTs must differenciate between activities and commodities, while IOTs just needs sectors. You will notice: - There must be 3 level of indices on both rows and columns - The first level is always the name of the region, apart from those table sets not defined on regions, such as “Factor of production” and “Satellite account”. For these two sets, just provide “-”. - The second level is always the name of the set (i.e. “Activity”, “Commodity”, “Consumption category”, “Factor of production”, “Satellite account”). In case of an IOT, provide “Sector” instead of “Activity” and “Commodity” - The third level is a label, referring to the name of the item - There must not be blank cells within the matrices There are no particular rules for the order of the labels and sets, MARIO will always sort all the indices in alphabetical order before doing any calculation.

Units sheet

Regardin unit of measures, this sheet must be named “units” and the header of the column of units (column C of the sheet) must be labelled “unit” as in the following example

Alt text

Alt text

Again the rules are on the indices, that must be provided for all the labels, avoiding repeating the same label for multiple regions: regions indeed are not required in this sheet. MARIO can handle hybrid-units databases.

Parsing a customized database

Once the customized database is prepared in Excel, just provide the path, type of table (SUT or IOT) and the mode (flows or coefficients) and MARIO will be able to parse it using the “parse_from_excel” function

import mario  # Import MARIO

path = 'custom_SUT.xlsx'  # Define the desired path to the folder where Exiobase should be downloaded

database = mario.parse_from_excel(
    path = path,
    table = 'SUT',
    mode = 'flows',
)
database.X
Item production
Region Level Item
R1 Activity Production of Goods 1.0
Production of Services 0.9
R2 Activity Production of Goods 1.0
Production of Services 1.2
R3 Activity Production of Goods 1.0
Production of Services 0.9
R1 Commodity Goods 45.0
Services 31.4
R2 Commodity Goods 66.0
Services 44.0
R3 Commodity Goods 61.0
Services 44.0

The same structure is replicable for IOT database. If you want to see how the table should look like, you can load the test models and save them to excel to have a closer look to the structure:

mario.load_test("IOT").to_excel("test_iot.xlsx")

Parsing from pd.DataFrames

You can also build a mario.Database, using pd.DataFrames:

from mario import Database
import pandas as pd
import numpy as np
# Creating indeces according to mario format
regions  = ['reg.1']
Z_levels = ['Sector']
sectors  = ['sec.1','sec.2']

factors   = ['Labor']
satellite = ['CO2']


Y_level = ['Consumption category']
demands = ['Households']

Z_index   = pd.MultiIndex.from_product([regions,Z_levels,sectors])
Y_columns = pd.MultiIndex.from_product([regions,Y_level,demands])
# creating matrices
Z = pd.DataFrame(
    data =  np.array([
            [10,70],
            [50,10]]),
    index = Z_index,
    columns= Z_index
)
Y = pd.DataFrame(
    data =  np.array([
            [200],
            [80]]),
    index = Z_index,
    columns= Y_columns,
)
E = pd.DataFrame(
    data =  np.array([
            [30,20]]),
    index = satellite,
    columns= Z_index,
)
V = pd.DataFrame(
    data =  np.array([
            [220,60]]),
    index = factors,
    columns= Z_index,
)
EY = pd.DataFrame(
    data =  np.array([8]),
    index = satellite,
    columns= Y_columns,
)
Z
reg.1
Sector
sec.1 sec.2
reg.1 Sector sec.1 10 70
sec.2 50 10
Y
reg.1
Consumption category
Households
reg.1 Sector sec.1 200
sec.2 80

You also need to identify the units in a separate python dict as follow:

# units as a dict of pd.DataFrames
units= {
    'Sector':pd.DataFrame('EUR',index=sectors,columns=['unit']),
    'Satellite account':pd.DataFrame('Ton',index=satellite,columns=['unit']),
    'Factor of production': pd.DataFrame('EUR',index=factors,columns=['unit'])
    }
units
{'Sector':       unit
 sec.1  EUR
 sec.2  EUR,
 'Satellite account':     unit
 CO2  Ton,
 'Factor of production':       unit
 Labor  EUR}

Now you can create a mario.Database object:

# Creating a mario database
data = Database(
    Z=Z,
    Y=Y,
    E=E,
    V=V,
    EY=EY,
    table='IOT',
    units=units,
    name='iot test'
)
data.z
Region reg.1
Level Sector
Item sec.1 sec.2
Region Level Item
reg.1 Sector sec.1 0.035714 0.500000
sec.2 0.178571 0.071429
data.p
Database: to calculate p following matrices are need.
['w'].Trying to calculate dependencies.
price index
Region Level Item
reg.1 Sector sec.1 1.0
sec.2 1.0

Link to the jupyter notebook file.