Configuration Setup JSON

During the sensor and reference setup methods, a record of the setup configuration is saved locally to a setup.json file. This file is used to indicate to the ingestion module how the data should be interpreted into the sensortoolkit Data Formatting Scheme (SDFS).

This file is passed to a subroutine sensortoolkit.sensor_ingest.standard_ingest() to import the recorded dataset and convert headers and date/time-like columns to SDFS formatting.

Sensor setup.json

Setup.json files for air sensors are generated by running the sensortoolkit.AirSensor.sensor_setup() module and contain information about recorded sensor datasets that is used by the standard ingestion module

As sensors often record data with different formatting and header naming schemes, these files assist in converting data recorded in their original format into SDFS scheme for parameter data names and date/time formatting.

The sensor setup.json file is named [sensor_name]_setup.json where [sensor_name] is the name assigned to the sensor via sensor.name. This file is located within the users’ project directory in the following relative path: \Data and Figures\sensor_data\[sensor_name]\[sensor_name]_setup.json

{
    "path": "C:/Users/.../Documents/toucan_evaluation",
    "data_rel_path": "/data/sensor_data/Toco_Toucan/raw_data",
    "data_type": "sensor",
    "file_extension": ".csv",
    "header_iloc": 5,
    "data_row_idx": null,
    "sdfs_header_names": [
        "NO2",
        "O3",
        "PM25",
        "Temp",
        "RH",
        "DP"
    ],
    "col_headers": {
        "col_idx_0": {
            "Time": {
                "sdfs_param": "DateTime",
                "in_file_list_idx": [
                    0,
                    1,
                    2
                ],
                "header_class": "datetime",
                "drop": false,
                "dt_format": "%Y/%m/%d %H:%M:%S",
                "dt_timezone": "EST"
            }
        },
        "col_idx_1": {
            "NO2 (ppb)": {
                "sdfs_param": "NO2",
                "in_file_list_idx": [
                    0,
                    1,
                    2
                ],
                "unit_transform": null,
                "header_class": "parameter",
                "drop": false
            }
        },
        "col_idx_2": {
            "O3 (ppb)": {
                "sdfs_param": "O3",
                "in_file_list_idx": [
                    0,
                    1,
                    2
                ],
                "unit_transform": null,
                "header_class": "parameter",
                "drop": false
            }
        },
        "col_idx_3": {
            "PM2.5 (\u00b5g/m\u00b3)": {
                "sdfs_param": "PM25",
                "in_file_list_idx": [
                    0,
                    1,
                    2
                ],
                "unit_transform": null,
                "header_class": "parameter",
                "drop": false
            }
        },
        "col_idx_4": {
            "TEMP (\u00b0C)": {
                "sdfs_param": "Temp",
                "in_file_list_idx": [
                    0,
                    1,
                    2
                ],
                "unit_transform": null,
                "header_class": "parameter",
                "drop": false
            }
        },
        "col_idx_5": {
            "RH (%)": {
                "sdfs_param": "RH",
                "in_file_list_idx": [
                    0,
                    1,
                    2
                ],
                "unit_transform": null,
                "header_class": "parameter",
                "drop": false
            }
        },
        "col_idx_6": {
            "DP (\u00b0C)": {
                "sdfs_param": "DP",
                "in_file_list_idx": [
                    0,
                    1,
                    2
                ],
                "unit_transform": null,
                "header_class": "parameter",
                "drop": false
            }
        },
        "col_idx_7": {
            "Inlet": {
                "sdfs_param": "",
                "in_file_list_idx": [
                    0,
                    1,
                    2
                ],
                "header_class": "parameter",
                "drop": true
            }
        }
    },
    "name": "Toco_Toucan",
    "dataset_kwargs": {
        "name": "Toco_Toucan"
    },
    "_dataset_selection": "files",
    "file_list": [
        "C:/Users/.../Documents/toucan_evaluation\\data\\sensor_data\\Toco_Toucan\\raw_data\\toco_toucan_RT01_raw.csv",
        "C:/Users/.../Documents/toucan_evaluation\\data\\sensor_data\\Toco_Toucan\\raw_data\\toco_toucan_RT02_raw.csv",
        "C:/Users/.../Documents/toucan_evaluation\\data\\sensor_data\\Toco_Toucan\\raw_data\\toco_toucan_RT03_raw.csv"
    ],
    "encoding_predictions": {},
    "serials": {
        "1": "RT01",
        "2": "RT02",
        "3": "RT03"
    },
    "number_of_sensors": 3
}

Reference setup.json

The reference setup.json file is named reference_setup.json and is located within the users’ project directory in the following relative path: \Data and Figures\reference_data\[data_type]\[site_name]_[site_id]\reference_setup.json, where [data_type] is the name of the reference data source (i.e., ‘airnowtech’, ‘local’, etc.), ['site_name'] is the name of the monitoring site, where spaces have been replaced by ‘_’, and [site_id] is the AQS site ID (if applicable).

Below is an example reference_setup.json for a reference monitor dataset corresponding to EPA’s RTP campus ambient monitoring site for air sensor testing. The sensor and reference setup.json files share many similar attributes, however highlighted sections of code correspond to reference or monitoring site specific attributes that are important for creating a processed (SDFS formatted) version of the reference dataset.

{
    "path": "C:\\Users\\...\\Documents\\sensortoolkit_testing",
    "data_rel_path": "/data/reference_data/local/raw/Burdens_Creek_370630099/",
    "data_type": "reference",
    "file_extension": ".csv",
    "header_iloc": 2,
    "data_row_idx": null,
    "sdfs_header_names": [
        "PM25",
        "PM10"
    ],
    "col_headers": {
        "col_idx_0": {
            "Date & Time": {
                "sdfs_param": "DateTime",
                "in_file_list_idx": [0, 1],
                "header_class": "datetime",
                "drop": false,
                "dt_format": "%-m/%-d/%Y %-I:%M %p",
                "dt_timezone": "EST"
            }
        },
        "col_idx_1": {
            "Grimm PM2.5": {
                "sdfs_param": "",
                "in_file_list_idx": [0, 1],
                "header_class": "parameter",
                "drop": true
            }
        },
        "col_idx_2": {
            "Grimm PM10": {
                "sdfs_param": "",
                "in_file_list_idx": [0, 1],
                "header_class": "parameter",
                "drop": true
            }
        },
        },
        "col_idx_3": {
            "T640_2_PM25": {
                "sdfs_param": "PM25",
                "in_file_list_idx": [0, 1],
                "unit_transform": null,
                "header_class": "parameter",
                "drop": false
            }
        },
        "col_idx_4": {
            "T640_2_PM10": {
                "sdfs_param": "PM10",
                "in_file_list_idx": [0, 1],
                "unit_transform": null,
                "header_class": "parameter",
                "drop": false
            }
        }
    },
    "dataset_kwargs": {
        "ref_data_source": "local",
        "site_name": "Burdens_Creek",
        "site_aqs": "370630099"
    },
    "agency": "OAQPS",
    "site_name": "Burdens Creek",
    "site_aqs": "37-063-0099",
    "site_lat": "35.88",
    "site_lon": "-78.87",
    "fmt_site_name": "Burdens_Creek",
    "fmt_site_aqs": "370630099",
    "ref_data_subfolder": "Burdens_Creek_370630099",
    "_dataset_selection": "files",
    "file_list": [
        "C:\\Users\\...\\Documents\\sensortoolkit_testing\\data\\reference_data\\local\\raw\\Burdens_Creek_370630099\\min_201908_PM.csv",
        "C:\\Users\\...\\Documents\\sensortoolkit_testing\\data\\reference_data\\local\\raw\\Burdens_Creek_370630099\\min_201909_PM.csv"
    ],
    "PM25_Unit": "Micrograms/cubic meter (LC)",
    "PM25_Param_Code": "Micrograms/cubic meter (LC)",
    "PM25_Method_Code": 238,
    "PM25_Method": "Teledyne T640X at 16.67 LPM",
    "PM25_Method_POC": "1",
    "PM10_Unit": "Micrograms/cubic meter (LC)",
    "PM10_Param_Code": "Micrograms/cubic meter (LC)",
    "PM10_Method_Code": 239,
    "PM10_Method": "Teledyne API T640X at 16.67 LPM",
    "PM10_Method_POC": "1"
}