sensortoolkit.evaluation_objs._sensor_eval.SensorEvaluation
- class SensorEvaluation(sensor, param, reference, write_to_file=False, **kwargs)[source]
Bases:
object
Evaluate air sensor performance for use in NSIM applications.
A class for conducting analysis for air sensors deployed at ambient, outdoor, fixed monitoring sites using U.S. EPA’s performance metrics and targets for sensors measuring PM2.5 or O3. U.S. EPA’s testing protocols and performance metrics are intended for use with devices deployed for non-regulatory supplemental and informational monitoring (NSIM) applications.
- Parameters
sensor (sensortoolkit.AirSensor object) – The air sensor object containing datasets with parameter measurements that will be evaluated.
param (sensortoolkit.Parameter object) – The parameter (measured environmental quantity) object containing parameter-specific attributes as well as metrics and targets for evaluating sensor performance.
reference (sensortoolkit.ReferenceMethod object) – The FRM/FEM reference instrument object containing datasets with parameter measurements against which air sensor data will be evaluated.
write_to_file (bool) – If true, evaluation statistics will be written to the
/data/eval_stats
sensor subdirectory. Figures will also be written to the appropriate figures subdirectory.**kwargs – Keyword arguments (currently unused).
- path
The project path in which data, figures, and reports relevant to the sensor evaluation are stored.
- Type
str
- serials
A dictionary of sensor serial identifiers for each unit in the base testing deployment.
- Type
dict
- figure_path
The full directory path to figures for a given sensor make and model.
- Type
str
- stats_path
The full directory path to evaluation statistics for a given sensor make and model.
- full_df_list
List of sensor data frames of length N (where N is the number of sensor units in a testing group). DataFrames indexed by
DateTime
at recorded sampling frequency.- Type
list of pandas DataFrames
- hourly_df_list
List of sensor data frames of length N (where N is the number of sensor units in a testing group). DataFrames indexed by
DateTime
at 1-hour averaged sampling frequency.- Type
list of pandas DataFrames
- daily_df_list
List of sensor data frames of length N (where N is the number of sensor units in a testing group). DataFrames indexed by
DateTime
at 24-hour averaged sampling frequency.- Type
list of pandas DataFrames
- deploy_period_df
A data frame containing the start time (‘Begin’), end time (‘End’), and total duration of evaluation period for each sensor in a deployment group.
- Type
pandas DataFrame
- deploy_dict
A dictionary containing descriptive statistics and textual information about the deployment (testing agency, site, time period, etc.), sensors tested, and site conditions during the evaluation.
- Type
dict
- deploy_bdate
Overall start date of deployment. Determined by selecting the earliest recorded timestamp in sensor data frames.
- Type
pandas timestamp object
- deploy_edate
Overall end date of deployment. Determined by selecting the latest recorded timestamp in sensor data frames.
- Type
pandas timestamp object
- ref_dict
A dictionary container for reference data objects at varying averaging intervals and parameter classifications.
- Type
dict
- hourly_ref_df
Dataset containing reference data at 1-hour averaging intervals for methods measuring parameters matching the parameter classification of the parameter object passed to the
SensorEvaluation
class during instantation.- Type
pandas DataFrame
- daily_ref_df
Dataset containing reference data at 24-hour averaging intervals for methods measuring parameters matching the parameter classification of the parameter object passed to the
SensorEvaluation
class during instantation.- Type
pandas DataFrame
- pm_hourly_ref_df
Dataset containing reference data at 1-hour averaging intervals for methods measuring particulate matter parameters.
- Type
pandas DataFrame
- pm_daily_ref_df
Dataset containing reference data at 24-hour averaging intervals for methods measuring particulate matter parameters.
- Type
pandas DataFrame
- gas_hourly_ref_df
Dataset containing reference data at 1-hour averaging intervals for methods measuring gaseous parameters.
- Type
pandas DataFrame
- gas_daily_ref_df
Dataset containing reference data at 24-hour averaging intervals for methods measuring gaseous parameters.
- Type
pandas DataFrame
- met_hourly_ref_df
Dataset containing reference data at 1-hour averaging intervals for methods measuring meteorological parameters.
- Type
pandas DataFrame
- met_daily_ref_df
Dataset containing reference data at 24-hour averaging intervals for methods measuring meteorological parameters.
- Type
pandas DataFrame
- ref_name
The make and model of the FRM/FEM instrument used as reference for the selected evaluation parameter. Both AirNowTech and AQS return the AQS method code, and the AQS Sampling Methods Reference table is used to determine the instrument name associated with this code. AirNow does not return method codes or instrument names. When the name and type of the FRM/FEM instrument are unknown, ref_name takes the value ‘unknown_reference’.
- Type
str
- avg_hrly_df
Data frame containing the inter-sensor average for concurrent sensor measurements at 1-hour averaging intervals.
- Type
pandas DataFrame
- avg_daily_df
Data frame containing the inter-sensor average for concurrent sensor measurements at 24-hour averaging intervals.
- Type
pandas DataFrame
- stats_df
Data frame with OLS regression (sensor vs FRM/FEM) statistics, including R2, slope, intercept, RMSE, N (Number of sensor-FRM/FEM data point pairs), as well as the minimum, maximum, and the mean sensor concentration.
- Type
pandas DataFrame
- avg_stats_df
Data frame with OLS regression (sensor vs intersensor average) statistics, including R2, slope, intercept, RMSE, N (Number of concurrent sensor measurements during which all sensors in the testing group reported values), as well as the minimum, maximum, and the mean sensor concentration.
- Type
pandas DataFrame
Methods
Populate deployment dictionary with statistical metrics.
Compute hourly, daily, and inter-sensor statistics dataframes.
Plot the distribution of temperature and RH recorded by meterological instruments at the collocation site.
Plot the influence meteorological parameters (temperature or relative humidity) on sensor measurements.
Regression dot/boxplots for U.S EPA performance metrics and targets developed for PM2.5 and O3 sensor evaluations.
Plot internal sensor temp or RH measurements against collocated reference monitor measurements.
Plot sensor vs FRM/FEM reference measurement pairs as scatter.
Plot sensor and FRM/FEM reference measurements over time.
Display conditions for the evaluation parameter and meteorological conditions during the testing period.
Display a summary of performance evaluation results using EPA’s recommended performance metrics (‘PM25’ and ‘O3’).
- add_deploy_dict_stats()[source]
Populate deployment dictionary with statistical metrics.
Add precision and error performance targets metrics, include details about reference (for selected evaluation parameter) and monitor statistics for meteorological parameters (Temp, RH).
Calculates:
CV for 1-hour averaged sensor datasets
CV for 24-hour averaged sensor datasets
RMSE for 1-hour averaged sensor datasets
RMSE for 24-hour averaged sensor datasets
Reference monitor concentration range, mean concentration during testing period for 1-hour averaged measurements
Reference monitor concentration range, mean concentration during testing period for 24-hour averaged measurements
Meteorological monitor measurement range, mean value for temperature and/or relative humidity measurements at 1-hour intervals
Meteorological monitor measurement range, mean value for temperature and/or relative humidity measurements at 24-hour intervals
Populates:
SensorEvaluation.deploy_dict
Writes Files:
Deployment dictionary
- Returns
None.
- calculate_metrics()[source]
Compute hourly, daily, and inter-sensor statistics dataframes.
Note
calculate_metrics()
will check whetherSensorEvaluation.deploy_dict
has been populated with statistics via theadd_deploy_dict_stats()
method and will call this method if the dictionary has not been populated yet.Calculates:
1-hour averaged sensor vs. reference regression statistics for each sensor
24-hour averaged sensor vs. reference regression statistics for each sensor
1-hour averaged sensor vs. intersensor average regression statistics for each sensor
24-hour averaged sensor vs. intersensor average regression statistics for each sensor
Populates:
SensorEvaluation.stats_df
SensorEvaluation.avg_stats_df
Writes Files:
Statistics DataFrame - Sensor vs. FRM/FEM
Statistics DataFrame - Sensor vs. Intersensor Average
- Returns
None.
- plot_met_dist()[source]
Plot the distribution of temperature and RH recorded by meterological instruments at the collocation site.
Displays the relative frequency of meteorological measurements recorded during the testing period. Temperature (left) and relative humidity (right) measurements are displayed on separate subplots. Measurements are grouped into 15 bins, and the frequency of measurements within bin is normalized by the total number of measurements (i.e., the relative frequency) is displayed as a histogram. Additionally, a polynomial estimating the kernel density of measurements is shown for each subplot and indicates the general distribution of measurements over the range of recorded values.
This method will prioritize plotting meteorological measurements made by reference instruments, as sensor measurements are commonly biased warmer and drier than ambient conditions if measurements are made by an onboard sensing component within the housing of the air sensor. If no meteorological reference measurements are available, the method will use sensor measurements; however, a disclaimer will displayed above subplots indicating that sensor measurements are shown in the figure.
- Returns
None.
- plot_met_influence(met_param='Temp', report_fmt=True, **kwargs)[source]
Plot the influence meteorological parameters (temperature or relative humidity) on sensor measurements.
Sensor measurements that have been normalized by reference measurement values for the corresponding timestamp and are plotted along the y-axis. Meteorological measurements as measured by temperature or relative humidity monitors (rather than onboard sensor measurements) are plotted along the x-axis. Scatter for each sensor are displayed as separate colors to indicate the unique response of each sensor unit.
A gray 1:1 line indicates ideal agreement between sensor and reference measurements over the range of meteorological conditions (i.e., a ratio of 1 would indicate that the sensor and reference measure the same concentration value for a given timestamp). Scatter below the 1:1 line indicates underestimation bias, and scatter above the 1:1 line indicates overestimation bias.
- Parameters
met_param (str, optional) – Either
'Temp'
for displaying the influence of temperature or'RH'
for displaying the influence of relative humidity. Defaults to None.report_fmt (bool, optional) – If true, format figure for inclusion in a performance report. Defaults to True.
**kwargs (dict) – Plotting keyword arguments.
- Returns
None.
- plot_metrics(**kwargs)[source]
Regression dot/boxplots for U.S EPA performance metrics and targets developed for PM2.5 and O3 sensor evaluations.
Results for the following metrics are shown:
Linearity:
\(R^2\): The coefficient of determination, which is a measure of linearity between sensor and reference measurement pairs.
Bias:
Slope: The slope of the ordinary least-squares regression between sensor (y-axis) and reference (x-axis) measurements.
Intercept: The intercept term of the ordinary least-squares regression between sensor (y-axis) and reference (x-axis) measurements.
Error:
\(RMSE\): The root mean square error between sensor and reference measurements.
\(NRMSE\): The normalized root mean square error between sensor and reference measurements, where RMSE has been normalized by the mean reference concentration during the testing period.
Precision:
\(CV\): The coefficient of variation of concurrently recorded sensor measurements.
\(SD\): The standard deviation of concurrently recorded sensor measurements.
Results are shown as either colored dots (if the number of sensors is less than four) or as boxplots (if the number of sensors exceeds three). Target ranges are indicated by gray shaded regions, and target goals are indicated by dark gray lines. Results are grouped by data averaging interval, including 1-hour and 24-hour intervals (note that some pollutants such as O3 are analyzed only at 1-hour intervals due to significant diurnal variability, so the formatting of the figure will depend on which averaging interval(s) are indicated for the parameter via the
sensortoolkit.Parameter.averaging
attribute).- Parameters
**kwargs (dict) – Plotting keyword arguments.
- Returns
None.
- plot_sensor_met_scatter(averaging_interval='1-hour', met_param='Temp', **kwargs)[source]
Plot internal sensor temp or RH measurements against collocated reference monitor measurements.
Plots generated by this method: * Internal sensor RH vs Reference monitor RH * Internal sensor Temp vs Reference monitor Temp
Sensor measurements are plotted along the y-axis with reference measurements along the x-axis. Statistical quantities are displayed for each scatter plot including the ordinary least-squares (OLS) regression equation, R^2, RMSE, and N (the number of measurement pairs). The one-to-one line (indicating ideal agreement between sensor and reference measurements) is shown as a dashed gray line.
- Parameters
averaging_interval (str, optional) – The measurement averaging intervals commonly utilized for analyzing data corresponding the the selected parameter. Defaults to ‘1-hour’.
met_param (str, optional) – The meteorological parameter to display. Defaults to None.
**kwargs (dict) – Plotting keyword arguments.
- Returns
None.
- plot_sensor_scatter(averaging_interval='24-hour', plot_subset=None, **kwargs)[source]
Plot sensor vs FRM/FEM reference measurement pairs as scatter.
FRM/FEM reference concentrations are plotted along the x-axis, and sensor concentrations are plotted along the y-axis. Measurement pairs (i.e., concentration values for sensor and reference datasets recorded at matching timestamp entries) are colored by the relative humidity recorded by an independent meteorological instrument at the monitoring site if RH data are located within the
reference_object.data['Met']
DataFrame.- Parameters
averaging_interval (str, optional) – The measurement averaging intervals commonly utilized for analyzing data corresponding the the selected parameter. Defaults to ‘24-hour’.
plot_subset (list, optional) – A list of either sensor serial IDs or the keys associated with the serial IDs in the serial dictionary. Defaults to None.
Keyword Arguments
- Parameters
report_fmt (dict) – For displaying scatter plots on the first page of the performance report included alongside U.S. EPA’s documents outlining recommended testing protocols, performance metrics, and target values. Defaults to False.
**kwargs –
Additional keyword arguments passed to the underlying
sensortoolkit.plotting.scatter_plotter()
method.
- Returns
None.
- plot_timeseries(report_fmt=True, **kwargs)[source]
Plot sensor and FRM/FEM reference measurements over time.
Sensor measurements are indicated by distinct colors in a discrete color palette. FRM/FEM measurements are shown as black lines. The x-axis indicates the date in 5-day increments (default, although customizable). Measurement values are plotted along the y-axis.
- Parameters
report_fmt (bool, optional) – If true, format figure for inclusion in a performance report. Defaults to True.
**kwargs (dict) – Plotting keyword arguments.
- Returns
None.
- print_eval_conditions(averaging_interval='24-hour')[source]
Display conditions for the evaluation parameter and meteorological conditions during the testing period.
Values for the evaluation parameter recorded by the sensor, FRM/FEM instrument, and temperature and relative humidity values are displayed by the mean of 1-hour or 24-hour averages during the testing period. The range (min to max) of each parameter is listed below the mean in parentheses.
- Parameters
averaging_interval (str, optional) – The measurement averaging intervals commonly utilized for analyzing data corresponding the the selected parameter. Defaults to ‘24-hour’.
- Returns
None.
- print_eval_metrics(averaging_interval='24-hour')[source]
Display a summary of performance evaluation results using EPA’s recommended performance metrics (‘PM25’ and ‘O3’).
The coefficient of variation, sensor vs FRM/FEM OLS regression slope, intercept, and R2, and RMSE are displayed. Regression statistics are computed for each sensor, and the mean metric value is presented alongside the range (min to max).
- Parameters
averaging_interval (dict, optional) – The measurement averaging intervals commonly utilized for analyzing data corresponding the the selected parameter. Defaults to ‘24-hour’.
- Returns
None.