Backend API¶
Core functions¶
- xagg.core.aggregate(ds, wm)¶
Aggregate raster variable(s) to polygon(s)
Aggregates (N-D) raster variables in ds to the polygons in gfd_out - in other words, gives the weighted average of the values in [ds] based on each pixel’s relative area overlap with the polygons.
The values will be additionally weighted if a weight was inputted into
xagg.core.create_raster_polygons()
The code checks whether the input lat/lon grid in ds is equivalent to the linearly indexed grid in wm, or if it can be cropped to that grid.
- Parameters
- ds
xarray.Dataset
an
xarray.Dataset
containing one or more variables with dimensions lat, lon (and possibly more). The dataset’s geographic grid has to include the lat/lon coordinates used in determining the pixel overlaps inxagg.core.get_pixel_overlaps()
(and saved inwm['source_grid']
)- wm
xagg.classes.weightmap
the output to
xagg.core.get_pixel_overlaps()
; axagg.classes.weightmap
object containing['agg']
a dataframe, with one row per polygon, and the columns pix_idxs and rel_area, giving the linear indices and the relative area of each pixel over the polygon, respectively
['source_grid']
the lat/lon grid on which the aggregating parameters were calculated (and on which the linear indices are based)
- ds
- Returns
- agg_out
xagg.classes.aggregated
an
xagg.classes.aggregated
object with the aggregated variables
- agg_out
- xagg.core.create_raster_polygons(ds, mask=None, subset_bbox=None, weights=None, weights_target='ds')¶
Create polygons for each pixel in a raster
Note: ‘lat_bnds’ and ‘lon_bnds’ can be created through the
xagg.aux.get_bnds()
function if they are not already included in the input raster file.Note: Currently this code only supports regular rectangular grids (so where every pixel side is a straight line in lat/lon space). Future versions may include support for irregular grids.
- Parameters
- ds
xarray.Dataset
an xarray dataset with the variables ‘lat_bnds’ and ‘lon_bnds’, which are both lat/lon x 2 arrays giving the min and max values of lat and lon for each pixel given by lat/lon
- subset_bbox
geopandas.GeoDataFrame
, optional, default =None
if a
geopandas.GeoDataFrame
is entered, the bounding box around the geometries in the gdf are used to mask the grid, to reduce the number of pixel polygons created
- ds
- Returns
- pix_agg: dict
a dictionary containing:
'gdf_pixels'
a
geopandas.GeoDataFrame
containing a ‘geometry’ giving the pixel boundaries for each ‘lat’ / ‘lon’ pair
'source_grid'
a dictionary containing the original lat and lon inputs under the keys “lat” and “lon” (just the
xarray.DataArray
of those variables in the input ds)
- xagg.core.get_pixel_overlaps(gdf_in, pix_agg)¶
Get, for each polygon, the pixels that overlap and their area of overlap
Finds, for each polygon in gdf_in, which pixels intersect it, and by how much.
Note: Uses EASE-Grid 2.0 on the WGS84 datum to calculate relative areas (see https://nsidc.org/data/ease)
- Parameters
- gdf_in
geopandas.GeoDataFrane
a
geopandas.GeoDataFrame
giving the polygons over which the variables should be aggregated. Can be just a read shapefile (with the added column of “poly_idx”, which is just the index as a column).- pix_aggdict
the output of
xagg.core.create_raster_polygons()
; a dict containing:'gdf_pixels'
a
geopandas.GeoDataFrame
giving for each row the columns “lat” and “lon” (with coordinates) and a polygon giving the boundary of the pixel given by lat/lon
'source_grid'
[da.lat,da.lon]
of the grid used to create the pixel polygons
- gdf_in
- Returns
- wm_out: dict
A dictionary containing:
'agg'
a dataframe containing all the fields of gdf_in (except geometry) and the additional columns:
coords: the lat/lon coordiates of all pixels that overlap
the polygon of that row
pix_idxs: the linear indices of those pixels within the
gdf_pixels grid
rel_area: the relative area of each of the overlaps between
the pixels and the polygon (summing to 1 - e.g. if the polygon is exactly the size and location of two pixels, their rel_areas would be 0.5 each)
'source_grid'
:a dictionary with keys ‘lat’ and ‘lon’ giving the original lat/lon grid whose overlaps with the polygons was calculated
'geometry'
:just the polygons from gdf_in
- xagg.core.process_weights(ds, weights=None, target='ds')¶
Process weights - including regridding
If
target == 'ds'
, regrid weights to ds. Iftarget == 'weights'
, regrid ds to weights.- Parameters
- ds
xarray.Dataset
,xarray.DataArray
an
xarray.Dataset
/xarray.DataArray
to regrid- weights
xarray.DataArray
, optional, default =None
an
xarray.DataArray
containing a weight (numeric) at each location- targetstr, optional, default =
'ds'
whether weights should be regridded to the ds grid (by default) or vice-versa (not yet supported, returns NotImplementedError)
- ds
- Returns
- ds
xarray.Dataset
,xarray.DataArrays
the input
xarray.Dataset
/xarray.DataArray
, with a new variable weights specifying weights for each pixel- weights_infodict
a dictionary storing information about the weights regridding process, with the fields:
target
: showing which of the two grids was retainedds_grid
: a dictionary with the grid{"lat":ds.lat,"lon",ds.lon}
weights_grid
: a dictionary with the grid{"lat":weights.lat,"lon":weights.lon}
- ds
Export functions¶
- xagg.export.output_data(agg_obj, output_format, output_fn, loc_dim='poly_idx')¶
Wrapper for prep_for_* functions
- Parameters
- agg_obj
xagg.classes.aggregated
object to be exported
- output_formatstr
‘netcdf’, ‘csv’, or ‘shp’
- output_fn: str
the output filename
- loc_dimstr, optional. default =
'poly_idx'
the name of the dimension with location indices; used only by
xagg.export.prep_for_nc()
- agg_obj
- Returns
- the variable that gets saved, so depending on the output_format:
“netcdf”: the
xarray.Dataset
on which.to_netcdf
was called“csv”: the
pandas.Dataframe
on which.to_csv
was called“shp”: the
geopandas.GeoDataDrame
on which.to_file
was called
- xagg.export.prep_for_csv(agg_obj)¶
Preps aggregated data for output as a csv
Concretely, aggregated data is placed in a new pandas dataframe and expanded wide - each aggregated variable is placed in new columns; one column per coordinate in each dimension that isn’t the location (poolygon). So, for example, a lat x lon x time variable “tas”, aggregated to location x time, would be reshaped long to columns “tas0”, “tas1”, “tas2”,… for timestep 0, 1, etc.
Note: Currently no support for variables with more than one extra dimension beyond their location dimensions. Potential options: a multi-index column name, so [var]0-0, [var]0-1, etc…
- Parameters
- agg_obj
xagg.classes.aggregated
the output from
aggregate()
- agg_obj
- Returns
- df
pandas.DataFrame
a pandas dataframe containing all the fields from the original location polygons + columns containing the values of the aggregated variables at each location. This can then easily be exported as a csv directly (using
df.to_csv
) or to shapefiles by first turning into a geodataframe.
- df
- xagg.export.prep_for_nc(agg_obj, loc_dim='poly_idx')¶
Preps aggregated data for output as a netcdf
Concretely, aggregated data is placed in a new xarray dataset with dimensions of location (the different polygons in gdf_out) and any other dimension(s) in the original input raster data. All fields from the input polygons are kept as variables with dimension of location.
- Parameters
- agg_obj
xagg.classes.aggregated
- loc_dimstr, optional
the name of the location dimension; by definition ‘poly_idx’. Values of that dimension are currently only an integer index (with further information given by the field variables). Future versions may allow, if loc_dim is set to the name of a field in the input polygons, to replace the dimension with the values of that field.
- agg_obj