canopy.core
canopy.core.constants
canopy.core.field
- class canopy.core.field.Field(data: DataFrame, grid: Grid, modified: bool = False, source: str | None = None)[source]
Bases:
objectContainer for data derived from model output or observations
This object contains model output or observation data and associated grid information. It allows for basic data manipulation, such as time- and spatial reductions and slicing. The Field object is canopy’s elemental interface between data and user.
- add_md(key: str, value: str) None[source]
Add an entry to the metadata dictionary.
- Parameters:
key (str) – The key under which the value is stored.
value (str | float | int) – The value of the metadata entry.
Example
# Load field anpp = Field.from_file(“/path/to/file/anpp.out”) # Add the GCM used to force the simulation to the metadata field.add_md(‘gcm’, ‘MPIESM2.1’)
Notes
If the key already exists in the metadata dictionary, a KeyError will be raised. In order to overwrite an entry, use the set_md method.
- apply(op: str | Callable, operand: SupportsFloat | list[str] | None = None, layers: str | list[str] | None = None, how: Literal['left', 'right'] = 'left', inplace: bool = False) None | Field[source]
Apply an operation/function to selected layers
- Parameters:
op (str | Callable) – Operation to apply to layers. If a callable is passed, it must be numpy-vectorizable.
operand (str | list[str] | None) – Operand to combine with layers through ‘op’. Operand can be a constant number or the name of a layer. In the latter case, the operation will be performed element-wise. If ‘op’ is a Callable, this parameter is ignored.
layers (str | list[str] | None) – A list of the names of the layers to apply the operation to. If None, the operation is applied to all layers.
how (str = 'left') – Position of the layers in the operation. This argument is relevant for non-commutative operations. For example, if how == ‘left’, a ‘-’ operation will be layers - operand. If how = ‘right’, the operation will be operand - layers.
inplace (bool) – Whether to perform the operation in place.
- Return type:
A field with the modified layers, or None if the operation is performed in place.
- convert_units(factor: SupportsFloat, units: str, inplace: bool = False) None | Field[source]
Convert the field’s units by a multiplicative factor
- Parameters:
factor (SupportsFloat) – Scalar by which all values in the field’s data are multiplied
units (str) – The new units to be set in the metadata ‘units’ entry.
inplace (bool = False) – If True, the reduction is performed on the current Field.
Notes
This function does not perform any checks on whether the passed factor or units string make sense! It just trusts that the user knows what they are doing.
- property coordinates: list
Produce a list of spatial coordinates
- copy_history(field: Field) None[source]
Replace history with a copy of the history of the passed Field.
- Parameters:
field (Field) – The field whose history is to be copied.
- copy_md(field: Field) None[source]
Replace the metadata with a copy of the passed Field’s.
- Parameters:
field (Field) – The field whose metadata is to be copied.
- property data: DataFrame
The Field’s data
- Type:
pd.DataFrame
- property description: str
- drop_layers(layers: str | Sequence[str] | Index, inplace: bool = False) None | Field[source]
Drop one or more layers from the Field.
- Parameters:
layers (str | list) – Layer name or list of layer names.
inplace (bool = False) – If True, the layers are dropped from the current object.
- Returns:
If inplace is True, the layers are dropped from the current Field and the method returns None.
Otherwise the method returns a Field object with the selected data.
- filter(query: str, fill_nan: bool = False, inplace: bool = False) None | Field[source]
Filter rows based on boolean query
- Parameters:
query (str) – The string describing the boolean query in terms of the index or layers
fill_nan (bool) – If False, rows where the query is False are removed (default behaviour). If True, the resulting field’s data has NaNs where the query is false.
inplace (bool) – Whether to perform the operation in place
- Returns:
If inplace is True, the filtering is performed on the current Field and the method returns None.
Otherwise the method returns a Field object with the reduced data.
Example
# Load field aaet = Field.from_file(“/path/to/file/aaet.out”) print(aaet.layers) # […, ‘Total’] # Filter rows for which layer ‘Total’ is greater than 100 aaet1 = aaet.filter(‘Total > 100’) # Filter rows for which layer ‘Total’ is greater than 100 and layer ‘C3G’ is lower than 10 (inplace) aaet.filter(‘Total > 100 and C3G < 10’, inplace=True)
Notes
See pandas documentation for DataFrame.query() for more details on the query string.
- classmethod from_file(path: str, file_format: str = 'lpjg_annual', grid_type: str = 'lonlat', source: str | None = None, reader_params: dict[str, Any] | None = None, grid_params: dict[str, Any] | None = None) Self[source]
Construct a Field object from an LPJ-GUESS output file.
- Parameters:
path (str) – Path to file(s)
file_format (str) – One of the registered file formats. See file readers documentation for details. For LPJ-GUESS standard output: - ‘lpjg_annual’ (annual output) - ‘lpjg_monthly’ (monthly output) The default format is ‘lpjg_annual’.
grid_type (str) – The type of grid associated to the data.
source (str) – The source to retrieve the file’s metadata. The format for this argument is ‘source:field’. For example, to read an LPJ-GUESS file and add metadata corresponding to Annual GPP: agpp = Field.from_file(‘/path/to/file/file_name.out’, file_format=’lpjg_annual’, source=’lpjguess:agpp’)
reader_params (dict[str,Any]) – A dictionary of parameters passed to the file reader as keyword arguments. See the documentation of the different file readers for details.
grid_params (dict[str,Any]) – A dictionary of parameters passed to the Grid.from_frame class method as keyword arguments. See the Grid object documentation.
- Return type:
A Field object.
- property history: list[str]
Keeps the history of modifications of the Field
- Type:
list
- property layers: list[str]
returns a list with the Field’s layer names.
- Type:
str
- log(entries: str | list[str]) None[source]
File one or more entries in the Field’s history log.
The function files the passed entry in the history log and adds a timestamp.
- Parameters:
entries (str | list[str]) – The entry or list of entries to log. If a list is provided, the same timestamp will be attached to all entries.
- property metadata: MappingProxyType
Field’s metadata (units, etc…)
- Type:
dict
- property modified: bool
True if the Field has been modified after loading.
- Type:
bool
- property name: str
- reduce(redspec: RedSpec, inplace: bool = False) None | Field[source]
Perform the selection/slicing/reduction operations specified in te passed RedSpec object.
- Parameters:
redspec (RedSpec) – A RedSpec object specifying how to slice and/or reduce the data.
inplace (bool = False) – If True, the reduction is performed on the current Field.
- Returns:
If inplace is True, the reduction is performed on the current Field and the method returns None.
Otherwise the method returns a Field object with the reduced data.
- reduce_grid(gridop: str, axis: str = 'both', inplace: bool = False) None | Field[source]
Perform a reduction operation on the spatial axes.
- Parameters:
gridop (str) – The spatial reduction operation: one of ‘av’, ‘sum’.
axis (str = 'both') – A grid axis or ‘both’ (the default value).
inplace (bool = False) – If True, the reduction is performed on the current Field.
- Returns:
If inplace is True, the reduction is performed on the current Field and the method returns None.
Otherwise the method returns a Field object with the reduced data.
- reduce_layers(redop: str, layers: Sequence[str] | Index | None = None, name: str | None = None, drop: bool = False, inplace: bool = False) None | Field[source]
Perform a reduction operation on the Field’s layers.
- Parameters:
redop (str) – The reduction operation. One of ‘sum’, ‘av’, ‘maxLay’, ‘/’.
layers (None or a list of strings.) – List of names of layers to be reduced. If None, all layers are reduced.
name (None or str.) – Name of the new layer to store the reduction. If None, the redop argument is used.
drop (bool) – If True, the reduced layers are dropped from the data.
inplace (bool) – If True, the layers are reduced in the current object.
- Returns:
If inplace is True, the layers are reduced in the current Field and the method returns None.
Otherwise the method returns a Field object with the reduced data.
- reduce_time(timeop: str, freq: str | None = None, inplace: bool = False) None | Field[source]
Perform a reduction operation on the time axis.
- Parameters:
timeop (str) – The time reduction operation: one of ‘av’, ‘sum’.
freq (str | None = None) – A string specifying the frequency of the reduction. This is formed by an integer number and one of ‘M’, ‘Y’. For example, to perform an average every five years, specify timeop=’av’ and freq=’5Y’. If freq=None the whole time series is reduced.
inplace (bool = False) – If True, the reduction is performed on the current Field.
- Returns:
If inplace is True, the reduction is performed on the current Field and the method returns None.
Otherwise the method returns a Field object with the reduced data.
- rename_layers(new_names: dict[str, str]) None[source]
Rename the field’s layers
- Parameters:
new_names (dict) – A dictionary mapping existing layer names to their new names.
- sample_gridcells(size: int, inplace: bool = False)[source]
Select random sample of gridcells
- Parameters:
size (int) – Size of the sample
inplace (bool = False) – If True, the selection is performed in place
- Returns:
A new field with a subset of ‘size’ randomly chosen gridcells or None
if the operation is inplace.
- select_layers(layers: str | Sequence[str] | Index, inplace: bool = False) None | Field[source]
Select one or more layers from the Field.
- Parameters:
layers (str | Sequence[str]) – Layer name or list of layer names.
inplace (bool = False) – If True, the selection is performed in place.
- Returns:
If inplace is True, the selection is performed in place and the method returns None.
Otherwise the method returns a Field object containing the selected data.
- select_region(region: str | Sequence[str], region_type: str = 'country', inplace: bool = False) None | Field[source]
Select grid cells whose lon/lat fall inside a named geographical region.
- Parameters:
region (str | Sequence[str]) – One or more region identifiers understood by the chosen region set. For example: names for countries, or region names for Giorgi/SREX/AR6. Pass a list (or other sequence) to keep grid cells that fall inside any of the listed regions (logical OR).
region_type (str = "country") – Which predefined region set to use: - “country”: natural_earth_v5_0_0.countries_10 (https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/) - “giorgi”: Giorgi regions (https://link.springer.com/article/10.1007/PL00013733) - “SREX”/”srex”: IPCC SREX regions (https://www.ipcc-data.org/guidelines/pages/ar5_regions.html) - “AR6”/”ar6”: IPCC AR6 regions (https://github.com/IPCC-WG1/Atlas/tree/devel/reference-regions)
inplace (bool = False) – If True, the selection is performed in place.
- Returns:
If inplace is True, the selection is performed in place and the method returns None.
Otherwise the method returns a Field object containing only rows whose coordinates
are inside the region.
- select_slice(slices: dict[str, tuple], inplace: bool = False) None | Field[source]
Select a time or spatial slice from the Field.
- Parameters:
slices (dict[str, tuple]) – A dictionary of axis slices where the keys are the axis name and the values tuples representing the slice.
inplace (bool = False) – If True, the slicing is performed in place.
- Returns:
If inplace is True, the slicing is performed in place and the method returns None.
Otherwise the method returns a Field object containing the sliced data.
- set_md(key: str, value: str) None[source]
Add or replace an entry in the metadata dictionary.
- Parameters:
key (str) – The key under which the value is stored.
value (str | float | int) – The value of the metadata entry.
Example
# Load field anpp = Field.from_file(“/path/to/file/anpp.out”) # Fails because ‘name’ is set in the constructor field.add_md(‘name’, ‘NPP’) # KeyError field.set_md(‘name’, ‘NPP’) # Okay
- property sites: MappingProxyType
Get a dictionary of sites for unstructured grids
If the data has a ‘label’ level, the keys will be the labels. If not, the keys will be the coordinate pair in string format: ‘(x_axis_name, y_axis_name)’. In both cases the values of the dictionary are the sites’ coordinates as a tuple: (x, y)
- property time_freq: str
returns the sampling frequency of the time axis.
- Type:
str
- property timeop: str | None
returns the reduction operation applied to the time dimension (if any).
- Type:
str
- property units: str
canopy.core.frameops
- canopy.core.frameops.apply_div(df: DataFrame, operand: SupportsFloat | list[str], layers: list[str], how: Literal['left', 'right']) None[source]
- canopy.core.frameops.apply_mul(df: DataFrame, operand: SupportsFloat | list[str], layers: list[str]) None[source]
- canopy.core.frameops.apply_reduction(df: DataFrame, grid: Grid, redspec: RedSpec) tuple[DataFrame, Grid, list[str]][source]
- canopy.core.frameops.apply_sub(df: DataFrame, operand: SupportsFloat | list[str], layers: list[str], how: Literal['left', 'right']) None[source]
- canopy.core.frameops.apply_sum(df: DataFrame, operand: SupportsFloat | list[str], layers: list[str]) None[source]
- canopy.core.frameops.reduce_grid(df, grid: Grid, gridop: str, axis: str) tuple[DataFrame, Grid, list[str]][source]
canopy.core.redspec
- class canopy.core.redspec.RedSpec(layers: list[str] | str | NoneType = None, slices: dict[str, tuple] | None = None, gridop: str | None = None, axis: str = 'both', timeop: str | None = None, freq: str | None = None)[source]
Bases:
object- axis: str = 'both'
- freq: str | None = None
- gridop: str | None = None
- layers: list[str] | str | None = None
- slices: dict[str, tuple] | None = None
- timeop: str | None = None
canopy.core.grid.grid_abc
canopy.core.grid.grid_empty
canopy.core.grid.grid_lonlat
Grid with longitude and latitude coordinates.
This grid type has longitude and latutide coordinates. The longitude and latitude intervals are constant, although they can be different (i.e. dlon != dlat). Reduction operations on this grid are weighted according to their position on the grid.
- class canopy.core.grid.grid_lonlat.GridLonLat(lon_min: SupportsFloat = nan, lon_max: SupportsFloat = nan, dlon: SupportsFloat = nan, lat_min: SupportsFloat = nan, lat_max: SupportsFloat = nan, dlat: SupportsFloat = nan, lon_gridop: str | None = None, lat_gridop: str | None = None)[source]
Bases:
GridA Grid type representing a longitude-latitude grid with constant increments.
- property dlat
- property dlon
- classmethod from_frame(df: DataFrame, dlon: float = nan, dlat: float = nan, lon_gridop: None | str = None, lat_gridop: None | str = None) GridLonLat[source]
Create a GridLonLat instance from a DataFrame.
- Parameters:
df (pd.DataFrame) – A pandas DataFrame with a valid format (see Field documentation).
dlon (float) – If supplied, use this increment for the longitude axis, instead of inferring it from the DataFrame’s index
dlat (float) – If supplied, use this increment for the latitude axis, instead of inferring it from the DataFrame’s index
gridop_lon (str | None) – If the DataFrame’s index does not have a ‘lon’ axis, a gridop for this axis must be supplied.
gridop_lat (str | None) – If the DataFrame’s index does not have a ‘lat’ axis, a gridop for this axis must be supplied.
- Return type:
An instance of the grid subclass.
- property lat
- property lat_max
- property lat_min
- property lats
- property lon
- property lon_max
- property lon_min
- property lons
- reduce(gridop: str, axis: str) Grid[source]
Create a new grid, reduced according to the parameters
- Parameters:
gridop (str) – The reduction operation
axis (str) – The axis to be reduced
- Return type:
An instance of GridLonLat
- canopy.core.grid.grid_lonlat.av_both(df: DataFrame, grid: GridLonLat) DataFrame[source]
Average data across the whole domain.
- Each gridcell value is weighted by its corresponding area element:
da = EARTH_RADIUS**2*dlon*dlat*cos(lat),
where the angles are in radians.
- Parameters:
df (pd.DataFrame) – The pandas DataFrame whose data is to be averaged.
grid (GridLonLat) – A GridLonLat object.
- Return type:
A reduced pandas DataFrame
- canopy.core.grid.grid_lonlat.av_lat(df: DataFrame, grid: GridLonLat) DataFrame[source]
Average data along the latitude axis.
- Parameters:
df (pd.DataFrame) – The pandas DataFrame whose data is to be averaged.
grid (GridLonLat) – A GridLonLat object.
- Return type:
A reduced pandas DataFrame
- canopy.core.grid.grid_lonlat.av_lon(df: DataFrame, grid: GridLonLat) DataFrame[source]
Average data along the longitude axis.
- Parameters:
df (pd.DataFrame) – The pandas DataFrame whose data is to be averaged.
grid (Grid) – A GridLonLat object.
- Return type:
A reduced pandas DataFrame
- canopy.core.grid.grid_lonlat.sum_both(df: DataFrame, grid: GridLonLat) DataFrame[source]
Aggregate data across the whole domain.
- Each gridcell value is weighted by its corresponding area element:
da = EARTH_RADIUS**2*dlon*dlat*cos(lat),
where the angles are in radians.
- Parameters:
df (pandas DataFrame) – The pandas DataFrame whose data is to be averaged.
grid (GridLonLat) – A GridLonLat object.
- Return type:
A reduced pandas DataFrame
- canopy.core.grid.grid_lonlat.sum_lat(df: DataFrame, grid: GridLonLat) DataFrame[source]
Aggregate data along the latitude axis.
- Parameters:
df (pandas DataFrame) – The pandas DataFrame whose data is to be averaged.
grid (GridLonLat) – A GridLonLat object.
- Return type:
A reduced pandas DataFrame
- canopy.core.grid.grid_lonlat.sum_lon(df: DataFrame, grid: GridLonLat) DataFrame[source]
Aggregate data along the longitude axis.
- Parameters:
df (pd.DataFrame) – The pandas DataFrame whose data is to be averaged.
grid (GridLonLat) – A GridLonLat object.
- Return type:
A reduced pandas DataFrame
canopy.core.grid.grid_sites
Grid associated to site-based data.
This Grid type actually represents the absence of a grid. It is meant to describe a collection of unrelated sites or locations. Spatial reduction operations are only defined for both axes (axis = ‘both’). Coordinates are longitude and latitude.
- canopy.core.grid.grid_sites.av_both(df: DataFrame, grid: Grid) DataFrame[source]
Spatially average the data.
On this ‘grid’, all sites count the same for the average.
- Parameters:
df (pd.DataFrame) – The pandas DataFrame whose data is to be averaged.
grid (GridSites) – A GridSites object.
- Return type:
A reduced pandas DataFrame
- canopy.core.grid.grid_sites.sum_both(df: DataFrame, grid: Grid) DataFrame[source]
Spatially aggregate the data.
On this ‘grid’, all sites count the same for the sum.
- Parameters:
df (pd.DataFrame) – The pandas DataFrame whose data is to be aggregated.
grid (GridSites) – A GridSites object.
- Return type:
A reduced pandas DataFrame
canopy.core.grid.registry
Registry functionality for grids and grid operations.
This module provides:
A registry of different Grid types: A dictionary where the keys are string identifiers for each type of grid (e.g. ‘sites’, ‘lonlat’…) and the values are the corresponding subclass of Grid (e.g. GridSites, GridLonLat…)
A registry of spatial reduction operations on the different grids: A dictionary where the keys are tuples of the form (grid_type, operation, axis), and the values are the functions that perform the operation on the dataframe. In the key, grid_type is the string identifier of the grid (‘sites’, ‘lonlat’…), operation is a string identifying the reduction operation, and axis is the axis name as it appears on the DataFrame index.
Examples
Suppose we want to register a cartesian grid and its associated operations. The first step is to create a new file in the grid/ folder. This file will contain the grid description and the supported grid operations. Let’s assume the file is grid/grid_xy.py. The code described below goes in this file.
We first define a new class, GridXY, inheriting from the Grid abstract base class. This class will be registered with the string identifier passed to the register_grid class decorator:
from canopy.grid.grid_abc import Grid
grid_type = 'xy' # String used to register this grid type.
@register_grid(grid_type)
class GridXY(Grid):
super().__init__(axis0 = 'x', axis1 = 'y')
...
The Grid object’s documentation details the mandatory abstract methods to implement.
we define the operations that are allowed on this grid, and register them with the register_gridop decorator. This decorator will use the name of the function to form a key to identify the operation. The name of the function must be of the format operation_axis. For example, to register an averaging operation along the x axis, we would do:
@register_gridop(grid_type)
def av_x(df: pd.DataFrame, grid: Grid) -> pd.DataFrame:
...
Note that grid operations have all the same signature.
Lastly, we need to add the new file, grid/grid_xy.py, to grid/__init.py__:
import canopy.grid.grid_xy
If all went well, the new grid type and grid operations will be available. For example, the following should work:
from canopy.field import Field
field = Field.from_file(path, grid_type='xy')
field.reduce_grid('av', 'x', inplace=True)
print(field)
- canopy.core.grid.registry.check_gridop(grid: str | Grid, gridop: str, axis: str) None[source]
Check if a grid operation is registered.
- Parameters:
grid (str | Grid) – An instance of the Grid object describing the grid type.
gridop (str) – The string identifying the grid operation (e.g., ‘av’ for average).
axis (str) – The name of the axis along which the operation is performed (e.g. ‘lon’)
- canopy.core.grid.registry.create_grid(grid_type: str, **kwargs) Grid[source]
Create an instance of a Grid object of the specified type.
- Parameters:
grid_type (str) – The string identifying the grid type to create.
**kwargs – The keyword arguments are forwarded to the selected grid’s constructor.
- Return type:
An instance of a subclass of Grid, specified by the grid_type parameter.
Examples
# Create a 'lonlat' type grid from canopy.grid import create_grid grid = create_grid('lonlat', lon_min = -12.25, lon_max = 10.75, dlon = 0.5, lat_min = 20.25, lat_max = 40.25, dlat = 0.5)
- canopy.core.grid.registry.get_grid(grid_type: str) Type[Grid][source]
Get a reference to the uninstantiated Grid subclass of the specified type.
- Parameters:
grid_type (str) – The grid type identifier (e.g., ‘lonlat, ‘sites’)
- Return type:
The Grid subclass type registered under the specified grid_type string.
- canopy.core.grid.registry.get_grid_type(grid: Grid) str[source]
Get the grid type string identifier of the passed Grid instance.
- Parameters:
grid (Grid) – An instance of a Grid subclass.
- Return type:
The grid type’s string identifier.
- canopy.core.grid.registry.get_gridop(grid: str | Grid, gridop: str, axis: str) Callable[[DataFrame, Grid], DataFrame][source]
Retrieve a grid operation function from the registry.
- Parameters:
grid (str | Grid) – An instance of the Grid object describing the grid type.
gridop (str) – The string identifying the grid operation (e.g., ‘av’ for average).
axis (str) – The name of the axis along which the operation is performed (e.g. ‘lon’)
- Return type:
The function that performs the selected grid operation.
- canopy.core.grid.registry.register_grid(cls) Grid[source]
Add decorated Grid subclass to the grid registry.
- Parameters:
grid_type (str) – A string that identifies the type of Grid being registered.
- Return type:
The Grid type that is registered
- canopy.core.grid.registry.register_gridop(grid_type: str, gridop: str, axis: str)[source]
Add decorated function to the grid operations registry.
- Parameters:
grid_type (str) – The grid type string identifier.
- Return type:
A decorator to register a grid grid operation.
Notes
The name of the grid operation (gridop) can be any string. But it should make sense and be consistent with all the other gridops (for example, if you want to register averaging operations for axes ‘x’ and ‘y’, don’t use ‘mean’ for the ‘x’ axis and ‘av’ for the ‘y’ axis).
The names of the axes, however, must be the consistent with the names of the indices on the DataFrame that the operation is meant to act on.