Reference for SWMManywhere/metric_utilties.py
Metric utilities module for SWMManywhere.
A module for metrics, the metrics registry object and utilities for calculating metrics (such as NSE or timeseries data alignment) used in SWMManywhere.
MetricRegistry
Bases: dict
Registry object.
Source code in swmmanywhere/metric_utilities.py
__getattr__(name)
register(func)
Register a metric.
Source code in swmmanywhere/metric_utilities.py
align_by_id(synthetic_results, real_results, variable, syn_ids, real_ids)
Align and interpolate data by id.
Aggregate synthetic and real results by date for specifics ids (i.e., sum up over all ids - so we are only comparing timeseries for one aggregation). Align the synthetic and real dates. In cases where the synthetic data is does not overlap the real data, the value is interpolated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_results
|
DataFrame
|
The synthetic results. |
required |
real_results
|
DataFrame
|
The real results. |
required |
variable
|
str
|
The variable to align and calculate coef_func for. |
required |
syn_ids
|
list
|
The ids of the synthetic data to subselect for. |
required |
real_ids
|
list
|
The ids of the real data to subselect for. |
required |
coef_func
|
Callable
|
The coefficient to calculate. Defaults to nse. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame: The aligned and interpolated data. |
Source code in swmmanywhere/metric_utilities.py
align_by_shape(var, synthetic_results, real_results, shapes, synthetic_G, real_G, key='sub_id')
Align by subcatchment.
Align synthetic and real results by shape and return the results. If multiple ids exist in the same shape, these are aggregated by sum.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
var
|
str
|
The variable to align. |
required |
synthetic_results
|
DataFrame
|
The synthetic results. |
required |
real_results
|
DataFrame
|
The real results. |
required |
shapes
|
GeoDataFrame
|
The shapes to align by (e.g., grid or real_subs). |
required |
synthetic_G
|
Graph
|
The synthetic graph. |
required |
real_G
|
Graph
|
The real graph. |
required |
key
|
str
|
The column to align by. |
'sub_id'
|
Source code in swmmanywhere/metric_utilities.py
best_outfall_match(synthetic_G, real_subs)
Best outfall match.
Identify the outfall with the most nodes within the real_subs and return the subgraph of the synthetic graph of nodes that drain to that outfall.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_G
|
Graph
|
The synthetic graph. |
required |
real_subs
|
GeoDataFrame
|
The real subcatchments. |
required |
Returns:
Name | Type | Description |
---|---|---|
Graph
|
nx.Graph: The subgraph of the synthetic graph for the outfall with the most nodes within the real_subs. Empty if no match is made. |
|
int |
int | None
|
The id of the outfall. None if no outfall is found. |
Source code in swmmanywhere/metric_utilities.py
bias_flood_depth(synthetic_results, real_results, synthetic_subs, real_subs, **kwargs)
Run the evaluated metric.
Source code in swmmanywhere/metric_utilities.py
create_grid(bbox, scale)
Create a grid of polygons.
Create a grid of polygons based on the bounding box and scale.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bbox
|
tuple
|
The bounding box coordinates in the format (minx, miny, maxx, maxy). |
required |
scale
|
float | tuple
|
The scale of the grid. If a tuple, the scale is (dx, dy). Otherwise, the scale is dx = dy = scale. |
required |
Returns:
Type | Description |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: A geodataframe of the grid. |
Source code in swmmanywhere/metric_utilities.py
create_subgraph(G, nodes)
Create a subgraph.
Create a subgraph of G based on the nodes list. Taken from networkx documentation: https://networkx.org/documentation/stable/reference/classes/generated/networkx.Graph.subgraph.html
Parameters:
Name | Type | Description | Default |
---|---|---|---|
G
|
Graph
|
The original graph. |
required |
nodes
|
list
|
The list of nodes to include in the subgraph. |
required |
Returns:
Type | Description |
---|---|
Graph
|
nx.Graph: The subgraph. |
Source code in swmmanywhere/metric_utilities.py
dominant_outfall(G, results)
Dominant outfall.
Identify the outfall with highest flow along it and return the subgraph of the graph of nodes that drain to that outfall.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
G
|
DiGraph
|
The graph. |
required |
results
|
DataFrame
|
The results, which include a 'flow' and 'id' column. |
required |
Returns:
Name | Type | Description |
---|---|---|
DiGraph
|
nx.Graph: The subgraph of nodes/arcs that the reach max flow outfall |
|
int |
int
|
The id of the outfall. |
Source code in swmmanywhere/metric_utilities.py
edge_betweenness_centrality(G, normalized=True, weight='weight', njobs=-1)
Parallel betweenness centrality function.
Source code in swmmanywhere/metric_utilities.py
extract_var(df, var)
Extract var from a dataframe.
grid(synthetic_results, synthetic_subs, synthetic_G, real_results, real_subs, real_G, metric_evaluation, var, coef_func)
Grid scale metric.
Classify synthetic nodes to a grid and calculate the coef_func of a variable over time for each grid cell. The metric produced is the median coef_func across all grid cells.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_results
|
DataFrame
|
The synthetic results. |
required |
synthetic_subs
|
GeoDataFrame
|
The synthetic subcatchments. |
required |
synthetic_G
|
Graph
|
The synthetic graph. |
required |
real_results
|
DataFrame
|
The real results. |
required |
real_subs
|
GeoDataFrame
|
The real subcatchments. |
required |
real_G
|
Graph
|
The real graph. |
required |
metric_evaluation
|
MetricEvaluation
|
The metric evaluation parameters. |
required |
var
|
str
|
The variable to calculate the coefficient for. |
required |
coef_func
|
Callable
|
The coefficient to calculate. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
The median coef_func value. |
Source code in swmmanywhere/metric_utilities.py
iterate_metrics(synthetic_results=None, synthetic_subs=None, synthetic_G=None, real_results=None, real_subs=None, real_G=None, metric_list=None, metric_evaluation=None)
Iterate a list of metrics over a graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_results
|
DataFrame | None
|
The synthetic results. |
None
|
synthetic_subs
|
GeoDataFrame | None
|
The synthetic subcatchments. |
None
|
synthetic_G
|
Graph | None
|
The synthetic graph. |
None
|
real_results
|
DataFrame | None
|
The real results. |
None
|
real_subs
|
GeoDataFrame | None
|
The real subcatchments. |
None
|
real_G
|
Graph | None
|
The real graph. |
None
|
metric_list
|
list[str] | None
|
A list of metrics to iterate. |
None
|
metric_evaluation
|
MetricEvaluation | None
|
The metric evaluation parameters. |
None
|
Returns:
Type | Description |
---|---|
dict[str, float]
|
dict[str, float]: The results of the metrics. |
Source code in swmmanywhere/metric_utilities.py
kge(y, yhat)
Calculate the Kling-Gupta Efficiency (KGE) between simulated and observed data.
Calculate KGE with the 2009 formulation: $$ KGE = 1 - \sqrt{ (r - 1)^2 + (\frac{\sigma_{sim}}{\sigma_{obs}} - 1)^2 + (\frac{\mu_{sim}}{\mu_{obs}} - 1)^2 } $$
where:
- \(r\) is the correlation coefficient between observed and simulated value,
- \(\sigma_{sim}\) and \(\sigma_{obs}\) are the standard deviations of the simulated and observed value, respectively,
- \(\mu_{sim}\) and \(\mu_{obs}\) are the means of the simulated and observed value, respectively.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
array
|
Observed data array. |
required |
yhat
|
array
|
Simulated data array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The KGE value. |
Source code in swmmanywhere/metric_utilities.py
kstest_betweenness(synthetic_G, real_G, **kwargs)
Run the evaluated metric.
Source code in swmmanywhere/metric_utilities.py
kstest_edge_betweenness(synthetic_G, real_G, **kwargs)
Run the evaluated metric.
Source code in swmmanywhere/metric_utilities.py
median_coef_by_group(results, gb_key, coef_func=nse)
Median coef_func value by group.
Calculate the median coef_func value of a variable over time for each group in the results dataframe, and return the median of these values. Assumes that the results dataframe has a 'value_real' and 'value_syn' and that these properly line up.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
results
|
DataFrame
|
The results dataframe. |
required |
gb_key
|
str
|
The column to group by. |
required |
coef_func
|
Callable
|
The coefficient to calculate. Default is nse. |
nse
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The median coef_func value. |
Source code in swmmanywhere/metric_utilities.py
metric_factory(name)
Create a metric function.
A factory function to create a metric function based on the name. The first part of the name is the scale, the second part is the metric, and the third part is the variable. For example, 'grid_nse_flooding' is a metric function that calculates the NSE of flooding at the grid scale.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the metric. |
required |
Returns:
Name | Type | Description |
---|---|---|
Callable |
The metric function. |
Source code in swmmanywhere/metric_utilities.py
nc_adjacency_dist(synthetic_G, real_G, **kwargs)
Run the evaluated metric.
Source code in swmmanywhere/metric_utilities.py
nc_compare(G1, G2, funcname, **kw)
nc_deltacon0(synthetic_G, real_G, **kwargs)
nc_laplacian_dist(synthetic_G, real_G, **kwargs)
Run the evaluated metric.
nc_laplacian_norm_dist(synthetic_G, real_G, **kwargs)
Run the evaluated metric.
Source code in swmmanywhere/metric_utilities.py
nc_resistance_distance(synthetic_G, real_G, **kwargs)
Run the evaluated metric.
Source code in swmmanywhere/metric_utilities.py
nc_vertex_edge_distance(synthetic_G, real_G, **kwargs)
Run the evaluated metric.
Do '1 -' because this metric is similarity not distance.
Source code in swmmanywhere/metric_utilities.py
nodes_to_subs(G, subs)
Nodes to subcatchments.
Classify the nodes of the graph to the subcatchments of the subs dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
G
|
Graph
|
The graph. |
required |
subs
|
GeoDataFrame
|
The subcatchments. |
required |
Returns:
Type | Description |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: A dataframe from the nodes and data, and the subcatchment information, distinguished by the column 'sub_id'. |
Source code in swmmanywhere/metric_utilities.py
nse(y, yhat)
Calculate Nash-Sutcliffe efficiency (NSE).
Calculate the Nash-Sutcliffe efficiency (NSE):
where:
- \(Q_{obs,i}\) is the observed value at time \(i\),
- \(Q_{sim,i}\) is the simulated value at time \(i\),
- \(\overline{Q}_{obs}\) is the mean observed value over the simulation period,
- \(n\) is the number of time steps in the simulation period.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
array
|
Observed data array. |
required |
yhat
|
array
|
Simulated data array. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The NSE value. |
Source code in swmmanywhere/metric_utilities.py
outfall(synthetic_results, synthetic_subs, synthetic_G, real_results, real_subs, real_G, metric_evaluation, var, coef_func)
Outfall scale metric.
Calculate the coefficient of a variable for the subgraph that drains to the dominant outfall node. The dominant outfall node of the 'real' network is calculated by dominant_outfall, while the dominant outfall node of the 'synthetic' network is calculated by best_outfall_match.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_results
|
DataFrame
|
The synthetic results. |
required |
synthetic_subs
|
GeoDataFrame
|
The synthetic subcatchments. |
required |
synthetic_G
|
Graph
|
The synthetic graph. |
required |
real_results
|
DataFrame
|
The real results. |
required |
real_subs
|
GeoDataFrame
|
The real subcatchments. |
required |
real_G
|
Graph
|
The real graph. |
required |
metric_evaluation
|
MetricEvaluation
|
The metric evaluation parameters. |
required |
var
|
str
|
The variable to calculate the coefficient for. |
required |
coef_func
|
Callable
|
The coefficient to calculate. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
The median coef_func value. |
Source code in swmmanywhere/metric_utilities.py
831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 |
|
outfall_kstest_diameters(real_G, synthetic_G, real_results, real_subs, **kwargs)
Outfall KStest diameters.
Calculate the Kolmogorov-Smirnov statistic of the diameters in the subgraph that drains to the dominant outfall node. The dominant outfall node of the 'real' network is calculated by dominant_outfall, while the dominant outfall node of the 'synthetic' network is calculated by best_outfall_match.
Source code in swmmanywhere/metric_utilities.py
register_coef(coef_func)
Register a coefficient function.
Register a coefficient function to the coef_registry. The function should take two arguments, 'y' and 'yhat', and return a float. The function should be registered with the '@register_coef' decorator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
coef_func
|
Callable
|
The coefficient function to register. |
required |
Source code in swmmanywhere/metric_utilities.py
register_restriction(restriction_func)
Register a restriction function.
Register a restriction function to the restriction_registry. A restriction allows for the restriction of certain combinations of variables within the metric_factory. The function should take three arguments, 'scale', 'metric', and 'variable', and should raise a ValueError if the combination is not allowed. The function should be registered with the '@register_restriction'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
restriction_func
|
Callable
|
The restriction function to register. |
required |
Source code in swmmanywhere/metric_utilities.py
register_scale(scale_func)
Register a scale function.
Register a scale function to the scale_registry. The function should take the same arguments as the scale functions and return a float. The function should be registered with the '@register_scale' decorator. A scale function is called as a metric, but with some additional arguments provided (i.e., the variable name and the coefficient function to use). The function should return a float.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale_func
|
Callable
|
The scale function to register. |
required |
Source code in swmmanywhere/metric_utilities.py
relerror(y, yhat)
Relative error, relerror.
Calculate the relative error:
where:
- \(synthetic\) is the synthetic data,
- \(real\) is the real data,
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
The real data. |
required |
yhat
|
ndarray
|
The synthetic data. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The relerror value. |
Source code in swmmanywhere/metric_utilities.py
restriction_on_metric(scale, metric, variable)
Restriction on metric.
Restrict the design variables to use 'relerror' only.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale
|
str
|
The scale of the metric. |
required |
metric
|
str
|
The metric. |
required |
variable
|
str
|
The variable. |
required |
Source code in swmmanywhere/metric_utilities.py
restriction_on_scale(scale, metric, variable)
Restriction on scale.
Restrict the design variables to the outfall scale if the metric is 'relerror'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale
|
str
|
The scale of the metric. |
required |
metric
|
str
|
The metric. |
required |
variable
|
str
|
The variable. |
required |
Source code in swmmanywhere/metric_utilities.py
subcatchment(synthetic_results, synthetic_subs, synthetic_G, real_results, real_subs, real_G, metric_evaluation, var, coef_func)
Subcatchment scale metric.
Calculate the coefficient (coef_func) of a variable over time for aggregated to real subcatchment scale. The metric produced is the median coef_func across all subcatchments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_results
|
DataFrame
|
The synthetic results. |
required |
synthetic_subs
|
GeoDataFrame
|
The synthetic subcatchments. |
required |
synthetic_G
|
Graph
|
The synthetic graph. |
required |
real_results
|
DataFrame
|
The real results. |
required |
real_subs
|
GeoDataFrame
|
The real subcatchments. |
required |
real_G
|
Graph
|
The real graph. |
required |
metric_evaluation
|
MetricEvaluation
|
The metric evaluation parameters. |
required |
var
|
str
|
The variable to calculate the coefficient for. |
required |
coef_func
|
Callable
|
The coefficient to calculate. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
The median coef_func value. |
Source code in swmmanywhere/metric_utilities.py
validate_metric_list(metric_list)
Validate a list of metrics.
Validate that all metrics in the metric list are registered.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metric_list
|
list[str]
|
A list of metrics to validate. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If a metric is not registered. |