area_under_curve#

rojak.turbulence.metrics.area_under_curve(x_values: da.Array | NumpyOrDataArray, y_values: da.Array | NumpyOrDataArray) float[source]#

Area under the curve

Integrates using the scipy.integrate.trapezoid() method. Integrals over dask.array.Array collections are evaluated for each chunk and combined accordingly

Parameters:
  • x_values (da.Array | NumpyOrDataArray) – 1D array of points corresponding to the y values

  • y_values (da.Array | NumpyOrDataArray) – 1D array to integrate

Returns:

Area under the curve

Return type:

float

Modifying the examples in the documentation for scipy.integrate.trapezoid() method,

>>> area_under_curve(np.asarray([4, 5, 6]), np.asarray([1, 2, 3]))
4.0
>>> area_under_curve(da.asarray([4, 5, 6]), da.asarray([1, 2, 3]))
4.0

As this is to be used for computing AUC for ROC, the area under the curve should always be positive if the values to integrate over are decreasing in the x-axis. Thus, this method will return + 8.0 and not -8.0 as seen in scipy.integrate.trapezoid()

>>> area_under_curve(np.asarray([8, 6, 4]), np.asarray([1, 2, 3]))
8.0
>>> area_under_curve(da.asarray([8, 6, 4]), da.asarray([1, 2, 3]))
8.0

This behaviour is consistent with sklearn.metrics.auc(). Modifying the example in the scikit-learn docs,

>>> y = np.asarray([1, 1, 2, 2])
>>> scores = np.asarray([0.1, 0.4, 0.35, 0.8])
>>> decrease_idx = np.argsort(scores)[::-1]
>>> scores = da.asarray(scores[decrease_idx], chunks=2)
>>> y = da.asarray(y[decrease_idx], chunks=2)
>>> roc = received_operating_characteristic(y, scores, positive_classification_label=2)
>>> area_under_curve(roc.false_positives, roc.true_positives)
0.75