binary_classification_curve#

rojak.turbulence.metrics.binary_classification_curve(sorted_truth: Array, sorted_values: Array, num_intervals: int = 100, positive_classification_label: int | float | bool | str | None = None) BinaryClassificationResult[source]#

Binary classification curve

Parameters:
Returns:

A named tuple with the false positive, true positive, and thresholds

Return type:

BinaryClassificationResult

Modifying the example in the scikit-learn documentation on roc_curves:

>>> import dask.array as da
>>> y = np.asarray([1, 1, 2, 2])
>>> scores = np.asarray([0.1, 0.4, 0.35, 0.8])
>>> binary_classification_curve(da.asarray(y), da.asarray(scores), positive_classification_label=2)
Traceback (most recent call last):
ValueError: values must be strictly decreasing

Scikit-learn does not require the arrays to be sorted. However, as this implementation uses dask arrays, there is no built-in way to sort a dask array. Thus, the arrays passed into these methods must already be sorted

>>> decrease_idx = np.argsort(scores)[::-1]
>>> scores = da.asarray(scores[decrease_idx])
>>> y = da.asarray(y[decrease_idx])

Once the values are sorted, they can be passed into the method.

>>> classification = binary_classification_curve(y, scores, positive_classification_label=2)
>>> classification
BinaryClassificationResult(false_positives=dask.array<sub, shape=(nan,), dtype=int64, chunksize=(nan,),
chunktype=numpy.ndarray>, true_positives=dask.array<slice_with_int_dask_array_aggregate, shape=(nan,), dtype=int64,
chunksize=(nan,), chunktype=numpy.ndarray>, thresholds=dask.array<slice_with_int_dask_array_aggregate, shape=(nan,),
dtype=float64, chunksize=(nan,), chunktype=numpy.ndarray>)

The method returns a named tuple BinaryClassificationResult containing dask.array.Array. To get the values, compute() must be invoked on them evaluate the lazy collection.

>>> classification.false_positives.compute()
array([0, 1, 1, 2])
>>> classification.true_positives.compute()
array([1, 1, 2, 2])
>>> classification.thresholds.compute()
array([0.8 , 0.4 , 0.35, 0.1 ])