matthews_corr_coeff_multidim#

rojak.turbulence.metrics.matthews_corr_coeff_multidim(first_var: DataArray, second_var: DataArray, sum_over: str) DataArray[source]#

Matthews Correlation Coefficient for multidimensional arrays

This assumes that the inputs are binary variables such that the contingency table is as follows (see Wikipedia on MCC):

\[\begin{split}\begin{array}{c|c|c|c} & y = 1 & y = 0 & \text{Total} \\ \hline x = 1 & n_{11} & n_{10} & n_{1\bullet} \\ x = 0 & n_{01} & n_{00} & n_{0\bullet} \\ \hline \text{Total} & n_{\bullet1} & n_{\bullet0} & n \end{array}\end{split}\]

Such that the Matthew’s Correlation Coefficient (\(\varphi\)) is given as,

\[\varphi = \frac{n n_{11} - n_{1\bullet} n_{\bullet 1}} {\sqrt{n_{1\bullet} n_{\bullet 1} (n - n_{1\bullet})(n - n_{\bullet 1})}}.\]
Parameters:
  • first_var (DataArray) – First binary variable

  • second_var (DataArray) – Second binary variable

  • sum_over (str) – Dimension to sum over to compute the number of observations

Returns:

Array containing Matthew’s Correlation Coefficient reduced over the sum_over dimension

Return type:

DataArray