map_values_to_nearest_coordinate_index#
- rojak.core.indexing.map_values_to_nearest_coordinate_index(series: dd.Series | pd.Series, coordinate: NDArray[T], valid_window: timedelta64 | number | inexact | None = None) dd.Series | pd.Series[int] [source]#
Assuming the coordinate is a regular grid, compute the closest index that the values in the series correspond to.
- Parameters:
series (dd.Series | pd.Series) – Data which corresponds to a given value in the coordinate array and needs to be mapped to the closest index in the coordinate array
coordinate (NDArray[T]) – Array which is to be indexed into
valid_window (timedelta64 | number | inexact | None) – Symmetric window (e.g. +- 3 hrs => np.timedelta64(3, “h”))
- Returns:
Series which contains the closest index the data maps to
- Return type:
dd.Series | pd.Series[int]
>>> import pandas as pd >>> time_coordinate = np.arange(np.datetime64("2018-08-01"), np.datetime64("2018-08-03"), np.timedelta64(6, "h")) >>> data_series = pd.Series([np.datetime64("2018-08-02T16:06"), np.datetime64("2018-08-01T07:37"), np.datetime64("2018-08-02T09:12"), np.datetime64("2018-08-02T07:27"), np.datetime64("2018-08-02T19:09")]) >>> map_values_to_nearest_coordinate_index(data_series, time_coordinate, valid_window=np.timedelta64(3, "h")) 0 7 1 1 2 6 3 5 4 7 dtype: int64 >>> map_values_to_nearest_coordinate_index(data_series, time_coordinate, valid_window=np.timedelta64(2, "h")) Traceback (most recent call last): NotImplementedError: Function currently only supports regular grids with a symmetric window specified. And the window must correspond to half of the grid spacing
Not specifying the valid window, forces the minimum and maximum of the data in the Series to be strictly within the range of the coordinate
>>> map_values_to_nearest_coordinate_index(data_series, time_coordinate) Traceback (most recent call last): ValueError: Values in series must be within the range of the coordinate
By extending the time coordiante to include the last 6 hours on 2018-08-03 places the 2018-08-02T19:09 within the range of the coordinate
>>> time_coordinate = np.arange(np.datetime64("2018-08-01"), np.datetime64("2018-08-03T06"), np.timedelta64(6, "h")) >>> map_values_to_nearest_coordinate_index(data_series, time_coordinate) 0 7 1 1 2 6 3 5 4 7 dtype: int64