Contributing to the Analysis Module#

pyprobe.analysis classes are classes that perform further analysis of the data.

This document describes the standard format to be used for all PyProBE analysis functions. Constructing your method in this way ensures compatibility with the rest of the PyProBE package, while keeping your code clean and easy to read.

Functions#

All calculations should be conducted inside methods. These are called by the user with any additional information required to perform the analysis, and always return Result objects. We will use the gradient() method as an example.

It is recommended to use pydantic’s validate_call function decorator to ensure that objects of the correct type are being passed to your method. This provides the user with an error message if they have not called the method correctly, simplifying debugging.

The steps to write a method are as follows:

  1. Define the method and its input parameters. One of these is likely to be a PyProBE object, which you can confirm has the necessary columns for your method with step 2.

  2. Check that inputs to the method are valid with the AnalysisValidator class. Provide the class the input data to the method, the columns that are required for the computation to be performed and the required data type for input_data`.

  3. If needed, you can retrieve the columns specified in the required_columns field as numpy arrays by accessing the variables attribute of the instance of AnalysisValidator.

  4. Perform the required computation. In this example, this is done with np.gradient(), a numpy built-in method. It is encouraged to perform as little of the underlying computation as possible in the analysis class method. Instead, write simple functions in the pyprobe.analysis.base module that process only numpy arrays. This keeps the mathematical underpinnings of PyProBE analysis methods readable, portable and testable.

  5. Create a result object to return. This is easily done with the clean_copy() method, which provides a copy of the input data including the info attribute but replacing the data stored with a dataframe created from the provided dictionary.

  6. Add column definitions to the created result object.

  7. Return the result object.

 1@validate_call
 2def gradient(  # 1. Define the method
 3    input_data: PyProBEDataType,
 4    x: str,
 5    y: str,
 6) -> Result:
 7    """Differentiate smooth data with a finite difference method.
 8
 9    A wrapper of the numpy.gradient function. This method calculates the gradient
10    of the data in the y column with respect to the data in the x column.
11
12    Args:
13        input_data:
14            The input data PyProBE object for the differentiation
15        x: The name of the x variable.
16        y: The name of the y variable.
17
18    Returns:
19        A result object containing the columns, `x`, `y` and the
20        calculated gradient.
21    """
22    # 2. Validate the inputs to the method
23    validator = AnalysisValidator(
24        input_data=input_data,
25        required_columns=[x, y],
26        # required_type not neccessary here as type specified when declaring
27        # input_data attribute is strict enough
28    )
29    # 3. Retrieve the validated columns as numpy arrays
30    x_data, y_data = validator.variables
31
32    # 4. Perform the computation
33    gradient_title = f"d({y})/d({x})"
34    gradient_data = np.gradient(y_data, x_data)
35
36    # 5. Create a Result object to store the results
37    gradient_result = input_data.clean_copy(
38        pl.DataFrame({x: x_data, y: y_data, gradient_title: gradient_data})
39    )
40    # 6. Define the column definitions for the Result object
41    gradient_result.column_definitions = {
42        x: input_data.column_definitions[x],
43        y: input_data.column_definitions[y],
44        gradient_title: "The calculated gradient.",
45    }
46    # 7. Return the Result object
47    return gradient_result

Base#

The pyprobe.analysis.base module exists as a repository for functions to work in the rest of the analysis module. Often with data analysis code, it is tempting to include data manipulation (forming arrays, dataframes etc. from your standard data format) alongside calculations. By keeping the data manipulation inside the methods of classes in the pyprobe.analysis and calculations in the base submodule, these functions remain more readable, testable and portable.

base module functions should be defined as simply as possible, accepting and returning only arrays and floating-point numbers, with clearly defined variables. A good example is the calc_electrode_capacities() function in the degradation_mode_analysis_functions module:

 1def calc_electrode_capacities(
 2    x_pe_lo: float,
 3    x_pe_hi: float,
 4    x_ne_lo: float,
 5    x_ne_hi: float,
 6    cell_capacity: float,
 7) -> Tuple[float, float, float]:
 8    """Calculate the electrode capacities.
 9
10    Args:
11        x_pe_lo (float): The cathode stoichiometry at lowest cell SOC.
12        x_pe_hi (float): The cathode stoichiometry at highest cell SOC.
13        x_ne_lo (float): The anode stoichiometry at lowest cell SOC.
14        x_ne_hi (float): The anode stoichiometry at highest cell SOC.
15
16    Returns:
17        Tuple[float, float, float]:
18            - NDArray: The cathode capacity.
19            - NDArray: The anode capacity.
20            - NDArray: The lithium inventory.
21    """
22    pe_capacity = cell_capacity / (x_pe_lo - x_pe_hi)
23    ne_capacity = cell_capacity / (x_ne_hi - x_ne_lo)
24    li_inventory = (pe_capacity * x_pe_lo) + (ne_capacity * x_ne_lo)
25    return pe_capacity, ne_capacity, li_inventory