Providing Valid Inputs#

PyProbe uses Pydantic for input validation. This exists to ensure that the data provided is in the correct format to prevent unexpected errors. Most of the time, this will happen behind-the-scenes, so you will only notice it if there is a problem. This example is written to demonstrate how these errors may come about.

RawData Validation#

The RawData class is a specific variant of the Result object which only stores data in the standard PyProBE format. Therefore, validation is performed when a RawData object is created to verify this.

If you follow the standard method for importing data into PyProBE, you should never experience these errors, however it is helpful to know that they exist.

We will start with a normal dataset, printing the type that the procedure data is stored in:

[1]:
import pyprobe
import polars as pl

info_dictionary = {
    "Name": "Sample cell",
    "Chemistry": "NMC622",
    "Nominal Capacity [Ah]": 0.04,
    "Cycler number": 1,
    "Channel number": 1,
}
data_directory = "../../../tests/sample_data/neware"

# Create a cell object
cell = pyprobe.Cell(info=info_dictionary)
cell.add_procedure(
    procedure_name="Sample",
    folder_path=data_directory,
    filename="sample_data_neware.parquet",
)
print(type(cell.procedure["Sample"]))
<class 'pyprobe.filters.Procedure'>

The Procedure class inherits from RawData, which has a defined set of required columns (the PyProBE standard format):

[2]:
print(pyprobe.rawdata.required_columns)
['Time [s]', 'Step', 'Event', 'Current [A]', 'Voltage [V]', 'Capacity [Ah]']

Whenever a RawData class (or any of the filters module classes, that inherit from it) are created, the dataframe is checked against these required columns. We will create an example dataframe that is missing columns, which will be identified by the error that is returned.

[3]:
incorrect_dataframe = pl.DataFrame(
    {
        "Time [s]": [1, 2, 3],
        "Voltage [V]": [3.5, 3.6, 3.7],
        "Current [A]": [0.1, 0.2, 0.3],
    }
)
pyprobe.rawdata.RawData(base_dataframe=incorrect_dataframe, info={})
2025-01-22 17:43:53,543 - pyprobe.rawdata - ERROR - Missing required columns: ['Step', 'Event', 'Capacity [Ah]']
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[3], line 8
      1 incorrect_dataframe = pl.DataFrame(
      2     {
      3         "Time [s]": [1, 2, 3],
   (...)
      6     }
      7 )
----> 8 pyprobe.rawdata.RawData(base_dataframe=incorrect_dataframe, info={})

File ~/work/PyProBE/PyProBE/.venv/lib/python3.12/site-packages/pydantic/main.py:214, in BaseModel.__init__(self, **data)
    212 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    213 __tracebackhide__ = True
--> 214 validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    215 if self is not validated_self:
    216     warnings.warn(
    217         'A custom validator is returning a value other than `self`.\n'
    218         "Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.\n"
    219         'See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.',
    220         stacklevel=2,
    221     )

ValidationError: 1 validation error for RawData
base_dataframe
  Value error, Missing required columns: ['Step', 'Event', 'Capacity [Ah]'] [type=value_error, input_value=shape: (3, 3)
┌──...───────┘, input_type=DataFrame]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

You will also see a validation error if you try to create one of these classes with a data object that is not a Polars DataFrame or LazyFrame:

[4]:
incorrect_data_dict = {
    "Time [s]": [1, 2, 3],
    "Voltage [V]": [3.5, 3.6, 3.7],
    "Current [A]": [0.1, 0.2, 0.3],
}
pyprobe.rawdata.RawData(base_dataframe=incorrect_data_dict, info={})
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[4], line 6
      1 incorrect_data_dict = {
      2     "Time [s]": [1, 2, 3],
      3     "Voltage [V]": [3.5, 3.6, 3.7],
      4     "Current [A]": [0.1, 0.2, 0.3],
      5 }
----> 6 pyprobe.rawdata.RawData(base_dataframe=incorrect_data_dict, info={})

File ~/work/PyProBE/PyProBE/.venv/lib/python3.12/site-packages/pydantic/main.py:214, in BaseModel.__init__(self, **data)
    212 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    213 __tracebackhide__ = True
--> 214 validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    215 if self is not validated_self:
    216     warnings.warn(
    217         'A custom validator is returning a value other than `self`.\n'
    218         "Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.\n"
    219         'See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.',
    220         stacklevel=2,
    221     )

ValidationError: 2 validation errors for RawData
base_dataframe.is-instance[LazyFrame]
  Input should be an instance of LazyFrame [type=is_instance_of, input_value={'Time [s]': [1, 2, 3], '...t [A]': [0.1, 0.2, 0.3]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/is_instance_of
base_dataframe.is-instance[DataFrame]
  Input should be an instance of DataFrame [type=is_instance_of, input_value={'Time [s]': [1, 2, 3], '...t [A]': [0.1, 0.2, 0.3]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/is_instance_of

Analysis Module Validation#

You are much more likely to experience validation errors when dealing with the functions and classes in the analysis module. These may require a particular PyProBE object to work.

As an example, the Cycling class requires an Experiment input. This is because it provides calculations based on the cycle() method of the experiment class:

[5]:
experiment_object = cell.procedure["Sample"].experiment("Break-in Cycles")
print(type(experiment_object))
<class 'pyprobe.filters.Experiment'>

The experiment object should return no errors:

[6]:
from pyprobe.analysis.cycling import Cycling

cycling = Cycling(input_data=experiment_object)

However, if I were to filter the object further, I would get an error:

[7]:
cycling = Cycling(input_data=experiment_object.cycle(1))
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[7], line 1
----> 1 cycling = Cycling(input_data=experiment_object.cycle(1))

File ~/work/PyProBE/PyProBE/.venv/lib/python3.12/site-packages/pydantic/main.py:214, in BaseModel.__init__(self, **data)
    212 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    213 __tracebackhide__ = True
--> 214 validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    215 if self is not validated_self:
    216     warnings.warn(
    217         'A custom validator is returning a value other than `self`.\n'
    218         "Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.\n"
    219         'See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.',
    220         stacklevel=2,
    221     )

ValidationError: 1 validation error for Cycling
input_data
  Input should be a valid dictionary or instance of Experiment [type=model_type, input_value=Cycle(base_dataframe=<Laz...hours']}, cycle_info=[]), input_type=Cycle]
    For further information visit https://errors.pydantic.dev/2.10/v/model_type

Functions in the analysis module also contain type validation. This occurs on two levels. First, the inputs to the function are checked. E.g. for the gradient function of the differentiation module, input_data is required as a PyProBE object, and string column names are required for x and y:

[8]:
from pyprobe.analysis import differentiation

gradient = differentiation.gradient(
    input_data=cell.procedure["Sample"].experiment("Break-in Cycles").discharge(-1),
    x="Capacity [Ah]",
    y="Voltage [V]",
)

But if I provide an array to input_data, I will get an error to say that input_data should be one of many PyProBE objects:

[9]:
import numpy as np

gradient = differentiation.gradient(
    input_data=np.zeros((10, 2)), x="Capacity [Ah]", y="Voltage [V]"
)
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[9], line 3
      1 import numpy as np
----> 3 gradient = differentiation.gradient(
      4     input_data=np.zeros((10, 2)), x="Capacity [Ah]", y="Voltage [V]"
      5 )

File ~/work/PyProBE/PyProBE/.venv/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py:38, in update_wrapper_attributes.<locals>.wrapper_function(*args, **kwargs)
     36 @functools.wraps(wrapped)
     37 def wrapper_function(*args, **kwargs):
---> 38     return wrapper(*args, **kwargs)

File ~/work/PyProBE/PyProBE/.venv/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py:111, in ValidateCallWrapper.__call__(self, *args, **kwargs)
    110 def __call__(self, *args: Any, **kwargs: Any) -> Any:
--> 111     res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
    112     if self.__return_pydantic_validator__:
    113         return self.__return_pydantic_validator__(res)

ValidationError: 6 validation errors for gradient
input_data.RawData
  Input should be a valid dictionary or instance of RawData [type=model_type, input_value=array([[0., 0.],
       [..., 0.],
       [0., 0.]]), input_type=ndarray]
    For further information visit https://errors.pydantic.dev/2.10/v/model_type
input_data.Procedure
  Input should be a valid dictionary or instance of Procedure [type=model_type, input_value=array([[0., 0.],
       [..., 0.],
       [0., 0.]]), input_type=ndarray]
    For further information visit https://errors.pydantic.dev/2.10/v/model_type
input_data.Experiment
  Input should be a valid dictionary or instance of Experiment [type=model_type, input_value=array([[0., 0.],
       [..., 0.],
       [0., 0.]]), input_type=ndarray]
    For further information visit https://errors.pydantic.dev/2.10/v/model_type
input_data.Cycle
  Input should be a valid dictionary or instance of Cycle [type=model_type, input_value=array([[0., 0.],
       [..., 0.],
       [0., 0.]]), input_type=ndarray]
    For further information visit https://errors.pydantic.dev/2.10/v/model_type
input_data.Step
  Input should be a valid dictionary or instance of Step [type=model_type, input_value=array([[0., 0.],
       [..., 0.],
       [0., 0.]]), input_type=ndarray]
    For further information visit https://errors.pydantic.dev/2.10/v/model_type
input_data.Result
  Input should be a valid dictionary or instance of Result [type=model_type, input_value=array([[0., 0.],
       [..., 0.],
       [0., 0.]]), input_type=ndarray]
    For further information visit https://errors.pydantic.dev/2.10/v/model_type

Analysis functions will also check that the columns you require for the computation are present in the PyProBE objects provided. As an example, we will call the gradient() method, requesting to differentiate a column that does not exist in the underlying data:

[10]:
gradient = differentiation.gradient(
    input_data=cell.procedure["Sample"].experiment("Break-in Cycles").discharge(-1),
    x="Temperature [C]",
    y="Voltage [V]",
)
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[10], line 1
----> 1 gradient = differentiation.gradient(
      2     input_data=cell.procedure["Sample"].experiment("Break-in Cycles").discharge(-1),
      3     x="Temperature [C]",
      4     y="Voltage [V]",
      5 )

File ~/work/PyProBE/PyProBE/.venv/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py:38, in update_wrapper_attributes.<locals>.wrapper_function(*args, **kwargs)
     36 @functools.wraps(wrapped)
     37 def wrapper_function(*args, **kwargs):
---> 38     return wrapper(*args, **kwargs)

File ~/work/PyProBE/PyProBE/.venv/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py:111, in ValidateCallWrapper.__call__(self, *args, **kwargs)
    110 def __call__(self, *args: Any, **kwargs: Any) -> Any:
--> 111     res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
    112     if self.__return_pydantic_validator__:
    113         return self.__return_pydantic_validator__(res)

File ~/work/PyProBE/PyProBE/pyprobe/analysis/differentiation.py:38, in gradient(input_data, x, y)
     22 """Differentiate smooth data with a finite difference method.
     23
     24 A wrapper of the numpy.gradient function. This method calculates the gradient
   (...)
     35     calculated gradient.
     36 """
     37 # 2. Validate the inputs to the method
---> 38 validator = AnalysisValidator(
     39     input_data=input_data,
     40     required_columns=[x, y],
     41     # required_type not neccessary here as type specified when declaring
     42     # input_data attribute is strict enough
     43 )
     44 # 3. Retrieve the validated columns as numpy arrays
     45 x_data, y_data = validator.variables

File ~/work/PyProBE/PyProBE/.venv/lib/python3.12/site-packages/pydantic/main.py:214, in BaseModel.__init__(self, **data)
    212 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    213 __tracebackhide__ = True
--> 214 validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    215 if self is not validated_self:
    216     warnings.warn(
    217         'A custom validator is returning a value other than `self`.\n'
    218         "Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.\n"
    219         'See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.',
    220         stacklevel=2,
    221     )

ValidationError: 1 validation error for AnalysisValidator
  Value error, Quantities {'Temperature'} not in data. [type=value_error, input_value={'input_data': Step(base_...re [C]', 'Voltage [V]']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error