pyprobe.result module#
A module for the Result class.
- class PolarsColumnCache(base_dataframe)#
Bases:
object
A class to cache columns from a Polars DataFrame.
- Parameters:
base_dataframe (pl.LazyFrame | pl.DataFrame) – The base dataframe to cache columns from.
- property base_dataframe: LazyFrame | DataFrame#
The base dataframe.
- Returns:
The base dataframe.
- Return type:
pl.LazyFrame | pl.DataFrame
- property columns: List[str]#
The columns in the data.
- Returns:
The columns in the data.
- Return type:
List[str]
- property quantities: Set[str]#
The quantities of the data, with unit information removed.
- Returns:
The quantities of the data.
- Return type:
Set[str]
- collect_columns(*columns)#
Collect columns from the base dataframe and add to the cache.
This method will check if the columns are in the cache. If they are not, it will check if they are in the base dataframe. If they are not, it will attempt to convert the column to the requested units and add to the lazyframe.
- Parameters:
*columns (str) – The columns to collect.
- Raises:
ValueError – If the requested columns are not in the base dataframe and cannot be converted.
- Return type:
None
- clear_cache()#
Clear the cache.
- Return type:
None
- property cached_dataframe: DataFrame#
Return the cached dataframe as a Polars DataFrame.
- static get_quantities(column_list)#
The quantities of the data, with unit information removed.
- Parameters:
column_list (List[str]) – The columns to get the quantities of.
- Returns:
The quantities of the data.
- Return type:
Set[str]
- pydantic model Result(*, base_dataframe, info, column_definitions=<factory>)#
Bases:
BaseModel
A class for holding any data in PyProBE.
A Result object is the base type for every data object in PyProBE. This class includes all of the main methods for returning and describing any data in PyProBE.
- Key attributes for returning data:
- Key attributes for describing the data:
info
: A dictionary containing information about the cell.column_definitions
: A dictionary of column definitions.print_definitions()
: Print the column definitions.column_list
: A list of column names.
- Validators:
_load_base_dataframe
»all fields
- Parameters:
base_dataframe (LazyFrame | DataFrame)
info (dict[str, Any | None])
column_definitions (Dict[str, str])
- Return type:
None
- field base_dataframe: LazyFrame | DataFrame [Required]#
The data as a polars DataFrame or LazyFrame.
- Validated by:
_load_base_dataframe
- field info: dict[str, Any | None] [Required]#
Dictionary containing information about the cell.
- Validated by:
_load_base_dataframe
- field column_definitions: Dict[str, str] [Optional]#
A dictionary containing the definitions of the columns in the data.
- Validated by:
_load_base_dataframe
- property live_dataframe: DataFrame#
Return the data as a polars DataFrame.
- property data: DataFrame#
Return the data as a polars DataFrame.
- Returns:
The data as a polars DataFrame.
- Return type:
pl.DataFrame
- Raises:
ValueError – If no data exists for this filter.
- plot()#
This is a wrapper around the pandas plot method. It will performexactly as you would expect the pandas plot method to performwhen called on a DataFrame.
Make plots of Series or DataFrame.
Uses the backend specified by the option
plotting.backend
. By default, matplotlib is used.- Parameters:
data (Series or DataFrame) – The object for which the method is called.
x (label or position, default None) – Only used if data is a DataFrame.
y (label, position or list of label, positions, default None) – Allows plotting of one column versus another. Only used if data is a DataFrame.
kind (str) –
The kind of plot to produce:
’line’ : line plot (default)
’bar’ : vertical bar plot
’barh’ : horizontal bar plot
’hist’ : histogram
’box’ : boxplot
’kde’ : Kernel Density Estimation plot
’density’ : same as ‘kde’
’area’ : area plot
’pie’ : pie plot
’scatter’ : scatter plot (DataFrame only)
’hexbin’ : hexbin plot (DataFrame only)
ax (matplotlib axes object, default None) – An axes of the current figure.
subplots (bool or sequence of iterables, default False) –
Whether to group columns into subplots:
False
: No subplots will be usedTrue
: Make separate subplots for each column.sequence of iterables of column labels: Create a subplot for each group of columns. For example [(‘a’, ‘c’), (‘b’, ‘d’)] will create 2 subplots: one with columns ‘a’ and ‘c’, and one with columns ‘b’ and ‘d’. Remaining columns that aren’t specified will be plotted in additional subplots (one per column).
Added in version 1.5.0.
sharex (bool, default True if ax is None else False) – In case
subplots=True
, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in; Be aware, that passing in both an ax andsharex=True
will alter all x axis labels for all axis in a figure.sharey (bool, default False) – In case
subplots=True
, share y axis and set some y axis labels to invisible.layout (tuple, optional) – (rows, columns) for the layout of subplots.
figsize (a tuple (width, height) in inches) – Size of a figure object.
use_index (bool, default True) – Use index as ticks for x axis.
title (str or list) – Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot.
grid (bool, default None (matlab style default)) – Axis grid lines.
legend (bool or {'reverse'}) – Place legend on axis subplots.
style (list or dict) – The matplotlib line style per column.
logx (bool or 'sym', default False) – Use log scaling or symlog scaling on x axis.
logy (bool or 'sym' default False) – Use log scaling or symlog scaling on y axis.
loglog (bool or 'sym', default False) – Use log scaling or symlog scaling on both x and y axes.
xticks (sequence) – Values to use for the xticks.
yticks (sequence) – Values to use for the yticks.
xlim (2-tuple/list) – Set the x limits of the current axes.
ylim (2-tuple/list) – Set the y limits of the current axes.
xlabel (label, optional) –
Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the x-column name for planar plots.
Changed in version 2.0.0: Now applicable to histograms.
ylabel (label, optional) –
Name to use for the ylabel on y-axis. Default will show no ylabel, or the y-column name for planar plots.
Changed in version 2.0.0: Now applicable to histograms.
rot (float, default None) – Rotation for ticks (xticks for vertical, yticks for horizontal plots).
fontsize (float, default None) – Font size for xticks and yticks.
colormap (str or matplotlib colormap object, default None) – Colormap to select colors from. If string, load colormap with that name from matplotlib.
colorbar (bool, optional) – If True, plot colorbar (only relevant for ‘scatter’ and ‘hexbin’ plots).
position (float) – Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center).
table (bool, Series or DataFrame, default False) – If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.
yerr (DataFrame, Series, array-like, dict and str) – See Plotting with Error Bars for detail.
xerr (DataFrame, Series, array-like, dict and str) – Equivalent to yerr.
stacked (bool, default False in line and bar plots, and True in area plot) – If True, create stacked plot.
secondary_y (bool or sequence, default False) – Whether to plot on the secondary y-axis if a list/tuple, which columns to plot on secondary y-axis.
mark_right (bool, default True) – When using a secondary_y axis, automatically mark the column labels with “(right)” in the legend.
include_bool (bool, default is False) – If True, boolean values can be plotted.
backend (str, default None) – Backend to use instead of the backend specified in the option
plotting.backend
. For instance, ‘matplotlib’. Alternatively, to specify theplotting.backend
for the whole session, setpd.options.plotting.backend
.**kwargs – Options to pass to matplotlib plotting method.
- Returns:
If the backend is not the default matplotlib one, the return value will be the object returned by the backend.
- Return type:
matplotlib.axes.Axes
or numpy.ndarray of them
Notes
See matplotlib documentation online for more on this subject
If kind = ‘bar’ or ‘barh’, you can specify relative alignments for bar plot layout by position keyword. From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center)
Examples
For Series:
For DataFrame:
For SeriesGroupBy:
For DataFrameGroupBy:
- hvplot(custom_plots={}, **metadata)#
HvPlot is a library for creating fast and interactive plots.
This method requires the hvplot library to be installed as an optional dependency. You can install it with PyProBE by running
pip install 'PyProBE-Data[hvplot]'
, or install it seperately withpip install hvplot
.The default backend is bokeh, which can be changed by setting the backend with
hvplot.extension('matplotlib')
orhvplot.extension('plotly')
.The plotting method: df.hvplot(…) creates a plot similarly to the familiar Pandas df.plot method.
For more detailed options use a specific plotting method, e.g. df.hvplot.line.
Reference: https://hvplot.holoviz.org/reference/index.html
- xstring, optional
Field name(s) to draw x-positions from. If not specified, the index is used.
- ystring or list, optional
Field name(s) to draw y-positions from. If not specified, all numerical fields are used.
- kindstring, optional
The kind of plot to generate, e.g. ‘area’, ‘bar’, ‘line’, ‘scatter’ etc. To see the available plots run print(df.hvplot.__all__).
- **kwdsoptional
Additional keywords arguments are documented in hvplot.help(‘scatter’) or similar depending on the kind of plot.
A Holoviews object. You can print the object to study its composition and run hv.help on the object to learn more about its parameters and options.
import hvplot.pandas import pandas as pd df = pd.DataFrame( { "actual": [100, 150, 125, 140, 145, 135, 123], "forecast": [90, 160, 125, 150, 141, 141, 120], "numerical": [1.1, 1.9, 3.2, 3.8, 4.3, 5.0, 5.5], "date": pd.date_range("2022-01-03", "2022-01-09"), "string": ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"], }, ) line = df.hvplot.line( x="numerical", y=["actual", "forecast"], ylabel="value", legend="bottom", height=500, color=["steelblue", "teal"], alpha=0.7, line_width=5, ) line
You can can add markers to a line plot by overlaying with a scatter plot.
markers = df.hvplot.scatter( x="numerical", y=["actual", "forecast"], color=["#f16a6f", "#1e85f7"], size=50 ) line * markers
Please note that you can pass widgets or reactive functions as arguments instead of literal values, c.f. https://hvplot.holoviz.org/user_guide/Widgets.html.
- get(*column_names)#
Return one or more columns of the data as separate 1D numpy arrays.
- Parameters:
column_names (str) – The column name(s) to return.
- Returns:
The column(s) as numpy array(s).
- Return type:
Union[NDArray[np.float64], Tuple[NDArray[np.float64], …]]
- Raises:
ValueError – If no column names are provided.
ValueError – If a column name is not in the data.
- property contains_lazyframe: bool#
Return whether the data is a LazyFrame.
- Returns:
True if the data is a LazyFrame, False otherwise.
- Return type:
bool
- get_only(column_name)#
Return a single column of the data as a numpy array.
- Parameters:
column_name (str) – The column name to return.
- Returns:
The column as a numpy array.
- Return type:
NDArray[np.float64]
- Raises:
ValueError – If the column name is not in the data.
ValueError – If no column name is provided.
- property quantities: Set[str]#
The quantities of the data, with unit information removed.
- Returns:
The quantities of the data.
- Return type:
List[str]
- property column_list: List[str]#
The columns in the data.
- Returns:
The columns in the data.
- Return type:
List[str]
- define_column(column_name, definition)#
Define a new column when it is added to the dataframe.
- Parameters:
column_name (str) – The name of the column.
definition (str) – The definition of the quantity stored in the column
- Return type:
None
- print_definitions()#
Print the definitions of the columns stored in this result object.
- Return type:
None
- clean_copy(dataframe=None, column_definitions=None)#
Create a copy of the result object with info dictionary but without data.
- Parameters:
dataframe (Optional[Union[pl.DataFrame, pl.LazyFrame]) – The data to include in the new Result object.
column_definitions (Optional[Dict[str, str]]) – The definitions of the columns in the new result object.
- Returns:
A new result object with the specified data.
- Return type:
- add_new_data_columns(new_data, date_column_name)#
Add new data columns to the result object.
The data must be time series data with a date column. The new data is joined to the base dataframe on the date column, and the new data columns are interpolated to fill in missing values.
- Parameters:
new_data (pl.DataFrame | pl.LazyFrame) – The new data to add to the result object.
date_column_name (str) – The name of the column in the new data containing the date.
- Raises:
ValueError – If the base dataframe has no date column.
- Return type:
None
- join(other, on, how='inner', coalesce=True)#
Join two Result objects on a column. A wrapper around the polars join method.
This will extend the data in the Result object horizontally. The column definitions of the two Result objects are combined, if there are any conflicts the column definitions of the calling Result object will take precedence.
- Parameters:
other (Result) – The other Result object to join with.
on (Union[str, List[str]]) – The column(s) to join on.
how (str) – The type of join to perform. Default is ‘inner’.
coalesce (bool) – Whether to coalesce the columns. Default is True.
- Return type:
None
- extend(other, concat_method='diagonal')#
Extend the data in this Result object with the data in another Result object.
This method will concatenate the data in the two Result objects, with the Result object calling the method above the other Result object. The column definitions of the two Result objects are combined, if there are any conflicts the column definitions of the calling Result object will take precedence.
- classmethod build(data_list, info)#
Build a Result object from a list of dataframes.
- Parameters:
data_list (List[List[pl.LazyFrame | pl.DataFrame | Dict]]) – The data to include in the new result object. The first index indicates the cycle and the second index indicates the step.
info (Dict[str, Optional[str | int | float]]) – A dict containing test info.
- Returns:
A new result object with the specified data.
- Return type:
- combine_results(results, concat_method='diagonal')#
Combine multiple Result objects into a single Result object.
This method should be used to combine multiple Result objects that have different entries in their info dictionaries. The info dictionaries of the Result objects will be integrated into the dataframe of the new Result object