Skip to content

timeseries

main.timeseries ¤

Timeseries for generating ProCAT plots.

Functions¤

get_capacity_timeseries(start_date, end_date) ¤

Get the timeseries data for aggregated user capacities.

A user may have multiple capacity entries associated. In this case, we assign the 'end date' for the capacity entry as the start date of the next capacity. If there is no subsequent capacity entry, the 'end date' is the end of the plotting period.

Parameters:

Name Type Description Default
start_date datetime

datetime object representing the start of the plotting period

required
end_date datetime

datetime object representing the end of the plotting period

required

Returns:

Type Description
Series[float]

Pandas series of aggregated capacities with date range as index.

Source code in main/timeseries.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
def get_capacity_timeseries(
    start_date: datetime, end_date: datetime
) -> pd.Series[float]:
    """Get the timeseries data for aggregated user capacities.

    A user may have multiple capacity entries associated. In this case, we assign the
    'end date' for the capacity entry as the start date of the next capacity. If there
    is no subsequent capacity entry, the 'end date' is the end of the plotting period.

    Args:
        start_date: datetime object representing the start of the plotting period
        end_date: datetime object representing the end of the plotting period

    Returns:
        Pandas series of aggregated capacities with date range as index.
    """
    dates = pd.bdate_range(
        pd.Timestamp(start_date), pd.Timestamp(end_date), inclusive="left"
    )
    # if multiple capacities for a user, end_date is start_date of next capacity object
    # if no subsequent capacity, then end_date is plotting period end_date
    capacities = list(
        models.Capacity.objects.filter(start_date__lte=end_date.date())
        .annotate(
            end_date=Window(
                expression=Lead("start_date"),  # get start date of next capacity
                order_by=F("start_date").asc(),  # orders by ascending start date
                partition_by="user__username",
            )
        )
        .annotate(end_date=Coalesce("end_date", end_date.date()))
    )

    # initialize timeseries
    timeseries = pd.Series(0.0, index=dates)
    for capacity in capacities:
        timeseries = update_timeseries(timeseries, capacity, "value")

    return timeseries

get_cost_recovery_timeseries(dates) ¤

Get the cost recovery timeseries for the previous year.

For each month in the past year, this function aggregates all monthly charges and divides this by the daily rate (dependent on funding source) and the number of working days. This value is summed across all funding sources and added to the timeseries.

Parameters:

Name Type Description Default
dates list[tuple[date, date]]

list of tuples (from oldest to most recent) containing dates for all months of the previous year; each tuple contains two dates for the first and last date of the month

required

Returns:

Type Description
Series[float]

Tuple of Pandas series containing cost recovery timeseries data and a list of

list[float]

monthly totals.

Source code in main/timeseries.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def get_cost_recovery_timeseries(
    dates: list[tuple[date, date]],
) -> tuple[pd.Series[float], list[float]]:
    """Get the cost recovery timeseries for the previous year.

    For each month in the past year, this function aggregates all monthly charges and
    divides this by the daily rate (dependent on funding source) and the number of
    working days. This value is summed across all funding sources and added to the
    timeseries.

    Args:
        dates: list of tuples (from oldest to most recent) containing dates for all
            months of the previous year; each tuple contains two dates for the first
            and last date of the month

    Returns:
        Tuple of Pandas series containing cost recovery timeseries data and a list of
        monthly totals.
    """
    date_range = pd.bdate_range(
        start=dates[0][0],
        end=dates[-1][1],
        inclusive="both",
    )
    # initialize timeseries
    timeseries = pd.Series(0.0, index=date_range)

    # store monthly totals for bar plot
    monthly_totals = []

    for month in dates:
        month_dates = pd.bdate_range(start=month[0], end=month[1], inclusive="both")
        n_working_days = len(month_dates)
        monthly_charges = models.MonthlyCharge.objects.filter(date=month[0])
        monthly_total = monthly_charges.aggregate(Sum("amount"))["amount__sum"]

        # group by funding
        charges = (
            monthly_charges.values("funding")
            .annotate(total=Sum("amount"))  # get total Amount across monthly charges
            .annotate(  # divide total by daily rate and working days
                recovered=ExpressionWrapper(
                    F("total") * Value(1.0) / F("funding__daily_rate") / n_working_days,
                    output_field=FloatField(),
                )
            )
        )

        # aggregate across all funding sources (defaults to 0 if None)
        funding_total = charges.aggregate(Sum("recovered"))["recovered__sum"] or 0
        timeseries[month_dates] += funding_total  # Update timeseries

        # record total for the month
        monthly_totals.append(float(monthly_total) if monthly_total else 0.0)

    return timeseries, monthly_totals

get_effort_timeseries(start_date, end_date) ¤

Get the timeseries data for aggregated project effort.

Parameters:

Name Type Description Default
start_date datetime

datetime object representing the start of the plotting period

required
end_date datetime

datetime object representing the end of the plotting period

required

Returns:

Type Description
Series[float]

Pandas series of aggregated effort with date range as index.

Source code in main/timeseries.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def get_effort_timeseries(start_date: datetime, end_date: datetime) -> pd.Series[float]:
    """Get the timeseries data for aggregated project effort.

    Args:
        start_date: datetime object representing the start of the plotting period
        end_date: datetime object representing the end of the plotting period

    Returns:
        Pandas series of aggregated effort with date range as index.
    """
    dates = pd.bdate_range(
        pd.Timestamp(start_date), pd.Timestamp(end_date), inclusive="left"
    )
    # filter Projects to ensure dates exist and overlap with timeseries dates
    projects = list(
        models.Project.objects.filter(
            start_date__lt=end_date.date(),
            end_date__gte=start_date.date(),
            start_date__isnull=False,
            end_date__isnull=False,
        )
    )
    projects = [project for project in projects if project.funding_source.exists()]

    # initialize timeseries
    timeseries = pd.Series(0.0, index=dates)
    for project in projects:
        timeseries = update_timeseries(timeseries, project, "effort_per_day")

    return timeseries

update_timeseries(timeseries, object, attr_name) ¤

Update the initialized timeseries with value from a Model object.

The dates for the Model are used to index the timeseries. The value added is specified by the attr_name.

TODO: For advanced capacity planning, keep separate Project and User timeseries so these can be plotted individually.

Parameters:

Name Type Description Default
timeseries Series[float]

the Pandas series containing the Project or Capacity data with the dates of the plotting period as the index

required
object Project | Capacity

the Project or Capacity object used to update the timeseries

required
attr_name str

the name of the attribute representing the value to add to the timeseries (i.e. 'value' or 'effort_per_day')

required

Returns:

Type Description
Series[float]

Pandas series containing updated timeseries data.

Source code in main/timeseries.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def update_timeseries(
    timeseries: pd.Series[float],
    object: models.Project | models.Capacity,
    attr_name: str,
) -> pd.Series[float]:
    """Update the initialized timeseries with value from a Model object.

    The dates for the Model are used to index the timeseries. The value added is
    specified by the attr_name.

    TODO: For advanced capacity planning, keep separate Project and User timeseries
    so these can be plotted individually.

    Args:
        timeseries: the Pandas series containing the Project or Capacity data with
            the dates of the plotting period as the index
        object: the Project or Capacity object used to update the timeseries
        attr_name: the name of the attribute representing the value to add to the
            timeseries (i.e. 'value' or 'effort_per_day')

    Returns:
        Pandas series containing updated timeseries data.
    """
    object_dates = pd.bdate_range(
        start=object.start_date,
        end=object.end_date,  # type: ignore[union-attr]
        inclusive="left",
    )
    # get intersection between the Model dates and the plotting dates
    index = timeseries.index.intersection(object_dates)
    timeseries[index] += float(getattr(object, attr_name))
    return timeseries