Skip to content

reporting

measurement.reporting ¤

Classes¤

Measurement ¤

Bases: MeasurementBase

Class to store the measurements and their validation status.

This class holds the value of a given variable and station at a specific time, as well as auxiliary information such as maximum and minimum values, depth and direction, for vector quantities. All of these have a raw version where a backup of the original data is kept, should this change at any point.

Flags to monitor its validation status, if the data is active (and therefore can be used for reporting) and if it has actually been used for that is also included.

Attributes:

Name Type Description
depth int

Depth of the measurement.

direction Decimal

Direction of the measurement, useful for vector quantities.

raw_value Decimal

Original value of the measurement.

raw_maximum Decimal

Original maximum value of the measurement.

raw_minimum Decimal

Original minimum value of the measurement.

raw_direction Decimal

Original direction of the measurement.

raw_depth int

Original depth of the measurement.

is_validated bool

Flag to indicate if the measurement has been validated.

is_active bool

Flag to indicate if the measurement is active. An inactive measurement is not used for reporting

Attributes¤
overwritten: bool property ¤

Indicates if any of the values associated to the entry have been overwritten.

Returns:

Name Type Description
bool bool

True if any raw field is different to the corresponding standard field.

raws: tuple[str, ...] property ¤

Return the raw fields of the measurement.

Returns:

Type Description
tuple[str, ...]

tuple[str]: Tuple with the names of the raw fields of the measurement.

Functions¤
clean() ¤

Check consistency of validation, reporting and backs-up values.

Source code in measurement/models.py
250
251
252
253
254
255
256
257
258
259
260
def clean(self) -> None:
    """Check consistency of validation, reporting and backs-up values."""
    # Check consistency of validation
    if not self.is_validated and not self.is_active:
        raise ValidationError("Only validated entries can be declared as inactive.")

    # Backup values to raws, if needed
    for r in self.raws:
        value = getattr(self, r.removeprefix("raw_"))
        if value and not getattr(self, r):
            setattr(self, r, value)

Report ¤

Bases: MeasurementBase

Holds the different reporting data.

It also keeps track of which data has already been used when creating the reports.

Attributes:

Name Type Description
report_type str

Type of report. It can be hourly, daily or monthly.

completeness Decimal

Completeness of the report. Eg. a daily report with 24 hourly measurements would have a completeness of 100%.

Functions¤
clean() ¤

Validate that the report type and use of the data is consistent.

Source code in measurement/models.py
138
139
140
141
142
143
144
145
146
147
def clean(self) -> None:
    """Validate that the report type and use of the data is consistent."""
    if self.report_type == ReportType.HOURLY:
        self.time = self.time.replace(minute=0, second=0, microsecond=0)
    elif self.report_type == ReportType.DAILY:
        self.time = self.time.replace(hour=0, minute=0, second=0, microsecond=0)
    elif self.report_type == ReportType.MONTLY:
        self.time = self.time.replace(
            day=1, hour=0, minute=0, second=0, microsecond=0
        )

Station ¤

Bases: PermissionsBase

Main representation of a station, including several metadata.

Attributes:

Name Type Description
visibility str

Visibility level of the object, including an "internal" option.

station_id int

Primary key.

station_code str

Unique code for the station.

station_name str

Brief description of the station.

station_type StationType

Type of the station.

country Country

Country where the station is located.

region Region

Region within the Country where the station is located.

ecosystem Ecosystem

Ecosystem associated with the station.

institution Institution

Institutional partner responsible for the station.

place_basin PlaceBasin

Place-Basin association.

station_state bool

Is the station operational?

timezone str

Timezone of the station.

delta_t DeltaT

Interval of data adquisition (in minutes).

station_latitude Decimal

Latitude of the station, in degrees [-90 to 90].

station_longitude Decimal

Longitude of the station, in degrees [-180 to 180].

station_altitude int

Altitude of the station.

influence_km Decimal

Area of influence in km2.

station_file ImageField

Photography of the station.

station_external bool

Is the station external?

Functions¤
__str__() ¤

Return the station code.

Source code in station/models.py
474
475
476
def __str__(self) -> str:
    """Return the station code."""
    return str(self.station_code)
clean() ¤

Set the default delta_t value if not provided.

Source code in station/models.py
482
483
484
485
486
def clean(self) -> None:
    """Set the default delta_t value if not provided."""
    super().clean()
    if not self.delta_t:
        self.delta_t = DeltaT.get_default()
get_absolute_url() ¤

Return the absolute url of the station.

Source code in station/models.py
478
479
480
def get_absolute_url(self) -> str:
    """Return the absolute url of the station."""
    return reverse("station:station_detail", kwargs={"pk": self.pk})
set_object_permissions() ¤

Set object-level permissions.

This method is called by the save method of the model to set the object-level permissions based on the visibility level of the object. In addition to the standard permissions for the station, the view_measurements permission is set which controls who can view the measurements associated to the station.

Source code in station/models.py
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
def set_object_permissions(self) -> None:
    """Set object-level permissions.

    This method is called by the save method of the model to set the object-level
    permissions based on the visibility level of the object. In addition to the
    standard permissions for the station, the view_measurements permission is set
    which controls who can view the measurements associated to the station.
    """
    super().set_object_permissions()

    standard_group = Group.objects.get(name="Standard")
    anonymous_user = get_anonymous_user()

    # Assign view_measurements permission based on permissions level
    if self.visibility == "public":
        assign_perm("view_measurements", standard_group, self)
        assign_perm("view_measurements", anonymous_user, self)
        if self.owner:
            remove_perm("view_measurements", self.owner, self)
    elif self.visibility == "internal":
        assign_perm("view_measurements", standard_group, self)
        remove_perm("view_measurements", anonymous_user, self)
        if self.owner:
            remove_perm("view_measurements", self.owner, self)
    elif self.visibility == "private":
        remove_perm("view_measurements", standard_group, self)
        remove_perm("view_measurements", anonymous_user, self)
        if self.owner:
            assign_perm("view_measurements", self.owner, self)

Variable ¤

Bases: PermissionsBase

A variable with a physical meaning.

Such as precipitation, wind speed, wind direction, soil moisture, including the associated unit. It also includes metadata to help identify what is a reasonable value for the data, to flag outliers and to help with the validation process.

The nature of the variable can be one of the following:

  • sum: Cumulative value over a period of time.
  • average: Average value over a period of time.
  • value: One-off value.

Attributes:

Name Type Description
variable_id AutoField

Primary key.

variable_code CharField

Code of the variable, eg. airtemperature.

name CharField

Human-readable name of the variable, eg. Air temperature.

unit ForeignKey

Unit of the variable.

maximum DecimalField

Maximum value allowed for the variable.

minimum DecimalField

Minimum value allowed for the variable.

diff_error DecimalField

If two sequential values in the time-series data of this variable differ by more than this value, the validation process can mark this with an error flag.

outlier_limit DecimalField

The statistical deviation for defining outliers, in times the standard deviation (sigma).

null_limit DecimalField

The max % of null values (missing, caused by e.g. equipment malfunction) allowed for hourly, daily, monthly data. Cumulative values are not deemed trustworthy if the number of missing values in a given period is greater than the null_limit.

nature CharField

Nature of the variable, eg. if it represents a one-off value, the average over a period of time or the cumulative value over a period

Attributes¤
is_cumulative: bool property ¤

Return True if the nature of the variable is sum.

Functions¤
__str__() ¤

Return the string representation of the object.

Source code in variable/models.py
165
166
167
def __str__(self) -> str:
    """Return the string representation of the object."""
    return str(self.name)
clean() ¤

Validate the model fields.

Source code in variable/models.py
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
def clean(self) -> None:
    """Validate the model fields."""
    if self.maximum < self.minimum:
        raise ValidationError(
            {
                "maximum": "The maximum value must be greater than the minimum "
                "value."
            }
        )
    if not self.variable_code.isidentifier():
        raise ValidationError(
            {
                "variable_code": "The variable code must be a valid Python "
                "identifier. Only letters, numbers and underscores are allowed, and"
                " it cannot start with a number."
            }
        )
    return super().clean()
get_absolute_url() ¤

Get the absolute URL of the object.

Source code in variable/models.py
169
170
171
def get_absolute_url(self) -> str:
    """Get the absolute URL of the object."""
    return reverse("variable:variable_detail", kwargs={"pk": self.pk})

Functions¤

calculate_reports(data, station, variable, operation, period) ¤

Calculates the report for the chosen days.

Parameters:

Name Type Description Default
data DataFrame

The dataframe with the data.

required
station str

The name of the station.

required
variable str

The name of the variable.

required
operation str

Agreggation operation to perform on the data when calculating the report.

required
period Decimal

The period of the data in minutes.

required

Returns:

Type Description
DataFrame

A dataframe with the hourly, daily and monthly reports.

Source code in measurement/reporting.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def calculate_reports(
    data: pd.DataFrame, station: str, variable: str, operation: str, period: Decimal
) -> pd.DataFrame:
    """Calculates the report for the chosen days.

    Args:
        data: The dataframe with the data.
        station: The name of the station.
        variable: The name of the variable.
        operation: Agreggation operation to perform on the data when calculating the
            report.
        period: The period of the data in minutes.

    Returns:
        A dataframe with the hourly, daily and monthly reports.
    """
    cols = ["time", "value"]
    if "maximum" in data.columns:
        cols.append("maximum")
    if "minimum" in data.columns:
        cols.append("minimum")

    # Calculate the reports
    hourly = data[cols].resample("H", on="time").agg(operation)
    daily = hourly.resample("D").agg(operation)
    monthly = daily.resample("MS").agg(operation)

    # Find the completeness of the data
    per_hour = 60 / period
    per_day = 24
    per_month = monthly.index.to_series().apply(
        lambda t: pd.Period(t, freq="S").days_in_month
    )
    hourly["completeness"] = (
        data[["time", "value"]].resample("H", on="time").count() / per_hour * 100
    )
    daily["completeness"] = hourly["value"].resample("D").count() / per_day * 100
    monthly["completeness"] = daily["value"].resample("MS").count() / per_month * 100

    # Put everything together
    hourly["report_type"] = "hourly"
    daily["report_type"] = "daily"
    monthly["report_type"] = "monthly"

    report = pd.concat([hourly, daily, monthly])
    report["station"] = station
    report["variable"] = variable

    return report

get_data_to_report(station, variable, start_time, end_time) ¤

Retrieves data to be reported about.

It enforces to retrieve only active measurements and to use the station timezone.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable str

Variable of interest.

required
start_time datetime

Start time.

required
end_time datetime

End time.

required

Returns:

Type Description
DataFrame

A dataframe with the data to report about.

Source code in measurement/reporting.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
def get_data_to_report(
    station: str,
    variable: str,
    start_time: datetime,
    end_time: datetime,
) -> pd.DataFrame:
    """Retrieves data to be reported about.

    It enforces to retrieve only active measurements and to use the station timezone.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.

    Returns:
        A dataframe with the data to report about.
    """
    tz = zoneinfo.ZoneInfo(Station.objects.get(station_code=station).timezone)

    return pd.DataFrame.from_records(
        Measurement.objects.filter(
            station__station_code=station,
            variable__variable_code=variable,
            time__gte=start_time.replace(tzinfo=tz),
            time__lte=end_time.replace(tzinfo=tz),
            is_active=True,
        ).values()
    )

get_report_data_from_db(station, variable, start_time, end_time, report_type) ¤

Retrieves the report data from the database.

Time is set to the station timezone and the time range is inclusive of both start and end times.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable str

Variable of interest.

required
start_time str

Start time.

required
end_time str

End time.

required
report_type str

Type of report to retrieve.

required

Returns:

Type Description
DataFrame

A dataframe with the report data.

Source code in measurement/reporting.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
def get_report_data_from_db(
    station: str,
    variable: str,
    start_time: str,
    end_time: str,
    report_type: str,
) -> pd.DataFrame:
    """Retrieves the report data from the database.

    Time is set to the station timezone and the time range is inclusive of both
    start and end times.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.
        report_type: Type of report to retrieve.

    Returns:
        A dataframe with the report data.
    """
    start_time_, end_time_ = reformat_dates(station, start_time, end_time)

    if report_type == "measurement":
        data = pd.DataFrame.from_records(
            Measurement.objects.filter(
                station__station_code=station,
                variable__variable_code=variable,
                time__gte=start_time_,
                time__lte=end_time_,
            ).values()
        )
        raw_cols = [col for col in data.columns if col.startswith("raw_")]
        normal = [col.strip("raw_") for col in raw_cols]
        data = data.drop(columns=normal).rename(columns=dict(zip(raw_cols, normal)))

    elif report_type == "validated":
        data = pd.DataFrame.from_records(
            Measurement.objects.filter(
                station__station_code=station,
                variable__variable_code=variable,
                time__gte=start_time_,
                time__lte=end_time_,
                is_validated=True,
                is_active=True,
            ).values()
        )
        raw_cols = [col for col in data.columns if col.startswith("raw_")]
        data = data.drop(columns=raw_cols)

    else:
        data = pd.DataFrame.from_records(
            Report.objects.filter(
                station__station_code=station,
                variable__variable_code=variable,
                time__gte=start_time_,
                time__lte=end_time_,
                report_type=report_type,
            ).values()
        )

    data = data.rename(columns={"station_id": "station", "variable_id": "variable"})
    if not data.empty:
        data = data.sort_values("time")

    return data

launch_reports_calculation(station, variable, start_time, end_time) ¤

Launches the calculation of the reports.

Time is set to the station timezone and the time range is inclusive of both start and end times.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable str

Variable of interest.

required
start_time str

Start time.

required
end_time str

End time.

required
Source code in measurement/reporting.py
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
def launch_reports_calculation(
    station: str,
    variable: str,
    start_time: str,
    end_time: str,
) -> None:
    """Launches the calculation of the reports.

    Time is set to the station timezone and the time range is inclusive of both
    start and end times.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.
    """
    operation = (
        "sum" if Variable.objects.get(variable_code=variable).is_cumulative else "mean"
    )
    period = Station.objects.get(station_code=station).delta_t.delta_t
    start_time_, end_time_ = reformat_dates(station, start_time, end_time)
    data = get_data_to_report(station, variable, start_time_, end_time_)
    report = calculate_reports(data, station, variable, operation, period)
    remove_report_data_in_range(station, variable, start_time_, end_time_)
    save_report_data(report)

reformat_dates(station, start_time, end_time) ¤

Reformat dates so they have the right timezone and cover full days.

The start date is always the first day of the first month and the end date is the last day of the last month. Times are set to 00:00:00 and 23:59:59, respectively, and the timezone is set to the station timezone.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable

Variable of interest.

required
start_time str

Start time.

required
end_time str

End time.

required

Returns:

Type Description
Series

A series with the dates to be validated.

Source code in measurement/reporting.py
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def reformat_dates(
    station: str,
    start_time: str,
    end_time: str,
) -> pd.Series:
    """Reformat dates so they have the right timezone and cover full days.

    The start date is always the first day of the first month and the end date is the
    last day of the last month. Times are set to 00:00:00 and 23:59:59, respectively,
    and the timezone is set to the station timezone.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.

    Returns:
        A series with the dates to be validated.
    """
    tz = zoneinfo.ZoneInfo(Station.objects.get(station_code=station).timezone)
    start_time_ = datetime.strptime(start_time, "%Y-%m-%d").replace(day=1, tzinfo=tz)
    end_time_ = (
        datetime.strptime(end_time, "%Y-%m-%d").replace(day=1, tzinfo=tz)
        + pd.DateOffset(months=1)
        - pd.DateOffset(seconds=1)
    )

    return start_time_, end_time_

remove_report_data_in_range(station, variable, start_time, end_time) ¤

Removes data in the range from the database.

It enforces to use the station timezone.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable str

Variable of interest.

required
start_time datetime

Start time.

required
end_time datetime

End time.

required
Source code in measurement/reporting.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def remove_report_data_in_range(
    station: str,
    variable: str,
    start_time: datetime,
    end_time: datetime,
) -> None:
    """Removes data in the range from the database.

    It enforces to use the station timezone.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.
    """
    tz = zoneinfo.ZoneInfo(Station.objects.get(station_code=station).timezone)

    Report.objects.filter(
        station__station_code=station,
        variable__variable_code=variable,
        time__gte=start_time.replace(tzinfo=tz),
        time__lte=end_time.replace(tzinfo=tz),
    ).delete()

save_report_data(data) ¤

Saves the report data into the database.

Before saving, the function removes maximum and minimum columns if they have all NaN and removes rows with NaN in the value column.

Parameters:

Name Type Description Default
data DataFrame

The dataframe with the report data.

required
Source code in measurement/reporting.py
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def save_report_data(data: pd.DataFrame) -> None:
    """Saves the report data into the database.

    Before saving, the function removes maximum and minimum columns if they have all NaN
    and removes rows with NaN in the value column.

    Args:
        data: The dataframe with the report data.
    """
    data_ = data.dropna(axis=1, how="all").dropna(axis=0, subset=["value"])
    Report.objects.bulk_create(
        [
            Report(
                station=Station.objects.get(station_code=row["station"]),
                variable=Variable.objects.get(variable_code=row["variable"]),
                time=time,
                value=row["value"],
                maximum=row.get("maximum", None),
                minimum=row.get("minimum", None),
                completeness=row["completeness"],
                report_type=row["report_type"],
            )
            for time, row in data_.iterrows()
        ]
    )