Skip to content

reporting

measurement.reporting ¤

Classes¤

DataImport ¤

Bases: PermissionsBase

Model to store the data imports.

This model stores the data imports, which are files with data that are uploaded to the system. The data is then processed asynchronously and stored in the database.

Attributes:

Name Type Description
station ForeignKey

Station to which the data belongs.

format ForeignKey

Format of the data.

rawfile FileField

File with the data to be imported.

date DateTimeField

Date of submission of the data.

start_date DateTimeField

Start date of the data.

end_date DateTimeField

End date of the data.

records IntegerField

Number of records in the data.

observations TextField

Notes or observations about the data.

status TextField

Status of the import.

log TextField

Log of the data ingestion, indicating any errors.

reprocess BooleanField

If checked, the data will be reprocessed.

Functions¤
clean() ¤

Validate information and uploads the measurement data.

Source code in importing/models.py
104
105
106
107
108
109
110
111
112
113
114
115
116
def clean(self) -> None:
    """Validate information and uploads the measurement data."""
    tz = self.station.timezone
    if not tz:
        raise ValidationError("Station must have a timezone set.")

    # If the file has changed, we reprocess the data
    if self.pk and self.rawfile != self.__class__.objects.get(pk=self.pk).rawfile:
        self.reprocess = True

    if self.reprocess:
        self.status = "N"
        self.reprocess = False

Measurement ¤

Bases: MeasurementBase

Class to store the measurements and their validation status.

This class holds the value of a given variable and station at a specific time, as well as auxiliary information such as maximum and minimum values, depth and direction, for vector quantities. All of these have a raw version where a backup of the original data is kept, should this change at any point.

Flags to monitor its validation status, if the data is active (and therefore can be used for reporting) and if it has actually been used for that is also included.

Attributes:

Name Type Description
depth int

Depth of the measurement.

direction Decimal

Direction of the measurement, useful for vector quantities.

raw_value Decimal

Original value of the measurement.

raw_maximum Decimal

Original maximum value of the measurement.

raw_minimum Decimal

Original minimum value of the measurement.

raw_direction Decimal

Original direction of the measurement.

raw_depth int

Original depth of the measurement.

is_validated bool

Flag to indicate if the measurement has been validated.

is_active bool

Flag to indicate if the measurement is active. An inactive measurement is not used for reporting

Attributes¤
overwritten: bool property ¤

Indicates if any of the values associated to the entry have been overwritten.

Returns:

Name Type Description
bool bool

True if any raw field is different to the corresponding standard field.

raws: tuple[str, ...] property ¤

Return the raw fields of the measurement.

Returns:

Type Description
tuple[str, ...]

tuple[str]: Tuple with the names of the raw fields of the measurement.

Functions¤
clean() ¤

Check consistency of validation, reporting and backs-up values.

Source code in measurement/models.py
259
260
261
262
263
264
265
266
267
268
269
def clean(self) -> None:
    """Check consistency of validation, reporting and backs-up values."""
    # Check consistency of validation
    if not self.is_validated and not self.is_active:
        raise ValidationError("Only validated entries can be declared as inactive.")

    # Backup values to raws, if needed
    for r in self.raws:
        value = getattr(self, r.removeprefix("raw_"))
        if value and not getattr(self, r):
            setattr(self, r, value)

Report ¤

Bases: MeasurementBase

Holds the different reporting data.

It also keeps track of which data has already been used when creating the reports.

Attributes:

Name Type Description
report_type str

Type of report. It can be hourly, daily or monthly.

completeness Decimal

Completeness of the report. Eg. a daily report with 24 hourly measurements would have a completeness of 100%.

Functions¤
clean() ¤

Validate that the report type and use of the data is consistent.

Source code in measurement/models.py
147
148
149
150
151
152
153
154
155
156
def clean(self) -> None:
    """Validate that the report type and use of the data is consistent."""
    if self.report_type == ReportType.HOURLY:
        self.time = self.time.replace(minute=0, second=0, microsecond=0)
    elif self.report_type == ReportType.DAILY:
        self.time = self.time.replace(hour=0, minute=0, second=0, microsecond=0)
    elif self.report_type == ReportType.MONTLY:
        self.time = self.time.replace(
            day=1, hour=0, minute=0, second=0, microsecond=0
        )

Station ¤

Bases: PermissionsBase

Main representation of a station, including several metadata.

Attributes:

Name Type Description
visibility str

Visibility level of the object, including an "internal" option.

station_id int

Primary key.

station_code str

Unique code for the station.

station_name str

Brief description of the station.

station_type StationType

Type of the station.

country Country

Country where the station is located.

region Region

Region within the Country where the station is located.

ecosystem Ecosystem

Ecosystem associated with the station.

institution Institution

Institutional partner responsible for the station.

place_basin PlaceBasin

Place-Basin association.

station_state bool

Is the station operational?

timezone str

Timezone of the station.

station_latitude Decimal

Latitude of the station, in degrees [-90 to 90].

station_longitude Decimal

Longitude of the station, in degrees [-180 to 180].

station_altitude int

Altitude of the station.

influence_km Decimal

Area of influence in km2.

station_file ImageField

Photography of the station.

station_external bool

Is the station external?

variables str

Comma-separated list of variables measured by the station.

Attributes¤
variables_list: list[str] property ¤

Return the list of variables measured by the station.

Only variables with data in the database are returned.

Returns:

Type Description
list[str]

list[str]: List of variables measured by the station.

Functions¤
__str__() ¤

Return the station code.

Source code in station/models.py
458
459
460
def __str__(self) -> str:
    """Return the station code."""
    return str(self.station_code)
get_absolute_url() ¤

Return the absolute url of the station.

Source code in station/models.py
462
463
464
def get_absolute_url(self) -> str:
    """Return the absolute url of the station."""
    return reverse("station:station_detail", kwargs={"pk": self.pk})
set_object_permissions() ¤

Set object-level permissions.

This method is called by the save method of the model to set the object-level permissions based on the visibility level of the object. In addition to the standard permissions for the station, the view_measurements permission is set which controls who can view the measurements associated to the station.

Source code in station/models.py
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
def set_object_permissions(self) -> None:
    """Set object-level permissions.

    This method is called by the save method of the model to set the object-level
    permissions based on the visibility level of the object. In addition to the
    standard permissions for the station, the view_measurements permission is set
    which controls who can view the measurements associated to the station.
    """
    super().set_object_permissions()

    standard_group = Group.objects.get(name="Standard")
    anonymous_user = get_anonymous_user()

    # Assign view_measurements permission based on permissions level
    if self.visibility == "public":
        assign_perm("view_measurements", standard_group, self)
        assign_perm("view_measurements", anonymous_user, self)
        if self.owner:
            remove_perm("view_measurements", self.owner, self)
    elif self.visibility == "internal":
        assign_perm("view_measurements", standard_group, self)
        remove_perm("view_measurements", anonymous_user, self)
        if self.owner:
            remove_perm("view_measurements", self.owner, self)
    elif self.visibility == "private":
        remove_perm("view_measurements", standard_group, self)
        remove_perm("view_measurements", anonymous_user, self)
        if self.owner:
            assign_perm("view_measurements", self.owner, self)

Variable ¤

Bases: PermissionsBase

A variable with a physical meaning.

Such as precipitation, wind speed, wind direction, soil moisture, including the associated unit. It also includes metadata to help identify what is a reasonable value for the data, to flag outliers and to help with the validation process.

The nature of the variable can be one of the following:

  • sum: Cumulative value over a period of time.
  • average: Average value over a period of time.
  • value: One-off value.

Attributes:

Name Type Description
variable_id AutoField

Primary key.

variable_code CharField

Code of the variable, eg. airtemperature.

name CharField

Human-readable name of the variable, eg. Air temperature.

unit ForeignKey

Unit of the variable.

maximum DecimalField

Maximum value allowed for the variable.

minimum DecimalField

Minimum value allowed for the variable.

diff_error DecimalField

If two sequential values in the time-series data of this variable differ by more than this value, the validation process can mark this with an error flag.

outlier_limit DecimalField

The statistical deviation for defining outliers, in times the standard deviation (sigma).

null_limit DecimalField

The max % of null values (missing, caused by e.g. equipment malfunction) allowed for hourly, daily, monthly data. Cumulative values are not deemed trustworthy if the number of missing values in a given period is greater than the null_limit.

nature CharField

Nature of the variable, eg. if it represents a one-off value, the average over a period of time or the cumulative value over a period

Attributes¤
is_cumulative: bool property ¤

Return True if the nature of the variable is sum.

Functions¤
__str__() ¤

Return the string representation of the object.

Source code in variable/models.py
165
166
167
def __str__(self) -> str:
    """Return the string representation of the object."""
    return str(self.name)
clean() ¤

Validate the model fields.

Source code in variable/models.py
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
def clean(self) -> None:
    """Validate the model fields."""
    if self.maximum < self.minimum:
        raise ValidationError(
            {
                "maximum": "The maximum value must be greater than the minimum "
                "value."
            }
        )
    if not self.variable_code.isidentifier():
        raise ValidationError(
            {
                "variable_code": "The variable code must be a valid Python "
                "identifier. Only letters, numbers and underscores are allowed, and"
                " it cannot start with a number."
            }
        )
    return super().clean()
get_absolute_url() ¤

Get the absolute URL of the object.

Source code in variable/models.py
169
170
171
def get_absolute_url(self) -> str:
    """Get the absolute URL of the object."""
    return reverse("variable:variable_detail", kwargs={"pk": self.pk})

Functions¤

calculate_reports(data, station, variable, operation) ¤

Calculates the report for the chosen days.

Parameters:

Name Type Description Default
data DataFrame

The dataframe with the data.

required
station str

The name of the station.

required
variable str

The name of the variable.

required
operation str

Aggregation operation to perform on the data when calculating the report.

required

Returns:

Type Description
DataFrame

A dataframe with the hourly, daily and monthly reports.

Source code in measurement/reporting.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def calculate_reports(
    data: pd.DataFrame, station: str, variable: str, operation: str
) -> pd.DataFrame:
    """Calculates the report for the chosen days.

    Args:
        data: The dataframe with the data.
        station: The name of the station.
        variable: The name of the variable.
        operation: Aggregation operation to perform on the data when calculating the
            report.

    Returns:
        A dataframe with the hourly, daily and monthly reports.
    """
    cols = ["time", "value"]
    if "maximum" in data.columns:
        cols.append("maximum")
    if "minimum" in data.columns:
        cols.append("minimum")

    # Calculate the reports
    hourly = data[cols].resample("h", on="time").agg(operation)
    daily = hourly.resample("D").agg(operation)
    monthly = daily.resample("MS").agg(operation)

    # Get the right data_import for each period. We use the mode to get the most common
    # data_import value in the period.
    def mode(x: pd.Series) -> str | None:
        modes = x.mode()
        return modes[0] if not modes.empty else None

    cols2 = ["time", "data_import_id"]
    hourly["data_import_id"] = data[cols2].resample("h", on="time").agg(mode)
    daily["data_import_id"] = data[cols2].resample("D", on="time").agg(mode)
    monthly["data_import_id"] = data[cols2].resample("MS", on="time").agg(mode)

    # Put everything together
    hourly["report_type"] = "hourly"
    daily["report_type"] = "daily"
    monthly["report_type"] = "monthly"

    report = pd.concat([hourly, daily, monthly])
    report["station"] = station
    report["variable"] = variable

    return report

get_data_to_report(station, variable, start_time, end_time) ¤

Retrieves data to be reported about.

It enforces to retrieve only active measurements and to use the station timezone.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable str

Variable of interest.

required
start_time datetime

Start time.

required
end_time datetime

End time.

required

Returns:

Type Description
DataFrame

A dataframe with the data to report about.

Source code in measurement/reporting.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def get_data_to_report(
    station: str,
    variable: str,
    start_time: datetime,
    end_time: datetime,
) -> pd.DataFrame:
    """Retrieves data to be reported about.

    It enforces to retrieve only active measurements and to use the station timezone.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.

    Returns:
        A dataframe with the data to report about.
    """

    return pd.DataFrame.from_records(
        Measurement.objects.filter(
            station__station_code=station,
            variable__variable_code=variable,
            time__date__range=(start_time.date(), end_time.date()),
            is_active=True,
        ).values()
    )

get_report_data_from_db(station, variable, start_time, end_time, report_type, whole_months=True) cached ¤

Retrieves the report data from the database.

Time is set to the station timezone and the time range is inclusive of both start and end times.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable str

Variable of interest.

required
start_time str

Start time.

required
end_time str

End time.

required
report_type str

Type of report to retrieve.

required
whole_months bool

Whether to cover whole months or not.

True

Returns:

Type Description
DataFrame

A dataframe with the report data.

Source code in measurement/reporting.py
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
@lru_cache(1)
def get_report_data_from_db(
    station: str,
    variable: str,
    start_time: str,
    end_time: str,
    report_type: str,
    whole_months: bool = True,
) -> pd.DataFrame:
    """Retrieves the report data from the database.

    Time is set to the station timezone and the time range is inclusive of both
    start and end times.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.
        report_type: Type of report to retrieve.
        whole_months: Whether to cover whole months or not.

    Returns:
        A dataframe with the report data.
    """
    start_time_, end_time_ = reformat_dates(start_time, end_time, whole_months)

    if report_type == "measurement":
        data = pd.DataFrame.from_records(
            Measurement.objects.filter(
                station__station_code=station,
                variable__variable_code=variable,
                time__date__range=(start_time_.date(), end_time_.date()),
            ).values()
        )
        raw_cols = [col for col in data.columns if col.startswith("raw_")]
        normal = [col.strip("raw_") for col in raw_cols]
        data = data.drop(columns=normal).rename(columns=dict(zip(raw_cols, normal)))

    elif report_type == "validated":
        data = pd.DataFrame.from_records(
            Measurement.objects.filter(
                station__station_code=station,
                variable__variable_code=variable,
                time__date__range=(start_time_.date(), end_time_.date()),
                is_validated=True,
                is_active=True,
            ).values()
        )
        raw_cols = [col for col in data.columns if col.startswith("raw_")]
        data = data.drop(columns=raw_cols)

    else:
        data = pd.DataFrame.from_records(
            Report.objects.filter(
                station__station_code=station,
                variable__variable_code=variable,
                time__date__range=(start_time_.date(), end_time_.date()),
                report_type=report_type,
            ).values()
        )

    data = data.rename(columns={"station_id": "station", "variable_id": "variable"})

    if data.empty:
        return data

    tz = timezone.get_current_timezone()
    data["time"] = data["time"].dt.tz_convert(tz)
    return data.sort_values("time")

launch_reports_calculation(station, variable, start_time, end_time) ¤

Launches the calculation of the reports.

Time is set to the station timezone and the time range is inclusive of both start and end times.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable str

Variable of interest.

required
start_time str

Start time.

required
end_time str

End time.

required
Source code in measurement/reporting.py
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
def launch_reports_calculation(
    station: str,
    variable: str,
    start_time: str,
    end_time: str,
) -> None:
    """Launches the calculation of the reports.

    Time is set to the station timezone and the time range is inclusive of both
    start and end times.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.
    """
    operation = (
        "sum" if Variable.objects.get(variable_code=variable).is_cumulative else "mean"
    )

    start_time_, end_time_ = reformat_dates(start_time, end_time)
    data = get_data_to_report(station, variable, start_time_, end_time_)
    report = calculate_reports(data, station, variable, operation)
    remove_report_data_in_range(station, variable, start_time_, end_time_)
    save_report_data(report)

reformat_dates(start_time, end_time, whole_months=True) ¤

Reformat dates so they have the right timezone and cover full days.

The start date is always the first day of the first month and the end date is the last day of the last month. Times are set to 00:00:00 and 23:59:59, respectively, and the timezone is set to the station timezone.

Parameters:

Name Type Description Default
start_time str

Start time.

required
end_time str

End time.

required
whole_months bool

Whether to cover whole months or not.

True

Returns:

Type Description
tuple[datetime, datetime]

A series with the dates to be validated.

Source code in measurement/reporting.py
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def reformat_dates(
    start_time: str,
    end_time: str,
    whole_months: bool = True,
) -> tuple[datetime, datetime]:
    """Reformat dates so they have the right timezone and cover full days.

    The start date is always the first day of the first month and the end date is the
    last day of the last month. Times are set to 00:00:00 and 23:59:59, respectively,
    and the timezone is set to the station timezone.

    Args:
        start_time: Start time.
        end_time: End time.
        whole_months: Whether to cover whole months or not.

    Returns:
        A series with the dates to be validated.
    """
    tz = timezone.get_current_timezone()

    if whole_months:
        start_time_ = datetime.strptime(start_time, "%Y-%m-%d").replace(
            day=1, tzinfo=tz
        )
        end_time_ = (
            datetime.strptime(end_time, "%Y-%m-%d").replace(day=1)
            + pd.DateOffset(months=1)
            - pd.DateOffset(seconds=1)
        )
        end_time_ = datetime.fromtimestamp(end_time_.timestamp()).astimezone(tz)
    else:
        start_time_ = datetime.strptime(start_time, "%Y-%m-%d").replace(tzinfo=tz)
        end_time_ = (
            datetime.strptime(end_time, "%Y-%m-%d")
            + pd.DateOffset(days=1)
            - pd.DateOffset(seconds=1)
        )
        end_time_ = datetime.fromtimestamp(end_time_.timestamp()).astimezone(tz)
    return start_time_, end_time_

remove_report_data_in_range(station, variable, start_time, end_time) ¤

Removes data in the range from the database.

It enforces to use the station timezone.

Parameters:

Name Type Description Default
station str

Station of interest.

required
variable str

Variable of interest.

required
start_time datetime

Start time.

required
end_time datetime

End time.

required
Source code in measurement/reporting.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def remove_report_data_in_range(
    station: str,
    variable: str,
    start_time: datetime,
    end_time: datetime,
) -> None:
    """Removes data in the range from the database.

    It enforces to use the station timezone.

    Args:
        station: Station of interest.
        variable: Variable of interest.
        start_time: Start time.
        end_time: End time.
    """

    Report.objects.filter(
        station__station_code=station,
        variable__variable_code=variable,
        time__date__range=(start_time.date(), end_time.date()),
    ).delete()

save_report_data(data) ¤

Saves the report data into the database.

Before saving, the function removes maximum and minimum columns if they have all NaN and removes rows with NaN in the value column.

Parameters:

Name Type Description Default
data DataFrame

The dataframe with the report data.

required
Source code in measurement/reporting.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def save_report_data(data: pd.DataFrame) -> None:
    """Saves the report data into the database.

    Before saving, the function removes maximum and minimum columns if they have all NaN
    and removes rows with NaN in the value column.

    Args:
        data: The dataframe with the report data.
    """
    data_ = data.dropna(axis=1, how="all").dropna(axis=0, subset=["value"])
    data_import_avail = "data_import_id" in data_.columns
    Report.objects.bulk_create(
        [
            Report(
                data_import=DataImport.objects.get(pk=row["data_import_id"])
                if data_import_avail and not pd.isna(row["data_import_id"])
                else None,
                station=Station.objects.get(station_code=row["station"]),
                variable=Variable.objects.get(variable_code=row["variable"]),
                time=time,
                value=row["value"],
                maximum=row.get("maximum", None),
                minimum=row.get("minimum", None),
                report_type=row["report_type"],
            )
            for time, row in data_.iterrows()
        ]
    )