Skip to content

create_synthetic_data_scenario1

utilities.benchmarking.create_synthetic_data_scenario1 ¤

Scenario for creating synthetic data for benchmarking purposes.

This scenario creates synthetic data for a single station and a set of variables for a range of years. It results in a database structure where the number of records is spread evenly across the years and variables. As the default chunk time interval for the TimescaleDB is 1 day, this scenario results in many chunks (>8000) with just a few records each (~3000).

Attributes¤

end = datetime(year + 1, 1, 1, tzinfo=tz) module-attribute ¤

execution = [] module-attribute ¤

maximum: int = random.randint(20, 30) module-attribute ¤

minimum: int = random.randint(-5, 5) module-attribute ¤

nrecords = 0 module-attribute ¤

progress = tqdm(itertools.product(years, variables), total=len(years) * len(variables), desc='Creating synthetic data') module-attribute ¤

records = [Measurement(station=station, variable=variable, time=t, value=Decimal(random.randint(minimum, maximum)), minimum=Decimal(minimum), maximum=Decimal(maximum)) for t in pd.date_range(start, end, freq='5min', inclusive='left')] module-attribute ¤

start = datetime(year, 1, 1, tzinfo=tz) module-attribute ¤

station = Station.objects.first() module-attribute ¤

tend = time.time() module-attribute ¤

tstart = time.time() module-attribute ¤

tz = zoneinfo.ZoneInfo(station.timezone) module-attribute ¤

variables = list(Variable.objects.all())[:10] module-attribute ¤

years = list(range(2000, 2023)) module-attribute ¤

Classes¤

Measurement ¤

Bases: MeasurementBase

Class to store the measurements and their validation status.

This class holds the value of a given variable and station at a specific time, as well as auxiliary information such as maximum and minimum values, depth and direction, for vector quantities. All of these have a raw version where a backup of the original data is kept, should this change at any point.

Flags to monitor its validation status, if the data is active (and therefore can be used for reporting) and if it has actually been used for that is also included.

Attributes:

Name Type Description
depth int

Depth of the measurement.

direction Decimal

Direction of the measurement, useful for vector quantities.

raw_value Decimal

Original value of the measurement.

raw_maximum Decimal

Original maximum value of the measurement.

raw_minimum Decimal

Original minimum value of the measurement.

raw_direction Decimal

Original direction of the measurement.

raw_depth int

Original depth of the measurement.

is_validated bool

Flag to indicate if the measurement has been validated.

is_active bool

Flag to indicate if the measurement is active. An inactive measurement is not used for reporting

Attributes¤
overwritten: bool property ¤

Indicates if any of the values associated to the entry have been overwritten.

Returns:

Name Type Description
bool bool

True if any raw field is different to the corresponding standard field.

raws: tuple[str, ...] property ¤

Return the raw fields of the measurement.

Returns:

Type Description
tuple[str, ...]

tuple[str]: Tuple with the names of the raw fields of the measurement.

Functions¤
clean() ¤

Check consistency of validation, reporting and backs-up values.

Source code in measurement/models.py
259
260
261
262
263
264
265
266
267
268
269
def clean(self) -> None:
    """Check consistency of validation, reporting and backs-up values."""
    # Check consistency of validation
    if not self.is_validated and not self.is_active:
        raise ValidationError("Only validated entries can be declared as inactive.")

    # Backup values to raws, if needed
    for r in self.raws:
        value = getattr(self, r.removeprefix("raw_"))
        if value and not getattr(self, r):
            setattr(self, r, value)

Station ¤

Bases: PermissionsBase

Main representation of a station, including several metadata.

Attributes:

Name Type Description
visibility str

Visibility level of the object, including an "internal" option.

station_id int

Primary key.

station_code str

Unique code for the station.

station_name str

Brief description of the station.

station_type StationType

Type of the station.

country Country

Country where the station is located.

region Region

Region within the Country where the station is located.

ecosystem Ecosystem

Ecosystem associated with the station.

institution Institution

Institutional partner responsible for the station.

place_basin PlaceBasin

Place-Basin association.

station_state bool

Is the station operational?

timezone str

Timezone of the station.

station_latitude Decimal

Latitude of the station, in degrees [-90 to 90].

station_longitude Decimal

Longitude of the station, in degrees [-180 to 180].

station_altitude int

Altitude of the station.

influence_km Decimal

Area of influence in km2.

station_file ImageField

Photography of the station.

station_external bool

Is the station external?

variables str

Comma-separated list of variables measured by the station.

Attributes¤
variables_list: list[str] property ¤

Return the list of variables measured by the station.

Only variables with data in the database are returned.

Returns:

Type Description
list[str]

list[str]: List of variables measured by the station.

Functions¤
__str__() ¤

Return the station code.

Source code in station/models.py
458
459
460
def __str__(self) -> str:
    """Return the station code."""
    return str(self.station_code)
get_absolute_url() ¤

Return the absolute url of the station.

Source code in station/models.py
462
463
464
def get_absolute_url(self) -> str:
    """Return the absolute url of the station."""
    return reverse("station:station_detail", kwargs={"pk": self.pk})
set_object_permissions() ¤

Set object-level permissions.

This method is called by the save method of the model to set the object-level permissions based on the visibility level of the object. In addition to the standard permissions for the station, the view_measurements permission is set which controls who can view the measurements associated to the station.

Source code in station/models.py
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
def set_object_permissions(self) -> None:
    """Set object-level permissions.

    This method is called by the save method of the model to set the object-level
    permissions based on the visibility level of the object. In addition to the
    standard permissions for the station, the view_measurements permission is set
    which controls who can view the measurements associated to the station.
    """
    super().set_object_permissions()

    standard_group = Group.objects.get(name="Standard")
    anonymous_user = get_anonymous_user()

    # Assign view_measurements permission based on permissions level
    if self.visibility == "public":
        assign_perm("view_measurements", standard_group, self)
        assign_perm("view_measurements", anonymous_user, self)
        if self.owner:
            remove_perm("view_measurements", self.owner, self)
    elif self.visibility == "internal":
        assign_perm("view_measurements", standard_group, self)
        remove_perm("view_measurements", anonymous_user, self)
        if self.owner:
            remove_perm("view_measurements", self.owner, self)
    elif self.visibility == "private":
        remove_perm("view_measurements", standard_group, self)
        remove_perm("view_measurements", anonymous_user, self)
        if self.owner:
            assign_perm("view_measurements", self.owner, self)

Variable ¤

Bases: PermissionsBase

A variable with a physical meaning.

Such as precipitation, wind speed, wind direction, soil moisture, including the associated unit. It also includes metadata to help identify what is a reasonable value for the data, to flag outliers and to help with the validation process.

The nature of the variable can be one of the following:

  • sum: Cumulative value over a period of time.
  • average: Average value over a period of time.
  • value: One-off value.

Attributes:

Name Type Description
variable_id AutoField

Primary key.

variable_code CharField

Code of the variable, eg. airtemperature.

name CharField

Human-readable name of the variable, eg. Air temperature.

unit ForeignKey

Unit of the variable.

maximum DecimalField

Maximum value allowed for the variable.

minimum DecimalField

Minimum value allowed for the variable.

diff_error DecimalField

If two sequential values in the time-series data of this variable differ by more than this value, the validation process can mark this with an error flag.

outlier_limit DecimalField

The statistical deviation for defining outliers, in times the standard deviation (sigma).

null_limit DecimalField

The max % of null values (missing, caused by e.g. equipment malfunction) allowed for hourly, daily, monthly data. Cumulative values are not deemed trustworthy if the number of missing values in a given period is greater than the null_limit.

nature CharField

Nature of the variable, eg. if it represents a one-off value, the average over a period of time or the cumulative value over a period

Attributes¤
is_cumulative: bool property ¤

Return True if the nature of the variable is sum.

Functions¤
__str__() ¤

Return the string representation of the object.

Source code in variable/models.py
165
166
167
def __str__(self) -> str:
    """Return the string representation of the object."""
    return str(self.name)
clean() ¤

Validate the model fields.

Source code in variable/models.py
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
def clean(self) -> None:
    """Validate the model fields."""
    if self.maximum < self.minimum:
        raise ValidationError(
            {
                "maximum": "The maximum value must be greater than the minimum "
                "value."
            }
        )
    if not self.variable_code.isidentifier():
        raise ValidationError(
            {
                "variable_code": "The variable code must be a valid Python "
                "identifier. Only letters, numbers and underscores are allowed, and"
                " it cannot start with a number."
            }
        )
    return super().clean()
get_absolute_url() ¤

Get the absolute URL of the object.

Source code in variable/models.py
169
170
171
def get_absolute_url(self) -> str:
    """Get the absolute URL of the object."""
    return reverse("variable:variable_detail", kwargs={"pk": self.pk})