create_synthetic_data_scenario1
utilities.benchmarking.create_synthetic_data_scenario1
¤
Scenario for creating synthetic data for benchmarking purposes.
This scenario creates synthetic data for a single station and a set of variables for a range of years. It results in a database structure where the number of records is spread evenly across the years and variables. As the default chunk time interval for the TimescaleDB is 1 day, this scenario results in many chunks (>8000) with just a few records each (~3000).
Attributes¤
end = datetime(year + 1, 1, 1, tzinfo=tz)
module-attribute
¤
execution = []
module-attribute
¤
maximum: int = random.randint(20, 30)
module-attribute
¤
minimum: int = random.randint(-5, 5)
module-attribute
¤
nrecords = 0
module-attribute
¤
progress = tqdm(itertools.product(years, variables), total=len(years) * len(variables), desc='Creating synthetic data')
module-attribute
¤
records = [Measurement(station=station, variable=variable, time=t, value=Decimal(random.randint(minimum, maximum)), minimum=Decimal(minimum), maximum=Decimal(maximum)) for t in pd.date_range(start, end, freq='5min', inclusive='left')]
module-attribute
¤
start = datetime(year, 1, 1, tzinfo=tz)
module-attribute
¤
station = Station.objects.first()
module-attribute
¤
tend = time.time()
module-attribute
¤
tstart = time.time()
module-attribute
¤
tz = zoneinfo.ZoneInfo(station.timezone)
module-attribute
¤
variables = list(Variable.objects.all())[:10]
module-attribute
¤
years = list(range(2000, 2023))
module-attribute
¤
Classes¤
Measurement
¤
Bases: MeasurementBase
Class to store the measurements and their validation status.
This class holds the value of a given variable and station at a specific time, as
well as auxiliary information such as maximum and minimum values, depth and
direction, for vector quantities. All of these have a raw
version where a backup
of the original data is kept, should this change at any point.
Flags to monitor its validation status, if the data is active (and therefore can be used for reporting) and if it has actually been used for that is also included.
Attributes:
Name | Type | Description |
---|---|---|
depth |
int
|
Depth of the measurement. |
direction |
Decimal
|
Direction of the measurement, useful for vector quantities. |
raw_value |
Decimal
|
Original value of the measurement. |
raw_maximum |
Decimal
|
Original maximum value of the measurement. |
raw_minimum |
Decimal
|
Original minimum value of the measurement. |
raw_direction |
Decimal
|
Original direction of the measurement. |
raw_depth |
int
|
Original depth of the measurement. |
is_validated |
bool
|
Flag to indicate if the measurement has been validated. |
is_active |
bool
|
Flag to indicate if the measurement is active. An inactive measurement is not used for reporting |
Attributes¤
overwritten: bool
property
¤
Indicates if any of the values associated to the entry have been overwritten.
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if any raw field is different to the corresponding standard field. |
raws: tuple[str, ...]
property
¤
Return the raw fields of the measurement.
Returns:
Type | Description |
---|---|
tuple[str, ...]
|
tuple[str]: Tuple with the names of the raw fields of the measurement. |
Functions¤
clean()
¤
Check consistency of validation, reporting and backs-up values.
Source code in measurement\models.py
259 260 261 262 263 264 265 266 267 268 269 |
|
Station
¤
Bases: PermissionsBase
Main representation of a station, including several metadata.
Attributes:
Name | Type | Description |
---|---|---|
visibility |
str
|
Visibility level of the object, including an "internal" option. |
station_id |
int
|
Primary key. |
station_code |
str
|
Unique code for the station. |
station_name |
str
|
Brief description of the station. |
station_type |
StationType
|
Type of the station. |
country |
Country
|
Country where the station is located. |
region |
Region
|
Region within the Country where the station is located. |
ecosystem |
Ecosystem
|
Ecosystem associated with the station. |
institution |
Institution
|
Institutional partner responsible for the station. |
place_basin |
PlaceBasin
|
Place-Basin association. |
station_state |
bool
|
Is the station operational? |
timezone |
str
|
Timezone of the station. |
station_latitude |
Decimal
|
Latitude of the station, in degrees [-90 to 90]. |
station_longitude |
Decimal
|
Longitude of the station, in degrees [-180 to 180]. |
station_altitude |
int
|
Altitude of the station. |
influence_km |
Decimal
|
Area of influence in km2. |
station_file |
ImageField
|
Photography of the station. |
station_external |
bool
|
Is the station external? |
variables |
str
|
Comma-separated list of variables measured by the station. |
Attributes¤
variables_list: list[str]
property
¤
Return the list of variables measured by the station.
Only variables with data in the database are returned.
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: List of variables measured by the station. |
Functions¤
__str__()
¤
Return the station code.
Source code in station\models.py
458 459 460 |
|
get_absolute_url()
¤
Return the absolute url of the station.
Source code in station\models.py
462 463 464 |
|
set_object_permissions()
¤
Set object-level permissions.
This method is called by the save method of the model to set the object-level permissions based on the visibility level of the object. In addition to the standard permissions for the station, the view_measurements permission is set which controls who can view the measurements associated to the station.
Source code in station\models.py
466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 |
|
Variable
¤
Bases: PermissionsBase
A variable with a physical meaning.
Such as precipitation, wind speed, wind direction, soil moisture, including the associated unit. It also includes metadata to help identify what is a reasonable value for the data, to flag outliers and to help with the validation process.
The nature of the variable can be one of the following:
- sum: Cumulative value over a period of time.
- average: Average value over a period of time.
- value: One-off value.
Attributes:
Name | Type | Description |
---|---|---|
variable_id |
AutoField
|
Primary key. |
variable_code |
CharField
|
Code of the variable, eg. airtemperature. |
name |
CharField
|
Human-readable name of the variable, eg. Air temperature. |
unit |
ForeignKey
|
Unit of the variable. |
maximum |
DecimalField
|
Maximum value allowed for the variable. |
minimum |
DecimalField
|
Minimum value allowed for the variable. |
diff_error |
DecimalField
|
If two sequential values in the time-series data of this variable differ by more than this value, the validation process can mark this with an error flag. |
outlier_limit |
DecimalField
|
The statistical deviation for defining outliers, in times the standard deviation (sigma). |
null_limit |
DecimalField
|
The max % of null values (missing, caused by e.g. equipment malfunction) allowed for hourly, daily, monthly data. Cumulative values are not deemed trustworthy if the number of missing values in a given period is greater than the null_limit. |
nature |
CharField
|
Nature of the variable, eg. if it represents a one-off value, the average over a period of time or the cumulative value over a period |
Attributes¤
is_cumulative: bool
property
¤
Return True if the nature of the variable is sum.
Functions¤
__str__()
¤
Return the string representation of the object.
Source code in variable\models.py
165 166 167 |
|
clean()
¤
Validate the model fields.
Source code in variable\models.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
get_absolute_url()
¤
Get the absolute URL of the object.
Source code in variable\models.py
169 170 171 |
|