functions
importing.functions
¤
Attributes¤
one_second = np.timedelta64(1, 's')
module-attribute
¤
unix_epoch = np.datetime64(0, 's')
module-attribute
¤
Classes¤
Classification
¤
Bases: PermissionsBase
Contains instructions on how to classify the data into a specific variable.
In particular, it links a format to a variable, and provides the column indices for the value, maximum, and minimum columns, as well as the validator columns. It also contains information on whether the data is accumulated, incremental, and the resolution of the data.
Attributes:
Name | Type | Description |
---|---|---|
cls_id |
AutoField
|
Primary key. |
format |
ForeignKey
|
The format of the data file. |
variable |
ForeignKey
|
The variable to which the data belongs. |
value |
PositiveSmallIntegerField
|
Index of the value column, starting in 0. |
maximum |
PositiveSmallIntegerField
|
Index of the maximum value column, starting in 0. |
minimum |
PositiveSmallIntegerField
|
Index of the minimum value column, starting in 0. |
value_validator_column |
PositiveSmallIntegerField
|
Index of the value validator column, starting in 0. |
value_validator_text |
CharField
|
Value validator text. |
maximum_validator_column |
PositiveSmallIntegerField
|
Index of the maximum value validator column, starting in 0. |
maximum_validator_text |
CharField
|
Maximum value validator text. |
minimum_validator_column |
PositiveSmallIntegerField
|
Index of the minimum value validator column, starting in 0. |
minimum_validator_text |
CharField
|
Minimum value validator text. |
accumulate |
PositiveSmallIntegerField
|
If set to a number of minutes, the data will be accumulated over that period. |
resolution |
DecimalField
|
Resolution of the data. Only used if it is to be accumulated. |
incremental |
BooleanField
|
Whether the data is an incremental counter. If it is, any value below the previous one will be removed. |
decimal_comma |
BooleanField
|
Whether the data uses a comma as a decimal separator. |
Functions¤
__str__()
¤
Return the string representation of the object.
Source code in formatting/models.py
419 420 421 |
|
clean()
¤
Validate the model instance.
It checks that the column indices are different, and that the accumulation period is greater than zero if it is set. It also checks that the resolution is set if the data is accumulated.
Source code in formatting/models.py
427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 |
|
get_absolute_url()
¤
Get the absolute URL of the object.
Source code in formatting/models.py
423 424 425 |
|
DataImport
¤
Bases: PermissionsBase
Model to store the data imports.
This model stores the data imports, which are files with data that are uploaded to the system. The data is then processed asynchronously and stored in the database.
Attributes:
Name | Type | Description |
---|---|---|
station |
ForeignKey
|
Station to which the data belongs. |
format |
ForeignKey
|
Format of the data. |
rawfile |
FileField
|
File with the data to be imported. |
date |
DateTimeField
|
Date of submission of the data. |
start_date |
DateTimeField
|
Start date of the data. |
end_date |
DateTimeField
|
End date of the data. |
records |
IntegerField
|
Number of records in the data. |
observations |
TextField
|
Notes or observations about the data. |
status |
TextField
|
Status of the import. |
log |
TextField
|
Log of the data ingestion, indicating any errors. |
reprocess |
BooleanField
|
If checked, the data will be reprocessed. |
Functions¤
clean()
¤
Validate information and uploads the measurement data.
Source code in importing/models.py
104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
Format
¤
Bases: PermissionsBase
Details of the data file format, describing how to read the file.
It combines several properties, such as the file extension, the delimiter, the date and time formats, and the column indices for the date and time columns, instructing how to read the data file and parse the dates. It is mostly used to ingest data from text files, like CSV.
Attributes:
Name | Type | Description |
---|---|---|
format_id |
AutoField
|
Primary key. |
name |
CharField
|
Short name of the format entry. |
description |
TextField
|
Description of the format. |
extension |
ForeignKey
|
The extension of the data file. |
delimiter |
ForeignKey
|
The delimiter between columns in the data file. Only required for text files. |
first_row |
PositiveSmallIntegerField
|
Index of the first row with data, starting in 0. |
footer_rows |
PositiveSmallIntegerField
|
Number of footer rows to be ignored at the end. |
date |
ForeignKey
|
Format for the date column. Only required for text files. |
date_column |
PositiveSmallIntegerField
|
Index of the date column, starting in 0. |
time |
ForeignKey
|
Format for the time column. Only required for text files. |
time_column |
PositiveSmallIntegerField
|
Index of the time column, starting in 0. |
Attributes¤
datetime_format: str
property
¤
Obtain the datetime format string.
Functions¤
__str__()
¤
Return the string representation of the object.
Source code in formatting/models.py
253 254 255 |
|
datetime_columns(delimiter)
¤
Column indices that correspond to the date and time columns in the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
delimiter
|
str
|
The delimiter used to split the date and time codes. |
required |
Returns:
Type | Description |
---|---|
list[int]
|
list[int]: A list of column indices. |
Source code in formatting/models.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 |
|
get_absolute_url()
¤
Get the absolute URL of the object.
Source code in formatting/models.py
257 258 259 |
|
Measurement
¤
Bases: MeasurementBase
Class to store the measurements and their validation status.
This class holds the value of a given variable and station at a specific time, as
well as auxiliary information such as maximum and minimum values, depth and
direction, for vector quantities. All of these have a raw
version where a backup
of the original data is kept, should this change at any point.
Flags to monitor its validation status, if the data is active (and therefore can be used for reporting) and if it has actually been used for that is also included.
Attributes:
Name | Type | Description |
---|---|---|
depth |
int
|
Depth of the measurement. |
direction |
Decimal
|
Direction of the measurement, useful for vector quantities. |
raw_value |
Decimal
|
Original value of the measurement. |
raw_maximum |
Decimal
|
Original maximum value of the measurement. |
raw_minimum |
Decimal
|
Original minimum value of the measurement. |
raw_direction |
Decimal
|
Original direction of the measurement. |
raw_depth |
int
|
Original depth of the measurement. |
is_validated |
bool
|
Flag to indicate if the measurement has been validated. |
is_active |
bool
|
Flag to indicate if the measurement is active. An inactive measurement is not used for reporting |
Attributes¤
overwritten: bool
property
¤
Indicates if any of the values associated to the entry have been overwritten.
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if any raw field is different to the corresponding standard field. |
raws: tuple[str, ...]
property
¤
Return the raw fields of the measurement.
Returns:
Type | Description |
---|---|
tuple[str, ...]
|
tuple[str]: Tuple with the names of the raw fields of the measurement. |
Functions¤
clean()
¤
Check consistency of validation, reporting and backs-up values.
Source code in measurement/models.py
259 260 261 262 263 264 265 266 267 268 269 |
|
Report
¤
Bases: MeasurementBase
Holds the different reporting data.
It also keeps track of which data has already been used when creating the reports.
Attributes:
Name | Type | Description |
---|---|---|
report_type |
str
|
Type of report. It can be hourly, daily or monthly. |
completeness |
Decimal
|
Completeness of the report. Eg. a daily report with 24 hourly measurements would have a completeness of 100%. |
Functions¤
clean()
¤
Validate that the report type and use of the data is consistent.
Source code in measurement/models.py
147 148 149 150 151 152 153 154 155 156 |
|
Station
¤
Bases: PermissionsBase
Main representation of a station, including several metadata.
Attributes:
Name | Type | Description |
---|---|---|
visibility |
str
|
Visibility level of the object, including an "internal" option. |
station_id |
int
|
Primary key. |
station_code |
str
|
Unique code for the station. |
station_name |
str
|
Brief description of the station. |
station_type |
StationType
|
Type of the station. |
country |
Country
|
Country where the station is located. |
region |
Region
|
Region within the Country where the station is located. |
ecosystem |
Ecosystem
|
Ecosystem associated with the station. |
institution |
Institution
|
Institutional partner responsible for the station. |
place_basin |
PlaceBasin
|
Place-Basin association. |
station_state |
bool
|
Is the station operational? |
timezone |
str
|
Timezone of the station. |
station_latitude |
Decimal
|
Latitude of the station, in degrees [-90 to 90]. |
station_longitude |
Decimal
|
Longitude of the station, in degrees [-180 to 180]. |
station_altitude |
int
|
Altitude of the station. |
influence_km |
Decimal
|
Area of influence in km2. |
station_file |
ImageField
|
Photography of the station. |
station_external |
bool
|
Is the station external? |
variables |
str
|
Comma-separated list of variables measured by the station. |
Attributes¤
variables_list: list[str]
property
¤
Return the list of variables measured by the station.
Only variables with data in the database are returned.
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: List of variables measured by the station. |
Functions¤
__str__()
¤
Return the station code.
Source code in station/models.py
458 459 460 |
|
get_absolute_url()
¤
Return the absolute url of the station.
Source code in station/models.py
462 463 464 |
|
set_object_permissions()
¤
Set object-level permissions.
This method is called by the save method of the model to set the object-level permissions based on the visibility level of the object. In addition to the standard permissions for the station, the view_measurements permission is set which controls who can view the measurements associated to the station.
Source code in station/models.py
466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 |
|
Functions¤
construct_matrix(matrix_source, file_format, station, data_import)
¤
Construct the "matrix" or results table. Does various cleaning / simple transformations depending on the date format, type of data (accumulated, incremental...) and deals with NANs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
matrix_source
|
FileField
|
raw data file path |
required |
file_format
|
Format
|
a formatting.Format object. |
required |
Returns: Dict of dataframes for results (one for each variable type in the raw data file). TODO: Probably refactor into smaller chunks.
Source code in importing/functions.py
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 |
|
get_last_uploaded_date(station_id, var_code)
¤
Get the last date of uploaded data for a given station ID and variable code.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
station_id
|
int
|
The station ID. |
required |
var_code
|
str
|
The variable code. |
required |
Returns:
Type | Description |
---|---|
datetime | None
|
The last date that data was uploaded for the given station ID and variable code |
datetime | None
|
or None if no data was found. |
Source code in importing/functions.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
process_datetime_columns(data, file_format, timezone)
¤
Process the datetime columns in a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame
|
The DataFrame to process. |
required |
file_format
|
Format
|
The file format. |
required |
timezone
|
str
|
The timezone to use. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The DataFrame with the datetime columns processed. |
Source code in importing/functions.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
|
read_data_to_import(source_file, file_format, timezone)
¤
Reads the data from file into a pandas DataFrame.
Works out what sort of file is being read and adds standardised columns for datetime.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_file
|
Any
|
Stream of data to be parsed. |
required |
file_format
|
Format
|
Format of the data to be parsed. |
required |
timezone
|
str
|
Timezone name, eg. 'America/Chicago'. |
required |
Returns:
Type | Description |
---|---|
Pandas.DataFrame with raw data read and extra column(s) for datetime |
|
correctly parsed. |
Source code in importing/functions.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
|
read_file_csv(source_file, file_format)
¤
Reads a CSV file into a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_file
|
Any
|
Stream of data to be parsed. |
required |
file_format
|
Format
|
The file format. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A pandas DataFrame containing the data from the file. |
Source code in importing/functions.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
read_file_excel(file_path, file_format)
¤
Reads an Excel file into a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
The path to the file to be read. |
required |
file_format
|
Format
|
The file format. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A pandas DataFrame containing the data from the file. |
Source code in importing/functions.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
save_temp_data_to_permanent(data_import)
¤
Function to pass the temporary import to the final table.
Uses the data_import_temp object only to get all required information from its fields.
This function carries out the following steps:
- Bulk delete of existing data between two times on a given measurement table for the station in question.
- Bulk create to add the new data from the uploaded file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_import_temp
|
DataImportTemp object. |
required |
Source code in importing/functions.py
262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
|
standardise_datetime(date_time, datetime_format)
¤
Returns a datetime object in the case that date_time is not already in that form.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
date_time
|
Any
|
The date_time to be transformed. |
required |
datetime_format
|
str
|
The format that date_time is in (to be passed to datetime.strptime()). |
required |
Returns:
Type | Description |
---|---|
datetime
|
A datetime object or None if date_time is not in a recognised format. |
Source code in importing/functions.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
|
standardise_float(val_str)
¤
Removes commas from strings for numbers that use a period as a decimal separator.
Args: val_str: string or Number-like Returns: val_num: float or None
Source code in importing/functions.py
493 494 495 496 497 498 499 500 501 502 503 504 505 506 |
|
standardise_float_comma(val_str)
¤
For strings representing numbers that use a comma as a decimal separator: (i) Removes full stops (ii) Replaces commas for full stops Args: val_str: string or Number-like Returns: val_num: float or None
Source code in importing/functions.py
509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 |
|
validate_dates(data_import)
¤
Verify if there already exists data for the dates of the data being imported.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_import
|
DataImportFull or DataImportTemp object. |
required |
Returns:
Type | Description |
---|---|
tuple of: result: (list of dicts): one per classification for this file format of summary: dict containing information on the variable, the end date and whether the data exists. overwrite: (bool) True if any of the data already exists. |
Source code in importing/functions.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|