Skip to content

Formatting¤

Introduction¤

The formatting application describes how a data file should be ingested: what columns to consider, what variable they contain, the format of date and time, etc. A summary of the models involved can be seen in the following diagram:

UML diagram of the Formatting app models.
Figure 1: UML diagram of the Formatting app models.

Basic components¤

Extension ¤

Extension of the data file.

It is mostly used to choose the tool to be employed to ingest the data. While it can take any value, there is currently explicit support only for xlsx and xlx. Anything else will be interpreted as a text file and loaded using pandas.read_csv.

Attributes:

Name Type Description
extension_id AutoField

Primary key.

value CharField

The extension value. eg. xlsx, xlx, txt.

Delimiter ¤

Delimiter between columns in the data file.

One or more characters that separate columns in a text file. The most common values are ,, ;, and \t (tab).

Attributes:

Name Type Description
delimiter_id AutoField

Primary key.

name CharField

The name of the delimiter. eg. comma, semicolon, tab.

character CharField

The character used as a delimiter. eg. ,, ;, \t.

Date ¤

Date format.

Format string for the date column. It is used to parse the date column in the data file. The format string must be compatible with the datetime module in Python. See the datetime documentation for more information on valid format codes.

Attributes:

Name Type Description
date_id AutoField

Primary key.

date_format CharField

The format string for the date column in human readable form, eg. DD-MM-YYYY.

code CharField

The code used to parse the date column, eg. %d-%m-%Y.

Time ¤

Time format.

Format string for the time column. It is used to parse the time column in the data file. The format string must be compatible with the datetime module in Python. See the datetime documentation for more information on valid format codes.

Attributes:

Name Type Description
date_id AutoField

Primary key.

date_format CharField

The format string for the date column in human readable form, eg. HH:MM:SS 24H.

code CharField

The code used to parse the date column, eg. %H:%M:%S.

Core component¤

Format ¤

Details of the data file format, describing how to read the file.

It combines several properties, such as the file extension, the delimiter, the date and time formats, and the column indices for the date and time columns, instructing how to read the data file and parse the dates. It is mostly used to ingest data from text files, like CSV.

Attributes:

Name Type Description
format_id AutoField

Primary key.

name CharField

Short name of the format entry.

description TextField

Description of the format.

extension ForeignKey

The extension of the data file.

delimiter ForeignKey

The delimiter between columns in the data file. Only required for text files.

first_row PositiveSmallIntegerField

Index of the first row with data, starting in 0.

footer_rows PositiveSmallIntegerField

Number of footer rows to be ignored at the end.

date ForeignKey

Format for the date column. Only required for text files.

date_column PositiveSmallIntegerField

Index of the date column, starting in 0.

time ForeignKey

Format for the time column. Only required for text files.

time_column PositiveSmallIntegerField

Index of the time column, starting in 0.

Classification ¤

Contains instructions on how to classify the data into a specific variable.

In particular, it links a format to a variable, and provides the column indices for the value, maximum, and minimum columns, as well as the validator columns. It also contains information on whether the data is accumulated, incremental, and the resolution of the data.

Attributes:

Name Type Description
cls_id AutoField

Primary key.

format ForeignKey

The format of the data file.

variable ForeignKey

The variable to which the data belongs.

value PositiveSmallIntegerField

Index of the value column, starting in 0.

maximum PositiveSmallIntegerField

Index of the maximum value column, starting in 0.

minimum PositiveSmallIntegerField

Index of the minimum value column, starting in 0.

value_validator_column PositiveSmallIntegerField

Index of the value validator column, starting in 0.

value_validator_text CharField

Value validator text.

maximum_validator_column PositiveSmallIntegerField

Index of the maximum value validator column, starting in 0.

maximum_validator_text CharField

Maximum value validator text.

minimum_validator_column PositiveSmallIntegerField

Index of the minimum value validator column, starting in 0.

minimum_validator_text CharField

Minimum value validator text.

accumulate PositiveSmallIntegerField

If set to a number of minutes, the data will be accumulated over that period.

resolution DecimalField

Resolution of the data. Only used if it is to be accumulated.

incremental BooleanField

Whether the data is an incremental counter. If it is, any value below the previous one will be removed.

decimal_comma BooleanField

Whether the data uses a comma as a decimal separator.