Formatting¤

Introduction¤

The formatting application describes how a data file should be ingested: what columns to consider, what variable they contain, the format of date and time, etc. A summary of the models involved can be seen in the following diagram:

Figure 1: UML diagram of the Formatting app models.

Basic components¤

`Extension` ¤

Extension of the data file.

It is mostly used to choose the tool to be employed to ingest the data. While it can take any value, there is currently explicit support only for xlsx and xlx. Anything else will be interpreted as a text file and loaded using pandas.read_csv.

Attributes:

Name	Type	Description
`extension_id`	`AutoField`	Primary key.
`value`	`CharField`	The extension value. eg. `xlsx`, `xlx`, `txt`.

`Delimiter` ¤

Delimiter between columns in the data file.

One or more characters that separate columns in a text file. The most common values are ,, ;, and \t (tab).

Attributes:

Name	Type	Description
`delimiter_id`	`AutoField`	Primary key.
`name`	`CharField`	The name of the delimiter. eg. `comma`, `semicolon`, `tab`.
`character`	`CharField`	The character used as a delimiter. eg. `,`, `;`, `\t`.

`Date` ¤

Date format.

Format string for the date column. It is used to parse the date column in the data file. The format string must be compatible with the datetime module in Python. See the datetime documentation for more information on valid format codes.

Attributes:

Name	Type	Description
`date_id`	`AutoField`	Primary key.
`date_format`	`CharField`	The format string for the date column in human readable form, eg. `DD-MM-YYYY`.
`code`	`CharField`	The code used to parse the date column, eg. `%d-%m-%Y`.

`Time` ¤

Time format.

Format string for the time column. It is used to parse the time column in the data file. The format string must be compatible with the datetime module in Python. See the datetime documentation for more information on valid format codes.

Attributes:

Name	Type	Description
`date_id`	`AutoField`	Primary key.
`date_format`	`CharField`	The format string for the date column in human readable form, eg. `HH:MM:SS 24H`.
`code`	`CharField`	The code used to parse the date column, eg. `%H:%M:%S`.

Core component¤

`Format` ¤

Details of the data file format, describing how to read the file.

It combines several properties, such as the file extension, the delimiter, the date and time formats, and the column indices for the date and time columns, instructing how to read the data file and parse the dates. It is mostly used to ingest data from text files, like CSV.

Attributes:

Name	Type	Description
`format_id`	`AutoField`	Primary key.
`name`	`CharField`	Short name of the format entry.
`description`	`TextField`	Description of the format.
`extension`	`ForeignKey`	The extension of the data file.
`delimiter`	`ForeignKey`	The delimiter between columns in the data file. Only required for text files.
`first_row`	`PositiveSmallIntegerField`	Index of the first row with data, starting in 0.
`footer_rows`	`PositiveSmallIntegerField`	Number of footer rows to be ignored at the end.
`date`	`ForeignKey`	Format for the date column. Only required for text files.
`date_column`	`PositiveSmallIntegerField`	Index of the date column, starting in 0.
`time`	`ForeignKey`	Format for the time column. Only required for text files.
`time_column`	`PositiveSmallIntegerField`	Index of the time column, starting in 0.

`Classification` ¤

Contains instructions on how to classify the data into a specific variable.

In particular, it links a format to a variable, and provides the column indices for the value, maximum, and minimum columns, as well as the validator columns. It also contains information on whether the data is accumulated, incremental, and the resolution of the data.

Attributes:

Name	Type	Description
`cls_id`	`AutoField`	Primary key.
`format`	`ForeignKey`	The format of the data file.
`variable`	`ForeignKey`	The variable to which the data belongs.
`value`	`PositiveSmallIntegerField`	Index of the value column, starting in 0.
`maximum`	`PositiveSmallIntegerField`	Index of the maximum value column, starting in 0.
`minimum`	`PositiveSmallIntegerField`	Index of the minimum value column, starting in 0.
`value_validator_column`	`PositiveSmallIntegerField`	Index of the value validator column, starting in 0.
`value_validator_text`	`CharField`	Value validator text.
`maximum_validator_column`	`PositiveSmallIntegerField`	Index of the maximum value validator column, starting in 0.
`maximum_validator_text`	`CharField`	Maximum value validator text.
`minimum_validator_column`	`PositiveSmallIntegerField`	Index of the minimum value validator column, starting in 0.
`minimum_validator_text`	`CharField`	Minimum value validator text.
`accumulate`	`PositiveSmallIntegerField`	If set to a number of minutes, the data will be accumulated over that period.
`resolution`	`DecimalField`	Resolution of the data. Only used if it is to be accumulated.
`incremental`	`BooleanField`	Whether the data is an incremental counter. If it is, any value below the previous one will be removed.
`decimal_comma`	`BooleanField`	Whether the data uses a comma as a decimal separator.

Formatting¤

Introduction¤

Basic components¤

Extension ¤

Delimiter ¤

Date ¤

Time ¤

Core component¤

Format ¤

Classification ¤

`Extension` ¤

`Delimiter` ¤

`Date` ¤

`Time` ¤

`Format` ¤

`Classification` ¤