Calculation methodology¤

This page describes the methodology used to estimate the energy usage and carbon emissions of compute jobs.

Gathering Compute Resources¤

Information about the compute job and executing node is gathered from the workload scheduler. For Imperial's CX3 and HX1 systems, this is PBS.

Internally, carbon performs a subprocess call to the PBS command qstat and parses the result. Therefore, only jobs accessible to qstat can be analysed by carbon. Currently, this means that only jobs which completed in the past two weeks (or jobs in progress) may be analysed.

Estimating Energy Consumption¤

The energy consumed by a job is estimated following the methodology behind the Green Algorithms project, led by Loïc Lannelongue at the University of Cambridge (1). This involves estimating the energy consumed using the compute resources assigned/used by the job, and information about the power draw of compute components provided by the component manufacturers. An additional factor is included in the calculation which accounts for the power usage effectiveness (PUE) of the data centre.

The following equation is used to estimate energy consumption (adapted from (1)):

$E = t \times (P_c \times u_c + P_g \times n_g + P_m \times n_m) \times \epsilon, \tag{1}$

where $t$ is the runtime of the compute job, $P_c$ is the per-core power draw of the CPU(s), $u_c$ is the usage factor of the CPU cores (which can vary between 0 and $n_c$ , where $n_c$ is the number of cores utilised by the job), $P_g$ is the per-component power draw of the GPU(s), $n_g$ is the number of GPUs employed, $P_m$ is the power draw of the memory (per GB), $n_m$ is the amount of memory allocated to the job (in GB), and $\epsilon$ is the PUE of the data center.

To estimate values for the power draw of the processors ( $P_c$ and $P_g$ ), the thermal design power (TDP) of the component is used. TDPs are sourced from the website of the relevant manufacturer (see Sources). The PBS workload manager used for Imperial's CX3 and HX1 systems is configured to allow the sharing of nodes between multiple jobs (there is no node exclusivity). This holds also for the CPUs, with cores being able to be distributed among concurrent jobs. The TDP of the full CPU component is therefore divided by the number of cores to yield an approximater per-core power draw, $P_c$ . For the CPU, the workload manager tracks the utilisation of the cores, $u_c$ , over the job runtime. This is used to scale the energy consumption due to the CPU. (Note that PBS reports the variable cput, which is the CPU core-time of a job, accounting for utilisation. This variable, equivalent to $t \times u_c$ , is used in the code, slightly changing the form of the energy calculation equation used in-code from that shown above).

For the GPU, exclusive use of the component by a job is assumed, so the full TDP is used for estimating power draw, $P_g$ . Since the workload managed is not configured to track the utilisation of the GPU, we assume 100% utilisation over the runtime of the job.

To estimate the power draw of memory (RAM), we follow the methodology laid out in (1). In that work, the authors describe how the power draw of memory is mainly dependent on the total quantity mobilised, rather than the amount actively in use or the nature of the workload (1, 2). Therefore, the amount of memory allocated to a job is used to determine the power draw due to memory, using a per-GB power of 0.3725 GB/W (1).

The method of estimating energy consumption based on compute resources assigned to the job may be compared to two alternative options (3, 4):

Hardware-based measurements (e.g., a physical power meter attached to the compute node or rack).
Software tools (e.g., Perf, PowerStat, CodeCarbon, which typically make use of Intel's RAPL interface under the hood).

Measuring energy consumption directly via hardware tools will generally lead to the most accurate values, with software tools being less accurate but typically more practical (3). Compared to both these methods, estimating energy consumption based on compute resource usage will tend to be even less accurate. However, it has two major practical advantages that motivated the adoption of this approach for carbon:

It is significantly less 'invasive', requiring no installation of additional hardware or software tools on the compute nodes/racks of the HPC cluster.
It can much more straightforwardly estimate the energy consumption associated with a particular user/process in a situation were a compute node may be shared between multiple users/processes.

In order to validate this approach, I am currently in the process of collating energy consumption estimates using carbon, with the aim of comparing/benchmarking these against energy usage statistics provided by the data center hosting the CX3 and HX1 clusters.

Estimating Emissions¤

An estimate of the carbon emissions associated with a compute job ( $G$ ) may be calculated by multiplying the estimated energy consumption of the job ( $E$ ), by the carbon intensity of the electrical energy ( $I$ ):

$G = E \times I. \tag{2}$

$I$ is dependent on the mixture of generation technologies used to produce the electrical energy supplied to the HPC cluster, and varies by time and location. For clusters based in the UK, we can use the Carbon Intensity API, run by the National Energy Systems Operator, to fetch $I$ for a given region and time. The region ID to use is set in the carbon config file (see a list of IDs here). The start time of the job is used for the timestamp for which $I$ is fetched.

Note that only CO2 emissions from electricity generation are reported by the API, meaning that other greenhouse gas emissions are neglected, along with other emissions due to indirect effects such as changes in land use.

The API call can be skipped in favour of a hardcoded carbon intensity using the flag --average-intensity. This value (137 CO2/kWh) is based on an average of the UK's carbon intensity over 2023 (149 CO2/kWh) and 2024 (125 CO2/kWh) (see Sources).

ToDo: Link to separate doc with details about renewable energy certification (REGOs?).

Calculation methodology¤

Gathering Compute Resources¤

Estimating Energy Consumption¤

Estimating Emissions¤

References¤