Since the introduction of ChatGPT, the Large Language Model (LLM) has evolved rapidly - GPT-4, LLama, Claude, Gemini, Mistral, Grok-1 - one new model after another is influencing the whole society. At the same time, the impact of LLM on the environment has also been hotly debated. At this stage, there are many studies or articles discussing the impact of AI/ML/LLM on the environment. A study at the University of Washington shows that training just one chatbot can consume a year's worth of electricity in a neighborhood. The actual carbon emissions generated behind the scenes are still not transparent today.
We delve into the carbon footprint of LLMs across their critical lifecycle stages: training and inference. Utilizing the Impact framework tool developed by the Green Software Foundation, we provide a detailed methodology for calculating the carbon emissions associated with LLM. We've crafted a series of manifest examples that illustrate the carbon emissions profile throughout the LLM's life cycle, offering a reference for your analysis. Throughout our research, we've organized relevant materials, including public data related to carbon emission calculations and papers related to LLM carbon emissions, to facilitate calculations and reference. Our aim is to equip you with the essential knowledge to approach the training and utilization of LLMs with an environmentally conscious mindset, fostering wise decision-making in the realm of AI.
-
Carbon Dioxide Equivalent (CO2e)
Carbon Dioxide Equivalent, or CO2e, is a measurement term used to describe different greenhouse gases in a common unit. CO2e is also often written as CO2eq, CO2-eq or CO2equivalent.
-
Carbon Intensity
Carbon intensity is measured in grams of carbon dioxide equivalents (CO2e) emitted per kilowatt-hour (KWh) of electricity generated. The standard unit of carbon intensity is gCO2eq/kWh. You can find our collected data source on here.
-
Embodied Carbon
Embodied carbon (also referred to as "embedded carbon") is the amount of carbon pollution emitted during the creation and disposal of a device.
-
Power usage effectiveness (PUE)
Power usage effectiveness (PUE) is a ratio to measure data center energy efficiency. For example, when the PUE is 1.43, this means that for every 1.43 units of energy used to run the entire data center, only 1 unit is effectively used for computing, with the remaining 0.43 units being used for non-computing purposes, such as cooling and lighting. You can find our collected data source on here.
-
Thermal Design Power (TDP)
The thermal design power (TDP) is the maximum amount of heat generated by a computer chip or component (often a CPU, GPU or system on a chip) that the cooling system in a computer is designed to dissipate under any workload. It will be used to estimate energy consumption for GPUs. You can find our collected data source on here.
-
Carbon emitted Per unit Area (CPA)
Carbon emitted Per unit Area (CPA) is used to quantify the embedded carbon of a chip, which depends on various semiconductor manufacturing parameters, including yield, energy consumption per unit area during the manufacturing process, emissions from chemicals used in hardware production, and emissions related to raw material procurement. The specific calculation formula is derived from Faiz et al., 2023. You can find our collected data source on here.
The total carbon footprint CO2eq
resulting from LLM processing consists of two main components: the operational carbon footprint and the embodied carbon footprint. Our calculations refer to the method of Narayanan et al. and Ahmad et al..
The operational carbon footprint refers to the carbon emissions generated during the day-to-day operations of the LLM. This includes the energy consumption required for training, inference, and other computational processes. The carbon emissions are produced mainly through the use of electricity to power the hardware infrastructure and the associated cooling systems.
The embodied carbon footprint represents the carbon emissions associated with the manufacturing, transportation, and disposal of the physical infrastructure used to support the LLM. This includes the carbon emissions generated during the production of servers, storage devices, and networking equipment, as well as the embodied energy in the materials used.
The total carbon footprint CO2eq
resulting from LLM processing is determined by
CO2eq = CO2eq_oper + CO2eq_emb
where CO2eq_oper
indicates the operational carbon footprint of the LLM,
and CO2eq_emb
denotes the embodied carbon footprint of the LLM.
The calculation processes for the training stage and inference stage are similar. Below, we will use the training stage as an example to describe the calculation process.
The total embodied carbon footprint CO2eq_emb
originating from all hardware units involved in LLM processing,
when each unit i
is assessed using the following
CO2eq_emb_i = (t_i * CO2eq_chip_i) / lifetime_i
where CO2eq_chip_i
denotes the chip’s embodied carbon footprint for hardware unit i
, lifetime_i
means the lifespan of hardware unit i
,
and t_i
represents the execution duration of hardware unit i
.
To quantify the chip’s embodied carbon footprint CO2eq_chip
within a specific hardware unit is calculated by
CO2eq_chip = area * CPA
where area
represents the chip’s area, CPA
means the carbon emitted per unit area.
The operational carbon footprint CO2eq_oper
attributed to LLM processing is calculated by
CO2eq_oper = energy_oper * carb_inten
energy_oper
includes the energy used for training, inference, and other computational processes involved in running the LLM. It takes into account the power consumption of the hardware infrastructure, including servers, networking equipment, and cooling systems.
carb_inten
refers to the carbon intensity of the specific data center where the LLM processing takes place. Carbon intensity represents the amount of carbon emissions associated with the energy generation and consumption in the data center. It takes into account factors such as the energy sources used (e.g., coal, natural gas, renewable energy), the efficiency of the energy generation, and any carbon offset or reduction measures in place.
By multiplying the operational energy with the carbon intensity, we can estimate the carbon emissions or carbon dioxide equivalent attributed to the operational phase of LLM processing. This calculation helps quantify the environmental impact and carbon footprint associated with the energy consumption during the operation of the LLM.
By multiplying the energy consumption of the computing hardware with the PUE of the specific data center, we can estimate the total energy consumed during LLM processing. This calculation takes into account the energy requirements of the hardware as well as the efficiency of the data center's infrastructure in delivering that energy to the IT equipment.
The operational energy energy_oper
associated with LLM processing can be calculated by
energy_oper = energy_hard * PUE
energy_hard
represents the energy consumed by the computing hardware within a data center. This includes the energy used by servers, storage devices, networking equipment, and other hardware components involved in LLM processing.
PUE
is a metric that quantifies the energy efficiency of a data center. It represents the ratio of the total energy consumed by the data center, including both IT equipment and supporting infrastructure (such as cooling systems and power distribution), to the energy consumed by the IT equipment alone.
The total energy energy_hard
consumed by all hardware units. The single unit i
energy energy_hard_i
consumed by
energy_hard_i = TDP_i * n_i * t_i
where
TDP_i
refers to the maximum amount of heat that a hardware unit i
is designed to dissipate under normal operating conditions;
n_i
indicates the count of hardware unit i
;
t_i
means the execution time of hardware unit i
;
Hardware units encompass a range of components, including CPUs, LLM computing devices, memories, SSDs, and others.
Hardware efficiency and training time are related in the context of machine learning and deep learning tasks. The training time can be estimated by the following:
- Total train FLOPs required by the model
- Benchmark of single GPU FLOPs
- Percent of peak device throughput as estimated using the regression equation
T = C / (n * FLOP_peak * eff)
where C
represents the computation required to train the transformer model, in total floating point operations, FLOP_peak
represents the device peak throughput, eff
represents efficiency of the device.
Hardware efficiency refers to how effectively the hardware resources are utilized to perform computations during the training process. Efficient hardware design and architecture can lead to faster and more optimized computations, resulting in shorter training times. It is calculated as the actual computing throughput divided by the peak throughput. The actual computing throughput is calculated as total floating point operations divided by execution time.
A linear regression using a 2nd order polynomial is fit on the throughput scaling data presented in the paper Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM.
The optimal parallelism setting is represented as p
,t
,d
,e
, where each variable corresponds to a degree of pipeline, tensor, data, and expert parallelism, respectively.
The efficiency eff_re
with re
devices can be calculated by
when re < n,
eff_re = (r_0 * re) / (n * eff_n)
when re > n,
eff_re = (r_1 * re) / (n * eff_n) + r_2 * re
, where r_0
, r_1
, r_2
are fitting constants, eff_n
means the highest hardware efficiency,
and n
indicates the number of devices that can achieve eff_n
. The number of devices required to achieve optimal hardware efficiency for dense LLM processing is calculated as
n = t ⋅ p ⋅ d
.
With l
transformer layers, hidden size h
, sequence length s
, vocabulary size V
, and training batch size B
,
a transformer layer consists of an attention block followed by a 2-layer feed-forward network. A 𝐴𝑚×𝑘 × 𝑋𝑘×𝑛 matrix multiplication requires 2𝑚 × 𝑘 × 𝑛 FLOPs (factor of 2 needed to account for multiplies and adds).
For the attention block, the main FLOP contributors are the key, query, and value transformation (6Bsh^2
operations),
attention matrix computation (2Bs^2h
operations), attention over values (2Bs^2h
operations),
and post-attention linear projection (2Bsh^2
operations). The feed-forward network increases the hidden size to 4h
and then reduces it back to h
; this requires 16Bsh^2
FLOPs.
Summing these together, each transformer layer results in 24Bsh^2 + 4Bs^2h
FLOPs for the forward pass.
The other main contributor to the FLOP count is the logit layer in the language model head, the required FLOPs for this operation is 2Bsh𝑉
in the forward pass and 4Bsh𝑉
in the backward pass, resulting in 6Bsh𝑉
FLOPs in total.
The backward pass requires double the number of FLOPs since need to calculate the gradients with respect to both input and weight tensors.
Thus, for a transformer model with l
transformer layers, the total number of floating-point operations is:
C = C_forward + C_backward ≈ 2PD + 4PD ≈ 6PD
with parameter size P
and the training dataset size D
(tokens).
The number of parameters in a model P
can be computed as:
P = 12lh^2 * [1 + 13/12h + (V + s)/(12lh)]
where number of layers l
, hidden size h
, vocabulary size V
, and sequence length s
.
The total carbon footprint calculation of inference is similar to training. Inference involves running the input data through the model's forward pass without performing any backward pass or gradient updates, thus the computation C_inference
is approximated as
C_inference ≈ 2P * D_inference
where D_inference
means inference dataset size (tokens).
Impact Framework (IF) is an Open Source tool being run inside the Green Software Foundation designed to assess the environmental impact of software across various components and settings, aiming to minimize the ecological footprint of software. To utilize IF, you simply need to create a manifest file, after which the IF takes care of the remaining processes. This manifest file provides essential context for calculating the environmental impact, outlining the application's architecture, the duration of observation, the sequence of calculations and transformations to be performed, and the specific environmental metrics to be monitored.
Here is the video explaining how IF works, it can help you better understand the capabilities of IF.
With the methodology outlined above for estimating LLM carbon emissions information, we can utilize the Impact Framework to assess the carbon footprint of the LLM. The Impact Framework offers a versatile and expandable framework for evaluating the carbon footprint of diverse computing activities, leveraging a variety of plugins to build upon the manifest.
Based on the basic the total carbon footprint equation CO2eq = CO2eq_oper + CO2eq_emb
, we can divide the total carbon footprint into two components: CO2eq_oper
,the operational footprint, and CO2eq_emb
, the embodied footprint.
The fundamental equation for CO2eq_oper
is CO2eq_oper = energy_oper * carb_inten
, where energy_oper
represents the energy utilized during the operation of the LLM, and carb_inten
denotes the carbon intensity of the energy consumed.
To derive energy_oper
, the Watt-hour formula energy_oper(Wh) = n * T * TDP * PUE
is employed. Hence, acquiring energy_oper
depends on the total time for training an LLM, n
number of GPUs plus the training time T
, the power consumption of the GPU (Thermal Design Power, TDP), and the Power Usage Effectiveness (PUE
).
The final equation for operational footprint is: CO2eq_oper = n * T * TDP * PUE * carb_inten
From the information provided, we observe that the embodied emissions for each hardware unit are calculated using the formula: CO2eq_emb_i = (t_i * CO2eq_chip_i) / lifetime_i
, where t_i
represents the execution duration of the hardware unit, which equates to the total time required for training an LLM. CO2eq_chip_i
denotes the CO2 emissions per chip, and lifetime_i
indicates the expected lifespan of the hardware unit. The chip’s embodied carbon footprint CO2eq_chip_i
within a specific hardware unit is calculated by CO2eq_chip_i = area_i * CPA_i
.
The total embodied emissions for training an LLM, denoted as CO2eq_emb
, are computed as the sum of CO2eq_emb_i
values for each hardware unit involved in the process. This is expressed by the formula: CO2eq_emb = sum(CO2eq_emb_i)
, where CO2eq_emb_i
represents the embodied emissions of each respective hardware unit. In essence, the hardware units encompass GPU, CPU, SSD, and DRAM. Thus, the aggregate embodied emissions for training an LLM can be articulated as: CO2eq_emb = sum(CO2eq_emb_GPU, CO2eq_emb_CPU, CO2eq_emb_SSD, CO2eq_emb_DRAM)
.
The manifest for LLM carbon emissions includes the following components:
CO2eq_oper
: Total operational emissions for training an LLM. Since the equation for operational footprint is:CO2eq_oper = n * T * TDP * PUE * carb_inten
. We can use the IF official plugin Multiply method to calculate.
name: llm basic operational emissions manifest
description:
"
CO2eq_oper = n * T * TDP * PUE * carb_inten
T: training hour(training_hour)
n: number of gpus(gpu/num)
TDP: power consumption of the GPU(gpu/tdp)
PUE: Power Usage Effectiveness(pue)
carb_inten: carbon intensity of the energy consumed(carb_inten)
"
tags:
initialize:
plugins:
training-operation-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'training_hour', 'gpu/tdp', 'pue', 'carb_inten']
output-parameter: 'operation-carbon'
tree:
children:
child:
pipeline:
- training-operation-carbon-multiply
inputs:
- timestamp:
gpu/num: # the number of GPUs used for training LLM
training_hour: # the total training hours includes training and inference
gpu/tdp: # kWh
# the power consumption per GPU per hour
pue:
carb_inten: # CO2eq/KWh
# the carbon intensity of training region
CO2eq_emb
: Total embodied emissions for training an LLM. For each hardware uniti
the embodied emissions is calculated using the following equation:CO2eq_emb_i = (t_i * area_i * CPA_i) / lifetime_i
.
We can utilize the Multiply method and the Divide method to calculate the embodied emissions for each hardware unit. We will use the Sum method to calculate the total embodied emissions. The equation used in the manifest is: CO2eq_emb = sum(CO2eq_emb_GPU, CO2eq_emb_CPU, CO2eq_emb_SSD, CO2eq_emb_DRAM)
.
name: llm basic embodied emissions manifest
description:
"
CO2eq_emb = sum(CO2eq_emb_GPU, CO2eq_emb_CPU, CO2eq_emb_SSD, CO2eq_emb_DRAM)
CO2eq_emb_i = (t_i * area_i * CPA_i) / lifetime_i
"
tags:
initialize:
plugins:
device-expected-lifespan-hours-per-year-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['expected-lifespan', 'days-per-year', 'hours-per-day']
output-parameter: 'expected-lifespan-duration'
reserved-device-hour-with-device-expected-lifespan-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'training_hour'
denominator: 'expected-lifespan-duration'
output: 'expected-lifespan-rate'
gpu-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'expected-lifespan-rate','gpu/cap', 'gpu/area']
output-parameter: 'gpu-carbon-embodied'
cpu-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['cpu/num', 'expected-lifespan-rate','cpu/cap', 'cpu/area']
output-parameter: 'cpu-carbon-embodied'
ssd-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['ssd/num', 'expected-lifespan-rate', 'ssd/cap', 'ssd/area']
output-parameter: 'ssd-carbon-embodied'
dram-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['dram/num', 'expected-lifespan-rate', 'dram/cap', 'dram/area']
output-parameter: 'dram-carbon-embodied'
embodied-carbon-sum:
method: Sum
path: '@grnsft/if-plugins'
global-config:
input-parameters: [ 'gpu-carbon-embodied', 'cpu-carbon-embodied', 'ssd-carbon-embodied', 'dram-carbon-embodied' ]
output-parameter: 'carbon-embodied'
tree:
children:
child:
pipeline:
- device-expected-lifespan-hours-per-year-multiply
- reserved-device-hour-with-device-expected-lifespan-divide
- gpu-embodied-carbon-multiply
- cpu-embodied-carbon-multiply
- ssd-embodied-carbon-multiply
- dram-embodied-carbon-multiply
- embodied-carbon-sum
defaults:
thousands-per-unit: 0.001
days-per-year: 365
hours-per-day: 24
seconds-per-hour: 3600
expected-lifespan: # year
# To keep the manifest file simple, we use one `expected-lifespan` for all the components.
inputs:
- timestamp:
training_hour:
gpu/num:
gpu/cap: # kgC02/cm2
gpu/area: # cm2
cpu/num:
cpu/cap: # kgC02/cm2
cpu/area: # cm2
ssd/num:
ssd/cap: # kgC02/GB
ssd/area: # GB
dram/num:
dram/cap: # kgC02/GB
dram/area: # GB
In addition to the fundamental calculation approach, the IF official plugin offers the SCI-M method for calculating the embodied emissions. This method can be employed to determine the embodied emissions of the hardware unit.
name: sci-m example
description: calculate the embodied emissions for the hardware unit
tags:
initialize:
plugins:
sci-m:
method: SciM
path: '@grnsft/if-plugins'
tree:
children:
child:
pipeline:
- sci-m
defaults:
device/emissions-embodied: # gCO2eq CO2eq_chip_i
device/expected-lifespan: # years in seconds
resources-reserved:
resources-total:
inputs:
- timestamp:
duration: # seconds
# the execution duration of the hardware unit
Sometimes we want to estimate an existing LLM model's emissions and find that we don't have the exact values for the training hours. In this case, we can use the methodologies from the above section to calculate the estimated emissions.
Basically, the operational emissions of an LLM model combines the training emissions and the inference emissions.
To get the operational emissions, we need to estimate the training hours and the inference hours based on the equation T = C / ( n * FLOP_peak * eff)
. Where C
represents the computation required, in total floating point operations, FLOP_peak
represents the device peak throughput, eff
represents efficiency of the device.
For the computation required for training, we can use the formula C_train ≈ 6PD
with parameter size P
and the training dataset size D
(tokens). For the computation required for inference, we can use the formula C_inference ≈ 2P * D_inference
, where D_inference
means inference dataset size (tokens).
name: llm emissions manifest with estimated training time
description:
tags:
initialize:
plugins:
estimate-total-compute-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['flop-count-factor', 'modal/parameters-count', 'modal/tokens-count' ]
output-parameter: 'estimate-total-compute'
estimate-compute-per-second-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'gpu/flop_peak', 'hardware-efficiency']
output-parameter: 'estimate-compute-per-second'
estimate-time-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-total-compute'
denominator: 'estimate-compute-per-second'
output: 'estimate-time-second'
estimate-operation-hour-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-time-second'
denominator: 'seconds-per-hour'
output: 'estimate-operation-hour'
operation-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'estimate-operation-hour', 'gpu/tdp', 'pue', 'carb_inten']
output-parameter: 'operation-carbon'
device-expected-lifespan-hours-per-year-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['expected-lifespan', 'days-per-year', 'hours-per-day']
output-parameter: 'expected-lifespan-duration'
reserved-device-hour-with-device-expected-lifespan-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-operation-hour'
denominator: 'expected-lifespan-duration'
output: 'expected-lifespan-rate'
gpu-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'expected-lifespan-rate','gpu/cap', 'gpu/area']
output-parameter: 'gpu-carbon-embodied'
cpu-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate','cpu/cap', 'cpu/area']
output-parameter: 'cpu-carbon-embodied'
ssd-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'ssd/cap', 'ssd/area']
output-parameter: 'ssd-carbon-embodied'
dram-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'dram/cap', 'dram/area']
output-parameter: 'dram-carbon-embodied'
embodied-carbon-sum:
method: Sum
path: '@grnsft/if-plugins'
global-config:
input-parameters: [ 'gpu-carbon-embodied', 'cpu-carbon-embodied', 'ssd-carbon-embodied', 'dram-carbon-embodied' ]
output-parameter: 'carbon-embodied'
llm-carbon-sum:
method: Sum
path: '@grnsft/if-plugins'
global-config:
input-parameters: [ 'carbon-embodied', 'operation-carbon']
output-parameter: 'total-carbon'
tree:
children:
operational-carbon:
pipeline:
- estimate-total-compute-multiply
- estimate-compute-per-second-multiply
- estimate-time-divide
- estimate-operation-hour-divide
- operation-carbon-multiply
- device-expected-lifespan-hours-per-year-multiply
- reserved-device-hour-with-device-expected-lifespan-divide
- gpu-embodied-carbon-multiply
- cpu-embodied-carbon-multiply
- ssd-embodied-carbon-multiply
- dram-embodied-carbon-multiply
- embodied-carbon-sum
- llm-carbon-sum
defaults:
flop-count-factor: 6 # use 6 for training phase C_train ≈ 6PD, while use 2 for inference phase C_infer ≈ 2P*D_infer
thousands-per-unit: 0.001
days-per-year: 365
hours-per-day: 24
seconds-per-hour: 3600
expected-lifespan: 5 # 5 years in seconds.
inputs:
- gpu/num: # the number of GPUs used for training LLM
gpu/tdp: # kWh the power consumption per GPU per hour
gpu/flop_peak:
hardware-efficiency:
modal/parameters-count:
modal/tokens-count:
pue:
carb_inten: # CO2eq/KWh the carbon intensity of training region
gpu/cap: # kgC02/cm2
gpu/area: # cm2
cpu/cap: # kgC02/cm2
cpu/area: # cm2
ssd/area: # GB
dram/cap: # kgC02/GB
dram/area: # GB
hardware-unit-num: # gpu_num / 8 assuming one CPU, SSD, DRAM for every 8 GPU/TPU chip or one server stack
Since efficient processing of LLMs relies on achieving high eff
, which is calculated as the actual computing throughput X
divided by the peak throughput FLOP_peak
. The equation for estimating T
can be written as follows: T = C / ( n * X )
. Using too few or too many devices or improperly configuring parallelism can lead to reduced hardware efficiency. To get the estimated computing throughput X
from parameter size P
, we can use the estimated regression coefficients used for polynomial fit X = aP^2 + bP + c
. When the expert parallelism e
equals one, which means your GPU memory is capable of store all the parameters, the estimated regression coefficients for X
are a = -8.82079068e-2 , b = 1.68591116, c = 1.33954735e+02
. Otherwise the estimated regression coefficients for X
are a = -5.60233749e-5, b = 8.45435587e-2, c = 1.34546129e+02
name: llm emissions manifest with estimated training time
description:
tags:
initialize:
plugins:
estimate-total-compute-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['flop-count-factor', 'modal/parameters-count', 'modal/tokens-count' ]
output-parameter: 'estimate-total-compute'
estimate-compute-per-second-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'modal/estimated-throughput']
output-parameter: 'estimate-compute-per-second'
estimate-time-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-total-compute'
denominator: 'estimate-compute-per-second'
output: 'estimate-time-second'
estimate-operation-hour-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-time-second'
denominator: 'seconds-per-hour'
output: 'estimate-operation-hour'
operation-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'estimate-operation-hour', 'gpu/tdp', 'pue', 'carb_inten']
output-parameter: 'operation-carbon'
device-expected-lifespan-hours-per-year-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['expected-lifespan', 'days-per-year', 'hours-per-day']
output-parameter: 'expected-lifespan-duration'
reserved-device-hour-with-device-expected-lifespan-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-operation-hour'
denominator: 'expected-lifespan-duration'
output: 'expected-lifespan-rate'
gpu-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'expected-lifespan-rate','gpu/cap', 'gpu/area']
output-parameter: 'gpu-carbon-embodied'
cpu-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate','cpu/cap', 'cpu/area']
output-parameter: 'cpu-carbon-embodied'
ssd-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'ssd/cap', 'ssd/area']
output-parameter: 'ssd-carbon-embodied'
dram-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'dram/cap', 'dram/area']
output-parameter: 'dram-carbon-embodied'
embodied-carbon-sum:
method: Sum
path: '@grnsft/if-plugins'
global-config:
input-parameters: [ 'gpu-carbon-embodied', 'cpu-carbon-embodied', 'ssd-carbon-embodied', 'dram-carbon-embodied' ]
output-parameter: 'carbon-embodied'
llm-carbon-sum:
method: Sum
path: '@grnsft/if-plugins'
global-config:
input-parameters: [ 'carbon-embodied', 'operation-carbon']
output-parameter: 'total-carbon'
tree:
children:
operational-carbon:
pipeline:
- estimate-total-compute-multiply
- estimate-compute-per-second-multiply
- estimate-time-divide
- estimate-operation-hour-divide
- operation-carbon-multiply
- device-expected-lifespan-hours-per-year-multiply
- reserved-device-hour-with-device-expected-lifespan-divide
- gpu-embodied-carbon-multiply
- cpu-embodied-carbon-multiply
- ssd-embodied-carbon-multiply
- dram-embodied-carbon-multiply
- embodied-carbon-sum
- llm-carbon-sum
defaults:
flop-count-factor: 6 # use 6 for training phase C_train ≈ 6PD, while use 2 for inference phase C_infer ≈ 2P*D_infer
thousands-per-unit: 0.001
days-per-year: 365
hours-per-day: 24
seconds-per-hour: 3600
expected-lifespan: 5 # 5 years in seconds.
inputs:
- gpu/num: # the number of GPUs used for training LLM
gpu/tdp: # kWh the power consumption per GPU per hour
modal/parameters-count:
modal/tokens-count: #
modal/estimated-throughput: # X tokens/s
pue:
carb_inten: # CO2eq/KWh the carbon intensity of training region
gpu/cap: # kgC02/cm2
gpu/area: # cm2
cpu/cap: # kgC02/cm2
cpu/area: # cm2
ssd/area: # GB
dram/cap: # kgC02/GB
dram/area: # GB
hardware-unit-num: # gpu_num / 8 assuming one CPU, SSD, DRAM for every 8 GPU/TPU chip or one server stack
Based on the estimated T
, we can use the same equation to calculate the embodied emissions:
CO2eq_emb = sum(CO2eq_emb_GPU, CO2eq_emb_CPU, CO2eq_emb_SSD, CO2eq_emb_DRAM)
CO2eq_emb_i = (t_i * area_i * CPA_i) / lifetime_i
Reference from Meta’s report, the data centers for training LLM achieve an average utilization rate of 60% throughout the 5-year lifespan of hardware units. We can assume the expected-lifespan
as 5 years.
For the hardware unit number, we assume one SSD, DRAM, CPU for every 8 GPU/TPU chips or one server stack.
name: llm embodied emissions manifest with estimated training time
description:
tags:
initialize:
plugins:
estimate-total-compute-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['flop-count-factor', 'modal/parameters-count', 'modal/tokens-count' ]
output-parameter: 'estimate-total-compute'
estimate-compute-per-second-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'modal/estimated-throughput']
output-parameter: 'estimate-compute-per-second'
estimate-training-time-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-total-compute'
denominator: 'estimate-compute-per-second'
output: 'estimate-time-second'
estimate-operation-hour-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-time-second'
denominator: 'seconds-per-hour'
output: 'estimate-operation-hour'
device-expected-lifespan-hours-per-year-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['expected-lifespan', 'days-per-year', 'hours-per-day']
output-parameter: 'expected-lifespan-duration'
reserved-device-hour-with-device-expected-lifespan-divide:
method: Divide
path: '@grnsft/if-plugins'
global-config:
numerator: 'estimate-operation-hour'
denominator: 'expected-lifespan-duration'
output: 'expected-lifespan-rate'
gpu-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['gpu/num', 'expected-lifespan-rate','gpu/cap', 'gpu/area']
output-parameter: 'gpu-carbon-embodied'
cpu-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate','cpu/cap', 'cpu/area']
output-parameter: 'cpu-carbon-embodied'
ssd-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'ssd/cap', 'ssd/area']
output-parameter: 'ssd-carbon-embodied'
dram-embodied-carbon-multiply:
method: Multiply
path: '@grnsft/if-plugins'
global-config:
input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'dram/cap', 'dram/area']
output-parameter: 'dram-carbon-embodied'
embodied-carbon-sum:
method: Sum
path: '@grnsft/if-plugins'
global-config:
input-parameters: [ 'gpu-carbon-embodied', 'cpu-carbon-embodied', 'ssd-carbon-embodied', 'dram-carbon-embodied' ]
output-parameter: 'carbon-embodied'
tree:
children:
child:
pipeline:
- estimate-total-compute-multiply
- estimate-compute-per-second-multiply
- estimate-time-divide
- estimate-operation-hour-divide
- operation-carbon-multiply
- device-expected-lifespan-hours-per-year-multiply
- reserved-device-hour-with-device-expected-lifespan-divide
- gpu-embodied-carbon-multiply
- cpu-embodied-carbon-multiply
- ssd-embodied-carbon-multiply
- dram-embodied-carbon-multiply
- embodied-carbon-sum
defaults:
flop-count-factor: 6 # use 6 for training phase C_train ≈ 6PD, while use 2 for inference phase C_infer ≈ 2P*D_infer
thousands-per-unit: 0.001
days-per-year: 365
hours-per-day: 24
seconds-per-hour: 3600
expected-lifespan: 5 # 5 years in seconds.
# Meta’s data centers achieve an average utilization rate of 60% throughout the 5-year lifespan of hardware units
# ref: Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga, Jinshi Huang, Charles Bai, et al. Sustainable ai: Environmental implications, challenges and opportunities. Proceedings of Machine Learning and Systems, 4:795–813, 2022.
inputs:
- gpu/num:
gpu/cap: # kgC02/cm2
gpu/area: # cm2
cpu/cap: # kgC02/cm2
cpu/area: # cm2
ssd/area: # GB
dram/cap: # kgC02/GB
dram/area: # GB
hardware-unit-num:
# assuming one CPU, SSD, DRAM for every 8 GPU/TPU chip or one server stack
# gpu/num / 8
modal/parameters-count:
modal/tokens-count:
modal/estimated-throughput: # X tokens/s
All of the above manifests can be found on our GitHub, and you can refer to the README's usage section for guidance on how to run them.
In this article, we explore the swift development of Large Language Models (LLMs) and the ongoing discussions surrounding their environmental impact. We detail the process of estimating carbon emissions during the training and inference stages of LLMs, utilizing the Impact framework tool to present a range of manifest examples that offer varying degrees of detailed emission estimates. These methods, alongside the manifest's input variables, enable a comparative analysis of the carbon footprint associated with different LLM configurations, encouraging efforts to minimize emissions. Additionally, we have assembled relevant materials, including public data related to carbon emission calculations and papers related to LLM carbon emissions, designed to streamline the process for users. By accurately quantifying the carbon emissions of LLMs and enhancing our understanding of their energy consumption, we are optimistic that a sustainable future, where AI advancements and environmental conservation are interwoven, is within reach.
Let's join hands to promote the green development of LLMs and create a brighter future together!