Skip to content

Latest commit

 

History

History
763 lines (682 loc) · 40.8 KB

content.md

File metadata and controls

763 lines (682 loc) · 40.8 KB

Sustainable Conversations: Evaluating LLM Carbon Footprints with Impact Framework

Introduction

Since the introduction of ChatGPT, the Large Language Model (LLM) has evolved rapidly - GPT-4, LLama, Claude, Gemini, Mistral, Grok-1 - one new model after another is influencing the whole society. At the same time, the impact of LLM on the environment has also been hotly debated. At this stage, there are many studies or articles discussing the impact of AI/ML/LLM on the environment. A study at the University of Washington shows that training just one chatbot can consume a year's worth of electricity in a neighborhood. The actual carbon emissions generated behind the scenes are still not transparent today.

We delve into the carbon footprint of LLMs across their critical lifecycle stages: training and inference. Utilizing the Impact framework tool developed by the Green Software Foundation, we provide a detailed methodology for calculating the carbon emissions associated with LLM. We've crafted a series of manifest examples that illustrate the carbon emissions profile throughout the LLM's life cycle, offering a reference for your analysis. Throughout our research, we've organized relevant materials, including public data related to carbon emission calculations and papers related to LLM carbon emissions, to facilitate calculations and reference. Our aim is to equip you with the essential knowledge to approach the training and utilization of LLMs with an environmentally conscious mindset, fostering wise decision-making in the realm of AI.

Methodology for evaluating LLM carbon emissions

Basic of Energy and CO2e

  • Carbon Dioxide Equivalent (CO2e)

    Carbon Dioxide Equivalent, or CO2e, is a measurement term used to describe different greenhouse gases in a common unit. CO2e is also often written as CO2eq, CO2-eq or CO2equivalent.

  • Carbon Intensity

    Carbon intensity is measured in grams of carbon dioxide equivalents (CO2e) emitted per kilowatt-hour (KWh) of electricity generated. The standard unit of carbon intensity is gCO2eq/kWh. You can find our collected data source on here.

  • Embodied Carbon

    Embodied carbon (also referred to as "embedded carbon") is the amount of carbon pollution emitted during the creation and disposal of a device.

  • Power usage effectiveness (PUE)

    Power usage effectiveness (PUE) is a ratio to measure data center energy efficiency. For example, when the PUE is 1.43, this means that for every 1.43 units of energy used to run the entire data center, only 1 unit is effectively used for computing, with the remaining 0.43 units being used for non-computing purposes, such as cooling and lighting. You can find our collected data source on here.

  • Thermal Design Power (TDP)

    The thermal design power (TDP) is the maximum amount of heat generated by a computer chip or component (often a CPU, GPU or system on a chip) that the cooling system in a computer is designed to dissipate under any workload. It will be used to estimate energy consumption for GPUs. You can find our collected data source on here.

  • Carbon emitted Per unit Area (CPA)

    Carbon emitted Per unit Area (CPA) is used to quantify the embedded carbon of a chip, which depends on various semiconductor manufacturing parameters, including yield, energy consumption per unit area during the manufacturing process, emissions from chemicals used in hardware production, and emissions related to raw material procurement. The specific calculation formula is derived from Faiz et al., 2023. You can find our collected data source on here.

Total Carbon Footprint - Training stage

The total carbon footprint CO2eq resulting from LLM processing consists of two main components: the operational carbon footprint and the embodied carbon footprint. Our calculations refer to the method of Narayanan et al. and Ahmad et al..

The operational carbon footprint refers to the carbon emissions generated during the day-to-day operations of the LLM. This includes the energy consumption required for training, inference, and other computational processes. The carbon emissions are produced mainly through the use of electricity to power the hardware infrastructure and the associated cooling systems.

The embodied carbon footprint represents the carbon emissions associated with the manufacturing, transportation, and disposal of the physical infrastructure used to support the LLM. This includes the carbon emissions generated during the production of servers, storage devices, and networking equipment, as well as the embodied energy in the materials used.

The total carbon footprint CO2eq resulting from LLM processing is determined by

CO2eq = CO2eq_oper + CO2eq_emb

where CO2eq_oper indicates the operational carbon footprint of the LLM, and CO2eq_emb denotes the embodied carbon footprint of the LLM.

The calculation processes for the training stage and inference stage are similar. Below, we will use the training stage as an example to describe the calculation process.

Embodied carbon footprint

The total embodied carbon footprint CO2eq_emb originating from all hardware units involved in LLM processing, when each unit i is assessed using the following

CO2eq_emb_i = (t_i * CO2eq_chip_i) / lifetime_i

where CO2eq_chip_i denotes the chip’s embodied carbon footprint for hardware unit i, lifetime_i means the lifespan of hardware unit i, and t_i represents the execution duration of hardware unit i.

To quantify the chip’s embodied carbon footprint CO2eq_chip within a specific hardware unit is calculated by

CO2eq_chip = area * CPA

where area represents the chip’s area, CPA means the carbon emitted per unit area.

Operational carbon footprint

The operational carbon footprint CO2eq_oper attributed to LLM processing is calculated by

CO2eq_oper = energy_oper * carb_inten

energy_oper includes the energy used for training, inference, and other computational processes involved in running the LLM. It takes into account the power consumption of the hardware infrastructure, including servers, networking equipment, and cooling systems.

carb_inten refers to the carbon intensity of the specific data center where the LLM processing takes place. Carbon intensity represents the amount of carbon emissions associated with the energy generation and consumption in the data center. It takes into account factors such as the energy sources used (e.g., coal, natural gas, renewable energy), the efficiency of the energy generation, and any carbon offset or reduction measures in place.

By multiplying the operational energy with the carbon intensity, we can estimate the carbon emissions or carbon dioxide equivalent attributed to the operational phase of LLM processing. This calculation helps quantify the environmental impact and carbon footprint associated with the energy consumption during the operation of the LLM.

Operational energy

By multiplying the energy consumption of the computing hardware with the PUE of the specific data center, we can estimate the total energy consumed during LLM processing. This calculation takes into account the energy requirements of the hardware as well as the efficiency of the data center's infrastructure in delivering that energy to the IT equipment.

The operational energy energy_oper associated with LLM processing can be calculated by

energy_oper = energy_hard * PUE

energy_hard represents the energy consumed by the computing hardware within a data center. This includes the energy used by servers, storage devices, networking equipment, and other hardware components involved in LLM processing.

PUE is a metric that quantifies the energy efficiency of a data center. It represents the ratio of the total energy consumed by the data center, including both IT equipment and supporting infrastructure (such as cooling systems and power distribution), to the energy consumed by the IT equipment alone.

Hardware energy

The total energy energy_hard consumed by all hardware units. The single unit i energy energy_hard_i consumed by

energy_hard_i = TDP_i * n_i * t_i

where
TDP_i refers to the maximum amount of heat that a hardware unit i is designed to dissipate under normal operating conditions;
n_i indicates the count of hardware unit i;
t_imeans the execution time of hardware unit i;
Hardware units encompass a range of components, including CPUs, LLM computing devices, memories, SSDs, and others.

Training time

Hardware efficiency and training time are related in the context of machine learning and deep learning tasks. The training time can be estimated by the following:

  • Total train FLOPs required by the model
  • Benchmark of single GPU FLOPs
  • Percent of peak device throughput as estimated using the regression equation
T = C / (n * FLOP_peak * eff)

where C represents the computation required to train the transformer model, in total floating point operations, FLOP_peak represents the device peak throughput, eff represents efficiency of the device.

Hardware efficiency

Hardware efficiency refers to how effectively the hardware resources are utilized to perform computations during the training process. Efficient hardware design and architecture can lead to faster and more optimized computations, resulting in shorter training times. It is calculated as the actual computing throughput divided by the peak throughput. The actual computing throughput is calculated as total floating point operations divided by execution time.

Hardware efficiency estimation

A linear regression using a 2nd order polynomial is fit on the throughput scaling data presented in the paper Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. The optimal parallelism setting is represented as p,t,d,e, where each variable corresponds to a degree of pipeline, tensor, data, and expert parallelism, respectively. The efficiency eff_re with re devices can be calculated by

when re < n,
eff_re = (r_0 * re) / (n * eff_n)

when re > n,
eff_re = (r_1 * re) / (n * eff_n) + r_2 * re

, where r_0, r_1, r_2are fitting constants, eff_nmeans the highest hardware efficiency, and nindicates the number of devices that can achieve eff_n. The number of devices required to achieve optimal hardware efficiency for dense LLM processing is calculated as n = t ⋅ p ⋅ d.

Floating point operations

With l transformer layers, hidden size h, sequence length s, vocabulary size V, and training batch size B, a transformer layer consists of an attention block followed by a 2-layer feed-forward network. A 𝐴𝑚×𝑘 × 𝑋𝑘×𝑛 matrix multiplication requires 2𝑚 × 𝑘 × 𝑛 FLOPs (factor of 2 needed to account for multiplies and adds).

For the attention block, the main FLOP contributors are the key, query, and value transformation (6Bsh^2 operations), attention matrix computation (2Bs^2h operations), attention over values (2Bs^2h operations), and post-attention linear projection (2Bsh^2 operations). The feed-forward network increases the hidden size to 4h and then reduces it back to h; this requires 16Bsh^2 FLOPs. Summing these together, each transformer layer results in 24Bsh^2 + 4Bs^2h FLOPs for the forward pass.

The other main contributor to the FLOP count is the logit layer in the language model head, the required FLOPs for this operation is 2Bsh𝑉 in the forward pass and 4Bsh𝑉 in the backward pass, resulting in 6Bsh𝑉 FLOPs in total.

The backward pass requires double the number of FLOPs since need to calculate the gradients with respect to both input and weight tensors.

Thus, for a transformer model with l transformer layers, the total number of floating-point operations is:

C = C_forward + C_backward ≈ 2PD + 4PD ≈ 6PD

with parameter size P and the training dataset size D (tokens).

Parameters size

The number of parameters in a model P can be computed as:

P = 12lh^2 * [1 + 13/12h + (V + s)/(12lh)]

where number of layers l, hidden size h, vocabulary size V, and sequence length s.

Total Carbon Footprint - Inference stage

The total carbon footprint calculation of inference is similar to training. Inference involves running the input data through the model's forward pass without performing any backward pass or gradient updates, thus the computation C_inference is approximated as

C_inference ≈ 2P * D_inference

where D_inference means inference dataset size (tokens).

Using Impact Framework for estimation

What is Impact Framework

Impact Framework (IF) is an Open Source tool being run inside the Green Software Foundation designed to assess the environmental impact of software across various components and settings, aiming to minimize the ecological footprint of software. To utilize IF, you simply need to create a manifest file, after which the IF takes care of the remaining processes. This manifest file provides essential context for calculating the environmental impact, outlining the application's architecture, the duration of observation, the sequence of calculations and transformations to be performed, and the specific environmental metrics to be monitored.

Here is the video explaining how IF works, it can help you better understand the capabilities of IF.

impact framework explainer video)

With the methodology outlined above for estimating LLM carbon emissions information, we can utilize the Impact Framework to assess the carbon footprint of the LLM. The Impact Framework offers a versatile and expandable framework for evaluating the carbon footprint of diverse computing activities, leveraging a variety of plugins to build upon the manifest.

Basic Manifest for LLM carbon emissions

Based on the basic the total carbon footprint equation CO2eq = CO2eq_oper + CO2eq_emb, we can divide the total carbon footprint into two components: CO2eq_oper,the operational footprint, and CO2eq_emb, the embodied footprint.

Basic operational footprint equation and variables

The fundamental equation for CO2eq_oper is CO2eq_oper = energy_oper * carb_inten, where energy_oper represents the energy utilized during the operation of the LLM, and carb_inten denotes the carbon intensity of the energy consumed.

To derive energy_oper, the Watt-hour formula energy_oper(Wh) = n * T * TDP * PUE is employed. Hence, acquiring energy_oper depends on the total time for training an LLM, n number of GPUs plus the training time T, the power consumption of the GPU (Thermal Design Power, TDP), and the Power Usage Effectiveness (PUE).

The final equation for operational footprint is: CO2eq_oper = n * T * TDP * PUE * carb_inten

Basic embodied footprint equation and variables

From the information provided, we observe that the embodied emissions for each hardware unit are calculated using the formula: CO2eq_emb_i = (t_i * CO2eq_chip_i) / lifetime_i, where t_i represents the execution duration of the hardware unit, which equates to the total time required for training an LLM. CO2eq_chip_i denotes the CO2 emissions per chip, and lifetime_i indicates the expected lifespan of the hardware unit. The chip’s embodied carbon footprint CO2eq_chip_i within a specific hardware unit is calculated by CO2eq_chip_i = area_i * CPA_i.

The total embodied emissions for training an LLM, denoted as CO2eq_emb, are computed as the sum of CO2eq_emb_i values for each hardware unit involved in the process. This is expressed by the formula: CO2eq_emb = sum(CO2eq_emb_i), where CO2eq_emb_i represents the embodied emissions of each respective hardware unit. In essence, the hardware units encompass GPU, CPU, SSD, and DRAM. Thus, the aggregate embodied emissions for training an LLM can be articulated as: CO2eq_emb = sum(CO2eq_emb_GPU, CO2eq_emb_CPU, CO2eq_emb_SSD, CO2eq_emb_DRAM).

Dive in manifest

The manifest for LLM carbon emissions includes the following components:

  1. CO2eq_oper: Total operational emissions for training an LLM. Since the equation for operational footprint is: CO2eq_oper = n * T * TDP * PUE * carb_inten. We can use the IF official plugin Multiply method to calculate.
name: llm basic operational emissions manifest
description:
  "
  CO2eq_oper =  n * T * TDP * PUE * carb_inten
  
  T: training hour(training_hour)
  n: number of gpus(gpu/num)
  TDP: power consumption of the GPU(gpu/tdp)
  PUE: Power Usage Effectiveness(pue)
  carb_inten: carbon intensity of the energy consumed(carb_inten)
  "
tags:
initialize:
  plugins:
    training-operation-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'training_hour', 'gpu/tdp', 'pue', 'carb_inten']
        output-parameter: 'operation-carbon'
tree:
  children:
    child:
      pipeline:
        - training-operation-carbon-multiply
      inputs:
        - timestamp: 
          gpu/num: # the number of GPUs used for training LLM
          training_hour: # the total training hours includes training and inference
          gpu/tdp:  # kWh 
          # the power consumption per GPU per hour
          pue: 
          carb_inten: # CO2eq/KWh 
          # the carbon intensity of training region
  1. CO2eq_emb: Total embodied emissions for training an LLM. For each hardware unit i the embodied emissions is calculated using the following equation: CO2eq_emb_i = (t_i * area_i * CPA_i) / lifetime_i.

We can utilize the Multiply method and the Divide method to calculate the embodied emissions for each hardware unit. We will use the Sum method to calculate the total embodied emissions. The equation used in the manifest is: CO2eq_emb = sum(CO2eq_emb_GPU, CO2eq_emb_CPU, CO2eq_emb_SSD, CO2eq_emb_DRAM).

name: llm basic embodied emissions manifest
description:
  " 
  CO2eq_emb = sum(CO2eq_emb_GPU, CO2eq_emb_CPU, CO2eq_emb_SSD, CO2eq_emb_DRAM)
  CO2eq_emb_i = (t_i * area_i * CPA_i) / lifetime_i
  "
tags:
initialize:
  plugins:
    device-expected-lifespan-hours-per-year-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['expected-lifespan', 'days-per-year', 'hours-per-day']
        output-parameter: 'expected-lifespan-duration'
    reserved-device-hour-with-device-expected-lifespan-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'training_hour'
        denominator: 'expected-lifespan-duration'
        output: 'expected-lifespan-rate'
    gpu-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'expected-lifespan-rate','gpu/cap', 'gpu/area']
        output-parameter: 'gpu-carbon-embodied'
    cpu-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['cpu/num', 'expected-lifespan-rate','cpu/cap', 'cpu/area']
        output-parameter: 'cpu-carbon-embodied'   
    ssd-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['ssd/num', 'expected-lifespan-rate', 'ssd/cap', 'ssd/area']
        output-parameter: 'ssd-carbon-embodied'       
    dram-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['dram/num', 'expected-lifespan-rate', 'dram/cap', 'dram/area']
        output-parameter: 'dram-carbon-embodied'      
    embodied-carbon-sum:
      method: Sum
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: [ 'gpu-carbon-embodied', 'cpu-carbon-embodied', 'ssd-carbon-embodied', 'dram-carbon-embodied' ]
        output-parameter: 'carbon-embodied'
tree:
  children:
    child:
      pipeline:
        - device-expected-lifespan-hours-per-year-multiply
        - reserved-device-hour-with-device-expected-lifespan-divide
        - gpu-embodied-carbon-multiply
        - cpu-embodied-carbon-multiply
        - ssd-embodied-carbon-multiply
        - dram-embodied-carbon-multiply
        - embodied-carbon-sum
      defaults:
        thousands-per-unit: 0.001
        days-per-year: 365
        hours-per-day: 24
        seconds-per-hour: 3600
        expected-lifespan:  # year 
        # To keep the manifest file simple, we use one `expected-lifespan` for all the components.
      inputs:
        - timestamp: 
          training_hour: 
          gpu/num: 
          gpu/cap:       # kgC02/cm2
          gpu/area:      # cm2
          cpu/num:       
          cpu/cap:       # kgC02/cm2
          cpu/area:      # cm2
          ssd/num:       
          ssd/cap:       # kgC02/GB
          ssd/area:      # GB
          dram/num:      
          dram/cap:      # kgC02/GB
          dram/area:     # GB

In addition to the fundamental calculation approach, the IF official plugin offers the SCI-M method for calculating the embodied emissions. This method can be employed to determine the embodied emissions of the hardware unit.

name: sci-m example
description: calculate the embodied emissions for the hardware unit
tags:
initialize:
  plugins:
    sci-m:
      method: SciM
      path: '@grnsft/if-plugins'
tree:
  children:
    child:
      pipeline:
        - sci-m 
      defaults:
        device/emissions-embodied:  # gCO2eq CO2eq_chip_i
        device/expected-lifespan:  # years in seconds
        resources-reserved:  
        resources-total: 
      inputs:
        - timestamp: 
          duration: # seconds  
          # the execution duration of the hardware unit

Extended manifest for LLM emissions

Sometimes we want to estimate an existing LLM model's emissions and find that we don't have the exact values for the training hours. In this case, we can use the methodologies from the above section to calculate the estimated emissions.

Estimate the operational emissions

Basically, the operational emissions of an LLM model combines the training emissions and the inference emissions.

To get the operational emissions, we need to estimate the training hours and the inference hours based on the equation T = C / ( n * FLOP_peak * eff). Where C represents the computation required, in total floating point operations, FLOP_peak represents the device peak throughput, eff represents efficiency of the device.

For the computation required for training, we can use the formula C_train ≈ 6PD with parameter size P and the training dataset size D (tokens). For the computation required for inference, we can use the formula C_inference ≈ 2P * D_inference, where D_inference means inference dataset size (tokens).

name: llm emissions manifest with estimated training time
description:
tags:
initialize:
  plugins:
    estimate-total-compute-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['flop-count-factor', 'modal/parameters-count', 'modal/tokens-count' ]
        output-parameter: 'estimate-total-compute'
    estimate-compute-per-second-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'gpu/flop_peak', 'hardware-efficiency']
        output-parameter: 'estimate-compute-per-second'
    estimate-time-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-total-compute'
        denominator: 'estimate-compute-per-second'
        output: 'estimate-time-second'
    estimate-operation-hour-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-time-second'
        denominator: 'seconds-per-hour'
        output: 'estimate-operation-hour'
    operation-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'estimate-operation-hour', 'gpu/tdp', 'pue', 'carb_inten']
        output-parameter: 'operation-carbon'
    device-expected-lifespan-hours-per-year-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['expected-lifespan', 'days-per-year', 'hours-per-day']
        output-parameter: 'expected-lifespan-duration'
    reserved-device-hour-with-device-expected-lifespan-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-operation-hour'
        denominator: 'expected-lifespan-duration'
        output: 'expected-lifespan-rate'
    gpu-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'expected-lifespan-rate','gpu/cap', 'gpu/area']
        output-parameter: 'gpu-carbon-embodied'
    cpu-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate','cpu/cap', 'cpu/area']
        output-parameter: 'cpu-carbon-embodied'
    ssd-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'ssd/cap', 'ssd/area']
        output-parameter: 'ssd-carbon-embodied'
    dram-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'dram/cap', 'dram/area']
        output-parameter: 'dram-carbon-embodied'
    embodied-carbon-sum:
      method: Sum
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: [ 'gpu-carbon-embodied', 'cpu-carbon-embodied', 'ssd-carbon-embodied', 'dram-carbon-embodied' ]
        output-parameter: 'carbon-embodied'
    llm-carbon-sum:
      method: Sum
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: [ 'carbon-embodied',  'operation-carbon']
        output-parameter: 'total-carbon'

tree:
  children:
    operational-carbon:
      pipeline:
        - estimate-total-compute-multiply
        - estimate-compute-per-second-multiply
        - estimate-time-divide
        - estimate-operation-hour-divide
        - operation-carbon-multiply
        - device-expected-lifespan-hours-per-year-multiply
        - reserved-device-hour-with-device-expected-lifespan-divide
        - gpu-embodied-carbon-multiply
        - cpu-embodied-carbon-multiply
        - ssd-embodied-carbon-multiply
        - dram-embodied-carbon-multiply
        - embodied-carbon-sum
        - llm-carbon-sum
      defaults:
        flop-count-factor: 6 # use 6 for training phase C_train ≈ 6PD, while use 2 for inference phase C_infer ≈ 2P*D_infer
        thousands-per-unit: 0.001
        days-per-year: 365
        hours-per-day: 24
        seconds-per-hour: 3600
        expected-lifespan: 5 # 5 years in seconds.
      inputs:
        - gpu/num: # the number of GPUs used for training LLM
          gpu/tdp:  # kWh the power consumption per GPU per hour
          gpu/flop_peak:
          hardware-efficiency:
          modal/parameters-count:
          modal/tokens-count:
          pue:
          carb_inten: # CO2eq/KWh  the carbon intensity of training region
          gpu/cap:       # kgC02/cm2
          gpu/area:      # cm2
          cpu/cap:       # kgC02/cm2
          cpu/area:      # cm2
          ssd/area:      # GB
          dram/cap:      # kgC02/GB
          dram/area:     # GB
          hardware-unit-num: # gpu_num / 8  assuming one CPU, SSD, DRAM for every 8 GPU/TPU chip or one server stack

Since efficient processing of LLMs relies on achieving high eff, which is calculated as the actual computing throughput X divided by the peak throughput FLOP_peak. The equation for estimating T can be written as follows: T = C / ( n * X ). Using too few or too many devices or improperly configuring parallelism can lead to reduced hardware efficiency. To get the estimated computing throughput X from parameter size P, we can use the estimated regression coefficients used for polynomial fit X = aP^2 + bP + c. When the expert parallelism e equals one, which means your GPU memory is capable of store all the parameters, the estimated regression coefficients for X are a = -8.82079068e-2 , b = 1.68591116, c = 1.33954735e+02. Otherwise the estimated regression coefficients for X are a = -5.60233749e-5, b = 8.45435587e-2, c = 1.34546129e+02

name: llm emissions manifest with estimated training time
description:
tags:
initialize:
  plugins:
    estimate-total-compute-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['flop-count-factor', 'modal/parameters-count', 'modal/tokens-count' ]
        output-parameter: 'estimate-total-compute'
    estimate-compute-per-second-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'modal/estimated-throughput']
        output-parameter: 'estimate-compute-per-second'
    estimate-time-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-total-compute'
        denominator: 'estimate-compute-per-second'
        output: 'estimate-time-second'
    estimate-operation-hour-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-time-second'
        denominator: 'seconds-per-hour'
        output: 'estimate-operation-hour'
    operation-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'estimate-operation-hour', 'gpu/tdp', 'pue', 'carb_inten']
        output-parameter: 'operation-carbon'
    device-expected-lifespan-hours-per-year-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['expected-lifespan', 'days-per-year', 'hours-per-day']
        output-parameter: 'expected-lifespan-duration'
    reserved-device-hour-with-device-expected-lifespan-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-operation-hour'
        denominator: 'expected-lifespan-duration'
        output: 'expected-lifespan-rate'
    gpu-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'expected-lifespan-rate','gpu/cap', 'gpu/area']
        output-parameter: 'gpu-carbon-embodied'
    cpu-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate','cpu/cap', 'cpu/area']
        output-parameter: 'cpu-carbon-embodied'
    ssd-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'ssd/cap', 'ssd/area']
        output-parameter: 'ssd-carbon-embodied'
    dram-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'dram/cap', 'dram/area']
        output-parameter: 'dram-carbon-embodied'
    embodied-carbon-sum:
      method: Sum
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: [ 'gpu-carbon-embodied', 'cpu-carbon-embodied', 'ssd-carbon-embodied', 'dram-carbon-embodied' ]
        output-parameter: 'carbon-embodied'
    llm-carbon-sum:
      method: Sum
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: [ 'carbon-embodied',  'operation-carbon']
        output-parameter: 'total-carbon'
tree:
  children:
    operational-carbon:
      pipeline:
        - estimate-total-compute-multiply
        - estimate-compute-per-second-multiply
        - estimate-time-divide
        - estimate-operation-hour-divide
        - operation-carbon-multiply
        - device-expected-lifespan-hours-per-year-multiply
        - reserved-device-hour-with-device-expected-lifespan-divide
        - gpu-embodied-carbon-multiply
        - cpu-embodied-carbon-multiply
        - ssd-embodied-carbon-multiply
        - dram-embodied-carbon-multiply
        - embodied-carbon-sum
        - llm-carbon-sum
      defaults:
        flop-count-factor: 6 # use 6 for training phase C_train ≈ 6PD, while use 2 for inference phase C_infer ≈ 2P*D_infer
        thousands-per-unit: 0.001
        days-per-year: 365
        hours-per-day: 24
        seconds-per-hour: 3600
        expected-lifespan: 5 # 5 years in seconds.
      inputs:
        - gpu/num: # the number of GPUs used for training LLM
          gpu/tdp:  # kWh  the power consumption per GPU per hour
          modal/parameters-count:
          modal/tokens-count: # 
          modal/estimated-throughput:  # X tokens/s
          pue:
          carb_inten: # CO2eq/KWh  the carbon intensity of training region
          gpu/cap:       # kgC02/cm2
          gpu/area:      # cm2
          cpu/cap:       # kgC02/cm2
          cpu/area:      # cm2
          ssd/area:      # GB
          dram/cap:      # kgC02/GB
          dram/area:     # GB
          hardware-unit-num: # gpu_num / 8  assuming one CPU, SSD, DRAM for every 8 GPU/TPU chip or one server stack

Estimate the embodied emissions

Based on the estimated T, we can use the same equation to calculate the embodied emissions:

  CO2eq_emb = sum(CO2eq_emb_GPU, CO2eq_emb_CPU, CO2eq_emb_SSD, CO2eq_emb_DRAM)
  CO2eq_emb_i = (t_i * area_i * CPA_i) / lifetime_i

Reference from Meta’s report, the data centers for training LLM achieve an average utilization rate of 60% throughout the 5-year lifespan of hardware units. We can assume the expected-lifespan as 5 years. For the hardware unit number, we assume one SSD, DRAM, CPU for every 8 GPU/TPU chips or one server stack.

name: llm embodied emissions manifest with estimated training time
description:
tags:
initialize:
  plugins:
    estimate-total-compute-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['flop-count-factor', 'modal/parameters-count', 'modal/tokens-count' ]
        output-parameter: 'estimate-total-compute'
    estimate-compute-per-second-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'modal/estimated-throughput']
        output-parameter: 'estimate-compute-per-second'
    estimate-training-time-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-total-compute'
        denominator: 'estimate-compute-per-second'
        output: 'estimate-time-second'
    estimate-operation-hour-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-time-second'
        denominator: 'seconds-per-hour'
        output: 'estimate-operation-hour'
    device-expected-lifespan-hours-per-year-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['expected-lifespan', 'days-per-year', 'hours-per-day']
        output-parameter: 'expected-lifespan-duration'
    reserved-device-hour-with-device-expected-lifespan-divide:
      method: Divide
      path: '@grnsft/if-plugins'
      global-config:
        numerator: 'estimate-operation-hour'
        denominator: 'expected-lifespan-duration'
        output: 'expected-lifespan-rate'
    gpu-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['gpu/num', 'expected-lifespan-rate','gpu/cap', 'gpu/area']
        output-parameter: 'gpu-carbon-embodied'
    cpu-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate','cpu/cap', 'cpu/area']
        output-parameter: 'cpu-carbon-embodied'
    ssd-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'ssd/cap', 'ssd/area']
        output-parameter: 'ssd-carbon-embodied'
    dram-embodied-carbon-multiply:
      method: Multiply
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: ['hardware-unit-num', 'expected-lifespan-rate', 'dram/cap', 'dram/area']
        output-parameter: 'dram-carbon-embodied'
    embodied-carbon-sum:
      method: Sum
      path: '@grnsft/if-plugins'
      global-config:
        input-parameters: [ 'gpu-carbon-embodied', 'cpu-carbon-embodied', 'ssd-carbon-embodied', 'dram-carbon-embodied' ]
        output-parameter: 'carbon-embodied'

tree:
  children:
    child:
      pipeline:
        - estimate-total-compute-multiply
        - estimate-compute-per-second-multiply
        - estimate-time-divide
        - estimate-operation-hour-divide
        - operation-carbon-multiply
        - device-expected-lifespan-hours-per-year-multiply
        - reserved-device-hour-with-device-expected-lifespan-divide
        - gpu-embodied-carbon-multiply
        - cpu-embodied-carbon-multiply
        - ssd-embodied-carbon-multiply
        - dram-embodied-carbon-multiply
        - embodied-carbon-sum
      defaults:
        flop-count-factor: 6 # use 6 for training phase C_train ≈ 6PD, while use 2 for inference phase C_infer ≈ 2P*D_infer
        thousands-per-unit: 0.001
        days-per-year: 365
        hours-per-day: 24
        seconds-per-hour: 3600
        expected-lifespan: 5 # 5 years in seconds.
        # Meta’s data centers achieve an average utilization rate of 60% throughout the 5-year lifespan of hardware units
        # ref: Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga, Jinshi Huang, Charles Bai, et al. Sustainable ai: Environmental implications, challenges and opportunities. Proceedings of Machine Learning and Systems, 4:795–813, 2022.
      inputs:
        - gpu/num:
          gpu/cap:       # kgC02/cm2
          gpu/area:      # cm2
          cpu/cap:       # kgC02/cm2
          cpu/area:      # cm2
          ssd/area:      # GB
          dram/cap:      # kgC02/GB
          dram/area:     # GB
          hardware-unit-num:
          # assuming one CPU, SSD, DRAM for every 8 GPU/TPU chip or one server stack
          # gpu/num / 8
          modal/parameters-count:
          modal/tokens-count:
          modal/estimated-throughput:  # X tokens/s

All of the above manifests can be found on our GitHub, and you can refer to the README's usage section for guidance on how to run them.

Conclusion

In this article, we explore the swift development of Large Language Models (LLMs) and the ongoing discussions surrounding their environmental impact. We detail the process of estimating carbon emissions during the training and inference stages of LLMs, utilizing the Impact framework tool to present a range of manifest examples that offer varying degrees of detailed emission estimates. These methods, alongside the manifest's input variables, enable a comparative analysis of the carbon footprint associated with different LLM configurations, encouraging efforts to minimize emissions. Additionally, we have assembled relevant materials, including public data related to carbon emission calculations and papers related to LLM carbon emissions, designed to streamline the process for users. By accurately quantifying the carbon emissions of LLMs and enhancing our understanding of their energy consumption, we are optimistic that a sustainable future, where AI advancements and environmental conservation are interwoven, is within reach.

Let's join hands to promote the green development of LLMs and create a brighter future together!