Skip to content

Floating point compatibility

Julia Sloan edited this page Nov 18, 2023 · 3 revisions

We aim to use Float64 and Float32 precision in CliMA's ESM. Here is a summary of some challenges / things to be aware of:

Float Types

  • Float32 (single precision): a signed Float32 can represent a maximum value of 2^31 − 1 = 2.14748365 * 10^9
  • Float64 (double precision): a signed Float64 can represent a maximum value of 2^63 - 1 = 9.22337204 * 10^18

Do's and don't's of floating point compatibility

  • DON'T retrieve FT from eltype(t)
    • t is converted to a Float64 in ClimaTimeSteppers (more details below), so we can't reliably get FT from t
  • DO wrap calls to functions of time with FT
    • since we can't infer FT from within these functions, we need to convert the returned value to FT
    • example of function definition and FT-wrapped function call using good practices
  • DO wrap decimals in FT
    • otherwise, defaults to Float64 (which causes conflicts when we try to run simulations with FT = Float32)
    • important when 1. defining a new variable, and 2. passing a value to a function
    • note: the following variable definitions are equivalent: x::FT = val, x = FT(val)
    • exception: this isn't necessary when setting the value of a variable that has already been defined with type FT. In this case, the new value you're setting will get converted to type FT and stored as that.
  • DO use eps(FT) to check that a number is small
    • in tests or simulations, we often verify that a value (perhaps the difference of two quantities) is nearly zero. Instead of using a hardcoded number (e.g. 1e-13) for this comparison, we should use eps(FT) or a multiple of it.
    • example comparison using eps(FT)
    • see more details about eps(FT) below
  • DO define t as either FT or Float64
    • in ClimaTimeSteppers, all time-related values (t_start, t_end, dt, etc) get converted to Float64, so it doesn't matter what type we define these values to be
    • it may be more clear that t is not reliably of type FT if we instead consistently define these as Float64, but I think this is a convention we should discuss
  • DO write tests for Float32 and Float64
    • a test should loop over both float types if it tests any of the following:
      • RHS functions
      • update aux functions
      • parameterizations
      • anything appearing in the RHS
      • types of constructed objects
      • calculations

ClimaTimeSteppers.jl

  • ClimaTimeSteppers.jl always stores time as a Float64 because Float32 does not have enough bits to accurately track time without roundoff error. integrator.t gets converted somewhere to Float64 during step!/solve.
  • Eventually, we may track time using a data type more complicated than Float, to avoid these accuracy issues for long simulations. This may be a DateTime type, which would require us to specify the date that our simulation starts on (which is already required when reading in temporally-varying maps of data). However, this approach hasn't been decided on yet.

eps() in Julia

  • Many real numbers can't be represented exactly with floating-point numbers. The "machine epsilon" is the distance between two representable floating point numbers, which varies for different float types.
  • The differences in decimal representation across float types causes different rounding errors, which can accumulate to cause quite different values between calculations done with different float types. This is why we prefer to use the float type-specific eps(FT) for comparisons of near-0 values, rather than hardcoded tiny numbers.
  • See the Julia docs about epsilon here
julia> eps(Float64)
2.220446049250313e-16

julia> eps(Float32)
1.1920929f-7

julia> nextfloat(Float64(1))
1.0000000000000002

julia> nextfloat(Float32(1))
1.0000001f0

Refs

Clone this wiki locally