-
Notifications
You must be signed in to change notification settings - Fork 10
Floating point compatibility
Julia Sloan edited this page Nov 18, 2023
·
3 revisions
We aim to use Float64 and Float32 precision in CliMA's ESM. Here is a summary of some challenges / things to be aware of:
- Float32 (single precision): a signed Float32 can represent a maximum value of 2^31 − 1 = 2.14748365 * 10^9
- see Wikipedia for more information
- Float64 (double precision): a signed Float64 can represent a maximum value of 2^63 - 1 = 9.22337204 * 10^18
- see Wikipedia for more information
-
DON'T retrieve FT from
eltype(t)
-
t
is converted to a Float64 in ClimaTimeSteppers (more details below), so we can't reliably get FT fromt
-
-
DO wrap calls to functions of time with FT
- since we can't infer FT from within these functions, we need to convert the returned value to FT
- example of function definition and FT-wrapped function call using good practices
-
DO wrap decimals in
FT
- otherwise, defaults to Float64 (which causes conflicts when we try to run simulations with FT = Float32)
- important when 1. defining a new variable, and 2. passing a value to a function
- note: the following variable definitions are equivalent:
x::FT = val
,x = FT(val)
- exception: this isn't necessary when setting the value of a variable that has already been defined with type FT. In this case, the new value you're setting will get converted to type FT and stored as that.
-
DO use
eps(FT)
to check that a number is small- in tests or simulations, we often verify that a value (perhaps the difference of two quantities) is nearly zero. Instead of using a hardcoded number (e.g.
1e-13
) for this comparison, we should useeps(FT)
or a multiple of it. -
example comparison using
eps(FT)
- see more details about
eps(FT)
below
- in tests or simulations, we often verify that a value (perhaps the difference of two quantities) is nearly zero. Instead of using a hardcoded number (e.g.
-
DO define
t
as either FT or Float64- in ClimaTimeSteppers, all time-related values (t_start, t_end, dt, etc) get converted to Float64, so it doesn't matter what type we define these values to be
- it may be more clear that t is not reliably of type FT if we instead consistently define these as Float64, but I think this is a convention we should discuss
-
DO write tests for Float32 and Float64
- a test should loop over both float types if it tests any of the following:
- RHS functions
- update aux functions
- parameterizations
- anything appearing in the RHS
- types of constructed objects
- calculations
- a test should loop over both float types if it tests any of the following:
- ClimaTimeSteppers.jl always stores time as a
Float64
becauseFloat32
does not have enough bits to accurately track time without roundoff error.integrator.t
gets converted somewhere to Float64 duringstep!
/solve
. - Eventually, we may track time using a data type more complicated than Float, to avoid these accuracy issues for long simulations. This may be a
DateTime
type, which would require us to specify the date that our simulation starts on (which is already required when reading in temporally-varying maps of data). However, this approach hasn't been decided on yet.
- Many real numbers can't be represented exactly with floating-point numbers. The "machine epsilon" is the distance between two representable floating point numbers, which varies for different float types.
- The differences in decimal representation across float types causes different rounding errors, which can accumulate to cause quite different values between calculations done with different float types. This is why we prefer to use the float type-specific
eps(FT)
for comparisons of near-0 values, rather than hardcoded tiny numbers. - See the Julia docs about epsilon here
julia> eps(Float64)
2.220446049250313e-16
julia> eps(Float32)
1.1920929f-7
julia> nextfloat(Float64(1))
1.0000000000000002
julia> nextfloat(Float32(1))
1.0000001f0
- ClimaCoupler.jl floating point precision issues wiki
- mixed precision: https://blogs.nvidia.com/blog/2019/11/15/whats-the-difference-between-single-double-multi-and-mixed-precision-computing/