The IEEE 32-bit float explorer that I wrote about previously, has now been extended from just float (e8m23) to include floating point support for a number of other representations, including additional CPU floating point types:

  • 64-bit IEEE (double: e11m52),
  • Intel “80-bit” (long double: e15m64),
  • 128-bit IEEE (long double on ARM Linux: e15m122).   This is also the GCC quadmath representation,

and GPU floating point types:

  • e5m2
  • e4m3
  • fp16 (e5m10)
  • bf16 (e8m7)

The CUDA API is used for floating point conversions of the GPU floating point types (if available), and a manual convertor has been implemented if CUDA is not available.

The Intel long double format is currently only supported when building on x64.  This type is different from all the others, where normal values do not use an implicit leading mantissa bit.

I have not implemented mainframe HEXFLOAT support.

Here is some sample output:

type: bf16
value:    3
hex:      4040
bits:     0100000001000000
sign:     0
exponent:  10000000                        (127 +1)
mantissa:          1000000
number:          1.1000000 x 2^(1)

type: fp16
value:    3
hex:      4200
bits:     0100001000000000
sign:     0
exponent:  10000                        (15 +1)
mantissa:       1000000000
number:       1.1000000000 x 2^(1)

type: e4m3
value:    3
hex:      44
bits:     01000100
sign:     0
exponent:  1000                        (7 +1)
mantissa:      100
number:      1.100 x 2^(1)

type: e5m2
value:    3
hex:      42
bits:     01000010
sign:     0
exponent:  10000                        (15 +1)
mantissa:       10
number:       1.10 x 2^(1)

type: float
value:    3
hex:      40400000
bits:     01000000010000000000000000000000
sign:     0
exponent:  10000000                        (127 +1)
mantissa:          10000000000000000000000
number:          1.10000000000000000000000 x 2^(1)

type: double
value:    3
hex:      4008000000000000
bits:     0100000000001000000000000000000000000000000000000000000000000000
sign:     0
exponent:  10000000000                                                     (1023 +1)
mantissa:             1000000000000000000000000000000000000000000000000000
number:             1.1000000000000000000000000000000000000000000000000000 x 2^(1)

type: long double
value:    3
hex:      4000C000000000000000
bits:     01000000000000001100000000000000000000000000000000000000000000000000000000000000
sign:     0
exponent:  100000000000000                                                     (16383 +1)
mantissa:                 1100000000000000000000000000000000000000000000000000000000000000
number:                 0.1100000000000000000000000000000000000000000000000000000000000000 x 2^(2)

type: float128
value:    3.000000
hex:      40008000000000000000000000000000
bits:     01000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
sign:     0
exponent:  100000000000000                                                     (16383 +1)
mantissa:                 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
number:                 1.1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 x 2^(1)