Karl’s 1st year C programming course included IEEE floating point representation. It’s good to understand that representation, but pretty silly to try to do those conversions by hand.
I remember covering this in our computer organization course, but by that time I’d already figured it out for my own purposes. I wasn’t smart enough to look it up, and figured it out the hard way by reverse engineering the layout with printfs or the debugger or something like that.
I couldn’t help myself and wrote a little program to unpack a (32-bit) float into it’s representative parts. The idea is that we have a number in the form:
\begin{equation}\label{eqn:float32:1}
\pm 1.bbbbbbbb x 2^n
\end{equation}
The sizes of those components in a 32-bit C float type are:
- 1 sign bit,
- 8 exponent bits,
- 1 implied 1 mantissa bit,
- 23 mantissa bits,
where the exponent has a bias (127 = (1<<7)-1) so that the hardware doesn’t have to do any twos complement manipulation.
I’ve put my little floating point explorer program on github. Here is some sample output:
value: 0 hex: 00000000 bits: 00000000000000000000000000000000 sign: 0 exponent: 00000000 (0+0) mantissa: 00000000000000000000000 number: 0.00000000000000000000000 x 2^(0) value: inf hex: 7F800000 bits: 01111111100000000000000000000000 sign: 0 exponent: 11111111 mantissa: 00000000000000000000000 number: +inf value: -inf hex: FF800000 bits: 11111111100000000000000000000000 sign: 1 exponent: 11111111 mantissa: 00000000000000000000000 number: -inf value: nan hex: 7FC00000 bits: 01111111110000000000000000000000 sign: 0 exponent: 11111111 mantissa: 10000000000000000000000 number: NaN value: 1.1754944e-38 hex: 00800000 bits: 00000000100000000000000000000000 sign: 0 exponent: 00000001 (127 -126) mantissa: 00000000000000000000000 number: 1.00000000000000000000000 x 2^(-126) value: 3.4028235e+38 hex: 7F7FFFFF bits: 01111111011111111111111111111111 sign: 0 exponent: 11111110 (127 +127) mantissa: 11111111111111111111111 number: 1.11111111111111111111111 x 2^(127) Smallest denormal: value: 1e-45 hex: 00000001 bits: 00000000000000000000000000000001 sign: 0 exponent: 00000000 (0-126) mantissa: 00000000000000000000001 number: 0.00000000000000000000001 x 2^(-126) Largest denormal: value: 1.1754942e-38 hex: 007FFFFF bits: 00000000011111111111111111111111 sign: 0 exponent: 00000000 (0-126) mantissa: 11111111111111111111111 number: 0.11111111111111111111111 x 2^(-126) value: 1 hex: 3F800000 bits: 00111111100000000000000000000000 sign: 0 exponent: 01111111 (127 +0) mantissa: 00000000000000000000000 number: 1.00000000000000000000000 x 2^(0) value: -2 hex: C0000000 bits: 11000000000000000000000000000000 sign: 1 exponent: 10000000 (127 +1) mantissa: 00000000000000000000000 number: -1.00000000000000000000000 x 2^(1) value: 6 hex: 40C00000 bits: 01000000110000000000000000000000 sign: 0 exponent: 10000001 (127 +2) mantissa: 10000000000000000000000 number: 1.10000000000000000000000 x 2^(2) value: 1.5 hex: 3FC00000 bits: 00111111110000000000000000000000 sign: 0 exponent: 01111111 (127 +0) mantissa: 10000000000000000000000 number: 1.10000000000000000000000 x 2^(0) value: 0.125 hex: 3E000000 bits: 00111110000000000000000000000000 sign: 0 exponent: 01111100 (127 -3) mantissa: 00000000000000000000000 number: 1.00000000000000000000000 x 2^(-3)
Shoutout to Grok for the code review and the code fragments required to show the representation of NaN, \( \pm \infty \), and denormalized numbers. Grok offered to help extend this to double and long double representations, but where is the fun in letting it do that — that’s an exercise for another day.