Binary Representation of Floating-point Numbers

In computing, a number with a decimal point is called a floating point number. For example, the number 1 is an integer, but 1.0 is a floating point number.

Scientific Notation

In considering such numbers, some are very large, while others are tiny:

The need to handle such a remarkable range of numbers naturally motivates the use of scientific notation: move the decimal point after the first digit and record any shift of the decimal point by indicating a power of ten. The following examples show the above numbers in scientific notation.

Number Decimal Point Shift Number in Scientific Notation
92,960,000 shift decimal point left 7 places 9.296 × 107
490,828,800,000 shift left 11 places 4.908288 × 1011
6.02214179 × 1023 already in scientific notation 6.02214179 × 1023
2,600,000,000 shift decimal point left 9 places 2.6 × 109
0.000000007 shift decimal point right 9 places 7. × 10-9
0.001 shift decimal point right 3 places 1. × 10-3
0.0000004 shift decimal point right 7 places 1. × 10-7
0.0000000001 shift decimal point right 10 places 1. × 10-10

One important characteristic of this notation is that we can maintain a substantial level of accuracy of each number, regardless of what power of 10 might be involved. For example, 4.908288 × 1011 provides 7 digits of accuracy, whereas the number 4.908288 × 10-7 still has 7 digits of accuracy.

Jargon: Numbers in scientific notation have the form a × 10n, where 1 ≤ a < 10 and n is an integer. In this context, a is called the mantissa and n is called the exponent for the identified number. For example, given the number 2,600,000,000, we rewrite it in scientific notation to get 2.6 × 109. With this transformation, 2.6 is the mantissa and 9 is the exponent.

The Floating Point Standard(s) of the Institute of Electrical and Electronics Engineers (IEEE)

Within computers, [almost] all floating point numbers follow standards formulated by the Institute of Electrical and Electronics Engineers (IEEE). These standards utilize a common approach for representing floating point numbers, but variations of the standards allow different numbers of bits.

Both float or double storage utilize a binary version of scientific notation. Translation of a decimal number to either float or double follows 4 steps:

  1. Write the number in a binary version of scientific notation, obtaining a binary mantissa and a binary exponent.
  2. Use one bit to represent the sign of the number.
  3. Store the binary mantissa in an efficient way.
  4. Use the binary exponent as a starting point in a computation to obtain a stored exponent.

Differences between float or double relate to how bits are allocated and how certain computations are made.

data type total bits sign bit exponent bits mantissa bits
float 32 bits 1 bit 8 bits 23 bits
double 64 bits 1 bit 11 bits 52 bits

Regarding accuracy, 10 bits can represent numbers up to 1023 (about 3 decimal digits of accuracy), so the 23 bits used for float numbers yields about 7 or 8 decimal digits of accuracy. Similar, the 52 bits available for double numbers allows about 16 decimal digits of accuracy.

With this background, we now examine in detail each of the four steps in translating a decimal number to float or double floating point notation.


Writing a Decimal in Binary Scientific Notation

The reading on the Binary Representation of Integers noted that the digits of an integer, written in binary form correspond to powers of two. For integers, the powers were non-negative, but negative powers also are posible. Here are some examples, following a somewhat expanded format from that used in the previous reading.

binary number0 0 0 1 1 0 1 0 1 0 0 1
powers of 2 27 26 25 24 23 22 21 20 2-1 2-2 2-3 2-4
decimal value
of power
128 64 32 16 8 4 2 1 1/2=0.5 1/4 = 0.25 1/8 = 0.125 1/16 = 0.0625
column values with binary 1 16 8 2 0.5 0.0625

Putting these pieces together, the binary number 00011010.1001 represents the number 16+8+2+0.5+0.0625 = 26.5625. In interpreting the number 11010.1001, the digits to the left of the point (11010) correspond to positive powers of 2, and the digits to the right (1001) correspond to negative powers. Also, since we are writing binary numbers, the period itself should not be called a decimal point; more properly it is called a binary point or a radix point.

As this example illustrates, parts of a binary number to the left of a binary point are integers, and we can use the discussion of [non-negative] integers to convert from decimal to binary. For parts of a binary number to the right of a binary point, some numbers are easy: 0.5, 0.25, 0.125, etc. correspond to negative powers of two. Combinations of these negative powers (e.g., 0.75 = 0.5 + 0.25) also are easy to represent (e.g., binary 0.11). However, other fractional decimals may require more careful work to find reasonably binary representations. In the interest of time and effort, in what follows we assume

Given a binary number, such as 00011010.1001, we can write the number in normalized form (scientific notation) by shifting the radix to come after the initial 1. For 11010.1001, the radix point should be sifted 4 places to the left, obtaining 1.10101001 and an exponent 4 (written 100 in binary). In a mix of binary and decimal, binary 11010.1001 = 1.10101001 × 2100 — a binary expression, except for our use of "2" which must be raised to the 4th power. Expressed in binary, 1.10101001 is the mantissa, and 100 is the exponent for the initial number 00011010.1001.

Some additional examples illustrate this rewriting of a decimal number into a binary normal form (binary scientific notation).

Decimal
number
Binary
equivalent
Normalized
mantissa
Shift binary exponent
19 10011 1.0011 4 left 100
87 1010111 1.010111 6 left 110
2718 101010011110 1.01010011110 11 left 1011
34145 1000010101100001 1.000010101100001 15 left 1111

Overall, the algorithm for writing a number in normalized binary form involves three steps:

  1. Convert the number to a binary number.
  2. Shift the binary point, so that it appears just after the initial 1 (the number will be 1.-----). The shifted number is called the mantissa.
  3. Record the number of bits required for the shift in step 2 (consider a left shift as a positive, a right shift as a negative). This amount of shift is the exponent.

Practice

Translate the number

to normalized binary form, by giving both the binary mantissa (with no leading 0's) and the binary exponent.

Answer:
    Mantissa: 
    Exponent: 
   






Optional: For the Mathematically Inclined and/or Curious

Using decimal notation, some fractions have an infinite decimal representation. For example, 1/3 = 0.33333333... . One way to compute this decimal representation is to use long division:

       0 . 3 3 3 3 3 ...    
      ------------------
   3 | 1 . 0 0 0 0 0 ...
           9
       -----
           1 0
             9
         -----
             1 0
               9
           -----
               1 0
                 9
             -----
                 1 0
                   9
               -----
                   1  
                    etc.

At each stage of the division (after the first step), we divide 3 into 10, obtain a quotient of 3, and remainder of 1.

In binary notation, a similar situation arises with many fractions. For example, consider the decimal number one tenth (1/10 or 0.1 decimal). To determine the relevant binary representation, note that 10 (decimal) corresponds to 1010 (binary). To compute 0.1 (decimal) we again utilize long division — in binary.

           0 . 0 0 0 1 1 0 0 1 1 0 0 1 ...
          --------------------------------
    1010 | 1 . 0 0 0 0 0 0 0 0 0 0 0 0 ...
               1 0 1 0
               -------
                 1 1 0 0
                 1 0 1 0
                 -------    
                     1 0 0 0 0
                       1 0 1 0
                     -------
                       1 1 0 0
                       1 0 1 0
                       -------    
                           1 0 0 0 0
                             1 0 1 0
                           ---------
                               1 1 0
                                   etc.

After starting 0.00011, the 0011 pattern continues forever, so the decimal number 0.1 cannot be represented with a finite number of digits in binary.

Computing the Sign Bit

Using either single precision or double precision, the first bit represents a sign. As with sign-magnitude notation for integers, 0 is used to represent a positive number and 1 is used for a negative. (We'll worry about representing the number zero later in this reading.)

For example, consider the representation of ±87.25. We still have to discuss details of storing the mantissa and the exponent. However, the number will start:

Storing the Binary Mantissa

When writing a number in (decimal-based) scientific notation, the leading digit may be 1, 2, ..., 9. For example, some decimal numbers at the start of this reading included 1. × 10-10, 2.6 × 109, 4.908288 × 1011, 6.02214179 × 1023, 7. × 10-9, and 9.296 × 107.

With binary normalized form (e.g., binary scientific form), the radix point moves after the first non-zero digit — but in binary, the only possible non-zero digit is 1. Thus, a number in binary normalized form must begin 1.0-----.

With this property that all normalized binary numbers begin 1.?????, the IEEE Floating Point Standards observe that there is no need to store the leading bit. The bit will always be 1, so we can save space by storing the mantissa starting with the second bit. For example, for the mantissa 1.10101001, the bits actually stored are 10101001 — the leading 1 is not stored.

These stored bits are place at the left of the mantissa section of the single- or double-precision IEEE floating point number. Since single-precision numbers require 23-bit mantissas and double-precision numbers require 52-bit mantissas. Once the desired mantissa is placed at the left of this field, the rest of the space is filled with 0's. Thus, for the mantissa 1.10101001, the actual mantissa stored in single-precision format is 10101001000000000000000.

In summary, given the binary mantissa of a floating point number:

Storing the Binary Exponent

Our discussion of the storage of floating point numbers has started with a normalized, binary representation:

mantissa × 2binary exponent

For many floating point numbers, storage of the exponent is [almost] straightforward — but with a twist. For floating point numbers with an exponent in a specified range, a bias value is added.

Precision Exponent range Bias
Single Precision -126 to +127 127
Double Precision -1022 to +1023 1023

For example, consider the 11010.1001. We have noted that the mantissa in binary normal form is 1.10101001. With a shift of the binary point 4 digits to the left, the exponent is 4 (100 in binary).

With the constraint on the range allowed for exponents in normalized binary form, and with the addition of the bias, note that all stored exponents will be greater than 1, but less than 111...111 . Thus, the exponents stored for normalized binary numbers will not be all 0's and not all 1's. For most numbers we might use, this storage of floating point numbers (eith single- or double-precision) will work without difficulty!

Normalized Exponent Summary

For normalized binary numbers with exponents in the specified range (e.g., -126 to +127 for float type, storage of the exponent follows two simple steps:

  1. Add the bias (127 for float, 1023 for double)
  2. Store the result

A Closer Look at Exponents

This process ensures that the stored exponent for normalized binary numbers with exponents in range will be neither all 0's nor all 1's.

What About Zero and Tiny Numbers

The storage of normalized binary numbers works fine in many cases, but it fails for the number zero and for numbers with exponents smaller than 2-126 (single-precision) or smaller than 2-1023 (double-precision).

Such numbers are stored un-normalized and considered to be multiplied by 2-126 (single-precision) or 2-1023 (double-precision). That is, the number is written in the form

mantissa × 2-126

and the mantissa is stored directly. Some examples for single-precision follow:

Number Number × 2-126 Stored Mantissa
0 0.0000... × 2-126 000000...
2-127 0.100000... × 2-126 100000...
1.1011 × 2-129 0.0011011 × 2-126 00110110000...

What About the Stored Exponent 111...111 ?

So far, all stored exponents for floating point numbers have been less than all 1's (111...111). Within the IEEE Standards, an exponent of all 1's is reserved for various error conditions — not for actual numbers.


The Full IEEE Floating Point Number

Combining the full discussion of representing floating point numbers within single-precision (32-bit) or double-precision (64-bit), the following rules apply for most circumstances:

  1. If the number is zero, the full representation is all 0's.
  2. If the number is non-zero:
    • Write the number in a normalized, binary format: mantissa × 2binary exponent. (In what follows, Assume the binary exponent is within a specified range.)
    • Assuming the binary exponent is within a specified range, the first bit of the stored floating-point number represents the sign: 0 for positive, 1 for negative.
    • Store the exponent in 8 bits (single precision) or 11 bits (double precision), after adding the required bias (127 or 1023, respectively).
    • Store the mantissa in 23 or 52 bits, respectively, omitting the leading 1, placing the resulting mantissa on the left, and filling with 0's on the right as needed.

Practice with Single-precision Floating Point

Translate the number

to 32-bit IEEE floating-point format.

Answer: Fill in the blanks

sign stored
exponent
stored
mantissa
1 bit 8 bits 23 bits
  0 bits typed 0 bits typed

   






created 4 April 2016 by Henry M. Walker
expanded and edited 7 April 2016 by Henry M. Walker
reformatted for CS 415 24 July 2022 by Henry M. Walker
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.