In computing, a number with a decimal point is called a floating point number. For example, the number 1 is an integer, but 1.0 is a floating point number.
In considering such numbers, some are very large, while others are tiny:
Large numbers
Tiny numbers
The need to handle such a remarkable range of numbers naturally motivates the use of scientific notation: move the decimal point after the first digit and record any shift of the decimal point by indicating a power of ten. The following examples show the above numbers in scientific notation.
Number | Decimal Point Shift | Number in Scientific Notation |
---|---|---|
92,960,000 | shift decimal point left 7 places | 9.296 × 107 |
490,828,800,000 | shift left 11 places | 4.908288 × 1011 |
6.02214179 × 1023 | already in scientific notation | 6.02214179 × 1023 |
2,600,000,000 | shift decimal point left 9 places | 2.6 × 109 |
0.000000007 | shift decimal point right 9 places | 7. × 10-9 |
0.001 | shift decimal point right 3 places | 1. × 10-3 |
0.0000004 | shift decimal point right 7 places | 1. × 10-7 |
0.0000000001 | shift decimal point right 10 places | 1. × 10-10 |
One important characteristic of this notation is that we can maintain a substantial level of accuracy of each number, regardless of what power of 10 might be involved. For example, 4.908288 × 1011 provides 7 digits of accuracy, whereas the number 4.908288 × 10-7 still has 7 digits of accuracy.
Jargon: Numbers in scientific notation have the form a × 10n, where 1 ≤ a < 10 and n is an integer. In this context, a is called the mantissa and n is called the exponent for the identified number. For example, given the number 2,600,000,000, we rewrite it in scientific notation to get 2.6 × 109. With this transformation, 2.6 is the mantissa and 9 is the exponent.
Within computers, [almost] all floating point numbers follow standards formulated by the Institute of Electrical and Electronics Engineers (IEEE). These standards utilize a common approach for representing floating point numbers, but variations of the standards allow different numbers of bits.
Both float or double storage utilize a binary version of scientific notation. Translation of a decimal number to either float or double follows 4 steps:
Differences between float or double relate to how bits are allocated and how certain computations are made.
data type | total bits | sign bit | exponent bits | mantissa bits |
---|---|---|---|---|
float | 32 bits | 1 bit | 8 bits | 23 bits |
double | 64 bits | 1 bit | 11 bits | 52 bits |
Regarding accuracy, 10 bits can represent numbers up to 1023 (about 3 decimal digits of accuracy), so the 23 bits used for float numbers yields about 7 or 8 decimal digits of accuracy. Similar, the 52 bits available for double numbers allows about 16 decimal digits of accuracy.
With this background, we now examine in detail each of the four steps in translating a decimal number to float or double floating point notation.
The reading on the Binary Representation of Integers noted that the digits of an integer, written in binary form correspond to powers of two. For integers, the powers were non-negative, but negative powers also are posible. Here are some examples, following a somewhat expanded format from that used in the previous reading.
binary number | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |
powers of 2 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 2-1 | 2-2 | 2-3 | 2-4 |
decimal value of power | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | 1/2=0.5 | 1/4 = 0.25 | 1/8 = 0.125 | 1/16 = 0.0625 |
column values with binary 1 | 16 | 8 | 2 | 0.5 | 0.0625 |
Putting these pieces together, the binary number 00011010.1001 represents the number 16+8+2+0.5+0.0625 = 26.5625. In interpreting the number 11010.1001, the digits to the left of the point (11010) correspond to positive powers of 2, and the digits to the right (1001) correspond to negative powers. Also, since we are writing binary numbers, the period itself should not be called a decimal point; more properly it is called a binary point or a radix point.
As this example illustrates, parts of a binary number to the left of a binary point are integers, and we can use the discussion of [non-negative] integers to convert from decimal to binary. For parts of a binary number to the right of a binary point, some numbers are easy: 0.5, 0.25, 0.125, etc. correspond to negative powers of two. Combinations of these negative powers (e.g., 0.75 = 0.5 + 0.25) also are easy to represent (e.g., binary 0.11). However, other fractional decimals may require more careful work to find reasonably binary representations. In the interest of time and effort, in what follows we assume
Given a binary number, such as 00011010.1001, we can write the number in normalized form (scientific notation) by shifting the radix to come after the initial 1. For 11010.1001, the radix point should be sifted 4 places to the left, obtaining 1.10101001 and an exponent 4 (written 100 in binary). In a mix of binary and decimal, binary 11010.1001 = 1.10101001 × 2100 — a binary expression, except for our use of "2" which must be raised to the 4th power. Expressed in binary, 1.10101001 is the mantissa, and 100 is the exponent for the initial number 00011010.1001.
Some additional examples illustrate this rewriting of a decimal number into a binary normal form (binary scientific notation).
Decimal number | Binary equivalent | Normalized mantissa | Shift | binary exponent |
19 | 10011 | 1.0011 | 4 left | 100 |
87 | 1010111 | 1.010111 | 6 left | 110 |
2718 | 101010011110 | 1.01010011110 | 11 left | 1011 |
34145 | 1000010101100001 | 1.000010101100001 | 15 left | 1111 |
Overall, the algorithm for writing a number in normalized binary form involves three steps:
Translate the number
to normalized binary form, by giving both the binary mantissa (with no leading 0's) and the binary exponent.
Answer:
Mantissa:
Exponent:
Using decimal notation, some fractions have an infinite decimal representation. For example, 1/3 = 0.33333333... . One way to compute this decimal representation is to use long division:
0 . 3 3 3 3 3 ... ------------------ 3 | 1 . 0 0 0 0 0 ... 9 ----- 1 0 9 ----- 1 0 9 ----- 1 0 9 ----- 1 0 9 ----- 1 etc.
At each stage of the division (after the first step), we divide 3 into 10, obtain a quotient of 3, and remainder of 1.
In binary notation, a similar situation arises with many fractions. For example, consider the decimal number one tenth (1/10 or 0.1 decimal). To determine the relevant binary representation, note that 10 (decimal) corresponds to 1010 (binary). To compute 0.1 (decimal) we again utilize long division — in binary.
0 . 0 0 0 1 1 0 0 1 1 0 0 1 ... -------------------------------- 1010 | 1 . 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 0 1 0 ------- 1 1 0 0 1 0 1 0 ------- 1 0 0 0 0 1 0 1 0 ------- 1 1 0 0 1 0 1 0 ------- 1 0 0 0 0 1 0 1 0 --------- 1 1 0 etc.
After starting 0.00011, the 0011 pattern continues forever, so the decimal number 0.1 cannot be represented with a finite number of digits in binary.
Using either single precision or double precision, the first bit represents a sign. As with sign-magnitude notation for integers, 0 is used to represent a positive number and 1 is used for a negative. (We'll worry about representing the number zero later in this reading.)
For example, consider the representation of ±87.25. We still have to discuss details of storing the mantissa and the exponent. However, the number will start:
When writing a number in (decimal-based) scientific notation, the leading digit may be 1, 2, ..., 9. For example, some decimal numbers at the start of this reading included 1. × 10-10, 2.6 × 109, 4.908288 × 1011, 6.02214179 × 1023, 7. × 10-9, and 9.296 × 107.
With binary normalized form (e.g., binary scientific form), the radix point moves after the first non-zero digit — but in binary, the only possible non-zero digit is 1. Thus, a number in binary normalized form must begin 1.0-----.
With this property that all normalized binary numbers begin 1.?????, the IEEE Floating Point Standards observe that there is no need to store the leading bit. The bit will always be 1, so we can save space by storing the mantissa starting with the second bit. For example, for the mantissa 1.10101001, the bits actually stored are 10101001 — the leading 1 is not stored.
These stored bits are place at the left of the mantissa section of the single- or double-precision IEEE floating point number. Since single-precision numbers require 23-bit mantissas and double-precision numbers require 52-bit mantissas. Once the desired mantissa is placed at the left of this field, the rest of the space is filled with 0's. Thus, for the mantissa 1.10101001, the actual mantissa stored in single-precision format is 10101001000000000000000.
In summary, given the binary mantissa of a floating point number:
This process ensures that the stored exponent for normalized binary numbers with exponents in range will be neither all 0's nor all 1's.
The storage of normalized binary numbers works fine in many cases, but it fails for the number zero and for numbers with exponents smaller than 2-126 (single-precision) or smaller than 2-1023 (double-precision).
Such numbers are stored un-normalized and considered to be multiplied by 2-126 (single-precision) or 2-1023 (double-precision). That is, the number is written in the form
and the mantissa is stored directly. Some examples for single-precision follow:
Number | Number × 2-126 | Stored Mantissa |
---|---|---|
0 | 0.0000... × 2-126 | 000000... |
2-127 | 0.100000... × 2-126 | 100000... |
1.1011 × 2-129
0.0011011 × 2-126
| 00110110000...
| |
So far, all stored exponents for floating point numbers have been less than all 1's (111...111). Within the IEEE Standards, an exponent of all 1's is reserved for various error conditions — not for actual numbers.
Combining the full discussion of representing floating point numbers within single-precision (32-bit) or double-precision (64-bit), the following rules apply for most circumstances:
Translate the number
to 32-bit IEEE floating-point format.
Answer: Fill in the blanks
sign | stored exponent | stored mantissa |
1 bit | 8 bits | 23 bits |
0 bits typed | 0 bits typed |
created 4 April 2016 by Henry M. Walker expanded and edited 7 April 2016 by Henry M. Walker reformatted for CS 415 24 July 2022 by Henry M. Walker |
|
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu. |