Sonoma State University
 
Algorithm Analysis
Instructor: Henry M. Walker

Lecturer, Sonoma State University
Professor Emeritus of Computer Science and Mathematics, Grinnell College

Although CS 415 has been well developed for several years, last year the CS faculty made a significant, long-term curricular change regarding SSU's Upper Division GE Area B Requirement.

Assignment on Consequences of Data Representation

This assignment explores some practical consequences of the representation of data in program processing.

Integer Overflow

Recall from your prior work in C/C++ that constants variables INT_MIN and INT_MAX in C/C++ contain the smallest and largest int values available in C/C++s with the current hardware and compiler.

  1. Finding Integer Averages: Given two integers, i and j, an expression is desired to compute their average (as an integer).

    Notes, since this problem involves integers:

    • In case the arithmetic average is a real number ending in .5, the average may be rounded either up or down. Thus, either 7 or 8 should be considered as a correct [integer] average of 6 and 9.
    • In the case that the integers, i and j, are the same, their average (of course) should be i (or j). Thus, the average of 5 and 5 should be 5, and the average of 6 and 6 should be 6.
    • A program should avoid issues of overflow to the extent possible. (Pragmatically, arithmetic with INT_MIN and INT_MAX may be troublesome, but such difficult cases should be kept to a minimum.)

    Consider Tables 1 and 2 below, in which the first column identifies five expressions that might be used to compute the average of i and j.

    1. Complete Table 1 below, assuming i and j are non-negative integers. That is, for each empty cell in the table, indicate
      • "always", if the expression consistently computes the correct average within the conditions given in the column header.
      • "sometimes", if the expression gives the correct average for some i, j values, but not others, given the conditions indicated in the column header.
      • "never", if the expression never computes a correct answer, given the conditions indicated in the column header.
    2. Based on your work in Step a, what, if any, expression(s) would you recommend for a program that needs to compute the average of two non-negative integers. Explain briefly.
    3. Complete Table 2 below, assuming i and j can be any integers (positive, negative, or zero). In completing Table 2:
      • Use the same options, "always", "sometimes", or "never", as in part a.
      • Note that if an entry in Table 1 is "sometimes" or "never", then the corresponding entry in Table 2 may be similar.
    4. Based on your work in Step c, what, if any, expression(s) would you recommend for a program that needs to compute the average of two arbitrary integers. Explain briefly.

Table 1: i, j can be any non-negative integers. overflow possible ignoring overflow, expression gives correct answer
expression OK for all but a few i, j OK unless i=INT_MAX OK unless j=INT_MAX i even, jeven i odd, jeven i even, jodd i odd, jodd
avg1 = (i + j) / 2;

avg2 = i/2 + j/2;

avg3 = (i+1)/2 + j/2;

avg4 = (i+1)/2 + (j+1)/2;

avg5 = i + (j-i)/2;




Table 2: i, jcan be any integers (positive, negative, or zero) overflow possible ignoring overflow, expression gives correct answer
expression OK for all but a few i, j OK unless i=INT_MAX or
i=-INT_MAX
OK unless j=INT_MAX or
j=-INT_MAX
i even, jeven i odd, jeven i even, jodd i odd, jodd
avg1 = (i + j) / 2;

avg2 = i/2 + j/2;

avg3 = (i+1)/2 + j/2;

avg4 = (i+1)/2 + (j+1)/2;

avg5 = i + (j-i)/2;

  1. Consider the program integer-average.c.

    1. Compile and run the program, and record what int values are possible within C proprams.
    2. Review the program to determine how the values of arr1 are computed, and how the value of sum compares to INT_MAX
    3. Check the program output. Is the computation of the average of values for arr1 correct?
    4. Answer parts b and c for array arr2. What is different in the processing? To the extent that you can, explain why the average computation for this array yields an incorrect result.

Storage of Real Numbers and its Accuracy

The international standard for 64-bit floating-point numbers (often the basis for a double in C/C++) uses a binary version of scientific notation, with a sign, an exponent (a power of 2 in binary) and a mantissa (also as a binary number). With the internation standard for 64-bit floating point numbers, the bits are allocated as follows:

See Binary Representation of Floating-point Numbers for details.

In practice, this international binary standard does not store the leading mantissa bit (because in scientific notation for binary numbers, the leading bit is always 1 (the number 0 is treated in a different way). Thus, since the 64-bit standard explicitly stores 52 bits for the mantissa, this format actually can provide 53 bits of accuracy for stored numbers.

In binary, the decimal number 1023 can be represented with 10 bits. Thus, the decimal number 1,000 can be stored in about 10 binary bits, and 3 decimal-digit numbers require 10 binary bits. Using this perspective, the decimal number 1,000,000 can be stored with about 20 bits, and about 6 decimal-digit numbers require about 20 bits. Continuing this insight, about 15 decimal-digit numbers require about 50 bits.

Also, the decimal digit 8 requires about 3 binary bits, so 3 binary bits allows storage for roughly another decimal digit).

Putting these observations together, we might expect that the 53-bits utilized in the 64-bit international standard can store about 16 decimal digits of accuracy.


To gain first-hand experience with the storage of double numbers in C/C++, Problem 3 considers the storage of the following numbers.

  "0.1234567890123456789012345678901234567890" ;   // digits for easy counting
  "0.2424242424242424242424242424242424242424" ;   // all digits < 5
  "0.6868686868686868686868686868686868686868" ;   // all digits > 5

Although modern C/C++ compilers often allocate 64 bits for the double data type, the C/C++ standard does not specify this size for all computers and compilers; rather, the number of bits for doubles is machine dependent—the number of bits, and the corresponding number of decimal digits stored may vary from one computing environment to another.


  1. To investigate the storage of double numbers, the program double-storage.c prints the double to several decimal places of accuracy.

    Download and compile this program.

    1. Run the program, based on the number

      0.1234567890123456789012345678901234567890

      where the digit pattern can help count individual decimal places.

      • How does the program store and print the number exactly to 40 decimal places of accuracy? That is, how are the 40 digits stored and printed exactly?
      • The output of the program is organized into groups of lines. Describe what is printed on each line of a group. Also, indicate how each output line is obtained. (You may need to consult a C/C++ manual to understand some functions, such as sprintf.)
      • printf tries to round a double to the number of decimal places specified. For the output involving 13 to 16 decimal places, does the output reflect this rounding?
      • What can you say about rounding (or the lack thereof), when 17 or more decimal places are printed?
      • As the number of digits are printed (after 17 decimal places), what can you say about the accuracy of the double number printed? Why do you think this accuracy is (or is not) observed?
    2. Repeat part a, after modifying the program to process the number

      0.2424242424242424242424242424242424242424

      where all digits are < 5, so no rounding would be appropriate.

    3. Repeat part a, after modifying the program to process the number

      0.6868686868686868686868686868686868686868

      where all digits are > 5, so rounding up would always be appropriate;

Associativity of Addition for Real Numbers

Over the years, many approaches have been developed to compute the value of the number π. Many of these approaches are based on an infinite series, one of which is

series for Pi

Details behind this formula may be found in a Wikipedia article on Leibniz formula for π and a stockexchange.com article on Series that converge to π quickly.

Although this is an infinite sum, calculus (and algebra) indicates that successively better approximations to π may be obtained by including more and more terms of the series. Also, it is worth noting that computationally each term is smaller than the previous.

Program pi-approx.c

  1. Read, analyze, download, compile, and run program pi-approx.c

    1. In reading the program, how are successive terms in the series computed—explain briefly why this approach gives the desired sum of terms.
    2. In past years, some students have indicated confusion regarding which of the terms, T[0], T[1], ..., T[n-1] and T[n], are small and which are large. Of course, the array indices 0, 1, ... , n-1, n are progressively larger, but what about the array elements themselves?
      • In the program, the computation of the terms involves the statement
           T[i] = 2.0 * i * T[i-1] / (2.0 * (2.0*i+1.0));
                    
        Based on this computation, explain algebraically why each computed term is progressively smaller than the previous.
      • Based on the printout of the first and last terms, confirm (in a written statement) that T[0] < T[1] < T[n-1] < T[n], so adding from T[0] up to T[n] adds numerical values from largest to smallest, and adding from T[n] down to T[0] adds numerical values from smallest to largest.
      (Note: Although this may or may not seem clear from the program or algebra, it is vital for the rest of this problem to understand that when the indices of the array elements get larger the values being added get smaller—be sure to ask about this if you have any questions!)
    3. Describe the output generated with the number of terms being 10, 25, 40, 50, 60, 100, and 1000.
    4. To what extent does including more terms to the sum help the accuracy when computing from the largest term to the smallest? Explain.
    5. To what extent does including more terms to the sum help the accuracy when computing from the smallest term to the largest? Explain.
    6. If there is a difference when computing from largest term to smallest versus smallest term to largest, explain the difference. What, if any, conclusions are suggested by the outputs observed from this exercise?

Compounding of Numeric Error

Our discussions of the representation of real numbers (doubles and floats) have identified at least three factors that can cause errors in processing—particularly if the errors can accumulate as processing continues.

Be sure to take these potential troubles into account in answering Steps 5 and 6.

  1. Given that start < end, suppose a loop is to begin at start and finish at (or near) end in exactly n+1 iterations. Within this loop, suppose the control variable will increase by a computed value increment = (end-start)/n with each iteration.

    Two loop structures are proposed:

          // approach 1
          increment = (end - start)/n;
          for (i = 0; i <= n; i++){
               value = start + i * increment;              
              /* processing for value */
            }
        
          // approach 2
          value = start;
          increment = (end - start)/n;
          while (value <= end) {
             /* processing for value */
             value += increment;             
          }
        

    Although the first approach requires somewhat more arithmetic within the loop than the second, it likely will provide better accuracy. Identify two distinct reasons why the first approach should be preferred over the second.

  2. Suppose y = f(x) is a function that decreases significantly from x=a to x=b, on the interval [a, b], with a < b.

    Throughout this interval, assume f(x)>0, and assume the Trapezoidal Rule were to be used to approximate the area under y = f(x) on the interval [a, b].

    1. Assuming accuracy is the highest priority for this computation, should the main loop begin at a and go toward b or begin at b and go toward a, or is either order fine? Explain.

    2. Again, assuming accuracy of the answer is the highest priority, write a reasonably efficient code that implements the Trapezoidal Rule for this function on this interval. (To be reasonably efficient, f(x) should be computed only once for each value of x, and division by 2 should be done as little as possible, as discussed in class.)
      Be sure to include your code within a program, and run several tests of the program.

      For this step, submit both the program and the output from several test runs.

      (Of course, your program must conform to the course's C/C++ Style Guide.)

    3. Explain how and why your approach to this problem (with f(x) decreasing significantly from x=a to x=b) should be different from the code when f(x) increases over this interval.



created 31 March 2022
expanded 24 July 2022
expanded 3 January 2023
modest editing Summer 2023
revised 20 November 2024
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.
ccbyncsa.png

Copyright © 2011-2025 by Henry M. Walker.
Selected materials copyright by Marge Coahran, Samuel A. Rebelsky, John David Stone, and Henry Walker and used by permission.
This page and other materials developed for this course are under development.
This and all laboratory exercises for this course are licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.