Sonoma State University
 
Algorithm Analysis
Instructor: Henry M. Walker

Lecturer, Sonoma State University
Professor Emeritus of Computer Science and Mathematics, Grinnell College

Although CS 415 has been well developed for several years, last year the CS faculty made a significant, long-term curricular change regarding SSU's Upper Division GE Area B Requirement.

CS 415 Signature Project

Following SSU's Upper Division GE Area B Requirement, this Signature Project has two parts: a computer-science-specific Report and a Self Reflection

Computer-Science-Specific Report

The description and instructions for the content-specific elements of a Signature Project Report are organized into several sections.

As described below, this Project has a substantial overall scope. However, to help make this work manageable, students will work in groups of four or five, and groups will meet for discussion and feedback. Specific assignments for individual student work are posted on Canvas.

Motivating Questions

At a high level, course work in computer science at SSU studies a wide range of topics regarding the design, implementation, efficiency, and testing of algorithms. Although many questions arise naturally during this work, the Signature Project focuses on several of the following, with specific assignments described in the small-group and individual assignments specified on Canvas.

  1. "Practice" Data versus "Real" Data: In many classes, students gain practice with algorithms and problem-solving approaches using simplified data and structures. What changes must be made when problems focus on "Real" Data, and how might those changes impact both program structure and efficiency?

  2. Code Clarity versus Efficiency: Software development principles often highlight the importance of code clarity and readability to aid debugging and program maintenance. Although such principles and approaches can aid program development and maintenance, to what extent do these extra procedure calls impact efficiency.

  3. Iteration versus Recursion: Recursion sometimes yields code that is simpler and clearer than an iterative solution, but recursive algorithms also require one or more sequences of procedure calls. On the other hand, recursive code sometimes can take advantage of optimized hardware and software environments to handle the run-time stack, whereas iterative code may depend on the programmer for optimization. Since programmers may choose iteration or recursion for different types of problems, to what extent, and/or under what circumstances, might iteration or recursion be more efficient (or do both approaches generally have similar run times)?
  4. Stability of Sorting Algorithms: A sorting algorithm is said to be stable, if two data elements with the same key appear in the sorted array in the same order as in the original array. In practice, three categories of sorting algorithms can be identified:

    1. some implementations of a sorting algorithm may be stable,
    2. some implementations may not be stable (but a modest refinement would be stable), and
    3. some algorithms follow an approach that inherently is not stable.

    How might tests be developed to determine the stability of a sorting algorithm, how might implementations or algorithms be categorized, and how could implementations in category 2 be fixed?

Overview of Project Work and Schedule

In order for students to gather sufficient data related to the motivating questions, students will be organized into teams of four or five, and each student will be assigned several distinct project components, involving writing/modifying, testing, and running code. In addition, each student will write a report describing elements of programs and analyzing test runs. With this work completed (in a reasonably complete draft form), team members will meet to discuss their own findings, to give feedback to others, and to discuss potential answers to the motivating questions.

The following schedule identifies due dates for this overall work. Note there are substantial point penalties for missing any of these due dates, and work submitted after 1:00 pm Pacific Time on a specified date will be considered late. (When email is involved, time stamps will be used for identifying work submitted late.)

If you do not want to share your email with others in the class, contact the instructor (before Thursday, April 24) about an acceptable alternative.

Individual Assignment Details

Individual student work requires completion of three components. Students should consult the course page on Canvas to determine which track(s) must be completed individually within a team.

The following table summarizes the components and tracks within a component, and indicates which motivating question(s) the components help address.

Component Description Motivating Question(s) Addressed Checklist of Required Work
1

When transitioning from an array of integers (or other primitive data type) to more complex data elements, it is natural to declare an object with multiple data fields. For this project, a Person will have fields for an individual's first name and surname, city, state (or territory), and telephone number.

At the level of coding, the non-numeric fields within a Person could be stored as a C++ string or a C-style char *. Further, if Person fields will be initialized with a constructor, but never changed, these fields might (or might not) be specified with const. In software development, natural questions arise as to whether/how such implementation details might impact code readability, maintenance, operability, and efficiency.

When ordering an array of Persons, at least seven case-sensitive orderings seem natural:

  • by first name (comparison of strings)
  • by surname (comparison of strings)
  • by combined surname and first name (use surnames, but if surnames match, then compare first names)
  • by city (comparison of strings)
  • by state/territory (comparison of strings)
  • by state/territory and city (use state/territory, but if state/territory names match then compare cities)
  • by telephone number (stored and ordered as long integers)

For these orderings equality and ordering are handled by different methods within the Person class, where each method examines only one or two fields, and ignores the others.

Although constructors and public methods for variations in the Person class need not impact the use of the class in applications, variations in private fields lead to two different header files. For the purposes of this assignment, two versions of a Person class are considered for this project:

  • Implementation with some fields involving C++ String objects: a header file Person.hpp and [partial] implementation file Person.cpp.
  • Implementation with some fields involving C-style strings (with const char * pointers and char arrays): a header file PersonAlt.hpp and [partial] implementation file PersonAlt.cpp.

Work for this project component:

  • Download these header and implementation files, and complete any unimplemented code.
    • Note: The header files must not be changed for this project.
    • Before editing any programs, compile and run Person.cpp and PersonAlt.cpp (and save the output). Then answer these questions (which likely will be the same for both programs). As a Signature Project, answers to each question must involve several complete sentences. (Expect point penalties for sentence fragments.)

      • Method equalName compares first names and surnames separately. Another approach might concatenate the surname followed by the first name, and then compare the concatenated strings. Why might this second approach yield incorrect results in some cases?
      • Explain how the testing is done in the main procedure of these two programs. For example, what is the purpose of each of the 2-dimensional arrays, equalNameResults, etc.?
      • Do the test cases in this main procedure fully cover the possible cases that might arise for the methods in this class? If some cases seem unnecessary, identify them, and explain why they are superfluous. If all cases are essential, justify your conclusion.
      • Describe the output obtained by running Person.cpp. For example, the output is structured into parts. Identify each part, and indicate what, if any, summary statements are made.
      • Does the output from PersonAlt.cpp differ from that obtained from Person.cpp? Describe any differences found, if any.
    • Complete Person.cpp and PersonAlt.cpp, by expanding the method stubs in each program.

      • Expanded code will be needed in each program for methods
        equalFirstName, equalSurname, equalCity, equalStateTerritory, equalLocation, equalTelephone, comesBeforeFirstName, comesBeforeSurname, comesBeforeName, comesBeforeCity, comesBeforeStateTerritory, comesBeforeLocation.

        Notes:

        • In most cases, the completed methods can serve as a starting reference to complete the stubs.
        • For each stub, the 2 lines of the method body can be replaced by no more than 6 nicely formatted lines—and most can involve just 1 or 2 simple lines of code!
        • Thus, the overall time for completing the stubs can be rather modest— and this work may provide a reasonable review of both C++ and C-based strings.)
      • Although the main procedure of each program contains thorough testing for most methods, code is not included for testing comesBeforeSurname or comesBeforeName. Expand the current testing with code to test these two additional methods. (Once you have completed this testing code in one program, it likely can be added to the second program with a cut-and-paste.)
    • Run each program and save the output obtained.

      • Note the header of both Person.cpp and PersonAlt.cpp provides instructions regarding how to compile and run the program in a terminal window, and how to compile to an object file which can be used in applications.
      • Explain how the structure of the program supports both this testing and the use of the program. For example, why/how can command-line statements yield two different types of compiled files?
    • Based on your testing output, explain (in at least a few sentences) why you believe your program is correct!

Note: Be sure your completed code for this project component is correct, as it will be used in other components.

Initial part, Motivating Question 1 All students should submit all of the following:
  • answers to questions asked regarding the initial programs (before stubs were completed)
  • a full listing of programs Person.cpp and PersonAlt.cpp, with all stubs completed in each program, and with thorough testing. (Testing should involve all methods, including the two additional tests you wrote.)
  • output from the completed program.
  • complete answers (in English sentences) for the questions about compiling the programs for different purposes and about the correctness of both programs.
2

Program sortAlgsForInts.c provides C code for several sorting algorithms applied to integers, and for testing those algorithms. This component of the Signature Project explores what changes might be needed to translate these algorithms to arrays of objects of the Person class, and also to explore the extent to which adding procedures for clarity and readability might impact run-time efficiency.

Program sortAlgsForPersons.cpp contains a framework for timing versions of the Insertion Sort and two other sorting algorithms for random Person arrays. To help make this work manageable, students will have different assignments (A, B, C, or D), as specified on Canvas.

  1. This assignment focuses on the tradSelectionSort from sortAlgsForInts.c:
    • Translate the code to sort arrays of Person objects, placing the resulting code as the implementation of sortAlgA in sortAlgsForPersons.cpp
    • Modify this code to obtain an implementation of sortAlgB, by writing your own swap procedure and calling that in the selection sort to swap a[smallestIndex] and a[i]
  2. This assignment focuses on the SelSortWInsertion from sortAlgsForInts.c:
    • Translate the code to sort arrays of Person objects, placing the resulting code as the implementation of sortAlgA in sortAlgsForPersons.cpp
    • Modify this code to obtain an implementation of sortAlgB, by writing your own procedure that inserts the max element into the last position by sliding other elements down and calling that in the selection sort to at the end of the outer loop.
  3. This assignment focuses on the traditional, recursive merge sort, involving a basic traditionalMergeSort method and a helper traditionalMerge procedure from sortAlgsForInts.c:
    • Translate the code to sort arrays of Person objects, naming the resulting code for the traditionalMergeSort as sortAlgA and including an appropriate helper (perhaps still named traditionalMergeSort).
    • The traditionalMerge procedure ends with two loops which copy larr and rarr back to array. Write your own procedure that copies a segment of one array to a designated position in another array, and the use your procedure in a revised newMerge procedure that replaces the copying of extra elements by procedure calls. Then copy the traditionalMergeSort to a new sorting merge sort method sortAlgB that uses newMerge rather than traditionalMerge.
  4. This assignment focuses on an iterative version of the Merge sort tradIterMergeSort, which uses a traditionalMerge procedure as defined in sortAlgsForInts.c:
    • Translate the code to sort arrays of Person objects, placing the resulting code as the implementation of traditionalMerge, and tradIterMergeSort in sortAlgsForPersons.cpp (Within sortAlgsForPersons.cpp, tradRecMergeSort should take the place of sortAlgA.)
    • The traditional merge procedure ends with two loops which copy larr and rarr back to array. Write your own procedure that copies a segment of one array to a designated position in another array, and the use your procedure in a revised newMerge procedure that replaces the copying of extra elements by procedure calls. You will need to copy your tradIterMergeSort to a new sortAlgB method that then calls this newMerge

When you have completed your assigned work on the sorting algorithms (just handling one of case A, B, C, or D, as specified on Canvas), follow these instructions.

  • Within sortAlgsForPersons.cpp, four constants/expressions, defined after the opening program header, may need to be adjusted to give meaningful results on your local computer.
              // range of simulations
              const int minDataSetSize = 40;
              const int maxDataSetSize = 640;
              //int maxDataSetSize = 40960000;
              //int maxDataSetSize = 81920000;
              //int maxDataSetSize = 327680000;
    
              // expression for incrementing size variable in simulation outer loop
              #define sizeIncrement  size *= 2
    
              // number of iterations for simulation/timing of each sorting size
              const int numSimIterations = 1500;
            
    When you first run the program, follow these guidelines:
    • Adjust numSimIterations, so the elapsed time for the size of the first data sets is between 2.0 and 4.0 seconds. If needed, minDataSetSize also could be adjusted.
    • Adjust minDataSetSize and maxDataSetSize, so that the output includes about 4 sizes of data. (Note, however, that in my experience, it is not uncommon to get a segmentation fault from time to time. If this happens to you, try running the program several times.)
    • Efficiency analysis can be relatively straight forward, when the size of experiments doubles for each set of experiments. However, if experiments consistently yield segmentation faults, try changing the sizeIncrement defintion (e.g., to size += 50).
    • Before proceeding in this assignment, continue adjusting these variables/definitions until you obtain four sets of non-zero timings—at least for some runs.
  • Since ordering by phone involves a simple comparison of long integers, timing for this sorting will largely correspond to our sorting experiments using integers (perhaps with some overhead for the class structure). However, ordering either by name or by location involves comparisons of strings. Run the program to obtain timings for sorting by phone, by name, and by location.
    • When you run the program, to what extent does the nature of the data for comparison seem to impact sorting times for random data? For example, is the sorting time about the same, about 10% longer for strings, about 20% longer, etc.? Justify your answer.
    • Compare timings and data set size for sortAlgsForInts.c and sortAlgsForPersons.cpp. One program sorts int data and the other objects containing string and integer data. Describe any differences you observe, and provide some reasons that might be behind these results.
    • In reviewing output of this program, compare the times required for both versions of the sorting algorithms (with and without the swapping or shifting procedures). To what extent does using the swap/shift procedure change the overall sorting time for each algorithm? Are the times the same with or without the swap procedure? If not, which is faster and how much so (e.g., 5%, 10%, 15%, etc.)? Overall to what extent does coding for readability in this case seem to impact code run times?
Motivating Question 1 (continued) and Motivating Question 2 (regarding code design for clarity) Based on the assignment posted on Canvas, (A, B, C, or D), each student should submit the following:
  • A listing of the extended SortingAlgForPersons.cpp program and the output it produces.
  • Discussion of both the relative readability of the code with and without swap procedures, and analysis of the extent to which utilizing the swap procedure impacts run time.
3

This component examines the relative efficiency of recursive versus iterative implementations of code by comparing two versions of the same sorting algorithm. For this component, all students will start by considering program sortAlgsForInts.c, which provides C code for several sorting algorithms applied to integers, and for testing those algorithms. Further, program sortAlgsForPersons.cpp contains a framework for timing versions of the Insertion Sort and two other sorting algorithms for random Person arrays. To spread the workload over members of small groups, students will have different assignments (A, B, or C) as specified on Canvas, as follows:

  1. This assignment focuses on the tradSelectionSort from sortAlgsForInts.c:
    • Translate the code to sort arrays of Person objects, placing the resulting code as the implementation of sortAlgA in sortAlgsForPersons.cpp (This is the same work that is part of assignment A for Project Component 2.)
    • Using your code in sortAlgA as a base, rework the algorithm by replacing the outer loop by a recursive procedure to obtain an implementation of sortAlgB. (If needed, feel free to utilize a husk and kernel approach, with sortAlgB as the husk and a helper method as the kernel.)
    • Note: For the recursive version of this code, you might consult Is it possible to have recursive Selection Sort? by Quora.
  2. This assignment focuses on the SelSortWInsertion from sortAlgsForInts.c:
    • Translate the code to sort arrays of Person objects, placing the resulting code as the implementation of sortAlgA in sortAlgsForPersons.cpp (This is the same work that is part of assignment B for Project Component 2.)
    • Using your code in sortAlgA as a base, rework the algorithm by replacing the outer loop by a recursive procedure to obtain an implementation of sortAlgB. (If needed, feel free to utilize a husk and kernel approach, with sortAlgB as the husk and a helper method as the kernel.)
    • Note: For the recursive version of this code, you might consult Selection Sort Algorithm – Iterative & Recursive | C, Java, Python by Techie Delight.
  3. This assignment focuses on both the recursive and iterative versions of the merge sort from sortAlgsForInts.c: In particular, translate the code for traditionalMergeSort as sortAlgA, tradIterMergeSort as sortAlgB, and traditionalMerge as a separate private method. (This work is similar to parts of assignments C and D for Component 2.)

When you have completed your assigned work on the sorting algorithms (just handling one of case A, B, or C, as specified on Canvas), follow these instructions.

  • As in component 2, adjust variables/definitions for minDataSetSize, maxDataSetSize, sizeIncrement, and numSimIterations (found just after the program header), so that the elapsed time for running the first data set is between 2.0 and 4.0 seconds, and so that the program does not yield segmentation faults (at least for some runs).
  • Program sortAlgsForInts.c (from above) shows both recursive and iterative implementations of a Merge Sort.
    • Review the two implementations carefully. Except for the difference form (recursion versus iteration), to what extent do these implementations perform exactly the same work. Of course, recursion and iteration may perform tasks in different orders, but is the overall work the same, or does one implementation perform some work that the other does not? Explain.
    • Run sortAlgsForInts.cpp, and compare the run times of the recursive and iterative implementations. Are the times similar, or is one approach more efficient than the other (and by about how much 5%, 10%. etc.)? Explain.
  • Now focus on the iterative and recursive methods that you wrote within sortAlgsForPersons.cpp
    • With your addition of the recursive implementation for your assigned sorting algorithm, run the program and compare the timings of the recursive and iterative versions of the sorting procedure.
    • Is the time roughly the same, or is one implementation more efficient? And if there is a noticeable difference, how much more (e.g., 5%, 10%, etc.)?
  • Looking at the results you obtained for this component of this project, and considering results shared by others in your group, can you make a preliminary conclusion regarding the likely impact of utilizing recursion versus iteration in implementing an algorithm? Does one approach have a clear efficiency advantage? Does one approach have a clear advantage regarding clarity? Overall, what guidelines might you suggest regarding the use of recursion versus iteration? Explain.
Motivating Question 3 regarding efficiency of recursive versus iteratie implementations Based on the assignment posted on Canvas, (A, B, or C), each student should submit all of the following:
  • A listing of the extended SortingAlgForPersons.cpp program and the output it produces.
  • Discussion of both the relative readability of the code with and without swap procedures), and analysis of the extent to which utilizing the swap procedure impacts run time.
4

As introduced in component 2, program sortAlgsForPersons.cpp runs versions of three sorting algorithms, Selection Sort, Insertion Sort, and Merge Sort for several Person array. The code also checks that each algorithm sorts correctly by name, by location, and by phone. This component considers whether the sorting algorithm is stable.

For this Project component, check the Project page on Canvas to determine which sorting algorithm you should utilize and whether you should consider sorting by name or by location.

  • Describe how code might be tested to determine if the implementation of a sorting algorithm is stable.
    • One possible approach would involve two steps: First sort the array by telephone number and then by either name or location.
    • Will this approach provide a solid test to determine stability? If so, explain carefully. If not, suggest a different testing strategy and explain why it works.
  • Add a method testStability to program sortAlgsForPersons.cpp to implement your stability test.
  • Run your program to determine if your designated algorithm for this Project component is stable.
    • If the sorting algorithm is stable, explain why.
    • If not, but the code can be refined (in no more than a couple lines) to yield a stable algorithm, describe those changes, then test your code to demonstrate the adjusted algorithm is stable.
    • If the code cannot be easily changed to yield a stable algorithm, explain why such an adjustment is not possible.
  • If you can obtain a stable sort procedure, run the following experiments.
    • Experiment 1:
      • Modify your program to perform a sort by name and save the results in an array or vector.
      • For a copy of the original array or vector, first sort by firstName and then by surname.
      • To what extent is the result of these two sorts identical?
    • Experiment 2: Repeat the previous experiment, except in the second step sort by surname and then by FirstName. Again, to what extent is the result of these two sorts identical?
  • To what extent do these experiments confirm or contradict your conclusions about designing tests for stability, as discussed at the start of this Project component?
Motivating Question 4 Based on the assignment posted on Canvas, (A, B, or C), each student should submit all of the following:
  • Description of an algorithm to test whether an algorithm is stable, and an argument why this test works.
  • A listing of the sortAlgForPersons.cpp program and the output it produces.
  • If the original sorting algorithm (as translated from sortAlgForInts.cpp) was not stable,
    • submit the code for a modest revision and an explanation why the revision works, or
    • submit an explanation why a modest adjustment is not possible to obtain stability.
    • Discuss the results and conclusions of the two experiments described toward the end of this Project component.



Required Self Reflection

Summary: Write a self reflection that connects work done in this course with other course work in computer science and with the application and/or use of computing in contemporary life.

Due Date (e-mail Before Class): Thursday, May 15

Target Audience: This paper may be shared with the SSU General Education Committee, the CS faculty, and some SSU administrators. As the SSU GE Committee and SSU administrators include people from a variety of disciplines, one should not assume readers will have substantial background in computing. (Pending instruction to the contrary from the SSU Committee or CS faculty, it is not intended that this paper will be distributed to students.)

Assignment Details: Since this course has been designated to meet SSU's Upper Division GE Area B: Scientific Inquiry and Quantitative Reasoning, including a Signature Assignment with a self reflection, this course requires that you write about what you have learned from this course, and how it carries over into your broader studies and experiences at SSU and beyond. To meet this requirement, please include answers to the following:

Note: You should devote at least one paragraph (each with at least 6 sentences) addressing each question, yielding at least a full page,




created September 10, 2024
revised October-November, 2024
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.
ccbyncsa.png

Copyright © 2011-2025 by Henry M. Walker.
Selected materials copyright by Marge Coahran, Samuel A. Rebelsky, John David Stone, and Henry Walker and used by permission.
This page and other materials developed for this course are under development.
This and all laboratory exercises for this course are licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.