CS 115: Lab 11

Goals

Upon successful completion of this lab, you should be able to trace the execution of selection sort and bottom-up merge sort.

Setup and Practice

Setup

If your lab machine was not started in MacOS when you went to use it, reboot it to MacOS.
Follow these procedures to mount your blue home-directory on the local machine and create a PyCharm project (call it Lab11). You will use PyCharm in the next part, but, it is important that you set up your project now so that if you run into any issue, we can resolve them quickly.
Right-click or control-click to download these 2 files and place them into the directory for this lab:
- cities.txt
- cities-small.txt

Practice

In this lab, we explore different ways of taking the elements of a list and sortingascending order. That is, L[0] <= L[1] will be True, as will L[1] <= L[2], etc.

For numbers, this means that the smaller numbers will be at the beginning of the list. For strings, the list will be alphabetically sorted from A-Z as long as we are never comparing uppercase to lowercase letters. Note that the techniques we use can be easily modified to support descending order (where L[0] >= L[1], etc.).

Note: Python provides a sort() function, but we will not use this in Lab 11. Instead, you will code to sort the list yourself using two well-known algorithms: selection sort and merge sort.

Answer Question 1 in your writeup.
Selection sort

Selection sort is an algorithm for sorting lists. It works by making multiple passes over the list. In each pass, it does two things:
- Finds the index of the minimum element in the unsorted portion of list.
- Swaps that element into its correct position.
For example, consider the list
```
['E', 'G', 'A', 'C', 'D']
```
In the first pass through the list, selection sort would identify position 2 as the index of the minimum element (since 'A' is the minimum element). Then, it would swap 'A' with the current first element:
```
['A', 'G', 'E', 'C', 'D']
```
In the next pass, we would ignore 'A', since we know it's in its final position already. The index of the minimum element in the remainder of the list is 3 (the position of 'C'). Then we would swap 'C' into place:
```
['A', 'C', 'E', 'G', 'D']
```
and so on.
Answer Questions 2 and 3 in your writeup.
Merge sort

Merge sort is a more efficient, although more complicated, sorting algorithm. We will be doing a bottom-up merge sort, which works as follows:
- Start by sorting pairs of adjacent elements. After this step, our list will consist of a bunch of 2-element sorted sub-lists.
- Merge pairs of adjacent 2-element sorted sublists. Now we will have a bunch of adjacent 4-element sorted sublists.
- Keep doubling the size of the sublists to be merged until the entire list is in sorted order.
For the list ['B', 'F', 'H', 'A', 'E', 'G', 'D', 'C'], this algorithm will work as follows:
1. Sort pairs of elements. We would look at 'B' and 'F' (which are already in order); 'H' and 'A' (which are not); 'E' and 'G' (which are); and 'D' and 'C' (which are not). After we put each of these 4 pairs in order, we have the list ['B', 'F', A', 'H', 'E', 'G', 'C', 'D']
2. Merge pairs of two-element sub-lists. This is a relatively fast operation since each two-element list is guaranteed to be in order thanks to the previous step. So we merge ['B', 'F'] with ['A', H'] and ['E', 'G'] with ['C', 'D']. This gives us ['A', 'B', 'F', 'H', 'C', 'D', 'E', 'G']
3. Merge the 4-element sublists ['A', 'B', 'F', 'H'] and ['C', 'D', 'E', 'G'], which will again be fast since each sublist is sorted. This gives us the final list ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'].
Answer Question 4 in your writeup.

Part A: Finding the minimum element of a list

We are going to write our own sorting function using an algorithm called selection sort. In selection sort, for a list of length N, we make (N-1) passes over the list:

On the first pass, we find the minimum element and swap it with the element at position 0.
On the second pass, we find the second-smallest element and swap it with the element in position 1
And so on...

For a visual summary of selection sort, see http://www.algolist.net/Algorithms/Sorting/Selection_sort.

Minimum element of L

A key part of selection sort is finding the minimum element of the list. For lists of strings of the same case, the minimum element is the one that comes first alphabetically. In this part of the lab, you will write a function to find the index of the minimum element of your list of cities. Here is the specification for that function:

def find_index_of_min(L):
    '''Returns the index of the minumum element of L. (We assume L contains
    no duplicate elements, for simplicity).
    
    Args:
        L (list): A list of strings.

    Returns:
        int: The index of the minimum element of L.
    '''

For example, the function call

find_index_of_min(['BBB', 'AAA', 'CCC'])

should produce, as the return value, the integer 1, since this is the index of the minimum element, 'AAA'.

Instructions

Create a new file called lab11a.py.
Copy the definitions of the functions readfile and print_list from your Lab 10 code into your new file.
Based on your Lab 10 code, define a main function that does the following:
- Asks the user for the name of the input file
- Calls readfile to read the file contents into a list of lines
- Calls print_list to print the resulting list

Test your code using the following sample input:

Name of input file: cities-small.txt

The original list of cities is:
0: Santa Rosa
1: Petaluma
2: Rohnert Park
3: Windsor
4: Healdsburg

Add the following empty function definition to your code:

def find_index_of_min(L):
    '''Returns the index of the minumum element of L. (We assume L contains
    no duplicate elements, for simplicity).
    
    Args:
        L (list): A list of strings.

    Returns:
        int: The index of the minimum element of L.
    '''
    pass

If you feel ready to write this function on your own, go ahead! Write the function, skip the next step, and go straight to the test cases.
To write this function, follow these steps:
- Begin by immediately returning None if the list L is empty.
- You'll need a variable -- essentially an accumulator variable -- to keep track of the index of the minimum element you've seen so far. Decide on the appropriate datatype, and initialize this variable to an appropriate value.
- Write a loop to look at the elements of L:
  - Each iteration, compare an element of L with the minimum value you've seen so far.
  - If the current element of L is smaller than the minimum element you've seen so far, update your accumulator variable.
  - Remember that this variable should hold a numeric value!
- After you have seen all of the elements of L, return the value of the variable holding the index of the minimum element.

To test your code, add the following lines of code to the end of main():

    print(find_index_of_min([]))
    print(find_index_of_min([3, 2, 1, 0]))
    print(find_index_of_min(['A', 'Z', 'Y', 'B']))
    print(find_index_of_min(['B', 'A', 'Z', 'Y']))

Run your program, and supply any valid filename when prompted. When the last 4 lines of output match your answers to Question 2 of the write-up, remove these 4 lines of code.
Be sure you are not using Python's built-in index() function anywhere in your code! It is less efficient than the code you are writing, and it will actually mess you up in subsequent parts of this lab.
Continue to Part B.

Part B: Restricting the search

Next, you will modify find_index_of_min() to find the index of the minimum element starting at an index you specify. For example, the call

find_index_of_min(['B', 'C', 'Z', 'D', 'E'], 2)

would return 3 because you told it to start at index 2, and so it ignores the first two elements of the list.

Instructions

Make a new copy of your lab11a.py file, and name it lab11b.py.
Modify the def line for find_index_of_min to take an additional parameter called start_index.
Modify the first if-statement in find_index_of_min to return None if the list does not have enough elements to start at start_index.
Modify your code inside the definition so that it does not consider any elements before start_index. In particular, look for places in your code where you assumed you were starting at index 0, and change those to start_index.

Test your code by adding the following 4 lines to main():

    print(find_index_of_min([], 0))
    print(find_index_of_min([3, 2, 1, 0], 3))
    print(find_index_of_min([3, 2, 1, 0], 4))
    print(find_index_of_min(['A', 'Z', 'Y', 'B'], 1))
    print(find_index_of_min(['B', 'Z', 'A', 'Y'], 2))

Answer Question 5 in your writeup, and then remove these 5 lines of code.

Update the docstring for find_index_of_min to reflect your changes (i.e., the new parameter and the new behavior).
Continue to Part C.

Part C: Selection sort

Next, you will extend your program to implement selection sort.

Instructions

Make a copy of your lab11b.py file, and name it lab11c.py.

Add the following function definition for selection_sort:

def selection_sort(L):
    '''Uses the selection sort algorithm to sort a list.
    Sorts the original list that was passed to it (has no return value).
    
    Args:
        L (list): The unsorted list.
    '''
    pass

Inside the definition of selection_sort, add a line of code that calls your find_index_of_min function to find the index of the minimum element in the entire list and saves it to a variable.
Inside the definition of selection_sort, add a line of code that swaps the first element of the list with the minimum element of the list.

At the end of main, add a call to selection_sort followed by a second call to print_list:

Name of input file: cities-small.txt

The original list of cities is:
0: Santa Rosa
1: Petaluma
2: Rohnert Park
3: Windsor
4: Healdsburg

The new list of cities is:
0: Healdsburg
1: Petaluma
2: Rohnert Park
3: Windsor
4: Santa Rosa

Inside the definition of selection_sort, add a line of code that prints the two elements that were swapped:

Name of input file: cities-small.txt

The original list of cities is:
0: Santa Rosa
1: Petaluma
2: Rohnert Park
3: Windsor
4: Healdsburg
Swapped elements 0 and 4 -- Santa Rosa and Healdsburg

The new list of cities is:
0: Healdsburg
1: Petaluma
2: Rohnert Park
3: Windsor
4: Santa Rosa

Inside selection_sort, move your code into a loop that makes N-1 passes through the list instead of just one pass (where N is the number of elements). In each iteration, you should modify your code to do these three things:

Call find_index_of_min to find the smallest element in the unsorted part of the list.
Swap that element with the first element in the unsorted part of the list.
Print the two elements that were swapped.

Sample output:

Name of input file: cities-small.txt

The original list of cities is:
0: Santa Rosa
1: Petaluma
2: Rohnert Park
3: Windsor
4: Healdsburg
Swapped elements 0 and 4 -- Santa Rosa and Healdsburg
Swapped elements 1 and 1 -- Petaluma and Petaluma
Swapped elements 2 and 2 -- Rohnert Park and Rohnert Park
Swapped elements 3 and 4 -- Windsor and Santa Rosa

The new list of cities is:
0: Healdsburg
1: Petaluma
2: Rohnert Park
3: Santa Rosa
4: Windsor

Answer Question 6 in your writeup.
Run your program on cities.txt, and answer Question 7 in your writeup.
Continue to Part D.

Part D: Merge sort

Next, you will extend your program to implement merge sort.

Instructions

Make a copy of your lab11c.py, and name it lab11d.py.
In main, comment out all of your code for now...

Add the following function definition to your code:

def merge(L, start_index, sublist_size):
    '''Merges two sublists, or two chunks of the larger list L.
    The left chunk is 
      L[start_index] to L[start_index + sublist_size - 1]
    The right chunk is 
      L[start_index + sublist_size] to L[start_index + 2 * sublist_size - 1]
    Modifies the original list that was passed to it (has no return value).

    Args:
        L (list): The list.
        start_index (int): The index of the first element to be merged.
        sublist_size (int): The size of the chunks to be merged.
    '''
    index_left = start_index
    left_stop_index = start_index + sublist_size
    index_right = start_index + sublist_size
    right_stop_index = min(start_index + 2 * sublist_size,
                           len(L))

    print('Merging sublists:', L[index_left:left_stop_index], 'and',
          L[index_right:right_stop_index]);

    L_tmp = []

    while (index_left < left_stop_index and
           index_right < right_stop_index):
        if L[index_left] < L[index_right]:
           L_tmp.append(L[index_left])
           index_left += 1
        else:
           L_tmp.append(L[index_right])
           index_right += 1

    if index_left < left_stop_index:
           L_tmp.extend(L[index_left : left_stop_index])
    if index_right < right_stop_index:
           L_tmp.extend(L[index_right : right_stop_index])

    L[start_index : right_stop_index] = L_tmp
    print('Merged sublist:', L_tmp, '\n')

In main, add the following call to merge at the end:
```
merge(['Healdsburg', 'Windsor', 'Rohnert Park', 'Santa Rosa'], 0, 2)
```
This code will do the following to the 4-element list that we passed to it:
- Starting at index 0 (the second parameter), look at a pair of adjacent 2-element chunks (the 2 comes from the third parameter).
- Assuming that each of the individual chunks is in sorted order, merge them into a single, sorted 4-element chunk.
Answer Questions 8 and 9 in your writeup.

Now, we'll use this code to help us sort an entire list. Here's how:

Start by merging pairs of adjacent elements (1-element chunks). Now we will have a bunch of adjacent 2-element sorted lists.
Merge pairs of adjacent 2-element chunks. Now we will have a bunch of adjacent 4-element sorted lists.
Keep doubling the size of the lists to be merged until the entire list is in sorted order.

Here is some code that does the first step:

def merge_sort(L):
    '''Sorts a list L using the merge sort algorithm.
    Sorts the original list that was passed to it (has no return value).

    Args:
        L (list): The list.
    '''
    chunksize = 1  # Start by dividing the list into N sub-lists of 1 element each

    # TODO: This starter code doesn't fully sort the list. You will finish it.

    print("\n*** Sorting sublists of size", chunksize)

    # Divide the list into pairs of chunks
    left_start_index = 0  # Start of left chunk in each pair

    # While we still have chunks to merge
    while left_start_index + chunksize < len(L):
        merge(L, left_start_index, chunksize)

        # Move to next pair of chunks
        left_start_index += 2 * chunksize

    print('List is now', L)

In main, remove your call to merge, and put the rest of your code back in. Comment out the call to selection sort, and call merge_sort instead for now.

To continue sorting the list, modify your code as follows:

Move all of this code (except for the initialization statement) into an outer loop. The loop should continue as long as your chunk size is less than the size of the list.
At the end of each outer loop iteration, double chunksize.

Sample input/output:

Name of input file: cities-small.txt

The original list of cities is:
0: Santa Rosa
1: Petaluma
2: Rohnert Park
3: Windsor
4: Healdsburg

*** Sorting sublists of size 1
Merging sublists: ['Santa Rosa'] and ['Petaluma']
Merged sublist: ['Petaluma', 'Santa Rosa'] 

Merging sublists: ['Rohnert Park'] and ['Windsor']
Merged sublist: ['Rohnert Park', 'Windsor'] 

List is now ['Petaluma', 'Santa Rosa', 'Rohnert Park', 'Windsor', 'Healdsburg']

*** Sorting sublists of size 2
Merging sublists: ['Petaluma', 'Santa Rosa'] and ['Rohnert Park', 'Windsor']
Merged sublist: ['Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor'] 

List is now ['Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor', 'Healdsburg']

*** Sorting sublists of size 4
Merging sublists: ['Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor'] and ['Healdsburg']
Merged sublist: ['Healdsburg', 'Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor'] 

List is now ['Healdsburg', 'Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor']

The new list of cities is:
0: Healdsburg
1: Petaluma
2: Rohnert Park
3: Santa Rosa
4: Windsor

Test your code on the larger cities.txt file, and answer Questions 10 and 11 in your writeup.

Add code to main that lets the user select whether to sort using selection sort or merge sort. You can assume that the user's input will always be 's', 'S', 'm', or 'M':

Name of input file: cities-small.txt
Type S for selection sort and M for merge sort: m

The original list of cities is:
0: Santa Rosa
1: Petaluma
2: Rohnert Park
3: Windsor
4: Healdsburg
...etc.

Demo. Demo your program.
Continue to the next part to submit your program. Note: we refer back to this code in future labs, so keep a copy in a safe place.

Assignment Submission

Instructions

Answer the last question (#12) in your Moodle writeup. Review your answers, and then click the "Next" button at the bottom of the quiz. Once you do that, you should see a "Summary of Attempt" screen.
Click the "Submit all and finish" button. Warning: You must hit "Submit all and finish" so that your writeup can be graded! It is not submitted until you do this. Once you have submitted your quiz, you should see something similar to this at the top of your Moodle window. The important part is that the State shows up as Finished.

Please leave this tab open in your browser.
Click on the "Lab 11 code" link in Moodle and open in a new tab. Follow the instructions to upload your source code (lab11d.py) for Lab11. You could either browse for your code or, using a finder window, drag and drop your lab11d.py from your cs115/Lab11 folder to Moodle. You should subsequently see a dialog box which indicates 'Submission Status' as 'Submitted for grading'. Leave this tab open in your browser.
With these confirmation pages open in your browser, you may call an instructor over to verify that you have completed every part of the lab. Otherwise, you are done!