Goals
Upon successful completion of this lab, you should be able to trace the execution of selection sort and bottom-up merge sort.
Setup and Practice
Setup
- If your lab machine was not started in MacOS when you went to use it, reboot it to MacOS.
- Follow these procedures to mount your blue home-directory on the local machine and create a PyCharm project (call it Lab11). You will use PyCharm in the next part, but, it is important that you set up your project now so that if you run into any issue, we can resolve them quickly.
- Right-click or control-click to download these 2 files and place them into the directory for this lab:
Practice
In this lab, we explore different ways of taking the elements of a list and sortingascending order. That is, L[0] <= L[1]
will be True
, as will L[1] <= L[2]
, etc.
For numbers, this means that the smaller numbers will be at the beginning of the list. For strings, the list will be alphabetically sorted from A-Z as long as we are never comparing uppercase to lowercase letters. Note that the techniques we use can be easily modified to support descending order (where L[0] >= L[1]
, etc.).
Note: Python provides a sort()
function, but we will not use this in Lab 11. Instead, you will code to sort the list yourself using two well-known algorithms: selection sort and merge sort.
- Answer Question 1 in your writeup.
Selection sort
Selection sort is an algorithm for sorting lists. It works by making multiple passes over the list. In each pass, it does two things:
- Finds the index of the minimum element in the unsorted portion of list.
- Swaps that element into its correct position.
For example, consider the list
['E', 'G', 'A', 'C', 'D']
In the first pass through the list, selection sort would identify position 2 as the index of the minimum element (since 'A' is the minimum element). Then, it would swap 'A' with the current first element:
['A', 'G', 'E', 'C', 'D']
In the next pass, we would ignore 'A', since we know it's in its final position already. The index of the minimum element in the remainder of the list is 3 (the position of 'C'). Then we would swap 'C' into place:
['A', 'C', 'E', 'G', 'D']
and so on.
- Answer Questions 2 and 3 in your writeup.
Merge sort
Merge sort is a more efficient, although more complicated, sorting algorithm. We will be doing a bottom-up merge sort, which works as follows:
- Start by sorting pairs of adjacent elements. After this step, our list will consist of a bunch of 2-element sorted sub-lists.
- Merge pairs of adjacent 2-element sorted sublists. Now we will have a bunch of adjacent 4-element sorted sublists.
- Keep doubling the size of the sublists to be merged until the entire list is in sorted order.
For the list
['B', 'F', 'H', 'A', 'E', 'G', 'D', 'C']
, this algorithm will work as follows:- Sort pairs of elements. We would look at 'B' and 'F' (which are already in order); 'H' and 'A' (which are not); 'E' and 'G' (which are); and 'D' and 'C' (which are not). After we put each of these 4 pairs in order, we have the list
['B', 'F', A', 'H', 'E', 'G', 'C', 'D']
- Merge pairs of two-element sub-lists. This is a relatively fast operation since each two-element list is guaranteed to be in order thanks to the previous step. So we merge
['B', 'F']
with['A', H']
and['E', 'G']
with['C', 'D']
. This gives us['A', 'B', 'F', 'H', 'C', 'D', 'E', 'G']
- Merge the 4-element sublists
['A', 'B', 'F', 'H']
and['C', 'D', 'E', 'G']
, which will again be fast since each sublist is sorted. This gives us the final list['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
.
- Answer Question 4 in your writeup.
Part A: Finding the minimum element of a list
We are going to write our own sorting function using an algorithm called selection sort. In selection sort, for a list of length N, we make (N-1) passes over the list:
- On the first pass, we find the minimum element and swap it with the element at position 0.
- On the second pass, we find the second-smallest element and swap it with the element in position 1
- And so on...
Minimum element of L
A key part of selection sort is finding the minimum element of the list. For lists of strings of the same case, the minimum element is the one that comes first alphabetically. In this part of the lab, you will write a function to find the index of the minimum element of your list of cities. Here is the specification for that function:
def find_index_of_min(L): '''Returns the index of the minumum element of L. (We assume L contains no duplicate elements, for simplicity). Args: L (list): A list of strings. Returns: int: The index of the minimum element of L. '''
For example, the function call
find_index_of_min(['BBB', 'AAA', 'CCC'])
should produce, as the return value, the integer 1
, since this is the index of the minimum element, 'AAA'
.
Instructions
- Create a new file called lab11a.py.
- Copy the definitions of the functions
readfile
andprint_list
from your Lab 10 code into your new file. - Based on your Lab 10 code, define a
main
function that does the following:- Asks the user for the name of the input file
- Calls
readfile
to read the file contents into a list of lines - Calls
print_list
to print the resulting list
Test your code using the following sample input:
Name of input file: cities-small.txt The original list of cities is: 0: Santa Rosa 1: Petaluma 2: Rohnert Park 3: Windsor 4: Healdsburg
Add the following empty function definition to your code:
def find_index_of_min(L): '''Returns the index of the minumum element of L. (We assume L contains no duplicate elements, for simplicity). Args: L (list): A list of strings. Returns: int: The index of the minimum element of L. ''' pass
- If you feel ready to write this function on your own, go ahead! Write the function, skip the next step, and go straight to the test cases.
- To write this function, follow these steps:
- Begin by immediately returning
None
if the listL
is empty. - You'll need a variable -- essentially an accumulator variable -- to keep track of the index of the minimum element you've seen so far. Decide on the appropriate datatype, and initialize this variable to an appropriate value.
- Write a loop to look at the elements of L:
- Each iteration, compare an element of L with the minimum value you've seen so far.
- If the current element of L is smaller than the minimum element you've seen so far, update your accumulator variable.
- Remember that this variable should hold a numeric value!
- After you have seen all of the elements of L, return the value of the variable holding the index of the minimum element.
- Begin by immediately returning
To test your code, add the following lines of code to the end of
main()
:print(find_index_of_min([])) print(find_index_of_min([3, 2, 1, 0])) print(find_index_of_min(['A', 'Z', 'Y', 'B'])) print(find_index_of_min(['B', 'A', 'Z', 'Y']))
- Run your program, and supply any valid filename when prompted. When the last 4 lines of output match your answers to Question 2 of the write-up, remove these 4 lines of code.
- Be sure you are not using Python's built-in
index()
function anywhere in your code! It is less efficient than the code you are writing, and it will actually mess you up in subsequent parts of this lab. - Continue to Part B.
Part B: Restricting the search
Next, you will modify find_index_of_min()
to find the index of the minimum element starting at an index you specify. For example, the call
find_index_of_min(['B', 'C', 'Z', 'D', 'E'], 2)
would return 3 because you told it to start at index 2, and so it ignores the first two elements of the list.
Instructions
- Make a new copy of your lab11a.py file, and name it lab11b.py.
- Modify the
def
line forfind_index_of_min
to take an additional parameter calledstart_index
. - Modify the first if-statement in
find_index_of_min
to returnNone
if the list does not have enough elements to start atstart_index
. - Modify your code inside the definition so that it does not consider any elements before
start_index
. In particular, look for places in your code where you assumed you were starting at index 0, and change those tostart_index
. Test your code by adding the following 4 lines to
main()
:print(find_index_of_min([], 0)) print(find_index_of_min([3, 2, 1, 0], 3)) print(find_index_of_min([3, 2, 1, 0], 4)) print(find_index_of_min(['A', 'Z', 'Y', 'B'], 1)) print(find_index_of_min(['B', 'Z', 'A', 'Y'], 2))
Answer Question 5 in your writeup, and then remove these 5 lines of code.- Update the docstring for
find_index_of_min
to reflect your changes (i.e., the new parameter and the new behavior). - Continue to Part C.
Part C: Selection sort
Next, you will extend your program to implement selection sort.Instructions
- Make a copy of your lab11b.py file, and name it lab11c.py.
Add the following function definition for
selection_sort
:def selection_sort(L): '''Uses the selection sort algorithm to sort a list. Sorts the original list that was passed to it (has no return value). Args: L (list): The unsorted list. ''' pass
- Inside the definition of
selection_sort
, add a line of code that calls yourfind_index_of_min
function to find the index of the minimum element in the entire list and saves it to a variable. - Inside the definition of
selection_sort
, add a line of code that swaps the first element of the list with the minimum element of the list. - At the end of
main
, add a call toselection_sort
followed by a second call toprint_list
:Name of input file: cities-small.txt The original list of cities is: 0: Santa Rosa 1: Petaluma 2: Rohnert Park 3: Windsor 4: Healdsburg The new list of cities is: 0: Healdsburg 1: Petaluma 2: Rohnert Park 3: Windsor 4: Santa Rosa
- Inside the definition of
selection_sort
, add a line of code that prints the two elements that were swapped:Name of input file: cities-small.txt The original list of cities is: 0: Santa Rosa 1: Petaluma 2: Rohnert Park 3: Windsor 4: Healdsburg Swapped elements 0 and 4 -- Santa Rosa and Healdsburg The new list of cities is: 0: Healdsburg 1: Petaluma 2: Rohnert Park 3: Windsor 4: Santa Rosa
- Inside
selection_sort
, move your code into a loop that makes N-1 passes through the list instead of just one pass (where N is the number of elements). In each iteration, you should modify your code to do these three things:- Call
find_index_of_min
to find the smallest element in the unsorted part of the list. - Swap that element with the first element in the unsorted part of the list.
- Print the two elements that were swapped.
Name of input file: cities-small.txt The original list of cities is: 0: Santa Rosa 1: Petaluma 2: Rohnert Park 3: Windsor 4: Healdsburg Swapped elements 0 and 4 -- Santa Rosa and Healdsburg Swapped elements 1 and 1 -- Petaluma and Petaluma Swapped elements 2 and 2 -- Rohnert Park and Rohnert Park Swapped elements 3 and 4 -- Windsor and Santa Rosa The new list of cities is: 0: Healdsburg 1: Petaluma 2: Rohnert Park 3: Santa Rosa 4: Windsor
- Call
- Answer Question 6 in your writeup.
- Run your program on cities.txt, and answer Question 7 in your writeup.
- Continue to Part D.
Part D: Merge sort
Next, you will extend your program to implement merge sort.Instructions
- Make a copy of your lab11c.py, and name it lab11d.py.
- In
main
, comment out all of your code for now... - Add the following function definition to your code:
def merge(L, start_index, sublist_size): '''Merges two sublists, or two chunks of the larger list L. The left chunk is L[start_index] to L[start_index + sublist_size - 1] The right chunk is L[start_index + sublist_size] to L[start_index + 2 * sublist_size - 1] Modifies the original list that was passed to it (has no return value). Args: L (list): The list. start_index (int): The index of the first element to be merged. sublist_size (int): The size of the chunks to be merged. ''' index_left = start_index left_stop_index = start_index + sublist_size index_right = start_index + sublist_size right_stop_index = min(start_index + 2 * sublist_size, len(L)) print('Merging sublists:', L[index_left:left_stop_index], 'and', L[index_right:right_stop_index]); L_tmp = [] while (index_left < left_stop_index and index_right < right_stop_index): if L[index_left] < L[index_right]: L_tmp.append(L[index_left]) index_left += 1 else: L_tmp.append(L[index_right]) index_right += 1 if index_left < left_stop_index: L_tmp.extend(L[index_left : left_stop_index]) if index_right < right_stop_index: L_tmp.extend(L[index_right : right_stop_index]) L[start_index : right_stop_index] = L_tmp print('Merged sublist:', L_tmp, '\n')
- In
main
, add the following call tomerge
at the end:merge(['Healdsburg', 'Windsor', 'Rohnert Park', 'Santa Rosa'], 0, 2)
This code will do the following to the 4-element list that we passed to it:
- Starting at index 0 (the second parameter), look at a pair of adjacent 2-element chunks (the 2 comes from the third parameter).
- Assuming that each of the individual chunks is in sorted order, merge them into a single, sorted 4-element chunk.
- Answer Questions 8 and 9 in your writeup.
- Now, we'll use this code to help us sort an entire list. Here's how:
- Start by merging pairs of adjacent elements (1-element chunks). Now we will have a bunch of adjacent 2-element sorted lists.
- Merge pairs of adjacent 2-element chunks. Now we will have a bunch of adjacent 4-element sorted lists.
- Keep doubling the size of the lists to be merged until the entire list is in sorted order.
def merge_sort(L): '''Sorts a list L using the merge sort algorithm. Sorts the original list that was passed to it (has no return value). Args: L (list): The list. ''' chunksize = 1 # Start by dividing the list into N sub-lists of 1 element each # TODO: This starter code doesn't fully sort the list. You will finish it. print("\n*** Sorting sublists of size", chunksize) # Divide the list into pairs of chunks left_start_index = 0 # Start of left chunk in each pair # While we still have chunks to merge while left_start_index + chunksize < len(L): merge(L, left_start_index, chunksize) # Move to next pair of chunks left_start_index += 2 * chunksize print('List is now', L)
- In
main
, remove your call tomerge
, and put the rest of your code back in. Comment out the call to selection sort, and callmerge_sort
instead for now. - To continue sorting the list, modify your code as follows:
- Move all of this code (except for the initialization statement) into an outer loop. The loop should continue as long as your chunk size is less than the size of the list.
- At the end of each outer loop iteration, double chunksize.
Name of input file: cities-small.txt The original list of cities is: 0: Santa Rosa 1: Petaluma 2: Rohnert Park 3: Windsor 4: Healdsburg *** Sorting sublists of size 1 Merging sublists: ['Santa Rosa'] and ['Petaluma'] Merged sublist: ['Petaluma', 'Santa Rosa'] Merging sublists: ['Rohnert Park'] and ['Windsor'] Merged sublist: ['Rohnert Park', 'Windsor'] List is now ['Petaluma', 'Santa Rosa', 'Rohnert Park', 'Windsor', 'Healdsburg'] *** Sorting sublists of size 2 Merging sublists: ['Petaluma', 'Santa Rosa'] and ['Rohnert Park', 'Windsor'] Merged sublist: ['Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor'] List is now ['Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor', 'Healdsburg'] *** Sorting sublists of size 4 Merging sublists: ['Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor'] and ['Healdsburg'] Merged sublist: ['Healdsburg', 'Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor'] List is now ['Healdsburg', 'Petaluma', 'Rohnert Park', 'Santa Rosa', 'Windsor'] The new list of cities is: 0: Healdsburg 1: Petaluma 2: Rohnert Park 3: Santa Rosa 4: Windsor
- Test your code on the larger cities.txt file, and answer Questions 10 and 11 in your writeup.
- Add code to
main
that lets the user select whether to sort using selection sort or merge sort. You can assume that the user's input will always be 's', 'S', 'm', or 'M':Name of input file: cities-small.txt Type S for selection sort and M for merge sort: m The original list of cities is: 0: Santa Rosa 1: Petaluma 2: Rohnert Park 3: Windsor 4: Healdsburg ...etc.
- Demo. Demo your program.
- Continue to the next part to submit your program. Note: we refer back to this code in future labs, so keep a copy in a safe place.
Assignment Submission
Instructions
- Answer the last question (#12) in your Moodle writeup. Review your answers, and then click the "Next" button at the bottom of the quiz. Once you do that, you should see a "Summary of Attempt" screen.
- Click the "Submit all and finish" button.
Warning: You must hit "Submit all and finish" so that your writeup can be graded! It is not submitted until you do this.
Once you have submitted your quiz, you should see something similar to this at the top of your Moodle window. The important part is that the State shows up as Finished.
Please leave this tab open in your browser. - Click on the "Lab 11 code" link in Moodle and open in a new tab. Follow the instructions to upload your source code (lab11d.py) for Lab11. You could either browse for your code or, using a finder window, drag and drop your lab11d.py from your cs115/Lab11 folder to Moodle. You should subsequently see a dialog box which indicates 'Submission Status' as 'Submitted for grading'. Leave this tab open in your browser.
- With these confirmation pages open in your browser, you may call an instructor over to verify that you have completed every part of the lab. Otherwise, you are done!