Searching and Sorting Exercises


Two of the most common tasks embedded in larger programs are Searching and Sorting. These are so common, and so important, that multiple ways have been developed to do each of them.

Search Performance

Consider an array containing 1000 Art Catalog entries. Each entry in the catalog includes the title of the artwork, the artist name, and a link to a picture of the artwork. For example, one entry might contain "The Starry Night", "Vincent Van Gogh", and a link to a picture of that painting.

    Linear Search for Title:
  1. The Starry Night by Vincent Van Gogh
    In the best (luckiest) case, how many catalog entries would a program have to look at if it were searching for a particular title using a linear search? For example, if you were looking for The Starry Night?
  2. In the worst (least lucky) case, how many catalog entries would a program have to look at if it were searching for a particular title using a linear search? For example, if you were looking for The Starry Night? (Assume the entries are in no particular order.)
  3. On average, how many catalog entries would a program have to look at if it were searching for a particular title using a linear search? (Assume the entries are in no particular order.)
  4. Binary Search for Title:
  5. In the best (luckiest) case, how many catalog entries would a program have to look at if it were searching for a particular title using a binary search? (Assume the entries are sorted by title.)
  6. In the worst (least lucky) case, how many catalog entries would a program have to look at if it were searching for a particular title using a binary search? (Assume the entries are sorted by title.)
  7. Variations on the Theme:
  8. How many catalog entries would a program have to look at in order to return the title of the 30th entry in the catalog?
  9. How many catalog entries would a program have to look at in order to determine how many items by a particular artist are in the catalog?
Hint: "1" is the correct answer for 3 questions.   "1000" is the correct answer for 2 questions.

Identifying Sorting Algorithms

In the following exercises, you will be exploring 4 different sorting algorithms: Selection Sort, Insertion Sort, Merge Sort, and QuickSort.

To help you with this exploration, you have 3 resources available to you.

In the following questions, assume that N is the number of items in an array that you want to sort. For example, if you were going to sort an array of 1000 Art Catalog entries, N would be 1000. Each question describes an algorithm in two ways: how it works, and its performance characteristics. The small-scale, step-by-step animations are likely to be the most helpful way to learn how each algorithm works. The other two resource pages will be more helpful in seeing how the algorithms compare in terms of their run-time speeds and efficiency.

  1. This algorithm steps through the data set N times (in other words, makes N passes). In each pass, it finds the next smallest (or largest) item and swaps it with the first (or last) item that hasn't been sorted yet. After 4 passes, the first (or last) 4 items are in their final, sorted locations; after 10 passes, the first (or last) 10 items are in their final locations; after all N passes, all N items are in sorted order.

    On average, the "finds the next smallest (largest) item" computation in each pass has to look at N/2 still-unsorted items; since that is done in all N passes, the algorithm takes approximately N * N/2 steps to go from completely unsorted data to sorted data.

  2. This algorithm steps through the data set N times (in other words, makes N passes). In each pass, it takes the next unsorted item in the array and figures out where it should go among the items that are already sorted. After 4 passes, the first 4 items are in sorted order; after 10 passes, the first 10 items are in sorted order; after all N passes, all N items are in sorted order.

    On average, the "figures out where it should go" computation in each pass has to look at N/4 previously sorted items; since that is done in all N passes, the algorithm takes approximately N * N/4 steps to go from completely unsorted data to sorted data.

  3. This algorithm makes multiple passes through the data set. In the first pass, it looks at adjacent pairs of data (the first and second, the third and fourth, the fifth and sixth, etc) and puts them in little sorted bundles of size 2. In the next pass, it looks at adjacent bundles (the first 2 and the next 2, the third bundle of 2 and the fourth bundle of 2, etc.) and orders them into sorted bundles of size 4. In the third pass, it again looks at adjacent bundles (this time of size 4) and orders them into sorted bundles of size 8. It continues this process until there are two bundles of size N/2; it orders them into a single bundle of size N, at which point the entire array is sorted.

    In each pass, it orders all N items from one size bundle to a bundle that is twice as large, so in each pass it looks at all N items. How many passes does it need? Well, the number of bundles is cut in half in each pass (for example, from 1000 to 500, to 250, to 125, to 63, to 32, to 16, to 8, to 4, to 2, to 1), so the number of passes is lg(N), also known as log2(N).

  4. This algorithm makes multiple passes through the data set. In the first pass, it uses the first item in the list as a "pivot", and then it separates all the items into two groups on either side of the pivot: all the items that are less than the pivot (in no particular order), then the pivot item, then all the items that are greater than the pivot (again, in no particular order). The two groups might be the same size, as in the diagram below, but they might not be.

     pivot 0
    group 1: less than pivot 0
     
    group 2: greater than pivot 0
    In the next pass, it applies the same algorithm to each of the groups created in the previous pass, so we now have 4 groups and 3 pivots. Again, the groups are not guaranteed to be equal in size, and often aren't.

     pivot 0
     pivot 1
    group 1a
     
    group 1b
     
     pivot 2
    group 2c
     
    group 2d
    This process continues, creating more but smaller groups, until the groups contain just a single item and do not have to be processed further.

    In each pass, this algorithm parcels all items to one side or another of its current pivot, comparing and swapping roughly N items in each pass. How many passes does it need? Well, if the data start out sufficiently random, each group is cut roughly in half in each pass. Ideally, the group sizes go from 1000 to 500, to 250, to 125, to 63, to 32, to 16, to 8, to 4, to 2, to 1, so the number of passes would be lg(N), also known as log2(N). The actual performance is likely to be somewhat worse because the groups do not divide exactly equally. (The worst case is if the data are sorted before the process begins; the group sizes go from 1000 to 999, to 998, to 997, etc., requiring N passes.)


Sorting Performance

In the following questions, assume that you have been asked to sort an array of 1000 randomly ordered Art Catalog entries. (The ~ symbol means "roughly" or "approximately.")

If you use the small-scale, step-by-step animations page for this exercise, remember that the number of comparisons and copies it reports depends on the number of arrays you tell it to run on. If you say to run it on 10 arrays with 1000 entries each, the number of comparisons will be 10 times greater than if you run it on just a single array of that size.
  1. Which option best describes the performance of an Insertion Sort algorithm applied to this data set?
  2. Which option best describes the performance of a Merge Sort algorithm applied to this data set?
  3. Which option best describes the performance of a QuickSort algorithm applied to this data set?
  4. Which option best describes the performance of a Selection Sort algorithm applied to this data set?

Factors for Making Design & Implementation Decisions

Identify the algorithm that best matches each of the design considerations below.

  1. Relatively easy to implement; fine for small data sets but inefficient for large data sets.
  2. Relatively easy to implement; good for small data sets; excellent choice when the data set is already close to being sorted.
  3. Fast algorithm for large data sets, but requires double the memory space.
  4. Complicated algorithm to implement, but fast for large data sets. Poor choice, though, if the data set is (close to) already sorted, or if reverse-sorted.