← back to the schedule


Assignment #3
Parallel Programing with OpenMP




For this assignment you are required to implement a parallel version of the Count sort algorithm. Count sort is a simple sorting algorithm that can be implemented sequentially (in C++) as follows:


void CountSort (int *a, int n)
{
  int count;
  int *temp = new int[n];
    
  for (int i = 0; i < n; ++i)
  {
    count = 0;
    for (int j = 0; j < n; ++j)
    {
      if (a[j] < a[i])
        count++;
      else if (a[j] == a[i] && j < i)
        count++;
    }
        
    temp[count] = a[i];
  }
    
  std::copy(temp, temp+n, a);
  delete[] temp; 
}

The basic idea is that for each element a[i] in the list a, we count the number of elements in the list that are less than a[i]. then we insert a[i] into a temporary list temp using the index determined by the count. There’s a slight problem with this approach when the list contains equal elements, since they could get assigned to the same position in the temp list. The code above deals with this by incrementing the count for equal elements on the basis of the indices i and j. If both a[j] == a[i] and j < i, then we count a[j] as being less than a[i].

After the algorithm has completed, we override the original array with the content of temp using the function std::copy from the C++ algorithm library.

Note: in C++, if you place the code using namespace std just below your #include statemets, you don’t need to add std:: in front of std::copy, std::cout, std::vector, etc.

Download the sequential code called count_sort.cpp if you wish not to code yourself the count sort algorithm.


Instructions for the Report

As part of this assignment you will write a report that will contain the following:

  • Describe the design of your algorithm.
  • Provide an execution time analysis for the parallel algorithm and create a XY graph to show the sorting time (Y axis) as the number of processing units (in this case threads) increases (X axis). The figure only needs to show results for n = 8, 64, 512, and 4096 elements. A line chart like in Figure 1 will work better.
  • You can use a table like Table 1 to tabulate your time results.
  • Calculate speedup S and efficiency E for each value of n, where S and E are the speedups and efficiencies of the algorithm with problem size n running it using t = 2, 4, 8, 16, 32, and 64. Create two more charts to show S and E and how they behave (such as Figure 1 and Figure 2). The charts only need to show results for n = 8, 64, 512, and 4096. Include tables similar to Table 1 that contain all the speedups and efficiencies (for all the values of n and t).
  • Since you are using your resulting TS and TP to calculate S and E, come up with a formula for TP that can be used to explain the behavior of your parallel time.
    • Is there any overhead (TO) that you think needs to be included? What’s the time complexity (big Oh) of the sequential code? Hint: depending on how you design your parallel code, some overhead is created, for example, when you use a critical section (since only one thread at a time can enter).
  • Talk about the scalability of the algorithm. Remember that based on your efficiency E results, as t doubles and n stays fixed you can get an algorithm that is strongly scalable if E remains mostly constant. If you notice that as you double t and n, E remains mostly constant, then your algorithm is weakly scalable.
  • BONUS 5 pts: Is Amdahl’s law having an effect in the speedup of your algorithm? Come up with formulas that can better represent the TP and S for your implementation.
  • 4 pages max.




Instructions for the Code

  • Write the C++ parallel version of Count sort using OpenMP. Name your file para_count_sort.cpp
  • Run this program using a list of size n = 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and 4096. For each list of size n, use a team of threads of size t = 2, 4, 8, 16, 32, and 64 to run it. Also run each size n sequentially using count_sort.cpp (you’ll probably need to modify it a little bit).
  • Collect the sequential and parallel execution times (TS and TP) of the algorithm for each problem size n and team of threads t.
    • Remember to use omp_get_wtime() to time these executions.


Format and Due Date

Submit a PDF version of your report together with the modified count_sort.cpp and your para_count_sort.cpp codes (zip these three files) by Monday of Week 8 by 11:59PM via Kit (there will be a space where you can submit your PDF document). Make sure to include your name, the submission date, and course code both in the PDF and the code files.

Once you are done with the programing and testing of your parallel code, you might want to run it in Jigwe to see the difference between your personal computer and a HPC computer. For your report, you might present the timing results obtained from Jigwe.


EVALUATION

Assignment #3 is worth 15 points and will be submitted in groups of two or individually.

Have fun! And if you have any questions remember my email or contact me via Teams!



← back to the schedule