Get Instant Help From 5000+ Experts For
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing:Proofread your work by experts and improve grade at Lowest cost

And Improve Your Grades
myassignmenthelp.com
loader
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!
Free Quote
wave
Parallel Computing Problems
Answered

Assume we have the following task flow graph where every node is a task and an arrow from a task to another means dependencies. For example, task B cannot start before task A is done.

flow graph

The following table shows the time taken by each task in nano seconds.

Task

Time Taken

A

3

B

2

C

5

D

5

E

5

F

20

G

25

H

5

I

10

  1. Withoutusing the DAG method, that is without using T1/span, determine the minimum number of cores needed to get the highest performance [2 points]? Then, specify which tasks will execute on which core to get the best performance [6 points]. Finally, calculate the speedup given the number of cores you calculated [2 points]?

  2. Calculate the span [2 points]. Which tasks form the span [5 points]? If there are more than one solution, please state all of them.

  3. [8 points] Fill-up the following table given the number of cores to execute the above DAG.

#cores ?

1

2

4

8

speedup

efficiency


Answer the following questions about technology and hardware design.

  1. [4 points] Superscalar capability can make some instructions execute out-of-order (for example, instruction 3 may execute before instruction 2), why is that?

  2. [4 points] If we have a multicore processor with four cores, is there a possibility to execute more than four threads at the same time? If yes, explain how. If no, explain why not.

  3. [4 points] We have shared-memory systems and distributed-memory systems. With shared memory, it is easier for threads to exchange data. Then, why do we build distributed-memory systems?

  4. [4 points] In shared-memory system, as the number of nodes increases, how does this affect the overhead of coherence protocols? Justify your answer.

  5. [4 points] We saw four different categories of architecture in Flynn s taxonomy. What is the classification of single core with hyperthreading capability? Justify your choice.

  6. [4 points] In distributed-memory system, as the number of nodes increases, how does this affect the overhead of coherence protocols? Justify your answer.

Answer the following questions regarding parallel programming techniques.

  1. [8 points] If we look at a distributed-memory system with four nodes and found four threads executing in parallel (i.e., at the same time), how many processes do we have in the whole system? List all the possibilities and justify your choices. Assume each node consists of s single core with no superscalar or hyperthreading capabilities. That is, each core in the node executes one instruction at a time only.

  2. [6 points] Having too many threads (i.e. way more than the available cores) in an application may not be a good idea. State three reasons for that.

  3. [6 points] Having too few threads, less than or equal to the number of available cores, in an application is not a good idea either. State three reasons explaining why. Assume the problem size is big enough.

A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively).

The first code sequence has 10 instructions:  2 of A, 6 of B, and 2 of C

The second sequence has 8 instructions: 4 of A, 2 of B, and 2 of C.

  1. [4 pt] What is the CPI of the first sequence? Show all your calculations. b. [4pt] What is the CPI of the second sequence? Show all your calculations. c. [2 pt] Which sequence is faster based on CPI?

  2. [5pts] Which sequence is faster based on ET? Show all the steps.

Answer the following questions regarding MPI.

  1. [6 points] If each process in a communicator call MPI_Reduce twice. That is, two reduction operations one after the other. Can some of the processes finish the first reduction operation and move to the next one before the other processes finish theirs? Justify.
  2. [4 points] Why do MPI processes not suffer from cache coherence overhead?
  3. [6 points] Do Reduction operations in MPI take care of critical sections on the programmer s behalf? Justify.

support
close