(h) [5 points] Suppose we want to apply the same convolution kernel to many independent input sequences, can we take advantage of multithreading to increase the performance? Please describe how such multithreading can be done, or why it cannot be done.

(j) [5 points] Let us estimate the processor performance with a rooftine model shown in the figure blow. If you know the arithmetic intensity of a computing kernel, then you know the attainable performance would not be highcr than the roofline. For example, Kernel 1 in the figure can attain no more than 8 GFLOPS (Floating-Point Operations Per Second), and Kernel 2 can attend up to 16 GFLOPS. Please calculate the arithmetic intensity of the convolution kernel and estimate the attainable performance in case there is no data cache. Then, discuss what would happen to the attainable performance of our convolution kernel when a data cache is added to the processor. Furthermore, discuss what would happen to the roofline and the attainable performance of our convolution kernel if a vector unit is added to the processor to provide 4 times of attainable performance.

題組內容

(h) [5 points] Suppose we want to apply the same convolution kernel to many independent input sequences, can we take advantage of multithreading to increase the performance? Please describe how such multithreading can be done, or why it cannot be done.

相關申論題

相關試卷