Please analyze the hazards in the assembly code which may cause the pipeline to stall, assuming all the instructions and data
are in the instruction and data caches and do not cause stalls in the IF and MEM stages.
申論題內容
(h) [5 points] Suppose we want to apply the same convolution kernel to many independent input sequences, can we take
advantage of multithreading to increase the performance? Please describe how such multithreading can be done, or why it
cannot be done.