Cradicle Explorer

/ docs / miscellany / exceptions-summary.md

exceptions-summary.md

1 # Summary of papers about interrupts
2
3 ## Introduction
4
5 This summary is a result of analysis of journal articles which I made as a preparation to implement support for
6 interrupts, exceptions and speculation to core blocks. It looks like the choice of primary interrupts related to
7 structure determines the interrupt handling procedure. We have chosen a ROB, so there is a one classical implementation
8 of the ROB interrupt handling procedure. This procedure is used as a basis for the improvements, but most of the papers
9 are pretty old (1993-2001). Currently there is no much research going on about CPU interrupt handling and it is
10 considered that this problem is solved. Instead of that, there are some works, which try to implement precise interrupts
11 on GPUs, but due to different characteristics of the CPU and GPU, this research can not be applied in an easy way to our
12 project.
13
14 When I have prepared this overview, I have decided to look at articles from different times to check on what people were
15 working at that time. So there is probably a lot of other works which can be worth checking out. Specifically:
16 - W. Walker and H. G. Cragon. “Interrupt processing in concurrent processors.” IEEE Computer, vol. 28, no. 6, June 1995
17 - M. Moudgill and S. Vassiliadis. “Precise interrupts.” IEEE Micro, vol. 16, no. 1, pp. 58–67, February 1996
18 These two works present a survey of the topic of interrupts as a time of writing.
19
20 ## Interrupt handling in old PCs
21
22 - CDC 6600 - interrupt handling is done by inserting a jump instruction into the interrupt handler
23 - IBM360 - stop fetching and wait for all fetched instructions to be committed, then jump to interrupt handler
24 - CRAY-1 - similar to IBM360, but here latency can be even bigger due to vector instructions
25
26 ## Interrupt Handling for Out-of-Order Execution Processors
27
28 > Interrupt Handling for Out-of-Order Execution Processors
29 > H. C. Torng and Martin Day, 1993
30
31 This article describes one of the first probes for the implementation of exceptions in out-of-order processes. In that
32 time, ROB was a very new idea. The authors of this paper introduced Instruction Window (IW) as their proposition for the
33 implementation of interrupts in out-of-order processors. IW will store all dispatched instructions, which didn't
34 complete. In the case of an interrupt it will be a part of the context and will be copied to memory by the interrupt handler.
35 After restoring the context, all instructions from IW will be restarted so that the state of the CPU will be precise.
36
37 The idea of IW is similar to that of ROB, but there are few differences:
38 - ROB is not a part of the context
39 - ROB removes instructions in-order, IW allows to remove instructions out-of-order
40 - Tags in IW are one-hot-encoded, which will in current implementation cause big overhead.
41
42 So in its original form, IW is unfeasible for our processor, because it would require to double a job. Additionally,
43 this context of the ROB size will have to be stored on each entry to the interrupt handler, which can be costly
44 operations. But:
45 - maybe it is possible to reduce cost of IW saving by cooperation of CPU and OS?
46 - maybe cost of restoring IW is smaller than cost of re-fetching and scheduling one more time for old instructions?
47
48 Some interesting ideas from the paper:
49 - they propose NRP (No Return Point) implementation - a point in the pipeline after which an instruction can be removed,
50 it should allow instructions which are ending to save its results and remove itself from IW, to don't waste cycles on
51 context restore for executing this instruction one more time, NRP can be implemented for different interrupts and
52 instructions in different places to allow different interrupt latency
53 - for vector instructions IW remember how many elements are left to be processed, so after this context restore allows
54 to restore vector operation in the middle of the vector
55
56
57 ## In-Line Interrupt Handling for Software-Managed TLBs
58
59 > In-Line Interrupt Handling for Software-Managed TLBs
60 > Aamer Jaleel and Bruce Jacob, 2001
61
62 In that time, ROB was already a standard, so there was "classic" interrupt handling procedure. But in that time
63 there is still a lot of software managed TLBs, which cause an interrupt on each page miss. This of course causes a big
64 context switch overhead, so the authors of this paper try to reduce this penalty. As a base architecture they use Alpha and MIPS. They
65 concentrate on improving TLB miss efficiency and this has a property that interrupt handlers are very short (10-30
66 instructions).
67
68 The main idea is to not flush the whole pipeline on an interrupt, but instead to inline a handler code between
69 instructions of the user space program. They observed that the most important problem is that there can not be enough
70 resources to execute the interrupt handler without flushing the pipeline (e.g. in case when ROB is full there can be
71 live-lock).
72 - they assume that interrupt handler has known length
73 - they check if ROB, RS and RF have enough free resources to inline handler
74 - if the handler can be inlined, they do that, else they flush the pipeline
75 - in fly instruction return instruction is swapped to NOP and excepted instruction is reexecuted
76 - each executed instruction has one connected bit indicating the privilege level
77
78 They use some properties of the Alpha and MIPS architectures where they have interrupts vectors so the OS can insert
79 short, specific, handlers to correct the interrupt vector addresses. In contrast to that in current design it is a
80 tendency to have one interrupt handler in OS, which next decides which handling functions should be invoked. This makes
81 the handler longer, so it can be hard to make inlining (e.g. for risk V in Linux the first step is to save
82 each of general purpose registers, so on the start we have already 32 instructions in handler)
83
84 Ideas from the paper:
85 - Pipelines don't have to be flushed
86 - Additional HW resources reserved for only privileged mode can allow to execute privileged instruction without boring
87 with stopping user space program
88 - Interrupts and exceptions can be treated as branches and can be speculated
89
90
91 ## Hardware/software cost analysis of interrupt processing strategies:
92
93 > Hardware/software cost analysis of interrupt processing strategies
94 > Mansur H. Samadzadeh, Loai E. Garalnabi, 2001
95
96 Present overview for all main structures to handle exceptions, so:
97 - Instruction Window
98 - Checkpoint repair
99 - History file
100 - Reorder buffer
101 - Future file
102
103 ## iGPU: Exception Support and Speculative Execution on GPUs
104
105 > iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf, Karthikeyan
106 > Sankaralingam, 2012
107
108 They try to introduce exceptions to GPU. To do that, they observe that:
109 - it is possible to find points where it is low number of live registers (e.g., boundaries of kernels)
110 - GPU execution has no side effects, so it can be safely rewritten
111 - GPU program can be recompiled in runtime
112
113 They introduce to the GPU program regions and subregions. Regions are parts of code which start at the beginning and end
114 with a small number of live registers. Subregion is a part of a region which has short length (not more than 32
115 instructions). In each subregion there is no instruction which overrides the output of the instruction from the previous
116 subregion. Each subregion end is an instruction barrier.
117
118 Exceptions and interrupts are handled by restarting execution of warp from the beginning of region. Wrong speculation is
119 handled by restarting execution of current subregion. In case when there are two exceptions in region, code is
120 dynamically recompiled and split into two regions to prevent live-locks.
121
122
123 ## Efficient Exception Handling Support for GPUs
124
125 > Efficient Exception Handling Support for GPUs
126 > Ivan Tanasic, Isaac Gelado, Marc Jorda, Eduard Ayguade, Nacho Navarro, 2017
127
128 One more time it is analysed the problem of precise interrupts for GPU. They observe that the only problematic
129 instructions are those related to memory access. All other instructions are guaranteed to end successfully or they kill
130 the program. This time there is more hardware modification and there are presented three propositions:
131 - stall warp until previous global load is solved - on GPU it is not so problematic because usually there is a lot of
132 other warp which wait to be executed.
133 - save instructions which can fail (global loads) to next reply them, input registers are not allowed to be
134 modified until this instruction don't claim that it doesn't fail
135 - operand logging - replay queue + storing operations
136
137
138 ## Others
139
140 > Reducing Exception Management Overhead with Software Restart Markers
141 > Mark Jerome Hampton, 2008
142
143 Mention a paper of Alli and Bailey [AB04] where there is no ROB. Instead of that in RAT there are FIFO-s which store
144 mapping between physical and logical registers and the age of instruction. If there is a need to raise an exception it
145 is checked if all younger instruction have already ended if not an exception wait for some cycles and repeat the check.
146
147
148 ## Summary
149
150 GPU research don't help us much, because the main assumption is that operations don't cause side effects
151 except of communication with main memory by load/store instructions. From research about CPUs, it looks like,
152 there is a "canonical" implementation of ROB that can be eventually introduced some small improvements, but
153 there aren't any very different procedures.