exceptions-summary.md
1 # Summary of papers about interrupts 2 3 ## Introduction 4 5 This summary is a result of analysis of journal articles which I made as a preparation to implement support for 6 interrupts, exceptions and speculation to core blocks. It looks like the choice of primary interrupts related to 7 structure determines the interrupt handling procedure. We have chosen a ROB, so there is a one classical implementation 8 of the ROB interrupt handling procedure. This procedure is used as a basis for the improvements, but most of the papers 9 are pretty old (1993-2001). Currently there is no much research going on about CPU interrupt handling and it is 10 considered that this problem is solved. Instead of that, there are some works, which try to implement precise interrupts 11 on GPUs, but due to different characteristics of the CPU and GPU, this research can not be applied in an easy way to our 12 project. 13 14 When I have prepared this overview, I have decided to look at articles from different times to check on what people were 15 working at that time. So there is probably a lot of other works which can be worth checking out. Specifically: 16 - W. Walker and H. G. Cragon. “Interrupt processing in concurrent processors.” IEEE Computer, vol. 28, no. 6, June 1995 17 - M. Moudgill and S. Vassiliadis. “Precise interrupts.” IEEE Micro, vol. 16, no. 1, pp. 58–67, February 1996 18 These two works present a survey of the topic of interrupts as a time of writing. 19 20 ## Interrupt handling in old PCs 21 22 - CDC 6600 - interrupt handling is done by inserting a jump instruction into the interrupt handler 23 - IBM360 - stop fetching and wait for all fetched instructions to be committed, then jump to interrupt handler 24 - CRAY-1 - similar to IBM360, but here latency can be even bigger due to vector instructions 25 26 ## Interrupt Handling for Out-of-Order Execution Processors 27 28 > Interrupt Handling for Out-of-Order Execution Processors 29 > H. C. Torng and Martin Day, 1993 30 31 This article describes one of the first probes for the implementation of exceptions in out-of-order processes. In that 32 time, ROB was a very new idea. The authors of this paper introduced Instruction Window (IW) as their proposition for the 33 implementation of interrupts in out-of-order processors. IW will store all dispatched instructions, which didn't 34 complete. In the case of an interrupt it will be a part of the context and will be copied to memory by the interrupt handler. 35 After restoring the context, all instructions from IW will be restarted so that the state of the CPU will be precise. 36 37 The idea of IW is similar to that of ROB, but there are few differences: 38 - ROB is not a part of the context 39 - ROB removes instructions in-order, IW allows to remove instructions out-of-order 40 - Tags in IW are one-hot-encoded, which will in current implementation cause big overhead. 41 42 So in its original form, IW is unfeasible for our processor, because it would require to double a job. Additionally, 43 this context of the ROB size will have to be stored on each entry to the interrupt handler, which can be costly 44 operations. But: 45 - maybe it is possible to reduce cost of IW saving by cooperation of CPU and OS? 46 - maybe cost of restoring IW is smaller than cost of re-fetching and scheduling one more time for old instructions? 47 48 Some interesting ideas from the paper: 49 - they propose NRP (No Return Point) implementation - a point in the pipeline after which an instruction can be removed, 50 it should allow instructions which are ending to save its results and remove itself from IW, to don't waste cycles on 51 context restore for executing this instruction one more time, NRP can be implemented for different interrupts and 52 instructions in different places to allow different interrupt latency 53 - for vector instructions IW remember how many elements are left to be processed, so after this context restore allows 54 to restore vector operation in the middle of the vector 55 56 57 ## In-Line Interrupt Handling for Software-Managed TLBs 58 59 > In-Line Interrupt Handling for Software-Managed TLBs 60 > Aamer Jaleel and Bruce Jacob, 2001 61 62 In that time, ROB was already a standard, so there was "classic" interrupt handling procedure. But in that time 63 there is still a lot of software managed TLBs, which cause an interrupt on each page miss. This of course causes a big 64 context switch overhead, so the authors of this paper try to reduce this penalty. As a base architecture they use Alpha and MIPS. They 65 concentrate on improving TLB miss efficiency and this has a property that interrupt handlers are very short (10-30 66 instructions). 67 68 The main idea is to not flush the whole pipeline on an interrupt, but instead to inline a handler code between 69 instructions of the user space program. They observed that the most important problem is that there can not be enough 70 resources to execute the interrupt handler without flushing the pipeline (e.g. in case when ROB is full there can be 71 live-lock). 72 - they assume that interrupt handler has known length 73 - they check if ROB, RS and RF have enough free resources to inline handler 74 - if the handler can be inlined, they do that, else they flush the pipeline 75 - in fly instruction return instruction is swapped to NOP and excepted instruction is reexecuted 76 - each executed instruction has one connected bit indicating the privilege level 77 78 They use some properties of the Alpha and MIPS architectures where they have interrupts vectors so the OS can insert 79 short, specific, handlers to correct the interrupt vector addresses. In contrast to that in current design it is a 80 tendency to have one interrupt handler in OS, which next decides which handling functions should be invoked. This makes 81 the handler longer, so it can be hard to make inlining (e.g. for risk V in Linux the first step is to save 82 each of general purpose registers, so on the start we have already 32 instructions in handler) 83 84 Ideas from the paper: 85 - Pipelines don't have to be flushed 86 - Additional HW resources reserved for only privileged mode can allow to execute privileged instruction without boring 87 with stopping user space program 88 - Interrupts and exceptions can be treated as branches and can be speculated 89 90 91 ## Hardware/software cost analysis of interrupt processing strategies: 92 93 > Hardware/software cost analysis of interrupt processing strategies 94 > Mansur H. Samadzadeh, Loai E. Garalnabi, 2001 95 96 Present overview for all main structures to handle exceptions, so: 97 - Instruction Window 98 - Checkpoint repair 99 - History file 100 - Reorder buffer 101 - Future file 102 103 ## iGPU: Exception Support and Speculative Execution on GPUs 104 105 > iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf, Karthikeyan 106 > Sankaralingam, 2012 107 108 They try to introduce exceptions to GPU. To do that, they observe that: 109 - it is possible to find points where it is low number of live registers (e.g., boundaries of kernels) 110 - GPU execution has no side effects, so it can be safely rewritten 111 - GPU program can be recompiled in runtime 112 113 They introduce to the GPU program regions and subregions. Regions are parts of code which start at the beginning and end 114 with a small number of live registers. Subregion is a part of a region which has short length (not more than 32 115 instructions). In each subregion there is no instruction which overrides the output of the instruction from the previous 116 subregion. Each subregion end is an instruction barrier. 117 118 Exceptions and interrupts are handled by restarting execution of warp from the beginning of region. Wrong speculation is 119 handled by restarting execution of current subregion. In case when there are two exceptions in region, code is 120 dynamically recompiled and split into two regions to prevent live-locks. 121 122 123 ## Efficient Exception Handling Support for GPUs 124 125 > Efficient Exception Handling Support for GPUs 126 > Ivan Tanasic, Isaac Gelado, Marc Jorda, Eduard Ayguade, Nacho Navarro, 2017 127 128 One more time it is analysed the problem of precise interrupts for GPU. They observe that the only problematic 129 instructions are those related to memory access. All other instructions are guaranteed to end successfully or they kill 130 the program. This time there is more hardware modification and there are presented three propositions: 131 - stall warp until previous global load is solved - on GPU it is not so problematic because usually there is a lot of 132 other warp which wait to be executed. 133 - save instructions which can fail (global loads) to next reply them, input registers are not allowed to be 134 modified until this instruction don't claim that it doesn't fail 135 - operand logging - replay queue + storing operations 136 137 138 ## Others 139 140 > Reducing Exception Management Overhead with Software Restart Markers 141 > Mark Jerome Hampton, 2008 142 143 Mention a paper of Alli and Bailey [AB04] where there is no ROB. Instead of that in RAT there are FIFO-s which store 144 mapping between physical and logical registers and the age of instruction. If there is a need to raise an exception it 145 is checked if all younger instruction have already ended if not an exception wait for some cycles and repeat the check. 146 147 148 ## Summary 149 150 GPU research don't help us much, because the main assumption is that operations don't cause side effects 151 except of communication with main memory by load/store instructions. From research about CPUs, it looks like, 152 there is a "canonical" implementation of ROB that can be eventually introduced some small improvements, but 153 there aren't any very different procedures.