/ docs / miscellany / exceptions-summary.md
exceptions-summary.md
  1  # Summary of papers about interrupts
  2  
  3  ## Introduction
  4  
  5  This summary is a result of analysis of journal articles which I made as a preparation to implement support for
  6  interrupts, exceptions and speculation to core blocks. It looks like the choice of primary interrupts related to
  7  structure determines the interrupt handling procedure. We have chosen a ROB, so there is a one classical implementation
  8  of the ROB interrupt handling procedure. This procedure is used as a basis for the improvements, but most of the papers
  9  are pretty old (1993-2001). Currently there is no much research going on about CPU interrupt handling and it is
 10  considered that this problem is solved. Instead of that, there are some works, which try to implement precise interrupts
 11  on GPUs, but due to different characteristics of the CPU and GPU, this research can not be applied in an easy way to our
 12  project.
 13  
 14  When I have prepared this overview, I have decided to look at articles from different times to check on what people were
 15  working at that time. So there is probably a lot of other works which can be worth checking out. Specifically:
 16  - W. Walker and H. G. Cragon. “Interrupt processing in concurrent processors.” IEEE Computer, vol. 28, no. 6, June 1995
 17  - M. Moudgill and S. Vassiliadis. “Precise interrupts.” IEEE Micro, vol. 16, no. 1, pp. 58–67, February 1996
 18  These two works present a survey of the topic of interrupts as a time of writing.
 19  
 20  ## Interrupt handling in old PCs
 21  
 22  - CDC 6600 - interrupt handling is done by inserting a jump instruction into the interrupt handler
 23  - IBM360 - stop fetching and wait for all fetched instructions to be committed, then jump to interrupt handler
 24  - CRAY-1 - similar to IBM360, but here latency can be even bigger due to vector instructions
 25  
 26  ## Interrupt Handling for Out-of-Order Execution Processors
 27  
 28  > Interrupt Handling for Out-of-Order Execution Processors
 29  > H. C. Torng and Martin Day, 1993
 30  
 31  This article describes one of the first probes for the implementation of exceptions in out-of-order processes. In that
 32  time, ROB was a very new idea. The authors of this paper introduced Instruction Window (IW) as their proposition for the
 33  implementation of interrupts in out-of-order processors. IW will store all dispatched instructions, which didn't
 34  complete. In the case of an interrupt it will be a part of the context and will be copied to memory by the interrupt handler.
 35  After restoring the context, all instructions from IW will be restarted so that the state of the CPU will be precise.
 36  
 37  The idea of IW is similar to that of ROB, but there are few differences:
 38  - ROB is not a part of the context
 39  - ROB removes instructions in-order, IW allows to remove instructions out-of-order
 40  - Tags in IW are one-hot-encoded, which will in current implementation cause big overhead.
 41  
 42  So in its original form, IW is unfeasible for our processor, because it would require to double a job. Additionally,
 43  this context of the ROB size will have to be stored on each entry to the interrupt handler, which can be costly
 44  operations. But:
 45  - maybe it is possible to reduce cost of IW saving by cooperation of CPU and OS?
 46  - maybe cost of restoring IW is smaller than cost of re-fetching and scheduling one more time for old instructions?
 47  
 48  Some interesting ideas from the paper:
 49  - they propose NRP (No Return Point) implementation - a point in the pipeline after which an instruction can be removed,
 50      it should allow instructions which are ending to save its results and remove itself from IW, to don't waste cycles on
 51      context restore for executing this instruction one more time, NRP can be implemented for different interrupts and
 52      instructions in different places to allow different interrupt latency
 53  - for vector instructions IW remember how many elements are left to be processed, so after this context restore allows
 54      to restore vector operation in the middle of the vector
 55  
 56  
 57  ## In-Line Interrupt Handling for Software-Managed TLBs
 58  
 59  > In-Line Interrupt Handling for Software-Managed TLBs
 60  > Aamer Jaleel and Bruce Jacob, 2001
 61  
 62  In that time, ROB was already a standard, so there was "classic" interrupt handling procedure. But in that time
 63  there is still a lot of software managed TLBs, which cause an interrupt on each page miss. This of course causes a big
 64  context switch overhead, so the authors of this paper try to reduce this penalty. As a base architecture they use Alpha and MIPS. They
 65  concentrate on improving TLB miss efficiency and this has a property that interrupt handlers are very short (10-30
 66  instructions).
 67  
 68  The main idea is to not flush the whole pipeline on an interrupt, but instead to inline a handler code between
 69  instructions of the user space program. They observed that the most important problem is that there can not be enough
 70  resources to execute the interrupt handler without flushing the pipeline (e.g. in case when ROB is full there can be
 71  live-lock).
 72  - they assume that interrupt handler has known length
 73  - they check if ROB, RS and RF have enough free resources to inline handler
 74  - if the handler can be inlined, they do that, else they flush the pipeline
 75  - in fly instruction return instruction is swapped to NOP and excepted instruction is reexecuted
 76  - each executed instruction has one connected bit indicating the privilege level
 77  
 78  They use some properties of the Alpha and MIPS architectures where they have interrupts vectors so the OS can insert
 79  short, specific, handlers to correct the interrupt vector addresses. In contrast to that in current design it is a
 80  tendency to have one interrupt handler in OS, which next decides which handling functions should be invoked. This makes
 81  the handler longer, so it can be hard to make inlining (e.g. for risk V in Linux the first step is to save
 82  each of general purpose registers, so on the start we have already 32 instructions in handler)
 83  
 84  Ideas from the paper:
 85  - Pipelines don't have to be flushed
 86  - Additional HW resources reserved for only privileged mode can allow to execute privileged instruction without boring
 87    with stopping user space program
 88  - Interrupts and exceptions can be treated as branches and can be speculated
 89  
 90  
 91  ## Hardware/software cost analysis of interrupt processing strategies:
 92  
 93  > Hardware/software cost analysis of interrupt processing strategies
 94  > Mansur H. Samadzadeh, Loai E. Garalnabi, 2001
 95  
 96  Present overview for all main structures to handle exceptions, so:
 97  - Instruction Window
 98  - Checkpoint repair
 99  - History file
100  - Reorder buffer
101  - Future file
102  
103  ## iGPU: Exception Support and Speculative Execution on GPUs
104  
105  > iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf, Karthikeyan
106  > Sankaralingam, 2012
107  
108  They try to introduce exceptions to GPU. To do that, they observe that:
109  - it is possible to find points where it is low number of live registers (e.g., boundaries of kernels)
110  - GPU execution has no side effects, so it can be safely rewritten
111  - GPU program can be recompiled in runtime
112  
113  They introduce to the GPU program regions and subregions. Regions are parts of code which start at the beginning and end
114  with a small number of live registers. Subregion is a part of a region which has short length (not more than 32
115  instructions). In each subregion there is no instruction which overrides the output of the instruction from the previous
116  subregion. Each subregion end is an instruction barrier.
117  
118  Exceptions and interrupts are handled by restarting execution of warp from the beginning of region. Wrong speculation is
119  handled by restarting execution of current subregion. In case when there are two exceptions in region, code is
120  dynamically recompiled and split into two regions to prevent live-locks.
121  
122  
123  ## Efficient Exception Handling Support for GPUs
124  
125  > Efficient Exception Handling Support for GPUs
126  > Ivan Tanasic, Isaac Gelado, Marc Jorda, Eduard Ayguade, Nacho Navarro, 2017
127  
128  One more time it is analysed the problem of precise interrupts for GPU. They observe that the only problematic
129  instructions are those related to memory access. All other instructions are guaranteed to end successfully or they kill
130  the program. This time there is more hardware modification and there are presented three propositions:
131  - stall warp until previous global load is solved - on GPU it is not so problematic because usually there is a lot of
132      other warp which wait to be executed.
133  - save instructions which can fail (global loads) to next reply them, input registers are not allowed to be
134      modified until this instruction don't claim that it doesn't fail
135  - operand logging - replay queue + storing operations
136  
137  
138  ## Others
139  
140  > Reducing Exception Management Overhead with Software Restart Markers
141  > Mark Jerome Hampton, 2008
142  
143  Mention a paper of Alli and Bailey [AB04] where there is no ROB. Instead of that in RAT there are FIFO-s which store
144  mapping between physical and logical registers and the age of instruction. If there is a need to raise an exception it
145  is checked if all younger instruction have already ended if not an exception wait for some cycles and repeat the check.
146  
147  
148  ## Summary
149  
150  GPU research don't help us much, because the main assumption is that operations don't cause side effects
151  except of communication with main memory by load/store instructions. From research about CPUs, it looks like,
152  there is a "canonical" implementation of ROB that can be eventually introduced some small improvements, but
153  there aren't any very different procedures.