/ docs / exception_handling.md
exception_handling.md
  1  The goal of this document is to give an overview of the exception handling
  2  options in breakpad.
  3  
  4  # Basics
  5  
  6  Exception handling is a mechanism designed to handle the occurrence of
  7  exceptions, special conditions that change the normal flow of program execution.
  8  
  9  `SetUnhandledExceptionFilter` replaces all unhandled exceptions when Breakpad is
 10  enabled. TODO: More on first and second change and vectored v. try/catch.
 11  
 12  There are two main types of exceptions across all platforms: in-process and
 13  out-of-process.
 14  
 15  # In-Process
 16  
 17  In process exception handling is relatively simple since the crashing process
 18  handles crash reporting. It is generally considered unsafe to write a minidump
 19  from a crashed process. For example, key data structures could be corrupted or
 20  the stack on which the exception handler runs could have been overwritten. For
 21  this reason all platforms also support some level of out-of-process exception
 22  handling.
 23  
 24  ## Windows
 25  
 26  In-process exception handling Breakpad creates a 'handler head' that waits
 27  infinitely on a semaphore at start up. When this thread is woken it writes the
 28  minidump and signals to the excepting thread that it may continue. A filter will
 29  tell the OS to kill the process if the minidump is written successfully.
 30  Otherwise it continues.
 31  
 32  # Out-of-Process
 33  
 34  Out-of-process exception handling is more complicated than in-process exception
 35  handling because of the need to set up a separate process that can read the
 36  state of the crashing process.
 37  
 38  ## Windows
 39  
 40  Breakpad uses two abstractions around the exception handler to make things work:
 41  `CrashGenerationServer` and `CrashGenerationClient`. The constructor for these
 42  takes a named pipe name.
 43  
 44  During server start up a named pipe and registers callbacks for client
 45  connections are created. The named pipe is used for registration and all IO on
 46  the pipe is done asynchronously. `OnPipeConnected` is called when a client
 47  attempts to connect (call `CreateFile` on the pipe). `OnPipeConnected` does the
 48  state machine transition from `Initial` to `Connecting` and on through
 49  `Reading`, `Reading_Done`, `Writing`, `Writing_Done`, `Reading_ACK`, and
 50  `Disconnecting`.
 51  
 52  When registering callbacks, the client passes in two pointers to pointers: 1. A
 53  pointer to the `EXCEPTION_INFO` pointer 1. A pointer to the `MDRawAssertionInfo`
 54  which handles various non-exception failures like assertions
 55  
 56  The essence of registration is adding a "`ClientInfo`" object that contains
 57  handles used for synchronization with the crashing process to an array
 58  maintained by the server. This is how we can keep track of all the clients on
 59  the system that have registered for minidumps. These handles are: *
 60  `server_died(mutex)` * `dump_requested(Event)` * `dump_generated(Event)`
 61  
 62  The server registers asynchronous waits on these events with the `ClientInfo`
 63  object as the callback context. When the `dump_requested` event is set by the
 64  client, the `OnDumpRequested()` callback is called. The server uses the handles
 65  inside `ClientInfo` to communicate with the child process. Once the child sets
 66  the event, it waits for two objects: 1. the `dump_generated` event 1. the
 67  `server_died` mutex
 68  
 69  In the end handles are "duped" into the client process, and the clients use
 70  `SetEvent` to request events, wait on the other event, or the `server_died`
 71  mutex.
 72  
 73  ## Linux
 74  
 75  ### Current Status
 76  
 77  As of July 2011, Linux had a minidump generator that is not entirely
 78  out-of-process. The minidump was generated from a separate process, but one that
 79  shared an address space, file descriptors, signal handles and much else with the
 80  crashing process. It worked by using the `clone()` system call to duplicate the
 81  crashing process, and then uses `ptrace()` and the `/proc` file system to
 82  retrieve the information required to write the minidump. Since then Breakpad has
 83  updated Linux exception handling to provide more benefits of out-of-process
 84  report generation.
 85  
 86  ### Proposed Design
 87  
 88  #### Overview
 89  
 90  Breakpad would use a per-user daemon to write out a minidump that does not have,
 91  interact with or depend on the crashing process. We don't want to start a new
 92  separate process every time a user launches a Breakpad-enabled process. Doing
 93  one daemon per machine is unacceptable for security concerns around one user
 94  being able to initiate a minidump generation for another user's process.
 95  
 96  #### Client/Server Communication
 97  
 98  On Breakpad initialization in a process, the initializer would check if the
 99  daemon is running and, if not, start it. The race condition between the check
100  and the initialization is not a problem because multiple daemons can check if
101  the IPC endpoint already exists and if a server is listening. Even if multiple
102  copies of the daemon try to `bind()` the filesystem to name the socket, all but
103  one will fail and can terminate.
104  
105  This point is relevant for error handling conditions. Linux does not clean the
106  file system representation of a UNIX domain socket even if both endpoints
107  terminate, so checking for existence is not strong enough. However checking the
108  process list or sending a ping on the socket can handle this.
109  
110  Breakpad uses UNIX domain sockets since they support full duplex communication
111  (unlike Windows, named pipes on Linux are half) and the kernal automatically
112  creates a private channel between the client and server once the client calls
113  `connect()`.
114  
115  #### Minidump Generation
116  
117  Breakpad could use the current system with `ptrace()` and `/proc` within the
118  daemon executable.
119  
120  Overall the operations look like: 1. Signal from OS indicating crash 1. Signal
121  Handler suspends all threads except itself 1. Signal Handler sends
122  `CRASH_DUMP_REQUEST` message to server and waits for response 1. Server inspects
123  1. Minidump is asynchronously written to disk by the server 1. Server responds
124  indicating inspection is done
125  
126  ## Mac OSX
127  
128  Out-of-process exception handling is fully supported on Mac.