/ docs / processor_design.md
processor_design.md
  1  # Breakpad Processor Library
  2  
  3  ## Objective
  4  
  5  The Breakpad processor library is an open-source framework to access the the
  6  information contained within crash dumps for multiple platforms, and to use that
  7  information to produce stack traces showing the call chain of each thread in a
  8  process. After processing, this data is made available to users of the library.
  9  
 10  ## Background
 11  
 12  The Breakpad processor is intended to sit at the core of a comprehensive
 13  crash-reporting system that does not require debugging information to be
 14  provided to those running applications being monitored. Some existing
 15  crash-reporting systems, such as [GNOME](http://www.gnome.org/)’s Bug-Buddy and
 16  [Apple](http://www.apple.com/)’s
 17  [CrashReporter](http://developer.apple.com/technotes/tn2004/tn2123.html),
 18  require symbolic
 19  information to be present on the end user’s computer; in the case of
 20  CrashReporter, the reports are transmitted only to Apple, not to third-party
 21  developers. Other systems, such as [Microsoft](http://www.microsoft.com/)’s
 22  [Windows Error Reporting](http://msdn.microsoft.com/isv/resources/wer/) and
 23  SupportSoft’s Talkback, transmit only a snapshot of a crashed process’ state,
 24  which can later be combined with symbolic debugging information without the need
 25  for it to be present on end users’ computers. Because symbolic debugging
 26  information consumes a large amount of space and is otherwise not needed during
 27  the normal operation of software, and because some developers are reluctant to
 28  release debugging symbols to their customers, Breakpad follows the latter
 29  approach.
 30  
 31  We know of no currently-maintained crash-reporting systems that meet our
 32  requirements, which are to: * allow for symbols to be separate from the
 33  application, * handle crash reports from multiple platforms, * allow developers
 34  to operate their own crash-reporting platform, and to * be open-source. Windows
 35  Error Reporting only functions for Microsoft products, and requires the
 36  involvement of Microsoft’s servers. Talkback, while cross-platform, has not been
 37  maintained and at this point does not support Mac OS X on x86, which we consider
 38  to be a significant platform. Talkback is also closed-source commercial
 39  software, and has very specific requirements for its server platform.
 40  
 41  We are aware of Windows-only crash-reporting systems that leverage Microsoft’s
 42  debugging interfaces. Such systems, even if extended to support dumps from other
 43  platforms, are tied to using Windows for at least a portion of the processor
 44  platform.
 45  
 46  ## Overview
 47  
 48  The Breakpad processor itself is written in standard C++ and will work on a
 49  variety of platforms. The dumps it accepts may also have been created on a
 50  variety of systems. The library is able to combine dumps with symbolic debugging
 51  information to create stack traces that include function signatures. The
 52  processor library includes simple command-line tools to examine dumps and
 53  process them, producing stack traces. It also exposes several layers of APIs
 54  enabling crash-reporting systems to be built around the Breakpad processor.
 55  
 56  ## Detailed Design
 57  
 58  ### Dump Files
 59  
 60  In the processor, the dump data is of primary significance. Dumps typically
 61  contain:
 62  
 63  *   CPU context (register data) as it was at the time the crash occurred, and an
 64      indication of which thread caused the crash. General-purpose registers are
 65      included, as are special-purpose registers such as the instruction pointer
 66      (program counter).
 67  *   Information about each thread of execution within a crashed process,
 68      including:
 69      *   The memory region used for each thread’s stack.
 70      *   CPU context for each thread, which for various reasons is not the same
 71          as the crash context in the case of the crashed thread.
 72  *   A list of loaded code segments (or modules), including:
 73      *   The name of the file (`.so`, `.exe`, `.dll`, etc.) which provides the
 74          code.
 75      *   The boundaries of the memory region in which the code segment is visible
 76          to the process.
 77      *   A reference to the debugging information for the code module, when such
 78          information is available.
 79  
 80  Ordinarily, dumps are produced as a result of a crash, but other triggers may be
 81  set to produce dumps at any time a developer deems appropriate. The Breakpad
 82  processor can handle dumps in the minidump format, either generated by an
 83  [Breakpad client “handler”](client_design.md) implementation, or by another
 84  implementation that produces dumps in this format. The
 85  [DbgHelp.dll!MiniDumpWriteDump](http://msdn2.microsoft.com/en-us/library/ms680360.aspx)
 86  function on Windows
 87  produces dumps in this format, and is the basis for the Breakpad handler
 88  implementation on that platform.
 89  
 90  The [minidump format](http://msdn.microsoft.com/en-us/library/ms679293%28VS.85%29.aspx) is
 91  essentially a simple container format, organized as a series of streams. Each
 92  stream contains some type of data relevant to the crash. A typical “normal”
 93  minidump contains streams for the thread list, the module list, the CPU context
 94  at the time of the crash, and various bits of additional system information.
 95  Other types of minidump can be generated, such as a full-memory minidump, which
 96  in addition to stack memory contains snapshots of all of a process’ mapped
 97  memory regions.
 98  
 99  The minidump format was chosen as Breakpad’s dump format because it has an
100  established track record on Windows, and it can be adapted to meet the needs of
101  the other platforms that Breakpad supports. Most other operating systems use
102  “core” files as their native dump formats, but the capabilities of core files
103  vary across platforms, and because core files are usually presented in a
104  platform’s native executable format, there are complications involved in
105  accessing the data contained therein without the benefit of the header files
106  that define an executable format’s entire structure. Because minidumps are
107  leaner than a typical executable format, a redefinition of the format in a
108  cross-platform header file, `minidump_format.h`, was a straightforward task.
109  Similarly, the capabilities of the minidump format are understood, and because
110  it provides an extensible container, any of Breakpad’s needs that could not be
111  met directly by the standard minidump format could likely be met by extending it
112  as needed. Finally, using this format means that the dump file is compatible
113  with native debugging tools at least on Windows. A possible future avenue for
114  exploration is the conversion of minidumps to core files, to enable this same
115  benefit on other platforms.
116  
117  We have already provided an extension to the minidump format that allows it to
118  carry dumps generated on systems with PowerPC processors. The format already
119  allows for variable CPUs, so our work in this area was limited to defining a
120  context structure sufficient to represent the execution state of a PowerPC. We
121  have also defined an extension that allows minidumps to indicate which thread of
122  execution requested a dump be produced for non-crash dumps.
123  
124  Often, the information contained within a dump alone is sufficient to produce a
125  full stack backtrace for each thread. Certain optimizations that compilers
126  employ in producing code frustrate this process. Specifically, the “frame
127  pointer omission” optimization of x86 compilers can make it impossible to
128  produce useful stack traces given only a stack snapshot and CPU context. In
129  these cases, however, compiler-emitted debugging information can aid in
130  producing useful stack traces. The Breakpad processor is able to take advantage
131  of this debugging information as supplied by Microsoft’s C/C++ compiler, the
132  only compiler to apply such optimizations by default. As a result, the Breakpad
133  processor can produce useful stack traces even from code with frame pointer
134  omission optimizations as produced by this compiler.
135  
136  ### Symbol Files
137  
138  The [symbol files](symbol_files.md) that the Breakpad processor accepts allow
139  for frame pointer omission data, but this is only one of their capabilities.
140  Each symbol file also includes information about the functions, source files,
141  and source code line numbers for a single module of code. A module is an
142  individually-loadble chunk of code: these can be executables containing a main
143  program (`exe` files on Windows) or shared libraries (`.so` files on Linux,
144  `.dylib` files, frameworks, and bundles on Mac OS X, and `.dll` files on
145  Windows). Dumps contain information about which of these modules were loaded at
146  the time the dump was produced, and given this information, the Breakpad
147  processor attempts to locate debugging symbols for the module through a
148  user-supplied function embodied in a “symbol supplier.” Breakpad includes a
149  sample symbol supplier, called `SimpleSymbolSupplier`, that is used by its
150  command-line tools; this supplier locates symbol files by pathname.
151  `SimpleSymbolSupplier` is also available to other users of the Breakpad
152  processor library. This allows for the use of a simple reference implementation,
153  but preserves flexibility for users who may have more demanding symbol file
154  storage needs.
155  
156  Breakpad’s symbol file format is text-based, and was defined to be fairly
157  human-readable and to encompass the needs of multiple platforms. The Breakpad
158  processor itself does not operate directly with native symbol formats
159  ([DWARF](http://dwarf.freestandards.org/) and
160  [STABS](http://sourceware.org/gdb/current/onlinedocs/stabs.html)
161  on most Unix-like systems,
162  [.pdb files](http://msdn2.microsoft.com/en-us/library/yd4f8bd1(VS.80).aspx)
163  on Windows),
164  because of the complications in accessing potentially complex symbol formats
165  with slight variations between platforms, stored within different types of
166  binary formats. In the case of `.pdb` files, the debugging format is not even
167  documented. Instead, Breakpad’s symbol files are produced on each platform,
168  using specific debugging APIs where available, to convert native symbols to
169  Breakpad’s cross-platform format.
170  
171  ### Processing
172  
173  Most commonly, a developer will enable an application to use Breakpad by
174  building it with a platform-specific [client “handler”](client_design.md)
175  library. After building the application, the developer will create symbol files
176  for Breakpad’s use using the included `dump_syms` or `symupload` tools, or
177  another suitable tool, and place the symbol files where the processor’s symbol
178  supplier will be able to locate them.
179  
180  When a dump file is given to the processor’s `MinidumpProcessor` class, it will
181  read it using its included minidump reader, contained in the `Minidump` family
182  of classes. It will collect information about the operating system and CPU that
183  produced the dump, and determine whether the dump was produced as a result of a
184  crash or at the direct request of the application itself. It then loops over all
185  of the threads in a process, attempting to walk the stack associated with each
186  thread. This process is achieved by the processor’s `Stackwalker` components, of
187  which there are a slightly different implementations for each CPU type that the
188  processor is able to handle dumps from. Beginning with a thread’s context, and
189  possibly using debugging data, the stackwalker produces a list of stack frames,
190  containing each instruction executed in the chain. These instructions are
191  matched up with the modules that contributed them to a process, and the
192  `SymbolSupplier` is invoked to locate a symbol file. The symbol file is given to
193  a `SourceLineResolver`, which matches the instruction up with a specific
194  function name, source file, and line number, resulting in a representation of a
195  stack frame that can easily be used to identify which code was executing.
196  
197  The results of processing are made available in a `ProcessState` object, which
198  contains a vector of threads, each containing a vector of stack frames.
199  
200  For small-scale use of the Breakpad processor, and for testing and debugging,
201  the `minidump_stackwalk` tool is provided. It invokes the processor and displays
202  the full results of processing, optionally allowing symbols to be provided to
203  the processor by a pathname-based symbol supplier, `SimpleSymbolSupplier`.
204  
205  For lower-level testing and debugging, the processor library also includes a
206  `minidump_dump` tool, which walks through an entire minidump file and displays
207  its contents in somewhat readable form.
208  
209  ### Platform Support
210  
211  The Breakpad processor library is able to process dumps produced on Mac OS X
212  systems running on x86, x86-64, and PowerPC processors, on Windows and Linux
213  systems running on x86 or x86-64 processors, and on Android systems running ARM
214  or x86 processors. The processor library itself is written in standard C++, and
215  should function properly in most Unix-like environments. It has been tested on
216  Linux and Mac OS X.
217  
218  ## Future Plans
219  
220  There are currently no firm plans or timetables to implement any of these
221  features, although they are possible avenues for future exploration.
222  
223  The symbol file format can be extended to carry information about the locations
224  of parameters and local variables as stored in stack frames and registers, and
225  the processor can use this information to provide enhanced stack traces showing
226  function arguments and variable values.
227  
228  On Mac OS X and Linux, we can provide tools to convert files from the minidump
229  format into the native core format. This will enable developers to open dump
230  files in a native debugger, just as they are presently able to do with minidumps
231  on Windows.