/ doc / policy / mempool-design.md
mempool-design.md
  1  # Mempool design and limits
  2  
  3  ## Definitions
  4  
  5  We view the unconfirmed transactions in the mempool as a directed graph,
  6  with an edge from transaction B to transaction A if B spends an output created
  7  by A (i.e., B is a **child** of A, and A is a **parent** of B).
  8  
  9  A transaction's **ancestors** include, recursively, its parents, the parents of
 10  its parents, etc.  A transaction's **descendants** include, recursively, its
 11  children, the children of its children, etc.
 12  
 13  A **cluster** is a connected component of the graph, i.e., a set of
 14  transactions where each transaction is reachable from any other transaction in
 15  the set by following edges in either direction. The cluster corresponding to a
 16  given transaction consists of that transaction, its ancestors and descendants,
 17  and the ancestors and descendants of those transactions, and so on.
 18  
 19  Each cluster is **linearized**, or sorted, in a topologically valid order (i.e.,
 20  no transaction appears before any of its ancestors). Our goal is to construct a
 21  linearization where the highest feerate subset of a cluster appears first,
 22  followed by the next highest feerate subset of the remaining transactions, and
 23  so on[1]. We call these subsets **chunks**, and the chunks of a linearization
 24  have the property that they are always in monotonically decreasing feerate
 25  order.
 26  
 27  Given two or more linearized clusters, we can construct a linearization of the
 28  union by simply merge sorting the chunks of each cluster by feerate.
 29  
 30  For any set of linearized clusters, then, we can define the **feerate diagram**
 31  of the set by plotting the cumulative fee (y-axis) against the cumulative size
 32  (x-axis) as we progress from chunk to chunk. Given two linearizations for the
 33  same set of transactions, we can compare their feerate diagrams by
 34  comparing their cumulative fees at each size value. Two diagrams may be
 35  **incomparable** if neither contains the other (i.e., there exist size values at
 36  which each one has a greater cumulative fee than the other). Or, they may be
 37  **equivalent** if they have identical cumulative fees at every size value; or
 38  one may be **strictly better** than the other if they are comparable and there
 39  exists at least one size value for which the cumulative fee is strictly higher
 40  in one of them.
 41  
 42  For more background and rationale, see [2] and [3] below.
 43  
 44  ## Mining/eviction
 45  
 46  As described above, the linearization of each cluster gives us a linearization
 47  of the entire mempool. We use this ordering for both block building and
 48  eviction, by selecting chunks at the front of the linearization when
 49  constructing a block template, and by evicting chunks from the back of the
 50  linearization when we need to free up space in the mempool.
 51  
 52  ## Replace-by-fee
 53  
 54  Prior to the cluster mempool implementation, it was possible for replacements
 55  to be prevented even if they would make the mempool more profitable for miners,
 56  and it was possible for replacements to be permitted even if the newly accepted
 57  transaction was less desirable to miners than the transactions it was
 58  replacing. With the ability to construct linearizations of the mempool, we're
 59  now able to compare the feerate diagram of the mempool before and after a
 60  proposed replacement, and only accept the replacement if it makes the feerate
 61  diagram strictly better.
 62  
 63  In simple cases, the intuition is that a replacement should have a higher
 64  feerate and fee than the transaction(s) it replaces. But for more complex cases
 65  (where some transactions may have unconfirmed parents), there may not be a
 66  simple way to describe the fee that is needed to successfully replace a set of
 67  transactions, other than to say that the overall feerate diagram of the
 68  resulting mempool must improve somewhere and not be worse anywhere.
 69  
 70  ## Mempool limits
 71  
 72  ### Motivation
 73  
 74  Selecting chunks in decreasing feerate order when building a block template
 75  will be close to optimal when the maximum size of any chunk is small compared
 76  to the block size. And for mempool eviction, we don't wish to evict too much of
 77  the mempool at once when a single (potentially small) transaction arrives that
 78  takes us over our mempool size limit. For both of these reasons, it's desirable
 79  to limit the maximum size of a cluster and thereby limit the maximum size of
 80  any chunk (as a cluster may consist entirely of one chunk).
 81  
 82  The computation required to linearize a transaction grows (in polynomial time)
 83  with the number of transactions in a cluster, so limiting the number of
 84  transactions in a cluster is necessary to ensure that we're able to find good
 85  (ideally, optimal) linearizations in a reasonable amount of time.
 86  
 87  ### Limits
 88  
 89  Transactions submitted to the mempool must not result in clusters that would
 90  exceed the cluster limits (64 transactions and 101 kvB total per cluster).
 91  
 92  ## References/Notes
 93  [1] This is an instance of the maximal-ratio closure problem, which is closely
 94  related to the maximal-weight closure problem, as found in the field of mineral
 95  extraction for open pit mining.
 96  
 97  [2] See
 98  https://delvingbitcoin.org/t/an-overview-of-the-cluster-mempool-proposal/393
 99  for a high level overview of the cluster mempool implementation (PR#33629,
100  since v31.0) and its design rationale.
101  
102  [3] See https://delvingbitcoin.org/t/mempool-incentive-compatibility/553 for an
103  explanation of why and how we use feerate diagrams for mining, eviction, and
104  evaluating transaction replacements.