table_format.md
1 leveldb File format 2 =================== 3 4 <beginning_of_file> 5 [data block 1] 6 [data block 2] 7 ... 8 [data block N] 9 [meta block 1] 10 ... 11 [meta block K] 12 [metaindex block] 13 [index block] 14 [Footer] (fixed size; starts at file_size - sizeof(Footer)) 15 <end_of_file> 16 17 The file contains internal pointers. Each such pointer is called 18 a BlockHandle and contains the following information: 19 20 offset: varint64 21 size: varint64 22 23 See [varints](https://developers.google.com/protocol-buffers/docs/encoding#varints) 24 for an explanation of varint64 format. 25 26 1. The sequence of key/value pairs in the file are stored in sorted 27 order and partitioned into a sequence of data blocks. These blocks 28 come one after another at the beginning of the file. Each data block 29 is formatted according to the code in `block_builder.cc`, and then 30 optionally compressed. 31 32 2. After the data blocks we store a bunch of meta blocks. The 33 supported meta block types are described below. More meta block types 34 may be added in the future. Each meta block is again formatted using 35 `block_builder.cc` and then optionally compressed. 36 37 3. A "metaindex" block. It contains one entry for every other meta 38 block where the key is the name of the meta block and the value is a 39 BlockHandle pointing to that meta block. 40 41 4. An "index" block. This block contains one entry per data block, 42 where the key is a string >= last key in that data block and before 43 the first key in the successive data block. The value is the 44 BlockHandle for the data block. 45 46 5. At the very end of the file is a fixed length footer that contains 47 the BlockHandle of the metaindex and index blocks as well as a magic number. 48 49 metaindex_handle: char[p]; // Block handle for metaindex 50 index_handle: char[q]; // Block handle for index 51 padding: char[40-p-q];// zeroed bytes to make fixed length 52 // (40==2*BlockHandle::kMaxEncodedLength) 53 magic: fixed64; // == 0xdb4775248b80fb57 (little-endian) 54 55 ## "filter" Meta Block 56 57 If a `FilterPolicy` was specified when the database was opened, a 58 filter block is stored in each table. The "metaindex" block contains 59 an entry that maps from `filter.<N>` to the BlockHandle for the filter 60 block where `<N>` is the string returned by the filter policy's 61 `Name()` method. 62 63 The filter block stores a sequence of filters, where filter i contains 64 the output of `FilterPolicy::CreateFilter()` on all keys that are stored 65 in a block whose file offset falls within the range 66 67 [ i*base ... (i+1)*base-1 ] 68 69 Currently, "base" is 2KB. So for example, if blocks X and Y start in 70 the range `[ 0KB .. 2KB-1 ]`, all of the keys in X and Y will be 71 converted to a filter by calling `FilterPolicy::CreateFilter()`, and the 72 resulting filter will be stored as the first filter in the filter 73 block. 74 75 The filter block is formatted as follows: 76 77 [filter 0] 78 [filter 1] 79 [filter 2] 80 ... 81 [filter N-1] 82 83 [offset of filter 0] : 4 bytes 84 [offset of filter 1] : 4 bytes 85 [offset of filter 2] : 4 bytes 86 ... 87 [offset of filter N-1] : 4 bytes 88 89 [offset of beginning of offset array] : 4 bytes 90 lg(base) : 1 byte 91 92 The offset array at the end of the filter block allows efficient 93 mapping from a data block offset to the corresponding filter. 94 95 ## "stats" Meta Block 96 97 This meta block contains a bunch of stats. The key is the name 98 of the statistic. The value contains the statistic. 99 100 TODO(postrelease): record following stats. 101 102 data size 103 index size 104 key size (uncompressed) 105 value size (uncompressed) 106 number of entries 107 number of data blocks