/ AKFS
AKFS
1 AKFS - Arching Kaos File System 2 =============================== 3 4 > (previously named SMFS - Sha(512) Merkle File System) 5 6 1. Introduction 7 --------------- 8 This filesystem follows the merkle tree architecture. Given a SHA512 hash root, 9 we can climb the branches, towards the leafs of the tree, keeping the order of 10 the leaves, intact. 11 12 In our case, leaves represent chunks of a splitted file. 13 14 2. Analysis 15 ----------- 16 Each file is encoded in base64, splitted in chunks of 4KB-4MB (chunk size will 17 vary depending on the size of the original file) and then for each chunk, the 18 SHA512 hash of it calculated. 19 20 Then, starting from the first's chunk hash, we store it with the next's chunk 21 hash in a text file. We move to the third and we repeat the process. The figure 22 next, will help -hopefully- in understanding this process. 23 24 25 ___ 26 | |\ 27 .---------->|RUT'-|<---------. 28 | |HASH | | 29 | |_____| | 30 | | 31 | | 32 | | 33 ___ ___ 34 | |\ | |\ 35 .----->|BR0'-| |BR1'-|<-------. 36 | |HASH | |HASH | | 37 | |_____| |_____| | 38 | ^ ^ | 39 | | | | 40 | | | | 41 ___ | ___ ___ | ___ 42 | |\ | | |\ | |\ | | |\ 43 |CH1'-| '---|CH2'-| |CH3'-| | |CH4'-| 44 |HASH | |HASH | |HASH |------' |HASH | 45 |_____| |_____| |_____| |_____| 46 47 Each "packet" of chunk hashes is called a "branch". Effectively, we repeat the 48 process above for the branches as well, matching the pattern 1-2,3-4,..,(N-1)-N. 49 50 We repeat this process until we have one branch as a result. This is called the 51 "root" (depicted as RUT in the figure). The only note we take, is that if N is 52 an odd number, we simply duplicate the last hash, for example, 1-2,3-3 and we 53 ignore them when reconstructing the original file. 54 55 To "merge" the leaves back to a file, we follow the process in reverse order. 56 57 3. Benefits 58 ----------- 59 Although storage nowadays is not a big concern, connectivity issues still are. 60 Having small pieces of information about the file to get small and verifiable 61 pieces of that file, helps with low-bandwidth connectivity and frequent 62 disconnects. 63 64 4. Influence 65 ------------ 66 The structure is heavily influenced by torrents and bitcoin. 67 68 5. Implementation 69 ----------------- 70 Currently, there are two bash scripts in the `bin` directory doing the file to 71 hash and the hash to file operations, respectively: 72 - `ak-sm-merkle-tree` -> `ak fs --add` 73 - `ak-sm-merkle-tree-to-file` -> `ak fs --get` 74 75 6. Specifications 76 ----------------- 77 As part of arching-kaos, the tree is stored under the `~/.arching-kaos` 78 directory. 79 80 The initial file is converted to base64. 81 The file is splitted depending on its size in chunks under the `ftr` directory. 82 Branches are consisting of 2 hashes as strings, separated with a '\n'. The same 83 applies for the root. 84 85 7. Networking 86 ------------- 87 Although networking capabilities are not part of the current implementation of 88 the system, it's fairly easy to exchange branches and root hashes as 'metadata' 89 and chunks/leaves as 'data'. A DHT-like implementation is under the works so 90 nodes requesting whichever hash can discover nodes that have those. 91 92 8. Bouquets of leaves (maps) 93 ---------------------------- 94 Based on the structure of the merkle trees we are producing here, another way is 95 possible to share information about the leaves (chunks) of a file. Instead of 96 packing those as two hashes of leaves in one branch, we would get all the leaves 97 hashes ordered as leaf01,leaf02,...,leafN. 98 99 For this kind of work see the following bash scripts: 100 ./bin/ak-sm-filejoiner 101 ./bin/ak-sm-filesplitter 102 103 9. Storage 104 ---------- 105 In current implementations storage is as follows: 106 - $AK_WORKDIR/fmrk branches, 107 - $AK_WORKDIR/ftr leaves and 108 - $AK_WORKDIR/fmp maps 109 110 The files are NOT organized for now. 111 112 To update this non structured into something more handy, we can use each digit 113 of a given SHA512 hash as a directory name. 114 115 For example given the hash: 116 0d1e5cd136e004be8c9625b1037e2e908304f03998b94cd201f6ca8a125bab03385f3c9c11b3c7cb280fb6b6f1fcbf9b2877e48dad09c81fa0ff6e5e7412ad0e 117 118 We should search or store or read in the following file: 119 0/d/1/e/5/c/d/1/3/6/e/0/0/4/b/e/8/c/9/6/2/5/b/1/0/3/7/e/2/e/9/0/8/3/0/4/f/0/3/9/9/8/b/9/4/c/d/2/0/1/f/6/c/a/8/a/1/2/5/b/a/b/0/3/3/8/5/f/3/c/9/c/1/1/b/3/c/7/c/b/2/8/0/f/b/6/b/6/f/1/f/c/b/f/9/b/2/8/7/7/e/4/8/d/a/d/0/9/c/8/1/f/a/0/f/f/6/e/5/e/7/4/1/2/a/d/0/e 120 121 In this way we would be storing up to 16 files per directory. 122 123 Now, doubling the digits would get us 256 files per directory: 124 0d/1e/5c/d1/36/e0/04/be/8c/96/25/b1/03/7e/2e/90/83/04/f0/39/98/b9/4c/d2/01/f6/ca/8a/12/5b/ab/03/38/5f/3c/9c/11/b3/c7/cb/28/0f/b6/b6/f1/fc/bf/9b/28/77/e4/8d/ad/09/c8/1f/a0/ff/6e/5e/74/12/ad/0e 125 126 Finally, a 65536 files per directory approach would be with 3 digits per 127 directory, like: 128 0d1e/5cd1/36e0/04be/8c96/25b1/037e/2e90/8304/f039/98b9/4cd2/01f6/ca8a/125b/ab03/385f/3c9c/11b3/c7cb/280f/b6b6/f1fc/bf9b/2877/e48d/ad09/c81f/a0ff/6e5e/7412/ad0e 129 130 But that is too much files/directory ratio. 131 132 We can also reuse the original hash like: 133 0d/1e/5c/d1/36/e0/04/be/8c/96/25/b1/03/7e/2e/90/83/04/f0/39/98/b9/4c/d2/01/f6/ca/8a/12/5b/ab/03/38/5f/3c/9c/11/b3/c7/cb/28/0f/b6/b6/f1/fc/bf/9b/28/77/e4/8d/ad/09/c8/1f/a0/ff/6e/5e/74/12/ad/0d1e5cd136e004be8c9625b1037e2e908304f03998b94cd201f6ca8a125bab03385f3c9c11b3c7cb280fb6b6f1fcbf9b2877e48dad09c81fa0ff6e5e7412ad0e 134 135 But, on the other hand, we already know the hash from the path we walked into 136 and we can recalculate the hash of the file we found, in case we want to verify 137 the correctness of the path/file. 138 139 To take care of all the above, a "driver" should be implemented, that it would 140 be given a hash and it would return the location of the file. We can go for the 141 2digit per directory approach. 142 143 Update 144 ------ 145 146 Driver was made but grouped for 1digit per directory per hash. 147 148 - `lib/_ak_fs` is introduced 149 - `src/akfs.c` is introduced 150 151 The C file is for future development and usage while `lib/_ak_fs` is the one 152 actively being used in the project.