/ AKFS
AKFS
  1  AKFS - Arching Kaos File System
  2  ===============================
  3  
  4  > (previously named SMFS - Sha(512) Merkle File System)
  5  
  6  1. Introduction
  7  ---------------
  8  This filesystem follows the merkle tree architecture. Given a SHA512 hash root,
  9  we can climb the branches, towards the leafs of the tree, keeping the order of
 10  the leaves, intact.
 11  
 12  In our case, leaves represent chunks of a splitted file.
 13  
 14  2. Analysis
 15  -----------
 16  Each file is encoded in base64, splitted in chunks of 4KB-4MB (chunk size will
 17  vary depending on the size of the original file) and then for each chunk, the
 18  SHA512 hash of it calculated.
 19  
 20  Then, starting from the first's chunk hash, we store it with the next's chunk
 21  hash in a text file. We move to the third and we repeat the process. The figure
 22  next, will help -hopefully- in understanding this process.
 23  
 24  
 25                                   ___   
 26                                  |   |\ 
 27                      .---------->|RUT'-|<---------.
 28                      |           |HASH |          |
 29                      |           |_____|          |
 30                      |                            |
 31                      |                            |
 32                      |                            |
 33                    ___                          ___   
 34                   |   |\                       |   |\ 
 35            .----->|BR0'-|                      |BR1'-|<-------.
 36            |      |HASH |                      |HASH |        |
 37            |      |_____|                      |_____|        |
 38            |         ^                            ^           |
 39            |         |                            |           |
 40            |         |                            |           |
 41          ___         |    ___         ___         |         ___   
 42         |   |\       |   |   |\      |   |\       |        |   |\ 
 43         |CH1'-|      '---|CH2'-|     |CH3'-|      |        |CH4'-|
 44         |HASH |          |HASH |     |HASH |------'        |HASH |
 45         |_____|          |_____|     |_____|               |_____|
 46  
 47  Each "packet" of chunk hashes is called a "branch". Effectively, we repeat the
 48  process above for the branches as well, matching the pattern 1-2,3-4,..,(N-1)-N.
 49  
 50  We repeat this process until we have one branch as a result. This is called the
 51  "root" (depicted as RUT in the figure). The only note we take, is that if N is
 52  an odd number, we simply duplicate the last hash, for example, 1-2,3-3 and we
 53  ignore them when reconstructing the original file.
 54  
 55  To "merge" the leaves back to a file, we follow the process in reverse order.
 56  
 57  3. Benefits
 58  -----------
 59  Although storage nowadays is not a big concern, connectivity issues still are.
 60  Having small pieces of information about the file to get small and verifiable
 61  pieces of that file, helps with low-bandwidth connectivity and frequent
 62  disconnects.
 63  
 64  4. Influence
 65  ------------
 66  The structure is heavily influenced by torrents and bitcoin.
 67  
 68  5. Implementation
 69  -----------------
 70  Currently, there are two bash scripts in the `bin` directory doing the file to
 71  hash and the hash to file operations, respectively:
 72  - `ak-sm-merkle-tree` -> `ak fs --add`
 73  - `ak-sm-merkle-tree-to-file` -> `ak fs --get`
 74  
 75  6. Specifications
 76  -----------------
 77  As part of arching-kaos, the tree is stored under the `~/.arching-kaos`
 78  directory.
 79  
 80  The initial file is converted to base64.
 81  The file is splitted depending on its size in chunks under the `ftr` directory.
 82  Branches are consisting of 2 hashes as strings, separated with a '\n'. The same
 83  applies for the root.
 84  
 85  7. Networking
 86  -------------
 87  Although networking capabilities are not part of the current implementation of
 88  the system, it's fairly easy to exchange branches and root hashes as 'metadata'
 89  and chunks/leaves as 'data'. A DHT-like implementation is under the works so
 90  nodes requesting whichever hash can discover nodes that have those.
 91  
 92  8. Bouquets of leaves (maps)
 93  ----------------------------
 94  Based on the structure of the merkle trees we are producing here, another way is
 95  possible to share information about the leaves (chunks) of a file. Instead of
 96  packing those as two hashes of leaves in one branch, we would get all the leaves
 97  hashes ordered as leaf01,leaf02,...,leafN.
 98  
 99  For this kind of work see the following bash scripts:
100  ./bin/ak-sm-filejoiner
101  ./bin/ak-sm-filesplitter
102  
103  9. Storage
104  ----------
105  In current implementations storage is as follows:
106  - $AK_WORKDIR/fmrk branches,
107  - $AK_WORKDIR/ftr leaves and
108  - $AK_WORKDIR/fmp maps
109  
110  The files are NOT organized for now.
111  
112  To update this non structured into something more handy, we can use each digit
113  of a given SHA512 hash as a directory name.
114  
115  For example given the hash:
116  0d1e5cd136e004be8c9625b1037e2e908304f03998b94cd201f6ca8a125bab03385f3c9c11b3c7cb280fb6b6f1fcbf9b2877e48dad09c81fa0ff6e5e7412ad0e
117  
118  We should search or store or read in the following file:
119  0/d/1/e/5/c/d/1/3/6/e/0/0/4/b/e/8/c/9/6/2/5/b/1/0/3/7/e/2/e/9/0/8/3/0/4/f/0/3/9/9/8/b/9/4/c/d/2/0/1/f/6/c/a/8/a/1/2/5/b/a/b/0/3/3/8/5/f/3/c/9/c/1/1/b/3/c/7/c/b/2/8/0/f/b/6/b/6/f/1/f/c/b/f/9/b/2/8/7/7/e/4/8/d/a/d/0/9/c/8/1/f/a/0/f/f/6/e/5/e/7/4/1/2/a/d/0/e
120  
121  In this way we would be storing up to 16 files per directory.
122  
123  Now, doubling the digits would get us 256 files per directory:
124  0d/1e/5c/d1/36/e0/04/be/8c/96/25/b1/03/7e/2e/90/83/04/f0/39/98/b9/4c/d2/01/f6/ca/8a/12/5b/ab/03/38/5f/3c/9c/11/b3/c7/cb/28/0f/b6/b6/f1/fc/bf/9b/28/77/e4/8d/ad/09/c8/1f/a0/ff/6e/5e/74/12/ad/0e
125  
126  Finally, a 65536 files per directory approach would be with 3 digits per
127  directory, like:
128  0d1e/5cd1/36e0/04be/8c96/25b1/037e/2e90/8304/f039/98b9/4cd2/01f6/ca8a/125b/ab03/385f/3c9c/11b3/c7cb/280f/b6b6/f1fc/bf9b/2877/e48d/ad09/c81f/a0ff/6e5e/7412/ad0e
129  
130  But that is too much files/directory ratio.
131  
132  We can also reuse the original hash like:
133  0d/1e/5c/d1/36/e0/04/be/8c/96/25/b1/03/7e/2e/90/83/04/f0/39/98/b9/4c/d2/01/f6/ca/8a/12/5b/ab/03/38/5f/3c/9c/11/b3/c7/cb/28/0f/b6/b6/f1/fc/bf/9b/28/77/e4/8d/ad/09/c8/1f/a0/ff/6e/5e/74/12/ad/0d1e5cd136e004be8c9625b1037e2e908304f03998b94cd201f6ca8a125bab03385f3c9c11b3c7cb280fb6b6f1fcbf9b2877e48dad09c81fa0ff6e5e7412ad0e
134  
135  But, on the other hand, we already know the hash from the path we walked into
136  and we can recalculate the hash of the file we found, in case we want to verify
137  the correctness of the path/file.
138  
139  To take care of all the above, a "driver" should be implemented, that it would
140  be given a hash and it would return the location of the file. We can go for the
141  2digit per directory approach.
142  
143  Update
144  ------
145  
146  Driver was made but grouped for 1digit per directory per hash.
147  
148  - `lib/_ak_fs` is introduced
149  - `src/akfs.c` is introduced
150  
151  The C file is for future development and usage while `lib/_ak_fs` is the one
152  actively being used in the project.