/ README.md
README.md
 1  # Description
 2  
 3  This library implements machinery for extremely fast multithreaded directory traversal.
 4  Using multiple threads leverages the kernels abilities to schedule IO-Requests in an
 5  optimal way.
 6  
 7  # How it works
 8  
 9  The user crates a 'Gatherer' object which spawns threads listening on an
10  PriorityQueue. Sending a 'directory' to this queue let one thread pick it up and traverse the
11  directory. Each element found is then send to a custom function/closure which may decides on
12  how to process it:
13   * Directories can be send again into the input PriorityQueue where other
14     threads may pick them up. This happens until the input queue is exhausted, eventually traversing
15     all sub-directories of the directory send initially.
16   * Files and Directories can be send to an output mcmp queue where they can be further
17     processed.
18   * Errors are send to the output queue as they happen.
19   * Once the input queue becomes empty a 'Done' message is send to the output to notify the
20     listener there.
21  
22  ## Queues
23  
24  A priority queue is choosen for the input to ensure that directories are processed in a file
25  handled preserving order. This is depth first in ascending inode order.
26  
27  ## Memory Optimizations
28  
29  Handling pathnames of millions of files would need considerably much memory. To conserve this
30  demands a ObjectPath implementation encodes any path by its filename and a reference to its
31  parent directory. Futher all names are interned thus same names would require only memory once
32  for their storage.
33  
34  # Benchmarking Results
35  
36  See the 'BENCH.md' file for some tests. As baseline was the 'gnu find' utility chosen. In the
37  most extreme case this code can perform directory traversal 20 times faster. With slow
38  spinning disks and moderate settings (16 threads) 1.6 times faster.