/ README.md
README.md
1 # Description 2 3 This library implements machinery for extremely fast multithreaded directory traversal. 4 Using multiple threads leverages the kernels abilities to schedule IO-Requests in an 5 optimal way. 6 7 # How it works 8 9 The user crates a 'Gatherer' object which spawns threads listening on an 10 PriorityQueue. Sending a 'directory' to this queue let one thread pick it up and traverse the 11 directory. Each element found is then send to a custom function/closure which may decides on 12 how to process it: 13 * Directories can be send again into the input PriorityQueue where other 14 threads may pick them up. This happens until the input queue is exhausted, eventually traversing 15 all sub-directories of the directory send initially. 16 * Files and Directories can be send to an output mcmp queue where they can be further 17 processed. 18 * Errors are send to the output queue as they happen. 19 * Once the input queue becomes empty a 'Done' message is send to the output to notify the 20 listener there. 21 22 ## Queues 23 24 A priority queue is choosen for the input to ensure that directories are processed in a file 25 handled preserving order. This is depth first in ascending inode order. 26 27 ## Memory Optimizations 28 29 Handling pathnames of millions of files would need considerably much memory. To conserve this 30 demands a ObjectPath implementation encodes any path by its filename and a reference to its 31 parent directory. Futher all names are interned thus same names would require only memory once 32 for their storage. 33 34 # Benchmarking Results 35 36 See the 'BENCH.md' file for some tests. As baseline was the 'gnu find' utility chosen. In the 37 most extreme case this code can perform directory traversal 20 times faster. With slow 38 spinning disks and moderate settings (16 threads) 1.6 times faster.