/ docs / shell.md
shell.md
  1  # Shell Tools
  2  
  3  Jump right into with an [example](#example-script).
  4  
  5  ## `dryrun`
  6  
  7  `dryrun` runs a command only when the `$DRYRUN` environmental variable is not set. [^dryrun_name]
  8  Also see [`try`](https://github.com/binpash/try), comparable to `make -n`
  9  
 10  ```shell title="dryrun'ed"
 11  $ echo hi > myfile
 12  $ export DRYRUN=1
 13  $ dryrun rm myfile  
 14  rm myfile  # (1)!
 15  $ cat myfile
 16  hi
 17  ```
 18  
 19  1. this is printed but not run
 20  
 21  ```shell title="actually run"
 22  $ echo hi > myfile
 23  $ export DRYRUN=
 24  $ dryrun rm myfile # (1)!
 25  $ cat myfile
 26  cat: myfile: No such file or directory
 27  ```
 28  
 29  1. nothing is printed. `rm` runs silently as if `dryrun` was not there
 30  
 31  It's worth noting bash allows environmental variables to be set and scoped to a single command by prefacing the call with `var=val`. For `dryrun` enabled scripts and functions, this means staring with `DRYRUN=1` for the "just print" version.
 32  
 33  ```shell title="compact"
 34  $ example(){ dryrun rm myfile; }
 35  $ DRYRUN=1 example
 36  rm myfile # (1)!
 37  $ echo $DRYRUN
 38  # (2)!
 39  ```
 40  
 41  1. `rm myfile` is printed but not run
 42  2. empty line showing `$DRYRUN` is not set but was for the call above (where it was explicitly declared)
 43  
 44  [^dryrun_name]: "dryrun"'s name is taken from the rsync "--dryrun" option. `perl-rename` alias `--dry-run` with `--just-print`
 45  
 46  ## `drytee`
 47  
 48  `drytee` works like `dryrun` but for capturing output you may want to be written to a file unless `$DRYRUN` is set. It's like the command `tee` but for writing to standard error when the user wants a dry run.
 49  
 50  ```shell
 51  $ echo hi | drytee myfile
 52  $ cat myfile
 53  hi # (1)!
 54  $ DRYRUN=1
 55  $ echo bye | drytee myfile
 56  #       bye
 57  # would be written to myfile
 58  $ cat myfile
 59  hi # (2)!
 60  ```
 61  
 62  1. `myfile` was written ("hi") b/c `DRYRUN` is not set
 63  2. `myfile` is unchanged. `bye` was not written
 64  
 65  ## `warn`
 66  
 67  `warn` could be written `echo "$@" > &2`. It simply writes it's arguments to standard error (2) instead of standard output. This is useful to avoid shell capture to either a variable or a file.
 68  ```shell title="avoid capture"
 69  $ a=$(warn "oh no"; echo "results")
 70  oh no # (1)!
 71  $ echo $a
 72  results
 73  ```
 74  
 75  1. 'oh no' seen on the terminal b/c it's written to stderr. "resutls" on stdout is captured into `$a`
 76  
 77  A contrived example for giving a warning that doesn't end up in the output (but still potentially notifies the user)
 78  ```shell title="no warning in file"
 79  # create a file of n lines sequentally numbered
 80  filelines(){
 81    n="$1"
 82    [ $n -lt 2 ] && warn "# WARNING: n=$n < 2. limited output"
 83    printf "%s\n" $(seq 1 $n)
 84  }
 85  ```
 86  
 87  ```
 88  $ filelines 1 > myfile
 89  # WARNING: n=1 < 2. limited output
 90  $ cat myfile
 91  1
 92  ```
 93  
 94  ## `waitforjobs`
 95  `waitforjobs` tracks the number of forked child processes. It waits `SLEEPTIME` and polls the count until there are fewer than `MAXJOBS` jobs running. It uses shell job control facilities and is useful for local, single user, or small servers. On HPC, you'd use `sbatch` from e.g. `slurm` or `torque`. Other alternatives include [`bq`](https://github.com/sitaramc/bq) and [`task-spooler`](https://github.com/justanhduc/task-spooler). [GNU Parallel](https://blog.ronin.cloud/gnu-parallel/) and Make also have job dispatching facilities.
 96  
 97  ```shell title="waitforjobs"
 98  for i in {1..20}; do
 99    sleep 5 & # (1)!
100    waitforjobs
101  done
102  wait  # (2)!
103  ```
104  
105  1. `sleep` here is a stand in for a more useful long running command to be parallelized
106  2. waitforjobs will exit the final loop with MAXJOBS-1 still running. this `wait` will wait for those (but wont have the the notifications every SLEEPTIME. could consider `waitforjobs -p 1` instead.
107  
108  when running locally, output looks like:
109  ```
110  2023-05-24T15:38: sleep 60s on 3: sleep 5;sleep 5;bash /home/foranw/src/work/lncdtools/waitforjobs;
111  ```
112  
113  ### Arguments
114  ```
115  USAGE:
116    waitforjobs [-j numjobs] [-s sleeptimesecs] [-c "auto"]  [-h|--help]"
117  ```
118  
119  `-c auto` is worth exploring in more detail. Using this option, a temporary file like `/tmp/host-user-basename.jobcfg` is created. Modifying the sleep and job settings in that file will affect the waitforjobs process watching it. You can change the number of cores to use in real time!
120  
121  
122  ## `iffmain`
123  In a scripts where `main_function` is a deifned function, `iffmain` use at the end like 
124  
125  
126  ```bash
127  eval "$(iffmain main_function)"
128  ```
129  
130  
131  Defensive shell scripting calls for `set -euo pipefail` but running that (e.g. via `source`) on the command line will break other scripts and normal interactive shell [^sete_break]. `iffmain` is modeled after the python idiom `if __name__ == "__main__"`. When the script is not sourced, it toggles the ideal settings and sets a standard `trap` to notify on error.
132  
133  ### Sourcing
134  Using `iffmain` makes it easier to write bash scripts that are primarily functions. Scripts styled this way are easy to source and test.
135  
136  A bash file that can be sourced can be reused and is able to be tested. See
137  [Bash Test Driven Development](https://neuro-programmers.pitt.edu/wiki/doku.php?id=public:bash_tdd)
138  
139  ### Template
140  `iffmain` generates shell code that looks like
141  ```shell title="iffmain template"
142  if [[ "$(caller)" == "0 "* ]]; then
143    set -euo pipefail
144    trap 'e=$?; [ $e -ne 0 ] && echo "$0 exited in error $e"' EXIT
145    MAINFUNCNAME "$@"
146    exit $?
147  fi
148  ```
149  
150  [^sete_break]: `set -e` "exit on an error" is especially disruptive.  One typo command and your interactive shell closes itself. 
151  
152  ## Example Script
153  
154  As an example, we'll use `drytee`, `dryrun`, and `waitforjobs` in the script `tat2all.bash` to
155  
156    * run [`tat2`](tat2) (`tat2_single`) on a collection of bold files
157    * in parallel (`all_parallel`) and 
158    * need to do a few checks (`input_checks`) before hand.
159  
160  We'll support 
161  
162    * printing what the script would do instead of actually doing it (`dryrun` and `drytee`) and
163    * using hygienic shell settings (e.g. `set -euo pipefail`) only when run as a file but not when sourced [^sourcetest]
164  
165  ```shell title="tat2_all.bash" linenums="1"
166  #!/usr/bin/env bash
167  
168  # create a 1D 0/1 binary censor file based on FD > 0.3mm
169  create_censor(){
170     mot=${1//bold.nii.gz/motion.txt} # sub*rest_motion.txt
171     out=${1//bold.nii.gz/fdcen.1D}   # sub*rest_fdcen.1D
172     [ ! -r "$mot" ] && warn "no $mot!" && return 1 # (5)!
173     fd_calc 1:3 4:6 deg .3 < "${mot}" |
174       drytee "$out" # (1)!
175  
176     # pass output censor file name so it can be captured
177     echo "$out"
178  }
179  
180  # run tat2 for a given bold epi
181  # remove high motion timepoints from calculation
182  tat2_single(){
183     local input
184     input="${1:?input.nii.gz needed}"
185     out=$(create_censor "$input")
186     dryrun tat2 "$input" -censor "$out" # (2)!
187  }
188  
189  # run tat2 for all bold image files in parallel
190  tat2_parallel(){
191    FILES=(sub-*/ses-*/func/*bold.nii.gz)
192  
193    for input in "${FILES[@]}"; do
194       tat2_single "$input" &
195       waitforjobs # (3)!
196       # for testing, just run one using:
197       # break
198    done
199  
200    # hold until the final set of jobs to finish
201    wait
202  }
203  
204  eval "$(iffmain tat2_parallel)" # (4)!
205  ```
206  
207  1. `drytee` writes to the specified file unless `DRYRUN` is set, then it truncates the output and writes output to stderr.
208  2. `dryrun` echos everything after it to `stderr` if `DRYRUN` is set. Otherwise, it runs the command.
209  3. `waitforjobs` watches the children of the current process and sleeps until there are fewer than 10 running.
210  4. `iffmain` generates bash code. It runs `set -euo pipefail` and the specified function only if file is not sourced -- e.g. `bash tat2_all.bash` or `./tat2_all.bash` [^sourcetest]
211  5. `warn` sends a message to `stderr` so it doesn't get included in any eval/capture -- `a=$(warn 'oh no'; echo 'yes')` yields `a="yes"`
212  
213  
214  ### In Use
215  If we have files like
216  ```
217  sub-1
218  └── ses-1
219      └── func
220          ├── sub-1_ses-1_func_task-rest_bold.nii.gz
221          └── sub-1_ses-1_func_task-rest_motion.txt
222  ```
223  
224  If we set `DRYRUN`, we'll see what the script would do: a "dry run".
225  ```shell
226  DRYRUN=1 ./tat2_all.bash
227  ```
228  
229  ```bash
230  #       1
231  #       1
232  #       1
233  #       0
234  #       1 # (1)
235  # would be written to sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D  # (2)
236  tat2 sub-1/ses-1/func/sub-1_ses-1_func_task-rest_bold.nii.gz -censor sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D
237  # (3)!
238  ```
239  
240  1. output of `fd_calc`, `drytee` truncated, prefixed with `#\t` and sent to stderr
241  2. `drytee` also mentions what file it would have created. This file still does not exist
242  3. `dryrun` shows but does not run the `tat2` command.
243  
244  ### Source/Debug
245  Because the bash file is only functions and `iffmain` does not run if sourced, we can debug with `source`.
246  Here we'll run the `create_censor` function defined in `tat2_all.bash` to check that it does what we expect.
247  
248  ```bash
249  source tat2_all.bash
250  create_censor sub-1/ses-1/func/sub-1_ses-1_func_task-rest_bold.nii.gz
251  cat sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D
252  ```
253  
254  ```text title="sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D"
255  1
256  1
257  1
258  0
259  1
260  1
261  ```
262  
263  
264  
265  [^sourcetest]: sourcing a shell script is useful for running same-file tests with bats and/or embedding the current file in other scripts to reuse function definitions. See [Sourcing][#sourcing]