shell.md
1 # Shell Tools 2 3 Jump right into with an [example](#example-script). 4 5 ## `dryrun` 6 7 `dryrun` runs a command only when the `$DRYRUN` environmental variable is not set. [^dryrun_name] 8 Also see [`try`](https://github.com/binpash/try), comparable to `make -n` 9 10 ```shell title="dryrun'ed" 11 $ echo hi > myfile 12 $ export DRYRUN=1 13 $ dryrun rm myfile 14 rm myfile # (1)! 15 $ cat myfile 16 hi 17 ``` 18 19 1. this is printed but not run 20 21 ```shell title="actually run" 22 $ echo hi > myfile 23 $ export DRYRUN= 24 $ dryrun rm myfile # (1)! 25 $ cat myfile 26 cat: myfile: No such file or directory 27 ``` 28 29 1. nothing is printed. `rm` runs silently as if `dryrun` was not there 30 31 It's worth noting bash allows environmental variables to be set and scoped to a single command by prefacing the call with `var=val`. For `dryrun` enabled scripts and functions, this means staring with `DRYRUN=1` for the "just print" version. 32 33 ```shell title="compact" 34 $ example(){ dryrun rm myfile; } 35 $ DRYRUN=1 example 36 rm myfile # (1)! 37 $ echo $DRYRUN 38 # (2)! 39 ``` 40 41 1. `rm myfile` is printed but not run 42 2. empty line showing `$DRYRUN` is not set but was for the call above (where it was explicitly declared) 43 44 [^dryrun_name]: "dryrun"'s name is taken from the rsync "--dryrun" option. `perl-rename` alias `--dry-run` with `--just-print` 45 46 ## `drytee` 47 48 `drytee` works like `dryrun` but for capturing output you may want to be written to a file unless `$DRYRUN` is set. It's like the command `tee` but for writing to standard error when the user wants a dry run. 49 50 ```shell 51 $ echo hi | drytee myfile 52 $ cat myfile 53 hi # (1)! 54 $ DRYRUN=1 55 $ echo bye | drytee myfile 56 # bye 57 # would be written to myfile 58 $ cat myfile 59 hi # (2)! 60 ``` 61 62 1. `myfile` was written ("hi") b/c `DRYRUN` is not set 63 2. `myfile` is unchanged. `bye` was not written 64 65 ## `warn` 66 67 `warn` could be written `echo "$@" > &2`. It simply writes it's arguments to standard error (2) instead of standard output. This is useful to avoid shell capture to either a variable or a file. 68 ```shell title="avoid capture" 69 $ a=$(warn "oh no"; echo "results") 70 oh no # (1)! 71 $ echo $a 72 results 73 ``` 74 75 1. 'oh no' seen on the terminal b/c it's written to stderr. "resutls" on stdout is captured into `$a` 76 77 A contrived example for giving a warning that doesn't end up in the output (but still potentially notifies the user) 78 ```shell title="no warning in file" 79 # create a file of n lines sequentally numbered 80 filelines(){ 81 n="$1" 82 [ $n -lt 2 ] && warn "# WARNING: n=$n < 2. limited output" 83 printf "%s\n" $(seq 1 $n) 84 } 85 ``` 86 87 ``` 88 $ filelines 1 > myfile 89 # WARNING: n=1 < 2. limited output 90 $ cat myfile 91 1 92 ``` 93 94 ## `waitforjobs` 95 `waitforjobs` tracks the number of forked child processes. It waits `SLEEPTIME` and polls the count until there are fewer than `MAXJOBS` jobs running. It uses shell job control facilities and is useful for local, single user, or small servers. On HPC, you'd use `sbatch` from e.g. `slurm` or `torque`. Other alternatives include [`bq`](https://github.com/sitaramc/bq) and [`task-spooler`](https://github.com/justanhduc/task-spooler). [GNU Parallel](https://blog.ronin.cloud/gnu-parallel/) and Make also have job dispatching facilities. 96 97 ```shell title="waitforjobs" 98 for i in {1..20}; do 99 sleep 5 & # (1)! 100 waitforjobs 101 done 102 wait # (2)! 103 ``` 104 105 1. `sleep` here is a stand in for a more useful long running command to be parallelized 106 2. waitforjobs will exit the final loop with MAXJOBS-1 still running. this `wait` will wait for those (but wont have the the notifications every SLEEPTIME. could consider `waitforjobs -p 1` instead. 107 108 when running locally, output looks like: 109 ``` 110 2023-05-24T15:38: sleep 60s on 3: sleep 5;sleep 5;bash /home/foranw/src/work/lncdtools/waitforjobs; 111 ``` 112 113 ### Arguments 114 ``` 115 USAGE: 116 waitforjobs [-j numjobs] [-s sleeptimesecs] [-c "auto"] [-h|--help]" 117 ``` 118 119 `-c auto` is worth exploring in more detail. Using this option, a temporary file like `/tmp/host-user-basename.jobcfg` is created. Modifying the sleep and job settings in that file will affect the waitforjobs process watching it. You can change the number of cores to use in real time! 120 121 122 ## `iffmain` 123 In a scripts where `main_function` is a deifned function, `iffmain` use at the end like 124 125 126 ```bash 127 eval "$(iffmain main_function)" 128 ``` 129 130 131 Defensive shell scripting calls for `set -euo pipefail` but running that (e.g. via `source`) on the command line will break other scripts and normal interactive shell [^sete_break]. `iffmain` is modeled after the python idiom `if __name__ == "__main__"`. When the script is not sourced, it toggles the ideal settings and sets a standard `trap` to notify on error. 132 133 ### Sourcing 134 Using `iffmain` makes it easier to write bash scripts that are primarily functions. Scripts styled this way are easy to source and test. 135 136 A bash file that can be sourced can be reused and is able to be tested. See 137 [Bash Test Driven Development](https://neuro-programmers.pitt.edu/wiki/doku.php?id=public:bash_tdd) 138 139 ### Template 140 `iffmain` generates shell code that looks like 141 ```shell title="iffmain template" 142 if [[ "$(caller)" == "0 "* ]]; then 143 set -euo pipefail 144 trap 'e=$?; [ $e -ne 0 ] && echo "$0 exited in error $e"' EXIT 145 MAINFUNCNAME "$@" 146 exit $? 147 fi 148 ``` 149 150 [^sete_break]: `set -e` "exit on an error" is especially disruptive. One typo command and your interactive shell closes itself. 151 152 ## Example Script 153 154 As an example, we'll use `drytee`, `dryrun`, and `waitforjobs` in the script `tat2all.bash` to 155 156 * run [`tat2`](tat2) (`tat2_single`) on a collection of bold files 157 * in parallel (`all_parallel`) and 158 * need to do a few checks (`input_checks`) before hand. 159 160 We'll support 161 162 * printing what the script would do instead of actually doing it (`dryrun` and `drytee`) and 163 * using hygienic shell settings (e.g. `set -euo pipefail`) only when run as a file but not when sourced [^sourcetest] 164 165 ```shell title="tat2_all.bash" linenums="1" 166 #!/usr/bin/env bash 167 168 # create a 1D 0/1 binary censor file based on FD > 0.3mm 169 create_censor(){ 170 mot=${1//bold.nii.gz/motion.txt} # sub*rest_motion.txt 171 out=${1//bold.nii.gz/fdcen.1D} # sub*rest_fdcen.1D 172 [ ! -r "$mot" ] && warn "no $mot!" && return 1 # (5)! 173 fd_calc 1:3 4:6 deg .3 < "${mot}" | 174 drytee "$out" # (1)! 175 176 # pass output censor file name so it can be captured 177 echo "$out" 178 } 179 180 # run tat2 for a given bold epi 181 # remove high motion timepoints from calculation 182 tat2_single(){ 183 local input 184 input="${1:?input.nii.gz needed}" 185 out=$(create_censor "$input") 186 dryrun tat2 "$input" -censor "$out" # (2)! 187 } 188 189 # run tat2 for all bold image files in parallel 190 tat2_parallel(){ 191 FILES=(sub-*/ses-*/func/*bold.nii.gz) 192 193 for input in "${FILES[@]}"; do 194 tat2_single "$input" & 195 waitforjobs # (3)! 196 # for testing, just run one using: 197 # break 198 done 199 200 # hold until the final set of jobs to finish 201 wait 202 } 203 204 eval "$(iffmain tat2_parallel)" # (4)! 205 ``` 206 207 1. `drytee` writes to the specified file unless `DRYRUN` is set, then it truncates the output and writes output to stderr. 208 2. `dryrun` echos everything after it to `stderr` if `DRYRUN` is set. Otherwise, it runs the command. 209 3. `waitforjobs` watches the children of the current process and sleeps until there are fewer than 10 running. 210 4. `iffmain` generates bash code. It runs `set -euo pipefail` and the specified function only if file is not sourced -- e.g. `bash tat2_all.bash` or `./tat2_all.bash` [^sourcetest] 211 5. `warn` sends a message to `stderr` so it doesn't get included in any eval/capture -- `a=$(warn 'oh no'; echo 'yes')` yields `a="yes"` 212 213 214 ### In Use 215 If we have files like 216 ``` 217 sub-1 218 └── ses-1 219 └── func 220 ├── sub-1_ses-1_func_task-rest_bold.nii.gz 221 └── sub-1_ses-1_func_task-rest_motion.txt 222 ``` 223 224 If we set `DRYRUN`, we'll see what the script would do: a "dry run". 225 ```shell 226 DRYRUN=1 ./tat2_all.bash 227 ``` 228 229 ```bash 230 # 1 231 # 1 232 # 1 233 # 0 234 # 1 # (1) 235 # would be written to sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D # (2) 236 tat2 sub-1/ses-1/func/sub-1_ses-1_func_task-rest_bold.nii.gz -censor sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D 237 # (3)! 238 ``` 239 240 1. output of `fd_calc`, `drytee` truncated, prefixed with `#\t` and sent to stderr 241 2. `drytee` also mentions what file it would have created. This file still does not exist 242 3. `dryrun` shows but does not run the `tat2` command. 243 244 ### Source/Debug 245 Because the bash file is only functions and `iffmain` does not run if sourced, we can debug with `source`. 246 Here we'll run the `create_censor` function defined in `tat2_all.bash` to check that it does what we expect. 247 248 ```bash 249 source tat2_all.bash 250 create_censor sub-1/ses-1/func/sub-1_ses-1_func_task-rest_bold.nii.gz 251 cat sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D 252 ``` 253 254 ```text title="sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D" 255 1 256 1 257 1 258 0 259 1 260 1 261 ``` 262 263 264 265 [^sourcetest]: sourcing a shell script is useful for running same-file tests with bats and/or embedding the current file in other scripts to reuse function definitions. See [Sourcing][#sourcing]