Advanced shell scripting with Bash

Presented by Victor Engmark

Copyright © 2018 Catalyst IT, 2020 Victor Engmark

Creative Commons Attribution-ShareAlike 4.0 International License

Huge thanks to Catalyst IT and Toitū Te Whenua Land Information New Zealand for letting me release this with an open license

Made with reveal.js

Prerequisites

  • Some familiarity with Bash
  • A terminal running Bash for exercises
  • GNU tools (find, grep, sort, etc.)
  • ssh 127.0.0.1 works

Introductions

  • Who am I?
  • Who are you?
  • Got any quick shell horror stories?
  • What do you hope to get out of this course?

Hints & tips

  • Please ask questions anytime
  • Take notes
  • Try things out
  • PS1='\$ ' PS2='> '

Outline 1

Outline 2

Outline 3

Outline 4

Motivation, aka why is shell scripting so weird?

  • Understand
  • Simplify
  • Replace

Context is everything

Count commands with quotes in $username’s history.

Syntactic double quotes count=""
Command substitution count="$()"
Word splitting count="$(grep --count)"
Syntactic single quotes count="$(grep --count '')"
Quoted string count="$(grep --count '"')"
Tilde expansion count="$(grep --count '"' ~)"

Context is everything

Count commands with double quotes in $username’s Bash history.

Syntactic double quotes count="$(grep --count '"' ~"")"
Variable expansion count="$(grep --count '"' ~"${username}")"
Double quoted literal count="$(grep --count '"' ~"${username}/.bash_history")"

👍, right?

Context is everything


$ username="$USER"
                    
$ count="$(grep --count '"' ~"${username}/.bash_history")"
                    
grep: ~victor/.bash_history: No such file or directory
                    
$ ls ~victor/.bash_history
/home/victor/.bash_history
                    
$ wtf
bash: wtf: command not found
                    
The order of expansions is: […] tilde expansion, […] variable expansion, […]

In other words, the username needs to be a literal.

Context is everything

Solution: eval getent+cut+obscure content.


$ username="$USER"
$ user_home="$(getent passwd "$username" | cut --delimiter=':' --fields='6')"
$ count="$(grep --count '"' "${user_home}/.bash_history")"
$ echo "$count"
169
                    

Say no to portability

Even POSIX mode does not lead to portable code:


$ type -a [[
[[ is a shell keyword
$ bash --posix
$ [[ 1 ]]
                    
$ echo $?
0
                    

Say no to portability

Readability often suffers — compare


sort -V
                    

with


sort --version-sort
                    

Say no to portability

Bashisms are helpful:

  • read -a
  • done <(my_script)
  • $'\n'

Say no to portability

  • Focus on the applicable shell + version
  • Avoid premature generalization

Bashing

Speed:


$ wget --quiet https://norvig.com/big.txt
                    
$ wc big.txt
 128457 1095695 6488666 big.txt
                    
$ time grep foobar big.txt
                    
real	0m0.031s
user	0m0.031s
sys	0m0.001s
                    
$ time while read -r -u 9
> do
>    :
> done 9< ./big.txt
                    
real	0m3.532s
user	0m3.113s
sys	0m0.266s
                    

No-op is >100 times slower!

Bashing

Limited data structures:


$ help declare
[Trimmed for brevity]
    Set variable values and attributes.

    Options:
      -f	restrict action or display to function names and definitions

    Options which set attributes:
      -a	to make NAMEs indexed arrays (if supported)
      -A	to make NAMEs associative arrays (if supported)
      -i	to make NAMEs have the `integer' attribute
      -n	make NAME a reference to the variable named by its value
                    

Everything else is a string.

Bashing

No nested data structures:

Bashing

No exceptions or try/catch/finally, just exit codes.

Bashing

Very limited functions:

  • No return value, only exit code
  • Can’t safely pass by reference
  • Only string arguments
  • Always scoped to file, so “nested functions” aren’t

Bashing

Too much state to keep in mind at all times:

  • local, file and exported variables (including functions)
  • Local and exec redirects
  • $PWD
  • Settings from set and shopt
  • User, group
  • umask
  • Locale
  • Time zone
  • Probably more, like SELinux context domain/type

Running away from $HOME

Recommended shebang line:


#!/usr/bin/env bash
                    

Works as long as bash is on the $PATH.

Running away from $HOME

The shebang line is only relevant if you call the script directly:

  • foo.bash if it’s on the $PATH
  • ./foo.bash
  • /full/path/to/foo.bash

Otherwise the shebang line is ignored:

  • . foo.bash uses the interpreter of the parent shell
  • sh foo.bash uses sh, which may be Bash, Dash or something else

Running away from $HOME

. foo.bash and bash foo.bash also ignore the executable flag:


$ echo 'echo "$SHLVL"' > test.bash
                    
$ ./test.bash
                    
bash: ./test.bash: Permission denied
                    
$ . ./test.bash
                    
1
                    
$ bash ./test.bash
                    
2
                    

Running away from $HOME — 5m exercise

Can you find any other differences between sourcing and running a script?

Solution


$ bash --noprofile --norc
$ cd "$(mktemp --directory)"
$ echo 'declare -p' > test.bash
$ chmod u+x test.bash
$ . test.bash first second > sourced.log
$ ./test.bash first second > run.log
$ git diff sourced.log run.log
                    

Mostly $BASH_ variables.

Your flight has been redirected

Read left-to-right:


$ { echo info; echo error >&2; } > result.txt 2>&1
                    
$ cat result.txt
                    
info
error
                    

$ { echo info; echo error >&2; } 2>&1 > result.txt
                    
error
                    
$ cat result.txt
                    
info
                    

Your flight has been redirected

One file per redirect:


$ echo foo > foo.txt
$ echo bar > bar.txt
$ echo > ./*.txt
bash: ./*.txt: ambiguous redirect
$ cat foo.txt bar.txt
foo
bar
                    

Your flight has been redirected

cat is only needed in corner cases like combining stdin and a file:


$ echo foo > foo.txt
$ echo bar | cat foo.txt -
foo
bar
                    

Most of the time, cat FILE | COMMAND can be simplified to COMMAND FILE.

Your flight has been redirected

Redirect the rest of the script:


exec > out.log 2> error.log
                    

Your flight has been redirected

Print and save the output with tee:


$ echo foo | tee output.log
foo
$ cat output.log
foo
                    

--append to append to file.

Your flight has been redirected

Redirect a stream to a command and back again:


$ (echo out; echo foo >&2; echo bar >&2) 2> >(grep bar >&2)
                    
out
bar
                    

Your flight has been redirected

Different processes will process input at different rates. This way standard error and standard output can get out of sync:


$ { echo first >&2; echo second; echo third >&2; } | tee out.log
                    
first
third
second
                    

Your flight has been redirected — 10m exercise

Running du --summarize /* as a non-root user prints a lot of Permission denied messages. Silence these. Additionally:

  • Don't mix or swap standard output and standard error streams.
  • Don't change output ordering of either stream.
  • Extra challenge: print one error message (still on standard error) representative of all of the silenced ones.

Solution

                        du --summarize /* 2> >(grep --invert-match
                            'Permission denied' >&2)
                    

Pipe dreams

Subshell environment for each subsequent command:


$ count=0
$ mount | grep '^tmpfs ' | while read
> do
>     (( ++count ))
> done
$ echo "$count"
                    
0
                    

Context is lost.

Pipe dreams

Bring the important command into the current context:


$ count=0
$ while read -r -u 3
> do
>     (( ++count ))
> done 3< <(mount | grep '^tmpfs ')
$ echo "$count"
                    
7
                    

Using a file descriptor above 2 avoids input being swallowed by tools such as SSH reading standard input inside the loop.

Pipe dreams

All the exit codes in a pipeline:


$ (exit 2) | true | false
                    
$ echo "${PIPESTATUS[@]}"
                    
2 0 1
                    

Pipe dreams

Application specific workarounds to get colour output:


grep --color=always […] | less --RAW-CONTROL-CHARS
                    

Colour codes are characters.

Pipe dreams — 10m exercise 1

Given a file with an IP per line (you can just use your own IP repeatedly), print the current time on each host.

Solution


$ while read -r -u 3 ip
> do
>     ssh "$ip" date
> done 3< hosts.txt
                    

Pipe dreams — 5m exercise 2

Each command, including a pipeline, can only have one exit code. How is that determined? Hint:


$ false | false | false
$ false | false | true
$ false | true | false
…
                    

NUL is not your friend

Some tools have a flag to separate or terminate entries with NUL.

You cannot store NUL in a variable.

You cannot put NUL in a literal:


$ printf '%q\n' $'foo\0bar\0baz\0'
                    
foo
                    

Bash doesn’t like half the world’s files.

Use More Quotes™!

Single quotes for any literals without single quote:


$ printf '%s\n' '|\|o e$cape'
                    
|\|o e$cape
                    

Use More Quotes™!

Double quotes for strings with single quotes, command substitutions or variables:


$ subject='this'
$ printf '%s\n' "Can't $(basename /usr/bin/touch) ${subject}"
                    
Can't touch this
                    

Use More Quotes™!

Dollar single quotes for escape sequences:


$ printf '%s\n' $'first\nsecond'
                    
first
second
                    

Use More Quotes™!

You can mix quotes even in a single word:


$ "e"c'h'$'o' '"'"'"
                    
"'
                    

<< HERE-KITTY

<< NAME is almost like a double quoted context:


$ cpu_count=8
$ cat << EOF
> [hardware]
> cpu_count=${cpu_count}
> EOF
                    
[hardware]
cpu_count=8
                    

Does a similar job to envsubst.

<< HERE-KITTY

Backslash in << NAME is literal except unescaped before a newline:


$ cat << EOF
> a\
> b
> EOF
                    
ab
                    

$ cat << EOF
> c\d
> EOF
                    
c\d
                    

$ cat << EOF
> e\\
> f
> EOF
                    
e\
f
                    

<< HERE-KITTY

Quotes are literal in << NAME:


$ cat << EOF
> '"
> EOF
                    
'"
                    

<< HERE-KITTY

<< 'NAME' works like a single quoted context:


$ result=foo
$ cat << 'EOF'
> ${result}
> EOF
                    
${result}
                    

<< HERE-KITTY — 10m exercises

  1. What happens if the here document delimiter is indented?
  2. What happens if there is whitespace after the delimiter?
  3. What happens if a variable expands to the delimiter?

<<< 'here string' — bonus level

Here strings are syntactic sugar for echo WORD | my_command:


$ IFS=/ read -a directories -r <<< "$HOME"
$ printf '%q\n' "${directories[@]}"
                    
''
home
username
                    

Also moves the command into current context.

Breaking news

Never modify a running script — they are read in chunks:


$ printf '%s\n' 'sleep 1' 'echo foo' > test.bash
$ bash test.bash & sed --in-place 's/foo/bar/' test.bash
                    

The result is either foo or bar, but it is not reproducible!

Escape\ from\ Alcatraz

Avoid word splitting in unquoted strings:


$ printf '%s\n' foo bar
                    
foo
bar
                    
$ printf '%s\n' foo\ bar
                    
foo bar
                    

\n is a printf escaped character, not a shell one.

Escape\ from\ Alcatraz

Literal \n terminators:


$ printf '%s\\n' foo bar
                    
foo\nbar\n
                    

No newline at end of output, so it’s followed immediately by $PS1.

Escape\ from\ Alcatraz

Escaping escape sequences:


$ printf %s\\\\n foo bar
                    
foo\nbar\n
                    

The number of backslashes always doubles, because each character has to be escaped separately in the next context.

Escape\ from\ Alcatraz

Avoid multiple escape levels.

  • Quoting
  • Here documents
  • Files & envsubst
  • printf '%q'

Forecast: $variable

Default value:


PATH="${PATH-/bin:/usr/bin}"
                    

:- also matches empty value.

Use with set -o nounset.

Forecast: $variable

Replacement value:


result="${1+defined}"
                    

:+ only matches non-empty value.

Forecast: $variable

The right hand side can be more complex:


$ csv=
$ entry='x'
$ csv="${csv:+"${csv},"}${entry}"
$ echo "$csv"
                    
x
                    

$ csv=foo,bar
$ entry='x'
$ csv="${csv:+"${csv},"}${entry}"
$ echo "$csv"
                    
foo,bar,x
                    

Forecast: $variable — 10m exercise

Add non-empty $path to $PATH cleanly - there should be no leading or trailing colons even if $PATH started out empty.

Solution


$ PATH="${PATH:+"${PATH}:"}${path}"
                    

$@ is where it’s at

Arguments beyond $9:


$ set -- {a..z}
$ echo ${26}
                    
z
                    

Arrays are zero-indexed, but $0 is special — it usually contains the script name.

$@ is where it’s at

Avoid $*:


$ set -- 'a b' 'c d'
$ for argument in "$*"
> do
>     printf '%s\n' "$argument"
> done
                    
a b c d
                    
$ for argument in $*
> do
>     printf '%s\n' "$argument"
> done
                    
a
b
c
d
                    

All arguments as a single word.

$@ is where it’s at

Use "$@":


$ for argument
> do
>     printf '%s\n' "$argument"
> done
                    
a b
c d
                    

Each argument as a separate word.

Default for loop target, no need for in "$@".

$@ is where it’s at

Named arguments handling skeleton:


set -o errexit
arguments="$(getopt --options='' --longoptions='case-sensitive,case-insensitive,help,exclude:' --name='script-name' -- "$@")"
eval set -- "$arguments"
unset arguments
while true
do
    case "$1" in
        [continued…]
    esac
done
                    

$@ is where it’s at

Boolean options (“flags”):


--case-sensitive)
    case_sensitive=1
    shift
    ;;
--case-insensitive)
    unset case_sensitive
    shift
    ;;
                    

$@ is where it’s at

Usage instructions:


--help)
    echo 'script-name [--case-sensitive|--case-insensitive] [--help] [--exclude=PATTERN ...] [--] FILES'
    exit 0
    ;;
                    

$@ is where it’s at

Repeating key/value arguments:


--exclude)
    excludes+=("$2")
    shift 2
    ;;
                    

$@ is where it’s at

End of options separator & unhandled arguments:


--)
    shift
    break
    ;;
*)
    echo "Unhandled option $(printf '%q' "$1"). Please report to …" >&2
    exit 2
    ;;
                    

$@ is where it’s at — 10m exercise

What does argument handling do in each case?


bash test.bash --case-sensitive . /some/path
                    

bash test.bash --help
                    

bash test.bash --exclude='.git' --exclude='.svn' -- --actual-filename.txt
                    

bash test.bash --blah /some/path
                    

$@ is where it’s at — solution

What does argument handling do in each case?


bash test.bash --case-sensitive . /some/path
case_sensitive=1
@=('.' '/some/path')
                    

bash test.bash --help
[Prints help message and returns from script with exit code 0]
                    

bash test.bash --exclude='.git' --exclude='.svn' -- --actual-filename.txt
excludes=('.git' '.svn')
@=('--actual-filename.txt')
                    

bash test.bash --blah /some/path
[Prints “script-name: unrecognized option '--blah'” and returns from script with exit code 1 from getopt]
                    

Collect all the things!

Collect non-$IFS characters:


$ matches=($(grep --only-matching . <<< $'some foo\nother foo'))
$ echo "${matches[@]}"
                    
s o m e f o o o t h e r f o o
                    

No quotes to enable word splitting.

Collect all the things!

Collect $IFS-separated or -terminated “words”:


$ while IFS=$'\t' read -a cells -r
> do
>     echo Line
>     printf '%s\n' "${cells[@]}"
> done <<< $'column 1\tcolumn 2\nvalue 1\tvalue 2'
                    
Line
column 1
column 2
Line
value 1
value 2
                    

Collect all the things!

Append to an array:


$ characters=({a..z})
$ characters+=({0..9})
$ echo "${characters[@]}"
                    
a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9
                    

Collect all the things!

Can be sparse:


$ characters=(a b c)
$ unset 'characters[1]'
$ characters+=([25]=z)
$ echo "${characters[@]}"
                    
a c z
                    
$ echo "${#characters[@]}"
                    
3
                    

Don’t loop from 0 through $(( "${#name[@]}" - 1 )).

Collect all the things! — 10m exercise

Create an array of all the executables on your $PATH.

Solution


$ IFS=: read -a paths -r <<< "$PATH"
$ for path in "${paths[@]}"
> do
>     executables+=("$path"/*)
> done
                    

The value of keys

Associative arrays use:


$ declare -A abbreviations=(['GNU HURD']="GNU's not Unix! HIRD of Unix-replacing daemons" ['sed']='stream editor')
$ declare -p abbreviations
                    
declare -A abbreviations=([sed]="stream editor" ["GNU HURD"]="GNU's not Unix! HIRD of Unix-replacing daemons" )
                    
$ printf '%s\n' "${!abbreviations[@]}"
                    
sed
GNU HURD
                    
$ echo "${abbreviations['sed']}"
                    
stream editor
                    

Not insertion ordering.

Not numerically indexable.

The value of keys

Must be declared:


$ example=([key]='value')
$ declare -p example
                    
declare -a example=([0]="value")
                    

The value of keys — 10m exercise

Print ‘key [length of value]’ for each key:


declare -A nicks=(['Bill Hicks']='William Melvin Hicks' ['Gandhi']='Mohandas Karamchand Gandhi')
[Your code]
Gandhi 26
Bill Hicks 20
                    

Solution


$ for name in "${!nicks[@]}"
> do
>     printf '%s %s\n' "$name" "${#nicks[$name]}"
> done
                    

Conditional surrender

Old style conditionals are easy to break:


$ bash --noprofile --norc -o xtrace
$ [ $foo = 'bar' ]
                    
+ '[' = bar ']'
                    
bash: [: =: unary operator expected
                    

Only two arguments (ignoring “]”) so Bash assumes a unary operation.

Conditional surrender

Old style conditionals have no way of grouping expressions such as “(A or B) and C”.

Conditional surrender

Use command conditionals and grouping:


{ [[ "$foo" = 'a' ]] || [[ "$foo" = 'b' ]]; } && [[ "$bar" = 'c' ]]
                    
  • Harder to break
  • More well known than -a and -o

Conditional surrender

Left associative: foo || bar && baz{ foo || bar; } && baz


$ false || echo failure && echo success
                    
failure
success
                    

Conditional surrender

foo && bar || baz{ foo && bar; } || baz


$ false && echo success || echo failure
                    
failure
                    

Conditional surrender

Break it up:


if some_command
then
    echo success
else
    echo failure
fi
                    

Conditional surrender — 10m exercise

Write a single expression to check whether $x is between 0 and $x_max or $y is between 0 and $y_max (-ge is ≥ and -le is ≤).

Solution


{
    [[ "$x" -ge 0 ]] && [[ "$x" -le "$x_max" ]];
} || {
    [[ "$y" -ge 0 ]] && [[ "$y" -le "$y_max" ]];
}
                    

Function junction

Run in the same shell as the parent script:


$ shell_pid() {
>     echo $$
> }
$ diff <(shell_pid) <(echo $$)
                    
$ echo $?
0
                    

Function junction

Use return to return an exit code from the function without exiting the script:


$ escape_key_value_pairs() {
>     if [[ $# -eq 0 ]] || [[ $(( $# % 2 )) -ne 0 ]]
>     then
>         echo "Use: ${FUNCNAME} [KEY VALUE]..." >&2
>         return 1
>     fi
>     printf '%s=%q\n' "$@"
> }

$ escape_key_value_pairs PS1 'My prompt
> $ ' PS2

Use: escape_key_value_pairs [KEY VALUE]...

$ echo $?
1
                    

Functions get their own argument list.

Function junction

Use local to declare function scope variables:


$ directory="$(mktemp --directory)"
$ filename='.bashrc_local'
$ save_current_prompt() {
>     local directory="$HOME"
>     escape_key_value_pairs PS1 "$PS1" PS2 "$PS2" >> "${directory}/${filename}"
> }
                    
$ save_current_prompt
$ tail --lines=2 ~/.bashrc_local
                    
PS1=\\\$\␠
PS2=\>\␠
                    
$ echo "$directory"
                    
/tmp/tmp.1ORclnSGb9
                    

Function junction — 5m exercise

This function pollutes the surrounding variable namespace:


reverse() {
    arguments=("$@")
    for index in $(seq $(( $# - 1 )) -1 0)
    do
        printf '%s ' "${arguments[$index]}"
    done
    printf '\n'
}
                    

Change it so that none of the variables it assigns are propagated to the outer scope.

Solution


local arguments index
                    

When is zero equal to one?

Numeric contexts:

  • $(( index++ )) prints result
  • index+=1 only if declare -i index first
  • (( index++ ))
  • [[ "$index" -eq 0 ]]

When is zero equal to one?

Integers only:


$ [[ 1.1 -eq 1 ]]
                    
bash: [[: 1.1: syntax error: invalid arithmetic operator (error token is ".1")
                    

Use tools like bc for more powerful maths.

When is zero equal to one?

0 starts octal:


$ echo $(( 077 ))
                    
63
                    

When is zero equal to one?

0x starts hex:


$ echo $(( 0xff ))
                    
255
                    

Case insensitive.

When is zero equal to one?

N# starts base N (2-64):


$ echo $(( 64#a ))
                    
10
                    

$ echo $(( 64#A ))
                    
36
                    

$ echo $(( 64#@ ))
                    
62
                    

$ echo $(( 64#_ ))
                    
63
                    

Case insensitive if base ≤ 36.

When is zero equal to one?

Comes after variable expansion:


$ msb=BE
$ lsb=EF
$ echo $(( "0x${msb}${lsb}" ))
                    
48879
                   

When is zero equal to one?

Beware leading zeros:


$ month=08
$ (( month++ ))
                    
bash: let: 08: value too great for base (error token is "08")
                   

Find out in August!

When is zero equal to one?

Type coercion:


$ foo=one
$ [[ "$foo" -eq 0 ]] && echo 'equal'
                    
equal
                   

When is zero equal to one? — 10m exercise

Sum an array of hexadecimal number strings without the 0x prefix. For example, (ffff 11) should sum to 65552 (65535 + 17).

Solution


$ declare -i sum=0
$ for number in "${numbers[@]}"
> do
>     sum+="0x${number}"
> done
                    

Chain of command

Build up argument lists or commands using arrays:


$ cat excludes.txt
foo bar
baz
                    
$ while read -r -u 9 exclude
> do
>     excludes+=(--regexp "$exclude")
> done 9< excludes.txt
                    
$ set -o xtrace
                    
$ grep --invert-match "${excludes[@]}" <<< $'foo bar\nfoo baa\n'
                    
+ grep --color=auto --invert-match --regexp 'foo bar' --regexp baz
                    
foo baa
                    

Chain of command

Only a single command and its arguments:

  • No redirects
  • No pipes

Reading is fun

Read newline-terminated strings:


$ while read -r line
> do
>     echo "$line"
> done < <(printf '%s\n' 'foo' 'bar')
                    
foo
bar
                    

A “line” in *nix operating systems.

Reading is fun

Read newline-separated or -terminated strings:


$ while read -r line || [[ -n "$line" ]]
> do
>     echo "$line"
> done < <(printf $'foo\nbar')
                    
foo
bar
                    

read populates variable then fails on non-newline character at EOF.

Reading is fun

Read NUL-terminated strings:


$ cd "$(mktemp --directory)"
$ touch 'backslash\separated' $'newline\nseparated' 'space separated'
$ while IFS= read -d '' -r filename
> do
>     printf '%q\n' "$filename"
> done < <(find . -mindepth 1 -exec printf '%s\0' {} +)
                    
./backslash\\separated
$'./newline\nseparated'
./space\ separated
                    

Reading is fun

Read $IFS-terminated words:


$ read -r first second rest <<< '   aye   bee   cee   dee   '
$ printf '%q\n' "$first" "$second" "$rest"
                    
aye
bee
cee\ \ \ dee
                    

Trims leading, trailing and separating $IFS characters.

End of the line

How many characters does $result contain?


$ result="$(printf '%s' $'foo\n\n')"
$ echo "${#result}"
                    
3
                    

$() removes trailing newlines.

End of the line

$() workaround:


$ result="$(printf '%s' $'foo\n\n'; printf x)"
$ result="${result%x}"
$ echo "${#result}"
5
                    

End of the line

<<< (here string) is bad in a different way:


$ wc --bytes <<< $'foo\n\n'
6
                    

Unconditionally adds a newline.

End of the line

Newline-preserving redirects:


$ printf $'foo\n\n' | wc --bytes
5
                    
$ printf $'foo\n\n' > result.txt
$ wc --bytes result.txt
5 result.txt
                    
$ wc --bytes < <(printf $'foo\n\n')
5
                    

End of the line

echo vs. printf:


$ echo foo | xxd -cols 1
                    
00000000: 66  f
00000001: 6f  o
00000002: 6f  o
00000003: 0a  .
                    

echo adds newline (0x0a) at end of output.

End of the line

echo vs. printf:


$ printf '%s' foo | xxd -cols 1
                    
00000000: 66  f
00000001: 6f  o
00000002: 6f  o
                    

printf formats arguments.

End of the line — 15m exercise

Save script arguments to a file, and reuse them in another script.

Hint: The only character which can’t be in a string is NUL (\0 in printf).

Hint: read’s -r flag avoids treating backslashes specially.

Solution


for argument
do
    printf '%s\0' "$argument"
done > arguments.bin
                        

set --

while read -d '' -r argument
do
    set -- "$@" "$argument"
done < arguments.bin
                        

Exit in an orderly fashion

Command “success:”


$ if true
> then
>     echo 'Success'
> fi
Success
                    

Defined as exit code 0.

Exit in an orderly fashion

Completely application specific. For example, zero in arithmetic expressions:


$ count=0
                    
$ echo $?
0
                    
$ (( count=0 ))
                    
$ echo $?
1
                    
$ (( count++ ))
                    
$ echo $?
1
                    
$ (( count++ ))
                    
$ echo $?
0
                    
$ printf '%s\n' "$count"
2
                    

Exit in an orderly fashion

Some fairly well–documented numbers in /usr/include/sysexits.h/$(nix eval --raw nixpkgs.glibc.dev.outPath)/include/sysexits.h:


$ grep '^#define ' /usr/include/sysexits.h
#define EX_OK		0	/* successful termination */
#define EX__BASE	64	/* base value for error messages */
#define EX_USAGE	64	/* command line usage error */
#define EX_DATAERR	65	/* data format error */
#define EX_NOINPUT	66	/* cannot open input */
#define EX_NOUSER	67	/* addressee unknown */
#define EX_NOHOST	68	/* host name unknown */
#define EX_UNAVAILABLE	69	/* service unavailable */
#define EX_SOFTWARE	70	/* internal software error */
#define EX_OSERR	71	/* system error (e.g., can't fork) */
#define EX_OSFILE	72	/* critical OS file missing */
#define EX_CANTCREAT	73	/* can't create (user) output file */
#define EX_IOERR	74	/* input/output error */
#define EX_TEMPFAIL	75	/* temp failure; user is invited to retry */
#define EX_PROTOCOL	76	/* remote error in protocol */
#define EX_NOPERM	77	/* permission denied */
#define EX_CONFIG	78	/* configuration error */
#define EX__MAX	78	/* maximum listed value */
                    

Exit in an orderly fashion

When a command terminates on a fatal signal whose number is N, Bash uses the value 128+N as the exit status.

$ kill -l INT
2
                    
$ sleep 1d
^C
$ echo $?
                    
130
                    

Exit in an orderly fashion

Don’t exit "$error_count"!


$ bash
$ exit 256
exit
$ echo $?
                    
0
                    

Defensive coding


#!/usr/bin/env bash

set -o errexit

temporary_directory="$(mktemp --directory)"
[…]
mkdir "$temporary_directory"
                    

Caveats galore.

Defensive coding


#!/usr/bin/env bash

set -o errexit -o noclobber

temporary_directory="$(mktemp --directory)"
echo "Start" > "${temporary_directory}/script.log"
[…]
echo "End" > "${temporary_directory}/script.log"
                    
/tmp/[…]/script.log: cannot overwrite existing file

Defensive coding


#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset

temporary_directory="$(mktemp --directory)"
echo "Start" > "${temporary_directry}/script.log"
                    
temporary_directry: unbound variable

Defensive coding


#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail

grep foo "$1" | cut --delimiter=':' --fields=1 | grep --invert-match bar
                    

Easier than $PIPESTATUS.

Defensive coding

Umask:


$ cd "$(mktemp --directory)"
$ umask
0022
$ touch first
$ ls -l first
-rw-r--r-- […] first
                    
$ umask 0077
$ touch second
$ ls -l second
-rw------- […] second
                    

Setting traps

Don’t litter!


cleanup() {
    rm --force --recursive "${temporary_directory-}"
}
trap cleanup EXIT
temporary_directory="$(mktemp --directory)"
                    

mktemp --directory is atomic.

mktemp result is only accessible by owner (umask 0077).

Setting traps

Print debugging information on demand:


trap env USR1
                    

kill -USR1 $!
                    

Triggers after the currently running command.

Setting traps

Reload configuration in long-running process:


trap read_configuration HUP
                    

kill -HUP "$server_pid"
                    

Setting traps — 10m exercise

Start with a script which processes standard input until EOF:


#!/usr/bin/env bash
while read -r line
do
    : # Omitted
done
                    

You have no visibility of how far it has processed. Modify it to react to SIGUSR1 by printing the value of $line.

Solution


trap 'echo "$line"' USR1
                    

An open square bracket by any other name

Commands can be defined in several ways, for example:


$ type -a [
                    
[ is a shell builtin
[ is /usr/bin/[
                    

Presented in order of decreasing precedence.

Don’t use which to determine what will be run!

An open square bracket by any other name

The precedence order:

  1. alias
  2. keyword
  3. function
  4. builtin
  5. file

An open square bracket by any other name

To find documentation:

  • help command for builtins
  • man command for executable files
  • Try command --help if neither work
  • No guarantees

Killing me softly


kill -9 "$!"
                    

Now you have to clean up manually.

Killing me softly

Run a command with a timeout:


$ timeout 1s sleep 2s
                    
$ echo $?
124
                    

Killing me softly


kill "$!"
timeout="$(date --date='now + 1 minute' +%s)"
while [[ "$(date +%s)" -lt "$timeout" ]]
do
    if kill -0 "$!"
    then
        sleep 0.1
    else
        exit 0 # Win
    fi
done
exit 1 # Fail
                    
  • SIGTERM is the default
  • Adjust timeout and interval as necessary
  • Exit early on success
  • If you have to kill -9 something is broken

The opposite of WYSIWYG

What you see is not always what you think you see:


$ printf '%s\n' “foo” ‘foo’
                    
“foo”
‘foo’
                    

“Typographic” quotes are not syntactic.

Usually caused by WYSIWYG web framework.

The opposite of WYSIWYG

What you see is not always what you think you see:


$ grep —fixed-strings foo <<< foobar
                    
grep: foo: No such file or directory
                    

Em dash ≠ double dash.

Usually caused by WYSIWYG web framework.

The opposite of WYSIWYG

Never copy straight to a terminal: git clone /dev/null
echo "Hi! I'm a Trojan horse, what do you not need on this machine?"
git clone
git@gitlab.com:engmark/advanced-shell-scripting-with-bash.git

Even command line editors are vulnerable.

Use a graphical editor if at all possible.

To debug or to debug, that is not a question

ShellCheck can find many common issues:


$ shellcheck --shell=bash - <<< 'while read line; do :; done'
                    
In - line 1:
while read line; do :; done
      ^--^ SC2162: read without -r will mangle backslashes.
           ^--^ SC2034: line appears unused. Verify use (or export if used externally).

For more information:
  https://www.shellcheck.net/wiki/SC2034 -- line appears unused. Verify use (...
  https://www.shellcheck.net/wiki/SC2162 -- read without -r will mangle backs...
                    

To debug or to debug, that is not a question


$ bash --noprofile --norc -o xtrace
                    
$ find "$(egrep --only-matching '/usr/[^:]+(:|$)' <<< "$PATH" | head --lines=1 | head --bytes=-2)" -mindepth 1
                    
++ egrep --only-matching '/usr/[^:]+(:|$)'
++ head --bytes=-2
++ head --lines=1
                    
+ find /usr/local/bin -mindepth 1
                    
/usr/local/bin/foo
                    

Standard input is not shown.

+” ($PS4) denotes shell depth ($SHLVL).

Pipelined commands run simultaneously, so ordering is not guaranteed.

set -o xtrace inside scripts.

To debug or to debug, that is not a question

strace prints system calls and signals:


$ strace -e openat cat /dev/null
                    
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/dev/null", O_RDONLY) = 3
+++ exited with 0 +++
                    
man 1 strace

To debug or to debug, that is not a question

lsof lists files currently open by programs:


$ nc example.org 80 &
[3] 9999
$ lsof -p $!
                    
COMMAND   PID     USER   FD   TYPE  DEVICE SIZE/OFF    NODE NAME
nc       9999 username  cwd    DIR   254,3     4096 8919615 /home/username
nc       9999 username  rtd    DIR   254,2     4096       2 /
nc       9999 username  txt    REG   254,2    39608 2921200 /usr/bin/netcat
nc       9999 username  mem    REG   254,2    84016 2885772 /usr/lib/libresolv-2.27.so
[…]
nc       9999 username    0u   CHR   136,0      0t0       3 /dev/pts/0
nc       9999 username    1u   CHR   136,0      0t0       3 /dev/pts/0
nc       9999 username    2u   CHR   136,0      0t0       3 /dev/pts/0
nc       9999 username    3u  IPv4 4538734      0t0     TCP machine-name:46748->example.org:http (ESTABLISHED)
                    
man 8 lsof

To debug or to debug, that is not a question

netstat prints networking information:


$ sudo netstat --listening --numeric --program --tcp | sed --quiet '1,2p;/ssh/p'
                    
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1234/sshd
tcp6       0      0 :::22                   :::*                    LISTEN      1234/sshd
                    
man 8 netstat

To debug or to debug, that is not a question

/proc has lots of runtime information:


$ journalctl --catalog --follow --unit=sshd | grep "$USER" &
$ ls -lA /proc/$!/fd
                    
total 0
lr-x------ 1 username username 64 May  7 14:32 0 -> 'pipe:[5181666]'
lrwx------ 1 username username 64 May  7 14:32 1 -> /dev/pts/3
lrwx------ 1 username username 64 May  7 14:32 2 -> /dev/pts/3
                    
man 5 procfs

To debug or to debug, that is not a question — 10m exercise

Find the maximum number of files your shell (PID $$) can open. Hint: /limits.

Extra exercise: Find the PID of the journalctl command:


$ journalctl --catalog --follow --unit=sshd | grep "$USER" &
                    

Solution

  1. cat /proc/$$/limits
  2. Get pipe inode number from ls -l /proc/$!/fd, then ls -l /proc/*/fd/1 | grep INODE

PhD thesis: looping through files

*.bash in the current directory in current locale’s alphabetical order:


ls *.bash | while read file
do
    something $file
done
                    

Expelled from PhD programme, computer ground to dust and buried using the secret rituals of the church of Stéphane Chazelas.

PhD thesis: looping through files

*.bash in the current directory in current locale’s alphabetical order:


for file in ./*.bash
[…]
                    

👍

PhD thesis: looping through files

*.bash including dotfiles in the current directory in current locale’s alphabetical order:


shopt -s dotglob
for file in ./*.bash
[…]
                    

👍

PhD thesis: looping through files

*.bash except for foo.bash in the current directory in current locale’s alphabetical order:


shopt -s extglob
for file in ./!(foo).bash
[…]
                    

😅

PhD thesis: looping through files

*.bash in and below the current directory in current locale’s alphabetical order:


shopt -s globstar
for file in ./**/*.bash
[…]
                    

😕

PhD thesis: looping through files

Universal ordering:


export LC_COLLATE='C'
for file in ./*
[…]
                    

Or LC_COLLATE='en_NZ.utf8'

Let me just curl dict://dict.org/d:collate 🧐

PhD thesis: looping through files

Reverse order:


files=(./*)
for (( index = ${#files[@]} - 1; index >= 0; index-- ))
do
    something "${files[$index]}"
done
                    

😟

PhD thesis: looping through files

Non globbable pattern:


while IFS= read -d '' -r -u 9 path
do
    something "$path"
done 9< <(find . \( -type d -regex '^.*/\.git$' -prune -false \) -o -type f -exec printf '%s\0' {} +)
                    

😭

PhD thesis: looping through files — 20m exercise

Combining all of the above 😉

Explain every part of the previous command to someone. For reference:


while IFS= read -d '' -r -u 9 path
do
    something "$path"
done 9< <(find . \( -type d -regex '^.*/\.git$' -prune -false \) -o -type f -exec printf '%s\0' {} +)
                        

Solution

  1. find . finds all files in the current directory and child directories.
  2. \( expression \) overrides find expression precedence.
    1. -type d matches directories.
    2. -regex '^.*/\.git$' matches filenames (actually directories because of the previous expression) ending with ‘/.git’.
    3. -prune stops find from descending into matching directories.
    4. -false makes the entire expression false.
  3. -o means "or". Since the previous expression was false we always process the expressions after this.
  4. -type f matches plain files (not directories).
  5. -exec some command {} + runs a command suffixed with as many filenames as possible, if necessary running the command multiple times with different sets of files.
  6. printf '%s\0' prints any subsequent arguments terminated with \0, aka. NUL.
  7. <(some command) creates a named pipe allowing the command output to be treated as a file.
  8. some command 9< causes file descriptor 9 to point to standard input.
  9. while condition command; do inner commands; done runs inner commands as long as condition command returns a zero exit code.
  10. IFS= some command empties the internal field separator during some command, avoiding any trimming of characters when word splitting.
  11. read options path reads a single piece of the input stream into the variable path. Options:
    1. -d '' sets the input stream separator to NUL.
    2. -r ensures that backslashes are treated literally.
    3. -u 9 sets the input stream to file descriptor 9.
  12. some command "$path" uses the now safe value in the path variable.

Welcome to sow’s ear processing!

  • Automating interactive scripts is the worst
  • You wouldn’t write Vim or Mutt as a shell script

Conclusion: arguments ≫ menus or prompts.

Welcome to sow’s ear processing!

  • Filenames are rarely relevant for the processing of their contents
  • Pipelining commands is several orders of magnitude faster than read loops
  • Redirect hacks are hard to read and therefore error-prone

Conclusion: support standard input and output before files.

Welcome to sow’s ear processing!

  • Error handling is rudimentary
  • Most scripts can do unrecoverable damage with small changes
  • You only have to debug the code you have run

Conclusion: exit at the first sign of a problem.

Welcome to sow’s ear processing!

  • Dozens of pieces of context for even simple scripts
  • Impossible to test scripts exhaustively
  • Even simple tasks are difficult to do correctly

Conclusion: keep it really simple.

Thank you!