Thursday, April 25, 2013

Bash/Korn Shell Diary - Part 4

The beauty of Scipts is that you can craft data in a persistent way.  From the command line building a complicated loop, or performing large numbers of sequential tasks can be tricky.  The Command Line is for testing, a Script file is for building.  And in order to build properly you need to understand your building-blocks, this brings us to flow-control.

Normal program flow in a script is iterative, we execute one line at a time starting from the top and working our way down until we reach the end.  Bash provides us with several ways to alter the command execution.

If-Then-Else blocks are the most fundamental type of flow control.  They allow you to branch execution based on testable conditions.  Bash also gives us the “elif” (Else-If) operator so that we don’t have to use nested If/Else blocks unnecessarily.  Since I’ve been forced to work in some professional languages that don’t support this simple feature I’d like to say “Thank you Bash”.

   if [ test_conditon ]; then
      :
   elif [ test_condition ]; then
      :
   fi


Test conditions come in a couple of flavors:
-    File Checks, these allow you to do things like check if a file exists (“-f MyFile.txt”), see if the file is writable (“-w MyFile.txt”), see if an inode (file) is a link (“-a MyFile.txt”), see if a file is newer than another file (“myFile.txt –nt myFile.bak”).
-    Simple string comparisons: (“$MYVAR” = “TESTING”).
-    Simple math comparisons: ($MYVAR –gt 12) or ($MYVAR > 12): -lt, -le, -eq, -ne, -ge, -gt.
-    If you use double brackets you can even use and (&&), and or (||) operators.  Double brackets also expand octal and hex values when it runs across them in the standard formats (08, 0xf3).
-    When using multiple compound comparisons you can short-circuit evaluation: (“... || …” the second test will only be executed if the first test fails) and (“… && …” the second test will only be executed if the first test succeeds).  This gets complicated when short-circuiting more than two tests so make sure you test thoroughly.
-    “-a” and “-o” can also be used for “and” and “or”.
-    The colon operator is actually a valid command in Shell, and will basically do nothing in this case.  Bash expects at least one valid command to be executed in an If statement.
-     Executed command exit values (`grep "xyz" myfiles.* 2>/dev/null`).  This gives you the same benefit as checking against the exit value stored in the varialbe ($? -eq 0), but allows you to do it in one place if that kind of thing appeals to you.

Case Statements are a more convenient form of If-Else checks.  Rather than do a long series of elif's you can do simple string matches in one place:

   case $var in
      a* ) ... ;;
      [0-9] | [0-9][0-9] ) ... ;;
      * ) ;;
   esac


A few things to note:
- "case" is ended with "esac" which is "case" spelled backwards.  We saw this before with "if" and "fi".  I have nothing more to say about this.
- We can use simple regular expressions in our string matches: "a*" means a string that starts with an "a", so it would match "alf" as well as "a".  "[0-9]" is a single digit, etc...
- You can put multiple cases on a single line by using the pipe operator ("|") as an or statement.
- Our final case here was a single "*" which will match any line.
- each case statement is ended with a double semicolon (";;").  There is no flow-through.

Bash provides a couple flavors of Pre-Check loops (loops that are checked prior to running through the enclosed commands for the first time):

   for $VAR in ... ; do
      :
   done

   while [ ... ]; do
      :
   done

   until [ ... ]; do
      :
   done


Our For loop allows us to cycle a variable through many different states.  This is useful if you know all of the states to be evaluated prior to entering the loop.  This is a convenient way to rifle through all of the *.csv files in a directory for instance, or perform the same logic on a pre-set series of strings (tokens to be found in files for instance).

While will rifle through commands as long as a condition is true, and Until does the same thing while the condition is false (not really necessary to have both of these operators since you have the negation test... but whatever).

In a previous lesson I showed you how you could redirect standard output and error out of your scripts, but there is another trick you can use to redirect standard input into a command by using the "<" operator.  By doing the following we can rip through a file one line at a time:
   while read LINE ; do
      :
   done < input_file.txt


Our $LINE variable now contains the full contents of each line of the file, one at a time.

There is another flow-control operator that is intended for user interaction:
   echo "Please make a selection: "
   select word in quit *.txt ; do
      echo "word is $word"
      echo "REPLY is $REPLY"
      if [ "$word" = "quit" ]; then
        break
      fi
      echo ""
   done


looks something like this:

   Please make a selection:
   1) quit              4) combinations.txt
   2) alchemy.txt       5) myout.txt
   3) alchemy2.txt      6) output.txt
   #? 1
   word is quit
   REPLY is 1


This structure deserves a few comments:
- Here I used the "$PS3" variable.  PS3 is a special variable that provides a prompt specifically for select queries.
- If you don't provide any choices explicitly the options given will come from the $@ variable (all arguments, quoted arguments can contain spaces).
- The text of the selected choice is stored in the variable you provided ("word" in this case).
- The number of the selection (that the user typed in) is stored in "$REPLY".
- You can also read user input one line at a time with "read MYVAR".  There are some more advanced tricks for reading a single character we'll try to get to later.
- We used the "break" keyword to jump out of one level of flow-control looping.  We could use "break 2" to jump out of two levels of nested loops if desired.

Bash also allows the user to use a more traditional For loop:
   for (( a = 1; a <= LIMIT ; a++ )); do
      :
   done


Here we are using roundy-braces.  We talked about these earlier when we were discussing the "let" operator.  They do have their uses.

There is one other mechanism I want to discuss here: scoping.  Although not normally associated with flow-control they do allow one special trick:
   echo "$MYVAR" >> outfile.txt
   echo "$MYVAR2" >> outfile.txt


can be replaced with:
   {
      echo "$MYVAR"
      echo "$MYVAR2"
   } >> outfile.txt


This may seem rather minor, but by handling all of our redirection in one chunk like this we can cut down on forks, or file open/close operations and this adds up over time.  In one test run it took me 6 seconds to do 10000 file redirects.  It took many million such operations to take six seconds when the output is scoped.

That's enough for now.  We'll pick up again later.

No comments:

Post a Comment