Thursday, April 25, 2013

Bash/Korn Shell Diary - Part 3

Shell Script Basics

Although the command line can be very powerful, at some point (either due to complexity or a desire for reuse) you will want to move into scripting.  A script is a series of commands stored in a plain-text file that starts with a shebang line (the special characters “#!”) followed by a path to the executable (and optional arguments) that will act as an interpreter for this script.

Script files may either be sourced (run in the current environment or shell) or they may be executed which will start them as a child process.  Sourced scripts (“source  ./myscript.sh”) will run in your current shell, this means that hitting an “exit” condition will exit your current shell.  Further, you run the risk of changing (and possibly corrupting) the current environment as your script changes your current environment.  These scripts will also have access to your full environment including any variables that you have set.

Executed scripts (“./myscript.sh”) will spawn a new shell to run in.  These tasks may be backgrounded if desired (and no user interaction is needed).  But, since this is a new shell it will not share the same environment as your command line shell.  If you want variables to be visible inside the new shell they need to be explicitly exported (“export MYVAR”) prior to running, or you can pass them in as arguments:

   #! bash
   # on all except for the shebang line a pound sign indicates a “to-end-of-line-comment”
   NUM=0
   while [ $# -gt 0 ]; do
      let “NUM += 1”
      echo “Argument ${NUM} = ${1}”
      shift
   done

If we have called our script with arguments we can parse those arguments out.

   $ ./temp.sh one "two three" four
   Argument 1 = one
   Argument 2 = two three
   Argument 3 = four


The shift operator allows us to rotate out some of our arguments ($2 becomes $1, $3 becomes $2, etc…).  You can skip over multiple arguments by using “shift 2” or “shift 3”.  The “$#” is a special variable that tells us how many arguments are sitting in the queue for our current scope (functions use the same methodology for handling arguments).  You can also access your arguments directly by using the “${2}”, “${3}” syntax – curly braces around our variable names used to remove ambiguity.  You can also access all arguments at once by using “$*”.

You can even go one step further and either check if expected arguments are set, or provide default values for when they are not set:
   MYVAR=${1:-default}   # the colon operator treats an empty (NULL) string as not being set
   MYVAR=${1-default}    # the dash operator “-“ says to use the default argument if not set


Although there are other ways to deal with unset arguments (such as checking if a variable is set using the “${MYVAR?}” syntax) either setting default values or forcing the operator to provide the inputs seem to work best. 

Variables

And speaking of variables, now that we are using a shell script I can tell you that they are a lot of fun to play with.  For instance if you know ahead of time that you will only use a variable in a certain way you can optimize the performance in the shell and make it easier to work with by using declare.

   declare -i MYINT   # MYINT only holds integers, if you shove a string in there it becomes a 0
   declare –u MYSTR   # MYSTR converts all letters to upper-case
   declare –l MYSTR   # MYSTR converts all letters to lower-case
   declare –r MYSTR=”YES”   # MYSTR is now read-only and cannot change


Unless otherwise specified all variables can be used interchangeably as Strings or (if they hold an appropriate value) Integers.  To assign a new value to a variable you use the “=” operator: “MYVAR=a string of text”.  And to access the variable you use the ${MYVAR} syntax.  In most cases the curly-braces are optional, but they help remove ambiguity from the shell and allow us to use some more advanced string manipulation features… so get used to them.  Things on the right hand side of your assignment are evaluated fully before assignment, so the following is legal and will append to your string: MYVAR=”${MYVAR} more text here”

Now that we have some data in our string we can start playing with them; Shell provides us with a rich set of string manipulation functions – built-in.  The following code snippet:
   MYSTRING="ab cd,ef gh,ij kl,mn op"
   typeset -u MYSTRING
   echo "Uppercase: $MYSTRING"

   typeset -l MYSTRING
   echo "Lowercase: $MYSTRING"

   echo "MYSTRING contains ${#MYSTRING} characters."

   echo "Let's change the vowels to x's: ${MYSTRING//[aeiou]/x}"
   echo "How about just the first comman to an underscore: ${MYSTRING/,/_}"

   echo ""
   echo "Characters 11-15: ${MYSTRING:10:5}"
   echo "Characters 11-end: ${MYSTRING:10}"

   echo ""
   echo "Hey is that CSV (Comma-Seperated-Values) format?"
   echo "Field 1 = ${MYSTRING%%,*}"
   echo "Field 4 = ${MYSTRING##*,}"
   echo "Field 1,2 and 3 = ${MYSTRING#*,}"
   echo "Field 2,3 and 4 = ${MYSTRING%,*}"

   echo ""
   echo "or how about this neat trick:"
   IFS=","
   set -- ${MYSTRING}
   echo "Field 1 = $1"
   echo "Field 2 = $2"
   echo "Field 3 = $3"
   echo "Field 4 = $4"

   echo ""
   printf "And how about some pretty formatting?  %7s %-3s \n" $1 $2

produces this output:
   Uppercase: ab cd,ef gh,ij kl,mn op
   Lowercase: ab cd,ef gh,ij kl,mn op
   MYSTRING contains 23 characters.
   Let's change the vowels to x's: xb cd,xf gh,xj kl,mn xp
   How about just the first comman to an underscore: ab cd_ef gh,ij kl,mn op

   Characters 11-15: h,ij
   Characters 11-end: h,ij kl,mn op

   Hey is that CSV (Comma-Seperated-Values) format?
   Field 1 = ab cd
   Field 4 = mn op
   Field 1,2 and 3 = ef gh,ij kl,mn op
   Field 2,3 and 4 = ab cd,ef gh,ij kl

   or how about this neat trick:
   Field 1 = ab cd
   Field 2 = ef gh
   Field 3 = ij kl
   Field 4 = mn op

   And how about some pretty formatting?    ab cd ef gh
Let’s see what we can learn from this code:
-    The typeset built-in allows you to do some broad-stroke manipulation of variables, similar to declare.
-    You can count the number of characters in a string with ${#...}, empty variables have 0 characters btw.
-    You can do find and replace with the ${…//find-this/replace-with-this} syntax or even ${…//delete-this/}.
-    You can also find/replace the first matched argument with ${…/find-this/replace-with-this}.
-    Those search terms we just talked about support some level of regular expressions.
-    You can easily remove up to a pattern either from the left (“#”) or the right (“%”).
-    Variables can replace the standard arguments by using the “set -- VARIABLE” syntax.
-    The $IFS variable is a special variable that determines the default field separator for reading arguments.  It defaults to whitespace (space, tab or newline), but can be overridden for special processing.
-    Although not demonstrated here you can also make individual characters upper or lowercase by using the ${…^^uppercase-this-match}, ${…,,lowercase-this-match}, these match globally, use a single carat (“^”) or comma (“,”) to convert only the first match.
- The printf built-in provides matches the special formatting used with the C++ or Unix command formats: %-3s says I'm expecting a string that is 3 characters long, right justify.  Toss the argument at the end outside of the quotes.

By using these special tricks we can replace most of the string manipulation features that would require us to fork commands.  Forked commands run in their own special shell and require a lot of overhead, these are the major slowdown for scripts running on orders of magnitude slower.  With these tricks we can replace most of the uses for external UNIX commands like “sed”, “awk”, “cut”, “tr”, etc…

Strings can also be used in comparison checks, but more on these later:
   if [ “$MYSTRING” = “$THATSTRING” ]; then …    are the strings equal?
   if [ ! “$MYSTRING” = “$THATSTRING” ]; then …  are they not equal?
   if [ “${MYSTRING?}” ]; then …                 is this thing even set?


Integer type variables are basically strings (they can be manipulated just like strings) that happen to contain integer values (-100, 0, 123784, etc…)  This special happenstance gives us some new toys to play with:
   MYVAR=24
   let “MYVAR = 12” “MYVAR += 8” "MYVAR += 16"
   echo "$MYVAR"
   let "MYVAR = (MYVAR / 3) % 7"
   echo "$MYVAR"

   let "MYVAR = 8#20"
   echo "$MYVAR"

gives us:
   36
   5
   16

We can even work with some very large numbers, up to 32-bit signed values!  But, we cannot work with floats like this.
-    When using the “let” operator we are telling the shell we are going to work with variables, so we don’t need the “$VAR” syntax, you could also use the “(( … ))” syntax if you prefer, at one point some timing studies showed me the “let” operator was evaluated by the shell quicker than the round-brace operators by 3ns, and at one point that was important enough to me to make me stick with the “let” operator for awhile.
-    Multiple operations can be done on a single “let” line, each is encased in double quotes.
-    You have a rich variety of operators available to you, basically all of the C++/Java type operators using the standard order of precedence.  You can do bitwise shifting with “>>” and “<<”, you have all the normal math operators plus “%” for the remainder (modulus), and even “**” for exponents.  Use round braces for clarification or to change the normal operator precedence.
-    You can take input in other bases using the “base#decimal-value” operation for conversion.

Integers are fun… but they still aren’t floaty-numbers :-(  You can however use special techniques to simulate fixed point floats (currency for example) or convert in-and-out of hex values.  I keep special functions in source-able script files for just this type of problem.  I still wish we had better float handling though.



In Shell you also have access to Arrays.  Let’s dive right in:
   declare -a myChars
   CNT=1
   while [ $CNT -le 12 ] ; do
      myInts[$CNT]=$CNT
      let "CNT += 1"
   done

   CNT=1
   for CHAR in {a..e} ; do
      myChars[$CNT]=$CHAR
      let "CNT += 1"
   done

   echo "how's this work?  ${myInts}"
   echo "nope... let's try something else:"
   I=1

   while [ $I -le ${#myInts[*]} ]; do
      echo "myInts[$I] = ${myInts[${I}]}"
      let "I += 1"
   done

   for CHAR in ${myChars[*]} ; do
      echo ">>> $CHAR"
   done

gives us:
   how's this work?
   nope... let's try something else:
   myInts[1] = 1
   myInts[2] = 2
   myInts[3] = 3
   myInts[4] = 4
   myInts[5] = 5
   myInts[6] = 6
   myInts[7] = 7
   myInts[8] = 8
   myInts[9] = 9
   myInts[10] = 10
   myInts[11] = 11
   myInts[12] = 12
   >>> a
   >>> b
   >>> c
   >>> d
   >>> e

This shows us a couple of things:
-    Although arrays can be declared explicitly it isn’t necessary.  Bash notices when you use array syntax and marks the variables as arrays on-the-fly.  I could prove this to you if there was some way to list all arrays dynamically, oh wait, there is: “typeset –a” will list all arrays in the current shell, and by-the-way try the “-i” flag for integers, “-r” for readonly, “-f” for functions when you get the time.
-    Arrays are single-dimension so “MYARRAY[12]” is possible, but “MYARRAY[12,6]” will require a few tricks to implement in bash.
-    Arrays can be staggered, you can feel free to assign to arrays willy-nilly.  Have an array with only a 3rd index filled in if you’d like.
-    To assign to an array use “MYARRAY[5]=value” or “MYARRAY[$INDEX]”.
-    To read from an array use “${MYARRAY[5]}” or “${MYARRAY[$INDEX]}”.
-    To find the number of items in an array use “${#MYARRAY[*]}”.
-    To list all members of an array at once use “${MYARRAY[*]}”.
-    And if you do just access “${MYARRAY}” you will only get the first index of that array.
-    Oh, and we almost missed the little curly brace expansion trick we pulled up there "{a..e}", this is a special trick that fills in all of the blanks between the two ends ("a b c d e" in this case).  You can use this trick on the command line to generate tons of test files: ("touch myfile{1..5}.{txt,bat}").  This type of expansion only works in certain circumstances though, so we can't use it to cleanly assign our variables.
-    As a side note, earlier when we used the "set --" command we assigned our string to a special array.

That should be enough for one Blog... we'll pick up again in the next entry :-)

No comments:

Post a Comment