Tuesday, April 23, 2013

Bash/Korn Shell Diary - Part 1

The Cygwin Shell

Over the years I have worked with a lot of shells and each of them have their own benefits and drawbacks.  I’ve been able to work with Perl, Sh, Bash, Ksh, CShell, Make, TCL/TK and probably a few others as well.  Currently I am working in a Windows-Only environment and I have settled on Gnu Bash 4.1+ running in Cygwin as my shell of choice.  Although a lot of the tips and tricks I am using here are specific to this shell (or the “Sh” family) there are a few items, such as speeding up your scripts, that will be applicable to any scripting language; even so, if you wish to follow along I would recommend downloading Cygwin.  Cygwin brings a lot of functionality to your Windows environment and is worth learning if you are serious about becoming a Windows Power-User.

http://www.cygwin.com

Crafting a Re-usable Script

Whenever I create a new shell script I always start by looking at a problem.  Let’s say I have information in multiple files I need to pull together to create a single Excel file; for me this is a very common scenario.   First, I try to think about my inputs and my outputs; where can I pull the information I want?  Do I need to compute or generate any information?  If information needs to be computed I need to work out how to do this…

 I copy all of the source files into one directory and then get started on the command line.  I take a few sample cases and practice extracting or generating the desired output.  I test out placing the final output in the correct format, make it pretty or parse-able depending on the end-goal.  As each piece is perfected I place the commands into an empty script (notepad plain-text file with a “.sh” extension and the appropriate first line “#!  bash”).

As I move along I test out my script to make sure there are no special-cases that will jump out and bite me; Special cases need to be handled.  If I am dealing with intermediary files I will stop removing/regenerating them in the script, once I have them in the correct format, (toss those lines into uncalled function blocks so they aren’t lost) to speed up our testing.

Then I start to generalize things to make the script more re-usable.  I start moving variables to optional command-line arguments.  I make the script able to parse more input files.  I remove operator interaction.  And, most importantly I try to make the script more user-friendly: If the input files are in a known location I start grabbing the originals so the operator doesn’t have to copy them in all the time.  I try to make the information that could change from run-to-run encased in variables and stored at the top of the script or driven off of command-line arguments (or an external global variables script).

Then I look for ways to speed things up.  I start with the heavy hitters – removing as many forks as I can.  If the script will get a lot of usage, or if we might need quick turn-around I might start gathering timing statistics on how long chunks take to run, and then work on optimizing the logic and applying more advanced speed-up routines (yes, I have timed out how long ‘let “MYVAR += 1”’ takes compared to “MYVAR=$(( $MYVAR + 1 ))”).

And finally, even though I’ve been commenting my script all along, I do a final scrub of the comments and make sure everything is well-documented and makes perfect sense.  Even though I may be the only person to ever look at the innards of my script I want to make sure that years from now when I need to tweak things I will understand it immediately and can make the changes in seconds as opposed to hours.

So, in summary:
  1. Take a few minutes to plan things out.
  2. Practice on the command line extracting and formatting the information you need.
  3. Build up your script one chunk at a time until it is perfect.
  4. Run your script in chunks to ensure it is running correctly.
  5. Make the script more generic/reusable.
  6. Optimize for performance.
  7. Comment everything.

No comments:

Post a Comment