The Cygwin Shell
Over the years I have worked with a lot of shells and each of them have their own benefits and drawbacks. I’ve been able to work with Perl, Sh, Bash, Ksh, CShell, Make, TCL/TK and probably a few others as well. Currently I am working in a Windows-Only environment and I have settled on Gnu Bash 4.1+ running in Cygwin as my shell of choice. Although a lot of the tips and tricks I am using here are specific to this shell (or the “Sh” family) there are a few items, such as speeding up your scripts, that will be applicable to any scripting language; even so, if you wish to follow along I would recommend downloading Cygwin. Cygwin brings a lot of functionality to your Windows environment and is worth learning if you are serious about becoming a Windows Power-User.http://www.cygwin.com
Crafting a Re-usable Script
Whenever I create a new shell script I always start by looking at a problem. Let’s say I have information in multiple files I need to pull together to create a single Excel file; for me this is a very common scenario. First, I try to think about my inputs and my outputs; where can I pull the information I want? Do I need to compute or generate any information? If information needs to be computed I need to work out how to do this…I copy all of the source files into one directory and then get started on the command line. I take a few sample cases and practice extracting or generating the desired output. I test out placing the final output in the correct format, make it pretty or parse-able depending on the end-goal. As each piece is perfected I place the commands into an empty script (notepad plain-text file with a “.sh” extension and the appropriate first line “#! bash”).
As I move along I test out my script to make sure there are no special-cases that will jump out and bite me; Special cases need to be handled. If I am dealing with intermediary files I will stop removing/regenerating them in the script, once I have them in the correct format, (toss those lines into uncalled function blocks so they aren’t lost) to speed up our testing.
Then I start to generalize things to make the script more re-usable. I start moving variables to optional command-line arguments. I make the script able to parse more input files. I remove operator interaction. And, most importantly I try to make the script more user-friendly: If the input files are in a known location I start grabbing the originals so the operator doesn’t have to copy them in all the time. I try to make the information that could change from run-to-run encased in variables and stored at the top of the script or driven off of command-line arguments (or an external global variables script).
Then I look for ways to speed things up. I start with the heavy hitters – removing as many forks as I can. If the script will get a lot of usage, or if we might need quick turn-around I might start gathering timing statistics on how long chunks take to run, and then work on optimizing the logic and applying more advanced speed-up routines (yes, I have timed out how long ‘let “MYVAR += 1”’ takes compared to “MYVAR=$(( $MYVAR + 1 ))”).
And finally, even though I’ve been commenting my script all along, I do a final scrub of the comments and make sure everything is well-documented and makes perfect sense. Even though I may be the only person to ever look at the innards of my script I want to make sure that years from now when I need to tweak things I will understand it immediately and can make the changes in seconds as opposed to hours.
So, in summary:
- Take a few minutes to plan things out.
- Practice on the command line extracting and formatting the information you need.
- Build up your script one chunk at a time until it is perfect.
- Run your script in chunks to ensure it is running correctly.
- Make the script more generic/reusable.
- Optimize for performance.
- Comment everything.
No comments:
Post a Comment