Sunday, November 17, 2013

Adventures in Compiling - Notepad2

I've been using Notepad2 for awhile now at work since it is an approved tool, and it's somewhat useful.  I like the single document interface and the syntax highlighting, but it has a few drawbacks; it doesn't support Bash or Cygwin scripting.

Long story short I decided to download a copy and make a few modifications.

1. Download a copy of all the source code.
The source can be found on the Notepad2 page, after unpacking it into a project directory you also have to download a copy of the Scintilla source and place it in the top level project directory (ex. Notepad2\scintilla).
Notepad2 Source Code
Scintilla Source Code

2. Choose your development environment.
Pleasant surprise - everything is in C++.  Since I'm using a Windows XP machine, and I'm cheap I opted for the free Microsoft Visual Studio 2010.  Download, Install, Register.  According to the instructions a few things need to happen first though, we need to locate the "lexlink.js" script and run it (double click).  This modifies the Catologue.cxx file to remove syntax files from the Scintilla build that we don't care about.

Open up the solution and convert it and we should be good to go.

Now before we make any modifications we need to make sure that we can get a clean compile, otherwise we'll be chasing false errors as we make changes.  It's about now that I discovered that my computer doesn't have a copy of winres.h on my system.  This isn't such a big deal, a few web searches later and I locate my Microsoft SDK - Include directory (for me it's in Program Files) and added a new file named "winres.h" with these contents:

   #include <winresrc.h>
   #ifdef IDC_STATIC
   #undef IDC_STATIC
   #define IDC_STATIC (-1)

That was easy, now I'm getting clean compiles without too much trouble.

3. Making some modifications.
Basically what I want to do is add in Syntax Highlighting for Cygwin/Bash.  I want comments Gray, Strings Green, Executed Commands Orange, Numbers Red, Keywords Blue, Variables light blue, and External Commands (from Cygwin) with a gray background.  Kind of like this:

First we need to add support for our Bash files.  That Catologue.cxx file gives us a pretty good starting point - we want to uncomment out this line: ("//LINK_LEXER(lmBash);") and while we are at it we want to add in lmBash to lexlink.js so re-running it won't mess it up again.  And then we add in the lexBash.cxx file to our Scintilla\Lexers in our project.

Now we need to add in support for our new type... searching around we run across the Styles.cxx file, this is where all of our keywords and format defaults are created.  We need to do a few things here:
- increase the NUMLEXERS variables in the header to support one more type.
- Add in our new KEYWORDLIST.
- Add in our new EDITLEXER.
- Add our new lexer to pLexArray.

I created a script in BASH to generate a list of shell built-ins and common commands.  It was easy enough to do these steps, but I quickly discovered that the Bash lexer doesn't support External Commands... only Keywords.  I added them in as a new type anyway which involved adding a new ENUM type (SCE_SH_EXTERNAL) in SciLexer.h and tossing it in as an additional case to LexBash.cxx.  We'll get back to this in a minute.

A KEYWORDLIST is an array of strings.  Our Bash Lexer reads these into wordlists for internal use.
    WordList &keywords = *keywordlists[0];
    WordList &keywords2 = *keywordlists[2]; // we added this one.

Pretty straightforward, we add the keywords to the appropriate list.  The list is handled in LexBash.cxx.  We are using the first string for BASH keywords (there are a few special cases in the lexer, so they don't need to be added).

// "if elif fi while until else then do done esac eval for case select "
"alias bg break builtin cd command compgen complete compopt continue declare dirs disown echo enable exec exit export "
"false fg getopts hash help history jobs kill let local logout mapfile popd printf pushd pwd read readarray readonly "
"return set shift shopt source suspend test time times trap true type typeset ulimit umask unalias unset wait fc", "", 

"awk banner clear df diff dirname du egrep env expr fmt fold free ftp g++ gcc grep groups gzip head hostname identify "
"import integer install ipcs join ln login look ls make man man2html mkdir mkgroup more mount mv nice od perl print "
"ps rm rmdir script sed setenv sh since size sleep sort strings stty su tac tail tar telnet tidy top touch tput tr "
"umount uname uniq unix2dos unzip uptime users vmstat watch wc whereis which who whoami xargs yacc yes zip basename "
"bash bc c++ cal cat chgrp chmod chown chroot cksum cpp crontab cut date factor file find flip flock",
"", "", "", "", "", "" };

Next we add in our EDITLEXER, the first line matches the ENUM token SCLEX_BASH, a string ID from Notepad2.rc (we added: 63022   "Bash Script"), the fourth field indicates the file types that will default to this type, and then we finally get to our styles.

Each style line gives us the ENUM cases (from LexBash.cxx) that we are applying the given style to, a string ID that describes the rule (press cntrl-F12 to change rules for Bash and this is the identifier that will show up), and our Coloring Rules in field 4.  If you want to combine multiple ENUM types to the same rules you can combine up to four of them using MULTI_STYLE.  The last line is an empty rule.

EDITLEXER lexSH = { SCLEX_BASH, 63022, L"Bash Script", L"sh; bash", L"", &KeyWords_SH, {
{ STYLE_DEFAULT, 63126, L"Default", L"", L"" },

{ SCE_SH_COMMENTLINE, 63127, L"Comment", L"fore:#808080", L""},
{ SCE_SH_WORD, 63128, L"Keyword", L"bold; fore:#0000C0", L"" },
{ SCE_SH_EXTERNAL, 63236, L"External", L"bold; fore:#4040C0; back:#C0C0C0", L"" },
{ SCE_SH_NUMBER, 63130, L"Number", L"fore:#FF0000", L"" },

{ MULTI_STYLE(SCE_SH_STRING,SCE_SH_CHARACTER,0,0), 63131, L"String", L"fore:#008000", L"" },
{ SCE_SH_OPERATOR, 63132, L"Operator", L"fore:#0000C0", L"" },
{ SCE_SH_BACKTICKS, 63229, L"Backtick", L"fore:#FF8000", L"" },
{ SCE_SH_PARAM, 63249, L"Variable", L"fore:#0080C0", L"" },
{ -1, 00000, L"", L"", L"" } } };
OK, so now we need to make some changes to LexBash.cxx to recognize External Commands.
In ColouriseBashDoc() we tell it to save off a copy of our External Keywords:
   WordList &keywords2 = *keywordlists[2];

And we Added our new SCE_SH_EXTERNAL as an additional case in the same place we handled SCE_SH_WORD.
   case SCE_SH_WORD:

At this point we can compile and test... all we need to do now is seperate the behavior for EXTERNAL and WORD.  We use the "sc.ChangeState(_ENUM_VALUE_)" to apply the rule, so we need to handle the two types.  Scrolling down to the very bottom of this case we make a minor modification - when we were going to apply the INTERNAL identifier check to see if this is an external command, if so change to SCE_SH_EXTERNAL, otherwise do what we were going to do originally.

   else if (cmdState != BASH_CMD_START || !(keywords.InList(s) && keywordEnds)) {
      if (keywords2.InList(s) && keywordEnds) {
      } else {


Compile, Run, Test... everything looked good.  This was surprisingly easy... I got the whole thing done in about two hours.  I think I've documented all of the stumbling blocks, and I feel comfortable modifying the code now.  I might try to tackle some other things in the code, we'll see.  I decided to throw this together as a tutorial in case anyone wanted to add support for their own favorite languages.  Enjoy.

Thanks to Florian Balmer for his excellent software, and for making it available to the world.

Saturday, September 28, 2013

Complete CMQA Access Dump

I've been doing a lot of work on Access Database Migration lately and a question came up about how we could provide CM/QA level auditing of the current database.  And that is what led me to coming up with the following bit of code attached to a little button in our new database.

Private Sub CMQA_Audit_Button_Click()
   On Error GoTo Err_DocDatabase
   Dim dbs As DAO.Database
   Dim cnt As DAO.Container
   Dim doc As DAO.Document

   Set dbs = CurrentDB()
   Dim OutDir As String
   OutDir = CurrentProject.Path & "\" & Format(Now(), "ddmmmyy-hhmmss")
   MkDir OutDir

   Dim Tbl As TableDef
   For Each Tbl In dbs.TableDefs\
      If Tbl.Attributes = 0 Then ' Ignore System Tables
         Application.ExportXML acExportTable, Tbl.Name, , OutDir & "\tbl_" & Tbl.Name & ".xsd"
      End If

   Set cnt = dbs.Containers("Forms")
   For Each doc In cnt.Documents
      Application.SaveAsText acForm, doc.Name, OutDir & "\form_" & doc.Name & ".txt"
   Next doc

   Set cnt = dbs.Containers("Reports")
   For Each doc In cnt.Documents
      Application.SaveAsText acReport, doc.Name, OutDir & "\rep_" & doc.Name & ".txt"
   Next doc

   Set cnt = dbs.Containers("Scripts")
   For Each doc In cnt.Documents
      Application.SaveAsText acMacro, doc.Name, OutDir & "\scr_" & doc.Name & ".txt"
   Next doc

   Set cnt = dbs.Containers("Modules")
   For Each doc In cnt.Documents
      Application.SaveAsText acModule, doc.Name, OutDir & "\mod_" & doc.Name & ".txt"
   Next doc

   Dim QryAs QueryDef
   For Each Qry In dbs.QueryDefs
      If Not (Qry.Name Like "~sq_*") Then
         Application.SaveAsText acQuery, Qry.Name, OutDir & "\qry_" & Qry.Name & ".txt"
      End If

   Set doc = Nothing
   Set cnt = Nothing
   Set dbs = Nothing

   Exit Sub

   Select Case Err
      Case Else
         MsgBox Err.Description
         Resume Exit_DocDatabase
   End Select
End Sub         

I'm still not 100% happy with the output of the raw SQL code, but it does allow us to run Beyond Compare on the output and gives us all of the components (of our rather complex database) including VB Code, SQL and Form changes.

Thursday, June 27, 2013

Automated Storytelling

One of the things that always fascinated me about AI was Automated Storytelling, or Emergent Narrative.  This is the idea that a computer can tell a compelling story.  There are a couple of approaches to this:

1. Fill in the blanks on a template-story; Think of this option as the Mad-Libs approach.  A lot of webpages use a simplified version of this for automatic plot generators, etc... Very hit-or-miss and the computer has no real understanding of the story it is creating.

2. Using grammar-type rules construct a legitimate story the same way you would construct a sentence; pick a hero, pick a compatible mission, pick a compatible obstacle, etc...

2. Create a world and characters to inhabit the world, set things loose and look for an interesting story to develop.  Think of this option as a Soap-Opera, or a Reality TV Show.  Things are very open-ended.

3. Create a world and characters, pick compelling plot-arcs and then force the characters into situations to fulfill the plot requirements.  This option can use sub-plots and back-seeding (changing previous portions of the story to support changes in the plot).  An AI construct known as fate can be used to keep the story on track, while another AI construct works on making the story dramatic, or funny, or whatever the theme of the story is.  I like to think of this option as a card game between two players (at least that's the way I'd implement it).

I've thought a lot about this topic over the years and I think I am ready to start trying a few things out.  I have started by looking over my notes and simplifying the ideas, one thing I've learned from working on Decision Aid Systems, Expert Systems, Natural Language Recognition, and doing Automatic Report Generation is that complexity rarely leads to a better result.

So, I am consolidating my 46 traits in seven spheres:
MIND: crazy romantic sensitive clever creative funny logical critical
ACTS: lazy trusting careful perceptive secretive suspicious controlling sleazy BODY: weak klutz lazy strong tense abusive
TALK: quiet follower gossip charismatic 
WORK: reckless selfish ambitious honest careful sacrificing  
LOVE: unselfish-love friends-love logical-love game-love posessive-love romantic-love 
LOOK: overweight dowdy plain athletic cute sexy glamorous exotic
into this:
MIND: ambitious, honest, romantic, player, possessive, planning
BODY: lazy, sexy, sleazy, reckless, abusive, addict
SOUL: crazy, leader, follower, gossip

I think that this offers sufficient variety, this also means Actors will have a limit of (up to) three traits each.

I'm narrowing in on a theme too, I'm going to go with deserted island (think Lost or Gilligans' Island) since it provides an easy sand-box (limited number of locations, don't have to worry about outside actors if we don't want to).

There are a number of other simplifications I'm making as well.  My hope is that this will simplify the rules engine that guides behavior.

PS - Here is an old-style set of Actors, there are some Stats that are hidden behind them as well:
Name: ____________   Sex: female
DOB: 21 JUN    AGE: YOUNG   JOB: judge (second-job)   Income: comfortable
     lucky sensitive perceptive tense
     healthy     dowdy        LOVE-STYLE: logical-love

Name: ____________   Sex: female
DOB: 21 JAN    AGE: OLD   JOB: accountant (cold-as-ice)   Income: average
      logical secretive strong quiet
     healthy     cute        LOVE-STYLE: logical-love

Name: ____________   Sex: male
DOB: 21 AUG    AGE: YOUNG   JOB: bar-tender (ugly-betty)   Income: comfortable
      sensitive careful  quiet
     healthy     plain        LOVE-STYLE: friends-love

And, here is the new version:
Name: _____________  Sex: female   Job: Engineer
Planning, Gossip

I'll post when I get a little further.

Wednesday, June 12, 2013

The Art of the Meeting

In any business there are a number of soft skills that are incredibly useful; a lot of people have talked about communication and technical writing, but I wanted to talk for a few minutes about another area that is often overlooked: Meetings.

A long time ago (when I was at Lockheed) everyone in our group was required to participate in Facilitator training (not just those running meetings), they believed that if everyone facilitates meetings they will run smoother.  My experience was very positive with meetings when everyone had the training.

The idea was that the facilitator should do everything they could to prepare the meeting to run smoothly:
- scrub the list of required vs. optional attendees. (don't waste peoples time)
- ensure all required attendees are available for the meeting. (avoid rescheduling)
- send out meeting notices well in advance (1 week+).
- include the agenda for the meeting prior to the meeting (eg. with the meeting notice). (keep the meeting focused and on-track)

Everyone in the meeting was expected to work on keeping the meeting on-track:
- fill-in for any role as needed (eg. Note taker, facilitator if you have the expertise and the facilitator isn't available at the start of the meeting)
- keep good thorough notes for the meeting, the note taker should share the minutes to all attendees after the meeting (by the next business day).
- during the meeting keep the topic on the agenda items, other issues should immediately be relegated to off-line discussion.
- once an item has been decided by group consensus be prepared to move on.

- Be on time for the meeting.
- Action Items should be re-capped at the end of the meeting to ensure they are accurate.
- Action Items should be descriptive enough such that someone not at the meeting will immediately understand what needs to be done.
- Minutes should be descriptive enough such that someone not at the meeting will understand what was discussed.
- Keep the meeting on-track, distractions such as side-bars or phone calls should not take place (or at the very least be moved outside of the room).

- ensure all resources are available and you know how to use them prior to the meeting (projectors, podium, etc...)
- don't schedule meetings if an alternative is available and appropriate. (don't waste peoples time)
   Phone Call (no documentation quick)
   In-Person Talk (no documentation, quick)
   Email Chain (provides documentation)
   Scheduled Meeting (slowest option, meeting invite and distributed minutes/action items provide documentation)

I don't mean to champion etiquette, but I've been at companies where some of these items weren't followed and it makes it very difficult to be productive.  Imagine discussing an Action Item from a month ago that says "Have Jake talk to the EP Group." or having a meeting scheduled and everyone preparing for it and then in the first five minutes of the meeting being told "we aren't going to be discussing that".

I've had it both ways; Facilitator training gets my vote.

Monday, June 10, 2013

A look at the Destroyers

Just a real quick post to a few links... many of you know that I have been working on Destroyers for a little over a decade now; first with Aegis and now with the MCS components.  Here are a couple of links showing off these ships:


They focus more on Aegis (acquiring targets and shooting missiles) and the big guns than MCS (electric plant , propulsion, damage control, fuel control, etc...), but I'm still proud of the work I do.

Monday, May 27, 2013


I ran into a bit of a tricky problem the other day and I wanted to share it with you.  I am using a program called ATRT (Automated Test and Re-Test) to test our software remotely, and we ran into a problem where two OCR (optical character recognition) extracted time stamp needs to be compared to find a time delta.  Unfortunately, in this version of software we are unable to convert the text time stamp back into a time object, and the software does not currently have a full math library (ex. Modulus function) or token splitting capabilities to determine the delta in-house.  Since it will be a at least a few weeks before we could get these features we decided to find a work-a-round; and, since ATRT can spawn child processes with custom arguments and read in the results we figured a simple Dos Shell Script was the way to go (note - powershell, unix tools, etc... will not be available on the target systems).

Simple idea, of course it's been a few years since I've been forced to use the Windows Command Prompt and the idiosyncracies can definitely lead to some frustrations.  It took about a minute to double check some syntax and write up the script and then to do the testing... first let's look at some code:

   REM > time_delta.bat <start> <stop>
   REM > time_delta.bat 2:03:04 2:05:06
   REM 122

   set MY_START=%1%
   set MY_STOP=%2%

   REM grab the hours, minutes and seconds from the start time fields 1 through 3
   REM grab the hours, minutes and seconds from the stop time  fields 4 through 6
   FOR /f "tokens=1-6 delims=:" %%a IN ( "%MY_START%:%MY_STOP%" ) DO (
     set H1=%%a
     set M1=%%b
     set S1=%%c
     set H2=%%d
     set M2=%%e
     set S2=%%f
     echo "%H2% %M2% %S2% - %H1% %M1% %S1%"

I'm going to stop right here... I ran this from the command line about a dozen times with different arguments, and it didn't work.  The idea is really simple, place the timestamp into a string of colon delimited tokens and extract each of the fields and assign them sequentially to variables starting with %a% through %f% (six fields).  If I gave it the same arguments twice in a row it would be wrong the first time, and right each subsequent time.  If I ran from the command line the arguments would not be set.

Once I understood what was going on it was easy enough to fix.  Moving my echo outside of the loop caused it to behave properly... it took about an hour to finish up the script that I thought would take five minutes because of this behavior.  Convinced that there was a pointless bug in Dosshell I went off to consult the inter-webs to find out the logic behind this, what I discovered is a bit disturbing.

Dos Shell has something called "Delayed vs. Immediate Variable Expansion".  The default behavior is for the Shell interpreter to read in all of the lines between "(" and ")" at once and apply any variable expansion prior to executing any of these lines.  So if you are doing an IF/ELSE construct and have a series of assignments they will be expanded prior to evaluating the first expression:

   IF %H1% GTR %H2% (
      REM we have had a clock rollover since we started, add 24 hours
      set /a TEMP=%H2% + 24
      set /a HOUR2=%TEMP% * 3600

In this example the TEMP variable will not be equal to "%H2% + 24", it will either be set to a previous value or not at all.  The disturbing thing is that Microsoft considers this a "feature" and not a "bug"; the documentation gives an example of how you could use a value stored in a variable and reset it to the original value all in one fell swoop... but since the value would be the same for both actions I don't see how this could ever be useful, or even do what they are saying cleanly.  The good news is that there are work-a-rounds... the bad news is that they are ugly.  The simplest method is to enable delayed expansion by setting an environment variable and then use alternate variable syntax (!VARIABLE!) in the instances where you want delayed expansion.  For me, I decided to leave my code as-is, and took a silent vow to avoid using Dos Shell in the future if any other alternative existed.

For completeness the code for the time delta is included below:

   REM > time_delta.bat <start> <stop>
   REM > time_delta.bat 2:03:04 2:05:06
   REM 122

   set MY_START=%1%
   set MY_STOP=%2%

   REM grab the hours, minutes and seconds from the start time fields 1 through 3
   REM grab the hours, minutes and seconds from the stop time  fields 4 through 6
   FOR /f "tokens=1-6 delims=:" %%a IN ( "%MY_START%:%MY_STOP%" ) DO (
     set H1=%%a
     set M1=%%b
     set S1=%%c
     set H2=%%d
     set M2=%%e
     set S2=%%f

   REM convert the start time into raw seconds
   set /a MIN1=%M1% * 60
   set /a HOUR1=%H1% * 3600
   set /a TOTAL_1=%HOUR1% + %MIN1% + %S1%

   REM convert the stop time into raw seconds
   REM NOTE: the next line needs to be outside of the IF block in order to work due to
   REM       immediate variable expansion - variables are expanded when read, but the
   REM       commands are not executed until the ending brace is reached ")".  There
   REM       are other work-arounds for this, but none of them are clean.
   set /a TEMP=%H2% + 24

   IF %H1% GTR %H2% (
      REM we have had a clock rollover since we started, add 24 hours
      set /a MIN2=%M2% * 60
      set /a HOUR2=%TEMP% * 3600
   ) ELSE (
      REM no clock rollover, so convert directly into seconds
      set /a MIN2=%M2% * 60
      set /a HOUR2=%H2% * 3600

   set /a TOTAL_2=%HOUR2% + %MIN2% + %S2%
   set /a DELTA=%TOTAL_2% - %TOTAL_1%
   echo %DELTA%

Sunday, May 19, 2013


I was reading an article ( and it got me thinking about the subject, which is near and dear to my heart.

Over the years I've worked on a lot of projects, some for only a few days others spread out over years.  I've produced a lot of helper programs, implemented huge honking scripts and programs in a wide variety of languages over a wide range of platforms.  I've even managed large chunks of code in legacy systems spanning thousands of files.  I've worked in C, Objective-C, C++, Java, SQL, Bash, Ksh, Shell, Batch, Visual Basic, Tcl/Tk, Clearcase triggers, etc...  I've worked on Windows NT through Windows 7, Several Flavors of Linux including ones with Real-Time, Solaris, HP-UX, Powermax, Powerhawk, Lynx, some older flavors of Macintosh, VXWorks, etc...

I remember a day when a coworker was modifying a script to run on a new platform and asked me for some help.  It took a few minutes to figure out what the script was doing, I saw a few places for improvement, figured out the bug she was running into  and work out a solution.  As I was reading through the code I commented that it was a good script and she should consider sharing it with the department to which she responded "You do know you wrote this script, don't you?"  It occured to me that if it wasn't already in my coding style it would have been very difficult for me to figure out what the script was doing.  It was truly unfair to have created this huge program and then expect someone else to maintain it after I had lost interest (usually within a day or so).  I made sure that I left the program well commented before handing it back.

The idea that well-written code doesn't need comments is laughable.  This might fly if you are writing a tiny program, but once you start a project of any real scale - a few hundred thousand lines of code, written in various languages, various coding styles, maintained over a few decades by different companies... Even if you are writing a small program that someone else is going to use, and probably modify a few years down the line when your favorite language is no longer the defacto standard around the office and an intern needs to make a change... It is your duty to make sure that the code is well documented with sprinkled comments.

I consider comments to be bread crumbs that will aid a developer in understanding my code.  I like to strictly follow a coding standard; it doesn't really matter too much which one, but you should follow a consistent coding standard.  If you don't have one I like to do the following:
   - each file should have a header section that describes what the file is doing at a high level.
      //    - provide a revision history section, each entry is on one line
      // DSM  3Dec2012 added support for new signal types including COUNTS and 2-Byte floats
      // DSM  7Dec2012 moved signal information to a seperate class

   - provide a list of features you would like to add at some point, along with priority for implementation
      // TO-DO LIST
      // HIGH   move initialization data to an external file
      // LOW    move printing functionality to a seperate class

   - functions should contain header information that describe what the function is going to do
   - code should frequently comment what you are trying to do (at a high level)
   - if you are not using a tool to do versioning of your code you should provide change tracking in your code (which lines were touched by change number 5?)

If your compile environment supports any kind of documentation you should be using that as well... so for Visual Studio you should be using the XML style documentation that will allow for tool-tips when you mouse over a class or variable.

A lot of people dislike comments because they feel they can lie.  This is a maintenance issue, and when discovered the comments should be updated appropriately.  This is not a reason to do away with the comments entirely.  If anything it's a call for following better coding standards on updating comments.

It doesn't matter how clean you think your code is, some day an intern will attempt to make a change and if you don't give him a few breadcrumbs to follow he will fail.  When he fails it will not be his fault for not understanding your idiosyncratic coding style.  It will be your fault for not living up to the unspoken contract all paid developers have with their employers to write good and maintainable code.

Look at my coding samples for further examples of the minimum level of coding that should be acceptable.  Your code needs to be maintainable.

Tuesday, May 14, 2013

Finding Your Passion

During commencement addresses we often get the advice to “Follow your Passions” in order to be successful.  On the surface this sounds like good, sage advice; But, many people have no clue as to what their passions may be, or even how to discover them. 

Recently, I heard an interview on NPR where a high school student started down this path in order to figure out what he wanted to do with his life.  He contacted leading academics and ended up doing lunch with several important economists in order to find out “what dreams should I follow, if I have no dreams of my own.”  This got me to thinking about what kind of advice I would offer if someone approached me.  The interview was very dissatisfying; the economists asked a lot of questions but ended up offering no real advice except to follow the money.

In life it has been my experience that many people are not introspective.  Everyone is capable of being happy, everyone has experienced joy in their lives; but if you ask people “What do you enjoy doing?” many cannot find an answer.  I wouldn’t say this is bad, but without understanding yourself better you are denying yourself the ability to pursue happiness.  Passion is the same thing; I believe we all have the capability of becoming passionate about something, but discovering that passion can be difficult.

As a starting point to introspection I recommend something along the lines of the Rokeach Values, the idea is very simple: print out the 18 terminal values (below), cut them into separate cards and place them on a table.  Move them around to identify which ones you consider important, and which ones you consider less important.  Once you have selected your “important” pile you will start moving them around and sort them into order from most important (to you) to least important.  Then you will do the same thing with the 18 Instrumental values (also below).

This is the first step to finding your passion.  The terminal values represent how you would like life to turn out; when you die what would you like to have accomplished.  The instrumental values represent the path you would like to take to get there; Would you like to be acknowledged by your peers for your imagination? Your Helpfulness?  Your Intellect?  With this information under your belt you can start to look into fields that will help you to fulfill your goals.

My next piece of advice would be to pursue a Bachelor of Arts (which forces you to take a lot of non-core classes) as opposed to a more focused degree, and to start getting involved in lots of different activities.  Most activities will probably fail to hold your interest for more than a week or so, but that is good.  The goal is to expose yourself to lots of different experiences and viewpoints and… stuff.   By doing this you will start to uncover what you enjoy, what you hate, what drives you.  And you will start to uncover who you really are.

Along the way you will, hopefully, discover your passion.  And if you don’t, then you can follow the money.

Monday, May 6, 2013

Design - Color Theory

Color Terminology

I am fascinated by User Interface design.  Over the years I have developed web pages and applications, I’ve done some rudimentary Graphic Design to create some custom backgrounds or logos.  When creating a design I like to think it through for usability, simplicity and elegance; and I have found that it is the little details that make the difference between an award-winning website and something that looks like a high-school student threw it together for an assignment.  As such, I find Color Theory a fundamental skill; and not just for Website Design or Interface Design, but as a transferable skill that can be used for presentations and gaining a deeper understanding of human psychology and physiology at some level.

To start colors have subtle, yet distinct meanings; the same way that two synonyms have different flavors.  When picking a color you should think about the kind of message you want to send:

Black – strength, professional, definite
Gray – balanced, calm
Pink – romantic, feminine, friendship, affection
Red – exciting, bold, youthful, danger
Orange – cheerful, friendly, confident, adaptability
Yellow – optimism, clarity, warmth, concentration
Green – natural, organic, calming, money
Blue – calming, dependable, clean, sincerity
Violet – loyalty, wisdom
Purple – creative, imaginative, tension, wealth

Or possibly your target audience:
Men’s Favorite Colors – Blue, Green.
Women’s Favorite Colors – Blue, Purple, Green, Red.

Colors are generally viewed two different ways: Additive – created by the mixing of light (adding red, green and blue together gives us white light), and Subtractive – created by the mixing of paints and objects (mixing cyan, magenta and yellow together gives us black, subtracting each color gets us back to white).  Additive colors (RGB) are used to describe colors on a computer monitor or website.  Subtractive colors are used to describe real world physical objects such as a poster.

If we were to set up a color-wheel with red at the top (lets’ call it 12-O’Clock or 0° position) and travelling through our rainbow through Green, then Blue and back again to Red we would see that Red, Green and Blue are equidistant from each other (0°, 120° and 240°).  These are often known as our Primary Colors and can be combined on your monitor to make white; in all honesty any three equidistant colors would work here but we like these because they line up more naturally with the way our eyes perceive colors.  Cyan, Magenta and Yellow are often referred to as Secondary colors; they can be combined to give us black ink and were chosen by the printing industry due to dye availability and cost.  Other colors are often referred to as Tertiary colors.

Warmer colors (the top half of our wheel) feel as if they are moving toward you, Cooler colors (the bottom half) feel as if they are receding.  Warmer colors work well for foreground objects while Darker colors work better for the background.  As an interesting side note there are certain types of 3D glasses that can amplify this effect and work right-out-of-the-box with many cartoons penned by traditional artists since they use rudimentary Color Theory when drawing.


Computers understand RGB (how much Red, Green and Blue to use to make our color); the computer usually stores these as hex values (a range from 0% to 100% becomes 0x00 through 0xFF (0123456789ABCDEF)).  Two Hex digits are used for each color (0xRRGGBB).  Color Theory works off of Hue, Saturation and Value though.  Let’s look at an example - the 216 Web-Safe colors come from creating all of the different combinations of 0%, 20%, 40%, 60%, 80%, 100%) values:

    Decimal   Hex      Percent
    0         0x00     0%
    51        0x33     20%
    102       0x66     40%
    153       0x99     60%
    204       0xCC     80%
    255       0xFF     100%

As you can imagine, it is difficult to determine which color is 30° away from Red (0xFF0000 or #FF0000).  In order to determine which colors to look at we need to convert to HSV (Hue, Saturation and Value). 

Hue is defined as the perceptible color corresponding to a specific wavelength of light (eg. Bright Red Violet “#CC00FF” – Violet “#9933FF” – Blue Violet “#6633FF”).  This is normally represented as a value between 0° and 360°, we will treat Red as 0°, Green as 120° and Blue as 240°.

Value is either the Lightness or Darkness of a color; depending on the conversion formula this is referred to as Tint (combined with white), Tone (combined with white/black) or Shade (combined with Black).  The exact formula you use for conversions will cause some minor variations in much the same way that choosing a Black vs. a White Movie Theater screen will cause variations (eg. Red “#FF0000” – Carnation Pink “#FF99CC” – Empire Ruby Red “#CC3333” – Deep Burnt Sienna “#660000”).

Saturation (Chroma) is the relative intensity of a chromatic color, or how far away from Gray the color is (eg. Pure Cyan “#00FFFF” – Carribean Blue “#33CCCC” – Mediteranian Blue “##339999” – Deep Blue-Green “#336666” – Charcoal Gray “#333333”).  A color is considered pure Gray-Scale if Red==Green==Blue.

So, to translate we are going to switch to some pseudo-code:
      Min = min(R, G, B)
      Max = max(R, G, B)
      Value = Max         // this is our brightness
      Delta = Max – Min
      If (Max != 0)
         Saturation = Delta / Max
         If (R == Max)      Hue = (G – B)/Delta
         Else If (G == Max) Hue = 2 + (B – R)/Delta
         Else               Hue = 4 + (R – G)/Delta

         Hue *= 60
         If (Hue < 0) Hue += 360
         Saturation = 0
         Hue = -1

Basically we are using Brightness for our Value (how close are we to white).  We could also add our R + G + B together here and get slightly different results.  It can be fun tweaking these formula to see how the results change.  Hue gives us a value between 0° and 360°; If Red is our predominant color we get a value between -60° and 60°.  Saturation is a measure of how closely the color is to Gray-Scale or how closely the three colors match in intensity.

Now that we have our Hue values we can start talking about color combinations:
Picking three colors that are side-by-side on the Color Wheel (eg. Yellow, Spring and Green at 0°, +30°, -30°) provides a harmonious balance, these are called Analogous colors.  Analogous colors are relaxing and grounding.  Neutral combinations are a little more closely related so more subtle, these are 0°, +15° and -15°.

Complimentary colors on the other hand represent a contrast, they provide energy with some discord; they are located directly across from each other on the wheel (0°, +180°)..  Split-Complimentary colors are a little easier to work with (0°, +150°, -150°) and open up a third color choice.

If you were feeling more confident you could try a Triadic Combination (0°, +120°, -120°) which provides a lot of tension.  Tetradic (0°, +90°, +180°, -90°) or even 4-Tone (0°, +60°, +180°, +240°).

Tonal Color Harmony can be obtained by adjusting the base colors Tint, Tone or Shade.  Chromatic Color Harmony is based on adjusting the Hue and Brightness while keeping the Saturation the same.

Monochromatic Color Harmony can be achieved by playing with the Hue, Tint, Tone, and Shade individually.


So, what does all of this actually mean?  If you want to design something (presentation, webpage, software, painting, etc…) you should try to pick a color that enhances your message for your base color.  After  you have picked that color you should choose a few more colors to round out your pallete.  It is best to limit yourself to four or fewer colors;  I’d recommend Analogous or Split-Complimentary colors unless you are feeling extra bold.

With this information under your belt you should be ready to create an attractive looking… whatever.

And now for some links to sites that probably describe color theory far better than I just did:

Thursday, April 25, 2013

Bash/Korn Shell Diary - Part 4

The beauty of Scipts is that you can craft data in a persistent way.  From the command line building a complicated loop, or performing large numbers of sequential tasks can be tricky.  The Command Line is for testing, a Script file is for building.  And in order to build properly you need to understand your building-blocks, this brings us to flow-control.

Normal program flow in a script is iterative, we execute one line at a time starting from the top and working our way down until we reach the end.  Bash provides us with several ways to alter the command execution.

If-Then-Else blocks are the most fundamental type of flow control.  They allow you to branch execution based on testable conditions.  Bash also gives us the “elif” (Else-If) operator so that we don’t have to use nested If/Else blocks unnecessarily.  Since I’ve been forced to work in some professional languages that don’t support this simple feature I’d like to say “Thank you Bash”.

   if [ test_conditon ]; then
   elif [ test_condition ]; then

Test conditions come in a couple of flavors:
-    File Checks, these allow you to do things like check if a file exists (“-f MyFile.txt”), see if the file is writable (“-w MyFile.txt”), see if an inode (file) is a link (“-a MyFile.txt”), see if a file is newer than another file (“myFile.txt –nt myFile.bak”).
-    Simple string comparisons: (“$MYVAR” = “TESTING”).
-    Simple math comparisons: ($MYVAR –gt 12) or ($MYVAR > 12): -lt, -le, -eq, -ne, -ge, -gt.
-    If you use double brackets you can even use and (&&), and or (||) operators.  Double brackets also expand octal and hex values when it runs across them in the standard formats (08, 0xf3).
-    When using multiple compound comparisons you can short-circuit evaluation: (“... || …” the second test will only be executed if the first test fails) and (“… && …” the second test will only be executed if the first test succeeds).  This gets complicated when short-circuiting more than two tests so make sure you test thoroughly.
-    “-a” and “-o” can also be used for “and” and “or”.
-    The colon operator is actually a valid command in Shell, and will basically do nothing in this case.  Bash expects at least one valid command to be executed in an If statement.
-     Executed command exit values (`grep "xyz" myfiles.* 2>/dev/null`).  This gives you the same benefit as checking against the exit value stored in the varialbe ($? -eq 0), but allows you to do it in one place if that kind of thing appeals to you.

Case Statements are a more convenient form of If-Else checks.  Rather than do a long series of elif's you can do simple string matches in one place:

   case $var in
      a* ) ... ;;
      [0-9] | [0-9][0-9] ) ... ;;
      * ) ;;

A few things to note:
- "case" is ended with "esac" which is "case" spelled backwards.  We saw this before with "if" and "fi".  I have nothing more to say about this.
- We can use simple regular expressions in our string matches: "a*" means a string that starts with an "a", so it would match "alf" as well as "a".  "[0-9]" is a single digit, etc...
- You can put multiple cases on a single line by using the pipe operator ("|") as an or statement.
- Our final case here was a single "*" which will match any line.
- each case statement is ended with a double semicolon (";;").  There is no flow-through.

Bash provides a couple flavors of Pre-Check loops (loops that are checked prior to running through the enclosed commands for the first time):

   for $VAR in ... ; do

   while [ ... ]; do

   until [ ... ]; do

Our For loop allows us to cycle a variable through many different states.  This is useful if you know all of the states to be evaluated prior to entering the loop.  This is a convenient way to rifle through all of the *.csv files in a directory for instance, or perform the same logic on a pre-set series of strings (tokens to be found in files for instance).

While will rifle through commands as long as a condition is true, and Until does the same thing while the condition is false (not really necessary to have both of these operators since you have the negation test... but whatever).

In a previous lesson I showed you how you could redirect standard output and error out of your scripts, but there is another trick you can use to redirect standard input into a command by using the "<" operator.  By doing the following we can rip through a file one line at a time:
   while read LINE ; do
   done < input_file.txt

Our $LINE variable now contains the full contents of each line of the file, one at a time.

There is another flow-control operator that is intended for user interaction:
   echo "Please make a selection: "
   select word in quit *.txt ; do
      echo "word is $word"
      echo "REPLY is $REPLY"
      if [ "$word" = "quit" ]; then
      echo ""

looks something like this:

   Please make a selection:
   1) quit              4) combinations.txt
   2) alchemy.txt       5) myout.txt
   3) alchemy2.txt      6) output.txt
   #? 1
   word is quit
   REPLY is 1

This structure deserves a few comments:
- Here I used the "$PS3" variable.  PS3 is a special variable that provides a prompt specifically for select queries.
- If you don't provide any choices explicitly the options given will come from the $@ variable (all arguments, quoted arguments can contain spaces).
- The text of the selected choice is stored in the variable you provided ("word" in this case).
- The number of the selection (that the user typed in) is stored in "$REPLY".
- You can also read user input one line at a time with "read MYVAR".  There are some more advanced tricks for reading a single character we'll try to get to later.
- We used the "break" keyword to jump out of one level of flow-control looping.  We could use "break 2" to jump out of two levels of nested loops if desired.

Bash also allows the user to use a more traditional For loop:
   for (( a = 1; a <= LIMIT ; a++ )); do

Here we are using roundy-braces.  We talked about these earlier when we were discussing the "let" operator.  They do have their uses.

There is one other mechanism I want to discuss here: scoping.  Although not normally associated with flow-control they do allow one special trick:
   echo "$MYVAR" >> outfile.txt
   echo "$MYVAR2" >> outfile.txt

can be replaced with:
      echo "$MYVAR"
      echo "$MYVAR2"
   } >> outfile.txt

This may seem rather minor, but by handling all of our redirection in one chunk like this we can cut down on forks, or file open/close operations and this adds up over time.  In one test run it took me 6 seconds to do 10000 file redirects.  It took many million such operations to take six seconds when the output is scoped.

That's enough for now.  We'll pick up again later.

Bash/Korn Shell Diary - Part 3

Shell Script Basics

Although the command line can be very powerful, at some point (either due to complexity or a desire for reuse) you will want to move into scripting.  A script is a series of commands stored in a plain-text file that starts with a shebang line (the special characters “#!”) followed by a path to the executable (and optional arguments) that will act as an interpreter for this script.

Script files may either be sourced (run in the current environment or shell) or they may be executed which will start them as a child process.  Sourced scripts (“source  ./”) will run in your current shell, this means that hitting an “exit” condition will exit your current shell.  Further, you run the risk of changing (and possibly corrupting) the current environment as your script changes your current environment.  These scripts will also have access to your full environment including any variables that you have set.

Executed scripts (“./”) will spawn a new shell to run in.  These tasks may be backgrounded if desired (and no user interaction is needed).  But, since this is a new shell it will not share the same environment as your command line shell.  If you want variables to be visible inside the new shell they need to be explicitly exported (“export MYVAR”) prior to running, or you can pass them in as arguments:

   #! bash
   # on all except for the shebang line a pound sign indicates a “to-end-of-line-comment”
   while [ $# -gt 0 ]; do
      let “NUM += 1”
      echo “Argument ${NUM} = ${1}”

If we have called our script with arguments we can parse those arguments out.

   $ ./ one "two three" four
   Argument 1 = one
   Argument 2 = two three
   Argument 3 = four

The shift operator allows us to rotate out some of our arguments ($2 becomes $1, $3 becomes $2, etc…).  You can skip over multiple arguments by using “shift 2” or “shift 3”.  The “$#” is a special variable that tells us how many arguments are sitting in the queue for our current scope (functions use the same methodology for handling arguments).  You can also access your arguments directly by using the “${2}”, “${3}” syntax – curly braces around our variable names used to remove ambiguity.  You can also access all arguments at once by using “$*”.

You can even go one step further and either check if expected arguments are set, or provide default values for when they are not set:
   MYVAR=${1:-default}   # the colon operator treats an empty (NULL) string as not being set
   MYVAR=${1-default}    # the dash operator “-“ says to use the default argument if not set

Although there are other ways to deal with unset arguments (such as checking if a variable is set using the “${MYVAR?}” syntax) either setting default values or forcing the operator to provide the inputs seem to work best. 


And speaking of variables, now that we are using a shell script I can tell you that they are a lot of fun to play with.  For instance if you know ahead of time that you will only use a variable in a certain way you can optimize the performance in the shell and make it easier to work with by using declare.

   declare -i MYINT   # MYINT only holds integers, if you shove a string in there it becomes a 0
   declare –u MYSTR   # MYSTR converts all letters to upper-case
   declare –l MYSTR   # MYSTR converts all letters to lower-case
   declare –r MYSTR=”YES”   # MYSTR is now read-only and cannot change

Unless otherwise specified all variables can be used interchangeably as Strings or (if they hold an appropriate value) Integers.  To assign a new value to a variable you use the “=” operator: “MYVAR=a string of text”.  And to access the variable you use the ${MYVAR} syntax.  In most cases the curly-braces are optional, but they help remove ambiguity from the shell and allow us to use some more advanced string manipulation features… so get used to them.  Things on the right hand side of your assignment are evaluated fully before assignment, so the following is legal and will append to your string: MYVAR=”${MYVAR} more text here”

Now that we have some data in our string we can start playing with them; Shell provides us with a rich set of string manipulation functions – built-in.  The following code snippet:
   MYSTRING="ab cd,ef gh,ij kl,mn op"
   typeset -u MYSTRING
   echo "Uppercase: $MYSTRING"

   typeset -l MYSTRING
   echo "Lowercase: $MYSTRING"

   echo "MYSTRING contains ${#MYSTRING} characters."

   echo "Let's change the vowels to x's: ${MYSTRING//[aeiou]/x}"
   echo "How about just the first comman to an underscore: ${MYSTRING/,/_}"

   echo ""
   echo "Characters 11-15: ${MYSTRING:10:5}"
   echo "Characters 11-end: ${MYSTRING:10}"

   echo ""
   echo "Hey is that CSV (Comma-Seperated-Values) format?"
   echo "Field 1 = ${MYSTRING%%,*}"
   echo "Field 4 = ${MYSTRING##*,}"
   echo "Field 1,2 and 3 = ${MYSTRING#*,}"
   echo "Field 2,3 and 4 = ${MYSTRING%,*}"

   echo ""
   echo "or how about this neat trick:"
   set -- ${MYSTRING}
   echo "Field 1 = $1"
   echo "Field 2 = $2"
   echo "Field 3 = $3"
   echo "Field 4 = $4"

   echo ""
   printf "And how about some pretty formatting?  %7s %-3s \n" $1 $2

produces this output:
   Uppercase: ab cd,ef gh,ij kl,mn op
   Lowercase: ab cd,ef gh,ij kl,mn op
   MYSTRING contains 23 characters.
   Let's change the vowels to x's: xb cd,xf gh,xj kl,mn xp
   How about just the first comman to an underscore: ab cd_ef gh,ij kl,mn op

   Characters 11-15: h,ij
   Characters 11-end: h,ij kl,mn op

   Hey is that CSV (Comma-Seperated-Values) format?
   Field 1 = ab cd
   Field 4 = mn op
   Field 1,2 and 3 = ef gh,ij kl,mn op
   Field 2,3 and 4 = ab cd,ef gh,ij kl

   or how about this neat trick:
   Field 1 = ab cd
   Field 2 = ef gh
   Field 3 = ij kl
   Field 4 = mn op

   And how about some pretty formatting?    ab cd ef gh
Let’s see what we can learn from this code:
-    The typeset built-in allows you to do some broad-stroke manipulation of variables, similar to declare.
-    You can count the number of characters in a string with ${#...}, empty variables have 0 characters btw.
-    You can do find and replace with the ${…//find-this/replace-with-this} syntax or even ${…//delete-this/}.
-    You can also find/replace the first matched argument with ${…/find-this/replace-with-this}.
-    Those search terms we just talked about support some level of regular expressions.
-    You can easily remove up to a pattern either from the left (“#”) or the right (“%”).
-    Variables can replace the standard arguments by using the “set -- VARIABLE” syntax.
-    The $IFS variable is a special variable that determines the default field separator for reading arguments.  It defaults to whitespace (space, tab or newline), but can be overridden for special processing.
-    Although not demonstrated here you can also make individual characters upper or lowercase by using the ${…^^uppercase-this-match}, ${…,,lowercase-this-match}, these match globally, use a single carat (“^”) or comma (“,”) to convert only the first match.
- The printf built-in provides matches the special formatting used with the C++ or Unix command formats: %-3s says I'm expecting a string that is 3 characters long, right justify.  Toss the argument at the end outside of the quotes.

By using these special tricks we can replace most of the string manipulation features that would require us to fork commands.  Forked commands run in their own special shell and require a lot of overhead, these are the major slowdown for scripts running on orders of magnitude slower.  With these tricks we can replace most of the uses for external UNIX commands like “sed”, “awk”, “cut”, “tr”, etc…

Strings can also be used in comparison checks, but more on these later:
   if [ “$MYSTRING” = “$THATSTRING” ]; then …    are the strings equal?
   if [ ! “$MYSTRING” = “$THATSTRING” ]; then …  are they not equal?
   if [ “${MYSTRING?}” ]; then …                 is this thing even set?

Integer type variables are basically strings (they can be manipulated just like strings) that happen to contain integer values (-100, 0, 123784, etc…)  This special happenstance gives us some new toys to play with:
   let “MYVAR = 12” “MYVAR += 8” "MYVAR += 16"
   echo "$MYVAR"
   let "MYVAR = (MYVAR / 3) % 7"
   echo "$MYVAR"

   let "MYVAR = 8#20"
   echo "$MYVAR"

gives us:

We can even work with some very large numbers, up to 32-bit signed values!  But, we cannot work with floats like this.
-    When using the “let” operator we are telling the shell we are going to work with variables, so we don’t need the “$VAR” syntax, you could also use the “(( … ))” syntax if you prefer, at one point some timing studies showed me the “let” operator was evaluated by the shell quicker than the round-brace operators by 3ns, and at one point that was important enough to me to make me stick with the “let” operator for awhile.
-    Multiple operations can be done on a single “let” line, each is encased in double quotes.
-    You have a rich variety of operators available to you, basically all of the C++/Java type operators using the standard order of precedence.  You can do bitwise shifting with “>>” and “<<”, you have all the normal math operators plus “%” for the remainder (modulus), and even “**” for exponents.  Use round braces for clarification or to change the normal operator precedence.
-    You can take input in other bases using the “base#decimal-value” operation for conversion.

Integers are fun… but they still aren’t floaty-numbers :-(  You can however use special techniques to simulate fixed point floats (currency for example) or convert in-and-out of hex values.  I keep special functions in source-able script files for just this type of problem.  I still wish we had better float handling though.

In Shell you also have access to Arrays.  Let’s dive right in:
   declare -a myChars
   while [ $CNT -le 12 ] ; do
      let "CNT += 1"

   for CHAR in {a..e} ; do
      let "CNT += 1"

   echo "how's this work?  ${myInts}"
   echo "nope... let's try something else:"

   while [ $I -le ${#myInts[*]} ]; do
      echo "myInts[$I] = ${myInts[${I}]}"
      let "I += 1"

   for CHAR in ${myChars[*]} ; do
      echo ">>> $CHAR"

gives us:
   how's this work?
   nope... let's try something else:
   myInts[1] = 1
   myInts[2] = 2
   myInts[3] = 3
   myInts[4] = 4
   myInts[5] = 5
   myInts[6] = 6
   myInts[7] = 7
   myInts[8] = 8
   myInts[9] = 9
   myInts[10] = 10
   myInts[11] = 11
   myInts[12] = 12
   >>> a
   >>> b
   >>> c
   >>> d
   >>> e

This shows us a couple of things:
-    Although arrays can be declared explicitly it isn’t necessary.  Bash notices when you use array syntax and marks the variables as arrays on-the-fly.  I could prove this to you if there was some way to list all arrays dynamically, oh wait, there is: “typeset –a” will list all arrays in the current shell, and by-the-way try the “-i” flag for integers, “-r” for readonly, “-f” for functions when you get the time.
-    Arrays are single-dimension so “MYARRAY[12]” is possible, but “MYARRAY[12,6]” will require a few tricks to implement in bash.
-    Arrays can be staggered, you can feel free to assign to arrays willy-nilly.  Have an array with only a 3rd index filled in if you’d like.
-    To assign to an array use “MYARRAY[5]=value” or “MYARRAY[$INDEX]”.
-    To read from an array use “${MYARRAY[5]}” or “${MYARRAY[$INDEX]}”.
-    To find the number of items in an array use “${#MYARRAY[*]}”.
-    To list all members of an array at once use “${MYARRAY[*]}”.
-    And if you do just access “${MYARRAY}” you will only get the first index of that array.
-    Oh, and we almost missed the little curly brace expansion trick we pulled up there "{a..e}", this is a special trick that fills in all of the blanks between the two ends ("a b c d e" in this case).  You can use this trick on the command line to generate tons of test files: ("touch myfile{1..5}.{txt,bat}").  This type of expansion only works in certain circumstances though, so we can't use it to cleanly assign our variables.
-    As a side note, earlier when we used the "set --" command we assigned our string to a special array.

That should be enough for one Blog... we'll pick up again in the next entry :-)

Tuesday, April 23, 2013

Bash/Korn Shell Diary - Part 2

Understanding the Command Line

Bash (I will refer to the entire family of “Ksh/Bash/Sh” as “Bash” from now on to make things simpler for me) provides two environments – an interactive command-line and a scripting engine which can parse commands from a user provided shell script file.

When running in command-line mode, Bash provides the user with a running environment.  Everything that takes place is within a single process with additional (non-built-in) commands running as child processes.  The shell provides the user with special variables that can customize the environment, and like all unix piped commands it also provides three streams (0 – standard input which defaults to the keyboard, 1 – standard output and 2 – standard error which both default to the terminal screen). 

To redirect these streams we can use the following formats:
    Commands  > output.txt    # redirect standard-output to a file
    Commands >> output.txt    # redirect to a file, appending to the end
    Commands  < inputs.txt    # take standard-input from a file
    Commands 2> /dev/null     # redirect standard-error to nothing (dump it)
    Commands  > out.txt 2>&1  # send standard-out and error to a file
    Commands | tee –a out.txt # split the output stream, append to a file
    Commands |& tee out.txt   # split the output and error stream

Bash also allows you to temporarily redirect a sequence of commands by using the following notation:
    } > output.txt

So: “>” indicates an overwrite, “>>” indicates an append, and “|” indicates a pipe – which passes the standard-output from one command to the standard-input of the next command.  Please note that using a Pipe (“|”) will fork a new process, this takes significantly longer than using shell built-in commands and is the major cause of slow-downs in scripts.

Each command returns a value.  This value can be tested against either directly in an “if” check or indirectly by looking at the return value.
    $ if `echo “DATE” | grep “DAY” | tr ‘A-Z’ ‘a-z’`; then
    >    echo “Success”
    > else
    >    echo “Failure”
    > fi

    $ echo “DATE” | grep “DAY” | tr ‘A-Z’ ‘a-z’
    $ echo “$?”

A return value of 0 indicates success, while a different value indicates failure.  This is because there is only one way for a command to succeed, but there can be multiple ways for a command to fail.  Looking at the above example we see that we received a Successful exit status from our piped command, even though we failed to produce any output, this is because piped commands (or functions) return the value of the last command executed.  By looking a little deeper we can see that our second command is the one that actually failed:

    $ echo “DATE” | grep “DAY” | tr ‘A-Z’ ‘a-z’
    $ echo “${PIPESTATUS[*]}”
    0 1 0

In these last couple of examples we’ve seen a few of the environment variables of importance to our shell:
    $? – exit status of the last executed command
    ${PIPESTATUS[*]} – exit status of the last executed series of commands executed

Some other useful variables for setting up your shell:
    $ PS1=” \[\e[33m\]\w\[\e[0m\]\n\$”
    $ PS2=”>   “

The PS1 variable sets up your command line prompt, in this case we are using special codes to change the color to brown “[e[33m]” and again to white “[e[0m]” an escape sequence to display the path “\w”, and a final newline “\n” followed by the traditional Bash-type prompt “$”.  PS2 is a special prompt that is displayed when  you are entering a multi-line subroutine (such as our compound “if” statement above).  To assign new values to our variables we use this format, to display a variable we can either dump all values (“env”) or display them using the special $variable syntax:
    $PWD      - current directory (as determined by the shell)
    $OLDPWD   - previous directory
    $PATH     - the search path for any command that is kicked out to the external shell

Some characters need to be escaped out (“\$”) because those characters can have special meaning in our shell.  When we are inside of double quotes (“”) the shell is allowed to pass over the input and do variable expansion ($MYVAR shows the value held in the variable “MYVAR”).  Other quotes have other special behaviors: Single Quotes (‘ – on the same key as a double quote) do not expand variabes, and Backticks (` - on the same key as the tilde ~) will execute the contents.


    #     |--- evaluate/execute between these ticks ----|
    WORD4=`echo “this is $TMP test” | awk ‘{ print $4 }’`
    #           |-- expand vars --|       |-- don’t ---|
    #                 in here               expand here
    echo “$WORD4”   # <-- “test”

I’m going to assume you know the basics of moving around and manipulating your environment: cd, pwd, ls, cp, mv, rm.  As a fun side-note if you ever wanted to write your own shell the “cd” and “pwd” commands are the only things you really have to implement since the shell needs to know where you are, all other commands could be forked to the outer shell (a bare minimum shell can be written in about a page of C++ code and makes for a fun topic when people ask you if you’ve ever done any shell programming).  Fortunately, Bash provides significantly more than that.

From the command line you have the ability to remember previous commands:

    $ cd ~/SCRIPTS/TEST01/RUN01
$ history                     ß list previously executed commands
2 history

$ !cd
    $ ^01^02                      ß substitute the 1st “02” for “01” in the last command I’ve run
    cd ~/SCRIPTS/TEST02/RUN01
    $ ^!!:gs/0/3                  ß substitute “3” for “2” globally in the previous command
    cd ~/SCRIPTS/TEST32/RUN31
    $ !h                          ß repeat the last command starting with this “h”
    1 cd ~/SCRIPTS/TEST01/RUN01
    2 history
    6 history

    $ !!                          ß execute the last command again
    1 cd ~/SCRIPTS/TEST01/RUN01
    6 history
    7 history

Bash also gives us some level of Job control, the ability to place jobs into the background (“command&”), examine what jobs are in the background (“jobs”), the ability to manipulate jobs by job ID (“%1”, “%2”…) as opposed to the unique process ID for that job ($$ - gives us our own PID, $! – the PID of the most recent backgrounded child) bring them to the foreground (“fg %1” - only one job can be running in the foreground at once, and this will pause your current shell environment until it has completed), and the ability to kill jobs (“kill %1”).

Bash/Korn Shell Diary - Part 1

The Cygwin Shell

Over the years I have worked with a lot of shells and each of them have their own benefits and drawbacks.  I’ve been able to work with Perl, Sh, Bash, Ksh, CShell, Make, TCL/TK and probably a few others as well.  Currently I am working in a Windows-Only environment and I have settled on Gnu Bash 4.1+ running in Cygwin as my shell of choice.  Although a lot of the tips and tricks I am using here are specific to this shell (or the “Sh” family) there are a few items, such as speeding up your scripts, that will be applicable to any scripting language; even so, if you wish to follow along I would recommend downloading Cygwin.  Cygwin brings a lot of functionality to your Windows environment and is worth learning if you are serious about becoming a Windows Power-User.

Crafting a Re-usable Script

Whenever I create a new shell script I always start by looking at a problem.  Let’s say I have information in multiple files I need to pull together to create a single Excel file; for me this is a very common scenario.   First, I try to think about my inputs and my outputs; where can I pull the information I want?  Do I need to compute or generate any information?  If information needs to be computed I need to work out how to do this…

 I copy all of the source files into one directory and then get started on the command line.  I take a few sample cases and practice extracting or generating the desired output.  I test out placing the final output in the correct format, make it pretty or parse-able depending on the end-goal.  As each piece is perfected I place the commands into an empty script (notepad plain-text file with a “.sh” extension and the appropriate first line “#!  bash”).

As I move along I test out my script to make sure there are no special-cases that will jump out and bite me; Special cases need to be handled.  If I am dealing with intermediary files I will stop removing/regenerating them in the script, once I have them in the correct format, (toss those lines into uncalled function blocks so they aren’t lost) to speed up our testing.

Then I start to generalize things to make the script more re-usable.  I start moving variables to optional command-line arguments.  I make the script able to parse more input files.  I remove operator interaction.  And, most importantly I try to make the script more user-friendly: If the input files are in a known location I start grabbing the originals so the operator doesn’t have to copy them in all the time.  I try to make the information that could change from run-to-run encased in variables and stored at the top of the script or driven off of command-line arguments (or an external global variables script).

Then I look for ways to speed things up.  I start with the heavy hitters – removing as many forks as I can.  If the script will get a lot of usage, or if we might need quick turn-around I might start gathering timing statistics on how long chunks take to run, and then work on optimizing the logic and applying more advanced speed-up routines (yes, I have timed out how long ‘let “MYVAR += 1”’ takes compared to “MYVAR=$(( $MYVAR + 1 ))”).

And finally, even though I’ve been commenting my script all along, I do a final scrub of the comments and make sure everything is well-documented and makes perfect sense.  Even though I may be the only person to ever look at the innards of my script I want to make sure that years from now when I need to tweak things I will understand it immediately and can make the changes in seconds as opposed to hours.

So, in summary:
  1. Take a few minutes to plan things out.
  2. Practice on the command line extracting and formatting the information you need.
  3. Build up your script one chunk at a time until it is perfect.
  4. Run your script in chunks to ensure it is running correctly.
  5. Make the script more generic/reusable.
  6. Optimize for performance.
  7. Comment everything.