Sunday, November 17, 2013

Adventures in Compiling - Notepad2

I've been using Notepad2 for awhile now at work since it is an approved tool, and it's somewhat useful.  I like the single document interface and the syntax highlighting, but it has a few drawbacks; it doesn't support Bash or Cygwin scripting.

Long story short I decided to download a copy and make a few modifications.

1. Download a copy of all the source code.
The source can be found on the Notepad2 page, after unpacking it into a project directory you also have to download a copy of the Scintilla source and place it in the top level project directory (ex. Notepad2\scintilla).
Notepad2 Source Code
Scintilla Source Code

2. Choose your development environment.
Pleasant surprise - everything is in C++.  Since I'm using a Windows XP machine, and I'm cheap I opted for the free Microsoft Visual Studio 2010.  Download, Install, Register.  According to the instructions a few things need to happen first though, we need to locate the "lexlink.js" script and run it (double click).  This modifies the Catologue.cxx file to remove syntax files from the Scintilla build that we don't care about.

Open up the solution and convert it and we should be good to go.

Now before we make any modifications we need to make sure that we can get a clean compile, otherwise we'll be chasing false errors as we make changes.  It's about now that I discovered that my computer doesn't have a copy of winres.h on my system.  This isn't such a big deal, a few web searches later and I locate my Microsoft SDK - Include directory (for me it's in Program Files) and added a new file named "winres.h" with these contents:

   #include <winresrc.h>
   #ifdef IDC_STATIC
   #undef IDC_STATIC
   #endif
   #define IDC_STATIC (-1)


That was easy, now I'm getting clean compiles without too much trouble.

3. Making some modifications.
Basically what I want to do is add in Syntax Highlighting for Cygwin/Bash.  I want comments Gray, Strings Green, Executed Commands Orange, Numbers Red, Keywords Blue, Variables light blue, and External Commands (from Cygwin) with a gray background.  Kind of like this:


First we need to add support for our Bash files.  That Catologue.cxx file gives us a pretty good starting point - we want to uncomment out this line: ("//LINK_LEXER(lmBash);") and while we are at it we want to add in lmBash to lexlink.js so re-running it won't mess it up again.  And then we add in the lexBash.cxx file to our Scintilla\Lexers in our project.

Now we need to add in support for our new type... searching around we run across the Styles.cxx file, this is where all of our keywords and format defaults are created.  We need to do a few things here:
- increase the NUMLEXERS variables in the header to support one more type.
- Add in our new KEYWORDLIST.
- Add in our new EDITLEXER.
- Add our new lexer to pLexArray.

I created a script in BASH to generate a list of shell built-ins and common commands.  It was easy enough to do these steps, but I quickly discovered that the Bash lexer doesn't support External Commands... only Keywords.  I added them in as a new type anyway which involved adding a new ENUM type (SCE_SH_EXTERNAL) in SciLexer.h and tossing it in as an additional case to LexBash.cxx.  We'll get back to this in a minute.

A KEYWORDLIST is an array of strings.  Our Bash Lexer reads these into wordlists for internal use.
    WordList &keywords = *keywordlists[0];
    WordList &keywords2 = *keywordlists[2]; // we added this one.


Pretty straightforward, we add the keywords to the appropriate list.  The list is handled in LexBash.cxx.  We are using the first string for BASH keywords (there are a few special cases in the lexer, so they don't need to be added).

KEYWORDLIST KeyWords_SH = {
// "if elif fi while until else then do done esac eval for case select "
"alias bg break builtin cd command compgen complete compopt continue declare dirs disown echo enable exec exit export "
"false fg getopts hash help history jobs kill let local logout mapfile popd printf pushd pwd read readarray readonly "
"return set shift shopt source suspend test time times trap true type typeset ulimit umask unalias unset wait fc", "", 

"awk banner clear df diff dirname du egrep env expr fmt fold free ftp g++ gcc grep groups gzip head hostname identify "
"import integer install ipcs join ln login look ls make man man2html mkdir mkgroup more mount mv nice od perl print "
"ps rm rmdir script sed setenv sh since size sleep sort strings stty su tac tail tar telnet tidy top touch tput tr "
"umount uname uniq unix2dos unzip uptime users vmstat watch wc whereis which who whoami xargs yacc yes zip basename "
"bash bc c++ cal cat chgrp chmod chown chroot cksum cpp crontab cut date factor file find flip flock",
"", "", "", "", "", "" };








Next we add in our EDITLEXER, the first line matches the ENUM token SCLEX_BASH, a string ID from Notepad2.rc (we added: 63022   "Bash Script"), the fourth field indicates the file types that will default to this type, and then we finally get to our styles.

Each style line gives us the ENUM cases (from LexBash.cxx) that we are applying the given style to, a string ID that describes the rule (press cntrl-F12 to change rules for Bash and this is the identifier that will show up), and our Coloring Rules in field 4.  If you want to combine multiple ENUM types to the same rules you can combine up to four of them using MULTI_STYLE.  The last line is an empty rule.
 

EDITLEXER lexSH = { SCLEX_BASH, 63022, L"Bash Script", L"sh; bash", L"", &KeyWords_SH, {
{ STYLE_DEFAULT, 63126, L"Default", L"", L"" },

{ SCE_SH_COMMENTLINE, 63127, L"Comment", L"fore:#808080", L""},
{ SCE_SH_WORD, 63128, L"Keyword", L"bold; fore:#0000C0", L"" },
{ SCE_SH_EXTERNAL, 63236, L"External", L"bold; fore:#4040C0; back:#C0C0C0", L"" },
{ SCE_SH_NUMBER, 63130, L"Number", L"fore:#FF0000", L"" },

{ MULTI_STYLE(SCE_SH_STRING,SCE_SH_CHARACTER,0,0), 63131, L"String", L"fore:#008000", L"" },
{ SCE_SH_OPERATOR, 63132, L"Operator", L"fore:#0000C0", L"" },
{ SCE_SH_BACKTICKS, 63229, L"Backtick", L"fore:#FF8000", L"" },
{ SCE_SH_PARAM, 63249, L"Variable", L"fore:#0080C0", L"" },
{ -1, 00000, L"", L"", L"" } } };
OK, so now we need to make some changes to LexBash.cxx to recognize External Commands.
In ColouriseBashDoc() we tell it to save off a copy of our External Keywords:
   WordList &keywords2 = *keywordlists[2];

And we Added our new SCE_SH_EXTERNAL as an additional case in the same place we handled SCE_SH_WORD.
   case SCE_SH_WORD:
   case SCE_SH_EXTERNAL:


At this point we can compile and test... all we need to do now is seperate the behavior for EXTERNAL and WORD.  We use the "sc.ChangeState(_ENUM_VALUE_)" to apply the rule, so we need to handle the two types.  Scrolling down to the very bottom of this case we make a minor modification - when we were going to apply the INTERNAL identifier check to see if this is an external command, if so change to SCE_SH_EXTERNAL, otherwise do what we were going to do originally.

   else if (cmdState != BASH_CMD_START || !(keywords.InList(s) && keywordEnds)) {
      if (keywords2.InList(s) && keywordEnds) {
         sc.ChangeState(SCE_SH_EXTERNAL);
      } else {
         sc.ChangeState(SCE_SH_IDENTIFIER);
      }

   }

Compile, Run, Test... everything looked good.  This was surprisingly easy... I got the whole thing done in about two hours.  I think I've documented all of the stumbling blocks, and I feel comfortable modifying the code now.  I might try to tackle some other things in the code, we'll see.  I decided to throw this together as a tutorial in case anyone wanted to add support for their own favorite languages.  Enjoy.

Thanks to Florian Balmer for his excellent software, and for making it available to the world.