Sunday, April 14, 2013

Optimizing the Genetic Algorithm

Well, I took some time to optimize the GA I described earlier, it was a little dissapointing.  First let's see the non-optimized results:

Pool Size: 11          Repro Rate: 40     Mutate Rate: 40
Return Rate: 20       Best Fits: 7       Num Mutations: 5
Churn Rate: 1/11       Run for 300 seconds
Target String:   >>> Hello, World! <<<

>>>    1   Generation: 0        ( 204 to 447 )   Prodigy: > dahm=YLrwwvep < with a score of 204
>>>    3   Generation: 5        ( 181 to 436 )   Prodigy: > cd&dT433R!2?3 < with a score of 181
>>>    6   Generation: 15       ( 170 to 431 )   Prodigy: > jbgJjX/&tandW < with a score of 170
>>>   28   Generation: 75       ( 161 to 475 )   Prodigy: > mgOIi=U#!Wh~1 < with a score of 161
>>>   31   Generation: 85       ( 158 to 404 )   Prodigy: > mWlyIz2rjVKB8 < with a score of 158
>>>   36   Generation: 98       ( 148 to 422 )   Prodigy: > mWlyIz2rjomG  < with a score of 148
>>>   38   Generation: 105      ( 137 to 457 )   Prodigy: > HPltpzZRuWvF; < with a score of 137
>>>   72   Generation: 200      ( 127 to 372 )   Prodigy: > IeGhzy QNalbZ < with a score of 127
>>>   94   Generation: 261      ( 126 to 445 )   Prodigy: > IeGVo/7whHHbM < with a score of 126
>>>  107   Generation: 297      ( 123 to 460 )   Prodigy: > IeGVo9+pPgOn  < with a score of 123
>>>  119   Generation: 333      ( 114 to 479 )   Prodigy: > IeGVo9+pOKMdP < with a score of 114
>>>  277   Generation: 772      ( 111 to 399 )   Prodigy: > IeGVo91SL2Oaq < with a score of 111
>>>  301   Generation: 838      ( 262 to 502 )   Prodigy: > IeGVo91SL2Oaq < with a score of 111
     0     262   >>> TuPNI52Hdh#Id <<<
     1     272   >>> IZGRHhIzukqrP <<<
     2     291   >>> WgNdL>,A;MBuc <<<
     3     301   >>> =AwF03qv&nBPJ <<<
     4     314   >>> D 1xMZmgqB&x2 <<<
     5     333   >>> 0^We=,70jUixb <<<
     6     338   >>> j5VWZ/510f 9E <<<
     7     341   >>> n<v~FYdygWyun <<<
     8     357   >>> XZK1vAGc@HqK& <<<
     9     360   >>> #r_7d#39IF 4= <<<
    10     502   >>> ,.V/+B8&y+=vv <<<


The closest we could come in five minutes to "Hello, World!" was "IeGVo91SL2Oaq", which is a bit disappointing , especially since we have a 11 member pool with 838 generations... so over 9000 candidate solutions evaluated to get here.

To optimize I held all of the values steady except one (at a time) and modified the remaining variable slowly, and after five runs at each level I ended up with some more optimized values... Did this once again using the new values and ended up with my somewhat optimized results:

Pool Size: 5          Repro Rate: 62     Mutate Rate: 20
Return Rate: 75       Best Fits: 1       Num Mutations: 1
Churn Rate: 5/5       Run for 300 seconds
Target String:   >>> Hello, World! <<<

>>>    1   Generation: 0        ( 286 to 439 )   Prodigy: > 6gmp3Iw|lq;uX < with a score of 286
>>>    1   Generation: 1        ( 249 to 428 )   Prodigy: > aoX3UVw|lqKzp < with a score of 249
>>>    1   Generation: 3        ( 248 to 336 )   Prodigy: > agmp3IWzU+Kzp < with a score of 248
...

>>>  268   Generation: 2467        ( 4 to 84 )   Prodigy: > HeMlo. WoSld1 < with a score of 4
>>>  283   Generation: 2602        ( 3 to 74 )   Prodigy: > HeMlo. WoSld! < with a score of 3
>>>  300   Generation: 2762        ( 2 to 64 )   Prodigy: > Hello. WoSld! < with a score of 2
>>>  301   Generation: 2764        ( 2 to 55 )   Prodigy: > Hello. WoSld! < with a score of 2
     0       2   >>> Hello. WoSld! <<<
     1      25   >>> HeMlo. WoSwd! <<<
     2      26   >>> Tello. WoSld! <<<
     3      28   >>> HeMloy WoSld! <<<
     4      55   >>> 9ello. WoSld! <<<
 
A bit better... still can't hit our target string in five minutes but we got pretty darn close.  And clocking in at over 13,000 candidates evaluated... I have nothing to say.

So, what lessons did I learn through all this?

Pool Size and Return Chance seem to be the most important variables to tune at the beginning.  Favoring more generations seems to bring the most benefit, but as you start tuning other variables seem more important.  Tuning is very difficult, and based on some quick experience with the Traveling Salesman problem the tuning from one has no bearing on the other.  It appears that the better combinations resulted in more candidate evaluations.

The amount of tuning required to get even marginal results is absurd... perhaps I need to create a GA to tune the GA... but based on the number of computations involved I'm not overly impressed.  For most problems if you already know the fitness algorithm you are better off just iterating your way to a solution.

All of this has been interesting... but I'm still not convinced that it is useful. 

Next up, onto something new.

2 comments:

  1. wow, this is very impressive, have you tried any other algorithms in bash?

    ReplyDelete
  2. Hey, my first comment!!! Even my own momma wouldn't comment.

    To answer your question I've done a lot of stuff in Bash. I prefer C++/C# or Java normally, but Bash is good for quick work. I've probably done as much coding in Bash at work as I have in any other language.

    One of my favorites was an expert system that would follow a distributed (multi-computer) build along, fix problems encountered during the build (including checking out and modifying source code and dropping nodes from the list if they failed too often) and generating reports on the bulid afterwards.

    Bash seems like an under-appreciated language, so I've kind of decided to champion it ;-)

    David

    ReplyDelete