Efficiency of Genetic-Algorithm Optimization vs Purely Random Search

As an intuitive argument against biological evolution, some argue that the organisms present in our world today are too impossibly complex to have been optimized by “random” variations in the DNA of their ancestors. But what if evolution is orders of magnitude more efficient than pure random search?

Genetic-Algorithm (GA) optimization is a computer method for finding optimal solutions to complex problems using a somewhat simplistic simulation of the evolutionary processes of natural selection, genetic crossover, and mutation. The following paragraphs describe an experiment to measure the efficiency of GA optimization versus purely random search.

The Traveling Salesman Problem (TSP) is a well-defined combinatorially difficult optimization problem commonly used to compare the efficiency of computerized optimization and search methods. The TSP attempts to find the shortest path among a set of cities, visiting each city once and returning to the starting city. The X & Y location coordinates of each city are used to determine path distances. For a TSP with thirty cities, there are approximately 265,000,000,000,000,000,000,000,000,000,000 different routes.

In the implementation of GA optimization of a TSP described here, a population of 1000 purely random individuals is selected, representing path sequences among the 30 cities. Each population member is composed of 30 randomly selected integers between 1 and 1,000,000. After the initial population is selected, each population member’s path length is evaluated by sorting the 30 random numbers from low to high, with the resulting sort order representing the sequence of the cities visited. Path distance is found by accumulating the air distances between cities in the chosen order. Fitness is defined as an inverse function of path distance, (10/(1+PathDistance))*2 in this study. Selection of parents for reproduction is random, but proportional to fitness. Thus relatively fit individuals have a higher probability of reproducing. Each pair of parents selected for reproduction has half of their genetic material exchanged, at multiple randomly selected cross-over points in the string of their 30 numbers. Both of the offspring resulting from each cross-over operation are kept and subjected to mutation; in this study, 15% of the numbers of each offspring member are randomly selected and replaced with purely random values between 1 and 1,000,000. Next, the newly crossed and mutated offspring are evaluated for fitness. The parents and the children are all kept in the population until 1000 pairs of new offspring have been added to the population. Then the population is sorted on fitness, keeping only the most-fit 1000 individual members. Selection, cross-over, mutation, and evaluation cycles continue until either a pre-set number of generations have occurred, all population members have converged to the same fitness value, or a specified shortest-possible tour length has been attained. The GA method does not guarantee to find the optimum solution every time, but the search operation can be repeated a number of times, to provide high likelihood of reaching the minimum-distance route at least once.

The above-described method for optimizing the traveling salesman problem was programmed in Dyalog APL language and run in Windows on a 2.4 GHz Intel Core i7 processor.

Over many cases, the shortest possible path was found, on average, in 21.6 seconds. The shortest path, as shown in the attached diagram, is 11005.34 miles long. (The next-shortest path is only .45 miles longer, connecting St. Louis to Kansas City to Omaha to Cedar Rapids.)

A purely random “trial-and-error” method was also programmed in the same computer environment. In this case, random sequences were selected (“dealt” from the set of numbers between 1 and 30), evaluated for path length of the sequence, the shortest path saved at each step, and the process timed for 10,000,000 cycles. On average, the best solution will be found when half of the total number of unique paths have been evaluated, or about 132,600,000,000,000,000,000,000,000,000,000 paths. The purely random program runs at a rate of 143,400 paths/second. Thus the purely random program takes about 924,900,000,000,000,000,000,000,000 seconds on average to find the shortest path, and the GA method is approximately 42,800,000,000,000,000,000,000,000 times as efficient as purely random search for finding the shortest route among 30 cities, as measured by this study.

How can we understand this huge difference in efficiency? The primary factor, as explained by John Holland, important proponent of computer-based genetic-algorithm optimization, is that there is a vast set of potential offspring that never have to be evaluated, since only the fittest parents are heavily selected for reproduction. The un-sampled individuals of each generation can be visualized as some fraction of a binary-progression: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, …. After an average of 537 generations to the best solution of the 30-city TSP, there is a huge number of un-sampled individuals.

A side note about GA efficiency in solving the TSP is that the efficiency ratio (to purely random search) increases as problem complexity increases. The GA efficiency ratio on a 30-city problem is about 150 billion times the efficiency ratio on a 22-city problem.

This study was performed as repeated trials on a single serial computer. In nature, there is as much parallelism as there are pairs of individuals in the population. So the implied conclusion is that the opportunity for optimization is beyond our intuition.

Charles E. Smith

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.