Color Recognition and Neural Networks
** You can download the demo and source code accompanying this tutorial here: color-rec.zip **
I have only recently begun to do much with Neural Networks. A few months ago, I first read about them. The idea of such a simple structure being able to learn intrigued me. The first group of articles I read about it included an example program, simulating the training of an XOR network. So, I accepted that it works to some degree. Then I asked myself to what extent could they be pushed? If they can discern whether or not A XOR B is true, and if it could be used to identify circles and other things, could it learn to "speak," (both in thesense of making audible sounds and coming up with intelligent things to say) identify colors, learn facts and not just patterns? I have not found very many things anywhere on any of these things. So, I decided to run some experiments on my own. I chose what I thought would be a simple one: The problem of color identification.
In considering this problem, I decided on the following. Rather than have the program determine exactly what color something was, I would have the computer say whether or not something was a given color. The color of choice happened to be blue. Having it state what color is actually represented is a simple extension of determining if a numeric representation of a color is or is not a particular color. I decided to write the program in Visual Basic, so that I could make a nice looking program without focusing on the graphics interface. So for those of you who know VB, I have included the source code for the final working version.
The consideration of how I should represent colors also was a difficult choice. I came up with two plans I could choose from in the end: I could either divide up the color into its red, green, and blue components and pass these components into a network employing perceptrons that can handle three inputs, or I could simply pass the number representing the color into the network as one parameter and have the other parameter be zero, and only have two inputs. I decided to try the former first, but I would end up implementing the other version as well in the second and third versions. I also determined that I would begin with a three-node network. The algorithm I chose to allow the network to improve (i.e. evolve, learn) is a genetic algorithm: I chose it because it is fairly simple, allows for easy coding and is intriguing in itself. I used a population of 5 networks that were evaluated by a fitness function, and the ones that had better fitness had a better chance of breading.
I did three versions of the same program, all three of which are included here. The first version uses the method described above. It is called "gann.exe." I will describe the other two versions as I arrive at them.
I have, in this essay, reported my failures as well as successes, for many reasons. First of all, you learn just as much when you fail as when you succeed, sometimes you learn more when you fail. Also, I hope to inform people what not to waste their time in trying.
Part I: Experimenting with The Programs
In the end, I made three different versions of the program. One has the long integer representation of the color divided by the maximum representation (the value representing white) and zero passed in as the two values to a three node network. The second has the components of the color (all divided by 255) passed into a single perceptron that takes in three values. The final version has the three components (again, all divided by 255) passed into each of the three first layer networks, which then each pass a value into the output perceptron. (The reason for the division is because it is easier to input values between 0 and 1, and dividing them by the maximum input value allows one to do this easily.)
The first version did not work very well. It ends up measuring how close the inputted number is to blue. The long integers representing white and pink are very close to the long integer representing blue, but are of course hardly blue. It was ok at measuring whether or not a color had a high blue component, not whether or not something was blue. The network was able to get close to having a zero error after enough iteration, according to the fitness function, meaning the multi-layered method was appropriate in this case. It required many iterations, however, in order to get any kind of decent accuracy, and favored in the side of the negative (with too few iterations, it would say blue was not blue). All in all, it did not work very well, and I shall not spend much time discussing it here. I have included the first program, even if it is kind of a dud, so you the reader can play around with it.
The second version only used one perceptron but did divide it up into the components. It approaches an error of zero much faster than the previous program. (One sample run of the first program had an error of about 29% after 1000 iterations, whereas the second program had an error about 20% after 1000 iterations.) It favored a too loose definition of blue; without enough iterations (1000 is too few), it would say everything is blue. Again, it did not work well, yet I have included it, so that you the reader can experiment with it. Depending on the exact fitness function (again, the fitness function will be described in more detail later) used, it would sometimes say that pink and purple and white were all shade of blue.
The third version works very well, however. First of all, it approaches an error of zero faster than either of the other ones, usually having an error of about 2.5% after 1000 iterations (in my high school science classes, the rule of thumb was that any percent error of 5% or less was acceptable), compared to the 29% and 20% of the others (both of which, according to the above rule, are unacceptable). Furthermore, when tested, this version seems to correct all the pitfalls of the other versions. It correctly identifies colors having high values of red as not being blue, even if there is a high blue component. Other versions would also say that white was a shade of blue, for similar reasons, but this version seems to correct that. It accepts higher amounts of green, however, as green when combined with blue makes green-blue which is kind of like blue, whereas red and blue just make purple. (This has something to do with the fitness function, which is described in more detail below.) The only group of colors that it is shaky at are the ones that are borderline blue. Then again, however, humans would have a certain amount of difficulty saying where the border between “blue” and “not blue” is. The computer, in this case, is acting very "human." So, what seems to be a weakness of the program, therefore, is actually a strength. The goal of artificial intelligence, is, after all, to create human like programs.
Part II: Notes on the Genetic Algorithm Approach
The Genetic Algorithm I used is pretty simple, and I will address it first, because in the three versions I did, I did not have to change it at all. In the first iteration, nodes were given random weights, and networks were composed of an appropriate amount of nodes. Five networks were created in this fashion. Their fitness function was then evaluated using a fitness function, measuring how accurately a particular network does at determining whether or not a color is what it should be. An example of a fitness function from the final, working version, looks like this:
Function EvalFitnessBlueNet(n As Network) EvalFitnessBlueNet = _ CDbl(Abs(EvalNetwork(n, 0, 0, 255) - 1) + _ Abs(EvalNetwork(n, 0, 0, 200) - 1) + _ Abs(EvalNetwork(n, 0, 0, 75) - 0) + _ Abs(EvalNetwork(n, 0, 100, 100) - 0) + _ Abs(EvalNetwork(n, 100, 0, 200) - 0) + _ Abs(EvalNetwork(n, 0, 0, 200) - 1) + _ Abs(EvalNetwork(n, 50, 100, 255) - 1) + _ Abs(EvalNetwork(n, 0, 0, 150) - 1) + _ Abs(EvalNetwork(n, 0, 255, 0) - 0) + _ Abs(EvalNetwork(n, 255, 0, 0) - 0) + _ Abs(EvalNetwork(n, 255, 255, 255) - 0) + _ Abs(EvalNetwork(n, 0, 0, 0) - 0)) / CDbl(12) End Function
Functions like these determined the error. The fitness was determined by (1-error). (In the program, I used the word fitness when I really referred to error, however "Error" is a reserved word in Visual Basic, and the two are closely related, so I could get away with calling "error" "fitness.") The fitnesses of all the networks were added together to get a fitness sum, and the probability of each becoming a parent was its fitness divided by the total fitness. Having a low error was beneficial, so therefore it became beneficial for it in some cases to evaluate to zero, and in others to evaluate to one. If in a given case, it is supposed to evaluate close to one, then one is subtracted, otherwise zero is subtracted (which actually does not do anything, but it makes the function look more complete to write it as such).
In a loop of four iterations, an array was created containing parents, and the chance of a particular network becoming a parent in each iteration was its probability. (Thus, a network might take all 4 parent positions, which mean it would have a bunch of children with itself. While biologically, this does not make sense, but fortunately, we are only using biology as a model, not as the rule.) Each network in the parent array would be either a "mother" or "father" for two "children," and so there were four children. These four children and the best of the old networks then replaced the old networks in the array, and the process was repeated a certain amount of iterations, provided by the user.
As I said earlier I chose the genetic algorithm method of evolving weights because it was both interesting and simple to code. I never realized exactly how simple to code, however. Back propagation requires that the node is evaluated for how well it does an individual problem and fixes the weights accordingly. The genetic algorithm method enables the programmer to write an "EvaluateFitness" function that does just that it determines how well the network performs across the board. Problems can arise here, however. Let us take the example of trying to get an incredibly precise machine such as a computer to "understand" the abstract and loosely defined term "blue." As when you teach a person anything, the right information must be provided, otherwise the person, or in this case the computer, may have a very warped view of what it is you are trying to teach it, as happened in my first version. It may kind of understand what "blue" is, but at the same time have an incomplete view. So, my program would often tell me that pink and purple were blue, which of course, they are not.
This problem stems from my own error: I figured that by telling a computer that blue is blue, a particular shade of blue is blue, green is not blue, red is not blue, and black is not blue, the computer would be able to figure out the rest. And, considering how little information it has, it does an excellent job for a computer, proving the potential power of neural networks. It turns out therefore that the teacher is a critical part of this process; if the teacher (user) does not do a good job of telling the computer what the computer should do, the computer can not work correctly. This sounds a little bit like programming: a low-quality programmer will get a computer program working correctly only with great difficulty, if at all. The difference here is that it turns out there is no need to tell a computer everything, as one would have to do while programming. All that is required is enough, if you will allow me to use such a vague term. Determining how much is enough, and what kind of information is enough is the difficult part.
Another concern is the amount of iterations. To train simple neural networks, like XOR, it is very easy to clearly define exactly what you want the network to do. With colors and shades of colors, it is not as easy. What is blue and gray can even be a matter of opinion, whereas TRUE XOR TRUE is always false. With XOR, increasing the amount of iterations does make it more accurate. For color determination, increasing the amount of iterations only makes the networks better at fitting the mold given by the fitness function.
This approach, however, enabled incredibly flexible programs. I reused much of this code from some code I wrote a couple months ago when I first started using this genetic algorithm. I started out simple, having a single perceptron that modeled the OR function. By changing the fitness function, I was able to easily make the same program have a perceptron that modeled the AND function, changing only two lines of code. So, by changing nothing more than the fitness function in the sample code included, one can change what it is the networks in the program identify or do. A similar changing of the fitness function could cause the network to be trained to do anything as long as the architecture could provide for it.
All in all, I like this approach. It was interesting to look at the report files that were created after a run and see how the different network evolved over several thousands of iterations, something that could not happen with back propagation. Back propagation also requires to be rewritten every time the architecture changes, or else the program must be designed in a very meticulous way. The genetic algorithm approach was flexible without the careful planning.
Part III: Notes on the Appropriate Network Architecture
So, we have established that the fitness test is crucial to getting the network to functioning as desired. However, that is not all that is required. The other requirement is a properly selected network architecture. If you try to train a network to evaluate the “XOR” function and you try to use a single node, it will never get better then 75% right. I ran into a similar problem here. When I tried a new fitness function, one that took into consideration the problems with pink and white that I noted earlier, the error never got below 40% for a particular fitness function, regardless of the number of iterations (in some cases, neural networks that use the genetic algorithm I used in these programs will hit plateaus, known as local optima, and hang around a given value for a while, and then it will break past that barrier and improve until it hits another barrier, repeating the process that is not occurring in this case, as the error would not decrease at all over the course of several thousand iterations). Just as in the single node XOR network, this likely stems from an architectural problem.
Thus, in the case of color identification, it appears to be necessary to divide a color up into its red, green, and blue components. So instead of having a user input a color, we must have a user input color components. (You can also use a simple algorithm that uses integer division and modulo division to divide up the components if it is necessary to read colors from the screen or a bitmap or whatever else.)
Also, we must re-evaluate whether or not we still need the three nodes. Consider the problem at hand: we want high amounts of blue and low amounts of green and red. This sounds as though it could be linearly separable. However, it turns out it is not, as I found out in experimenting using gann2.exe, a program that demonstrates that its architecture cannot evaluate shades of blue the way we want. This means that the single node architecture is out, meaning some kind of multi-node architecture is required.
Further trials into a particular multi-node architecture that I have tried have succeeded. The structure is each node takes in three inputs, and there are two layers: a layer of three that feeds into the output layer consisting of one node. This structure works very well, as it remedies many problems found with the single node and the non-component versions of the program. It works very well for the extreme conditions, as the single node version did, but it also handles the "gray area" much better. It does not work perfectly, however, which is probably the most desirable way of working. After all, there are a bunch of colors that might be called blue but are also very similar to green or purple, depending if green or red is added. A human would have a difficult time in figuring out if such colors are blue or not, so this program acts in a very human like manner. The program implementing this approach is gann3.exe.
Part IV: Conclusions
From the results of the tests runs, we can determine if a numerical representation of a color, we know what is necessary: to divide the number representing a color into its components and run it through some kind of multi-layered network. Trying to simplify it any other way does not work. The multi-node architecture I proposed here works well, but there may be better architectures or architectures that work equally as well. Another possible network that could be interesting is one that has a similar layout, except it has multiple output nodes, each one corresponding to a color.
One of the most interesting things about this experiment was how well it performed even though it was given so little information. The fitness function was based upon 12 colors about .00007152% of all possible test items. Yet, the only problems it runs into identifying simple colors, such as blue (as well as green, red, purple. see below) is when humans would have a hard time determining if a certain color could be called a shade of a color. This shows the power of genetic algorithms being used to evolve networks. One problem, however, is the memory demands. Using back-propagation requires only enough memory for the network itself, as well as loop variables, etc. The genetic algorithm requires a population of networks (the larger, the better my population of 5 is pretty small). Depending on the algorithm, arrays containing parents, children, and so on may also be required. This increase in memory usage is acceptable, however, because most computers can spare the extra few bytes, and increasing the size of a problem does not dramatically increase the amount of memory used.
Explanation of the Programs Included
The programs have two windows, one that allows you to create a network file (and the accompanying experiment files, described below), and one that allows you to load and test the network or node file. In the window used to create the files in the first and the final version, there is a button next to the directory input box. Clicking on this will bring up the save dialog box. Saving it in the directory you wish with whatever file name will put appropriate values in the directory box and the run name box. In the second version, an input box comes up, and you must type in a valid directory, making sure to end it with a backslash(\).
There are three files that result from each run: a file with the prefix "iter" which shows just the average fitness at each iteration, a file with the prefix "res" which shows the results in more detail, including the error for each network (or node). The final file is the neural network file, which is made by the program and contains the "best" (lowest percent error according to fitness function) network produced on the last iteration of the genetic algorithm, and it can be loaded using the tester window and can be used to test if a particular set of values represents blue. Included for use in the final, working version (gann3.exe) is a few extra networks that have been trained to determine if something is red, green, orange, or purple, as well as blue. Some of them work better than others. The source code for the fitness functions that led to each is included in the source code. I have included these to show how easily this method can be adapted to recognizing other colors
Remember you can visit the Message Store to discuss this tutorial.