A technology tutorial by Jon Tait
DotA AutoScript uses the Neuroph Java Neural Network Framework to perform image recognition from the video buffer in real-time to monitor a player’s inventory items and hero abilities while playing DotA. This information is used by DotA AutoScript to dynamically create hotkey binds and macros. Neural network technology was chosen for this task because of its ability to identify the game icons regardless of what screen resolution or graphics mode a player chooses to use. I wanted to discuss some of the technical details of this implementation for the benefit of people in the artificial intelligence community interested in working with neural networks and Java.
There are two different types of neural networks used by DotA AutoScript. The first type is a simple binary image classifier. It uses Neuroph’s “Multi-Layer Perceptron” class to model a neural network with an input neurons layer, one hidden neurons layer, and an output neurons layer. Exposing an image to the input layer neurons causes the output layer neurons to produce the probability of a match for each of the images it has been trained to identify; one trained image per output neuron. The output is “binary” in the sense that an image should be considered a complete match, or not at all. DotA AutoScript uses a threshold of 80% probability to confirm a match. Showing the neural network a picture of Paris Hilton will probably result in nearly zero percent matches for each output neuron, but showing it a picture of Aghanim’s Scepter may register as much as 90% match for the Dagon output neuron unless we teach it the difference, which will bring it down below the 80% threshold.
The second type of neural network is an analog Multi-Layer Perceptron network with two hidden layers, and only a single neuron in the output layer. This neural network is used to identify the state of “cooldown” on any inventory item or hero ability that has passed the 80% threshold of the first neural network. An item which is cooling down cannot be used, and so should be rejected until has completely cooled down. This network is considered “analog” because its output is a range between 0 and 1. An output close to 1 means the game icon has quite a bit of cooldown time remaining, and a ready game icon will cause an output of 0. The following animated gif shows phase boots cooling down:
Amazingly, this second type of neural network only needs about a dozen item or ability training samples like the phase boots shown above in order to learn what cooldown behavior looks like and how to recognize it on items and abilities it hasn’t seen before. Through trial and error, I have found that the second hidden layer of neurons is vital to the neural network’s ability to “understand” cooldown images. I attribute this to the “analog” output we demand from it. Remember, in the first type of neural network we only need yes or no “boolean” answers if a game icon was recognized, so a single hidden layer was found to be adequate in trial and error neural network topology experiments.
Since a player can choose from many different screen resolutions, DotA AutoScript uses percentage coordinates rather than pixel coordinates for reading game icons on the screen, and each screen observation is down-sampled to a common 16×16 pixel resolution thumbnail.
A 16×16 thumbnail contains 256 pixels, with each pixel describing 3 color channels (red, green, blue), making a total of 768 individual pieces of color information within a range of 0-255 (768 bytes). Thus, each image processing neural network must have an input layer composed of 768 neurons. This may sound like too large of a neural network, but the other layers in the network are much smaller, and we see in practical use that 768 input neurons is not prohibitively large for “real-time” calculations, especially when a caching scheme is added to the implementation. Each input neuron maps to a specific pixel and color channel, such that it inspects the same “location” in any image the neural network is exposed to. How the order of the color information is mapped to the order of the input neurons is unimportant, but what is important is that the mapping is consistent for every image that is exposed to the neural network.
Each input neuron requires that input values be formatted as a decimal falling in the range of 0-1. DotA AutoScript converts each byte of color information into this format by taking the color channel value between 0 and 255 and dividing it by 255, resulting in a decimal “fraction”, with each possible value consuming an equal range in the 0-1 spectrum.
A library of labeled training images was constructed using the following specifications for each game icon that needed to be recognized:
- A stock game icon from the official website
- An image of the game icon take from the game while running at 1024×768, 32 bit color, high texture detail
- An image of the game icon take from the game while running at 800×600, 16 bit color, medium texture detail
Further, game icons that were not needed by DotA AutoScript, but were likely to be encountered, were also included in the library as negative samples. This prevented most cases of false-positively identifying a hero power or inventory item. In other words, the neural networks were also taught what not to identify.
The image libraries were then loaded into Neuroph’s “Tile Classification Suite” and used to create and train the appropriate neural networks. The smaller libraries may train in an hour or less time, whereas the largest image library can take 10 hours to train using momentum backpropagation, and even longer if regular backpropagation is used. After satisfactory connection weights have been discovered across the entire neural network, it is then serialized to a .ann file. After all the needed networks have been created via this process, they are packed into a jar file that is placed on the classpath for DotA AutoScript. DotA AutoScript deserializes the neural networks upon launch, and is able to use them, with the learned connection weights perfectly preserved, for real time image recognition.