Machine Learning: Reinforcement & Prediction

This week I will be discussing reinforcement learning and Predictive Learning. This marks the departure of discussing non-learning learning algorithm and delving into the world of machine learning. There are three types of machine learning, Supervised, Unsupervised and Reinforcement. We will be looking at two of those, namely reinforcement and Predictive learning which is another way to call the unsupervised model.

Reinforcement learning is a machine learning method. Using rewards, it finetunes an algorithm and its output without providing it with readied input output data. It takes time for an algorithm to be fine-tuned this way which is why you don’t see this being used on an AI in real time. Instead, if it was used in a games application, it would have done the optimisation before its release. There are some algorithms that can run at real time but they tend to be limited in scope.

First, we will look at Q-Learning which is a type of a reinforcement learning algorithm. Q-learning uses quite a “natural” way of learning as it rewards good behaviour and punishes bad one which is how humans and many animals learn. The algorithm tries to achieve the highest score possible so it learns to avoid undesired behaviour and figure out which good behaviour works best.

The way this works is by reducing everything in the project to a state. In the case of a game this would be the world, the positions of objects, the health of characters and any information the developer would find relevant to the algorithm. Interestingly the algorithm does not need a model of the world and instead relies on the information of the states. That means that not only does it not need to understand the individual states (as long as each state has a unique identifier it would work) but that it can function entirely independent from the rest of the project. This makes it so that even in a large-scale project the

Q-learning implementation can remain simple and easy to understand. Each state would have some information about its value as a reward (normally called the Q-value). This would usually range from -1 to 1 fore simplicity but can be any values the developer chooses. When a state is triggered feedback would be given to the algorithm and this value would be used to score the current run. SARSA (State action reward state action) is a modified version of Q learning where unlike traditional Q-learning implementations it calculates the reward by looking at the next to be chosen action and the current sate instead of only looking at the next state.

On the other side, we have Predictive learning which as the name suggests tries to predict what the next action would be. N-Grams (The ne stands for the number of elements being looked at) are a popular algorithm used in natural language processing. It uses previous inputs to analyse a sequence of text and try to predict the next element, just like we see in autocomplete functions on computers. The size of N is to be carefully chosen as something too small would get too many false positives within the previous text and something too large might not get any hits at all.

Finally, Bayesian Inference is a system developed to deal with partial knowledge. Looking at the cause and effect of a system it tries to predict what the next chosen node will be. It is surprisingly simple to implement however it’s need of long training and a previous history tend to make it a non-suitable algorithm for games development.


Champandard, Alex J. (2002). Reinforcement Learning

Available from:

Recent Posts

See All

Self-Organising Maps & Clustering Algorithms

During last week’s blog post, we had a look at artificial neural networks. These were trained using data where a developer would know the input and correct output. Unlike them the algorithms we are lo