Download: MLBF - Project & Builds

For my bachelor I wanted to try something with AI and machine learning. I decided to create a fairly complex AI of a boss fight entirely with machine learning. Furthermore I wanted to test if it could be a viable strategy to utilize real world techniques to train the AI since teaching a child or a dog a new behaviour or skill is pretty similar to reinforcement learning. And, last but not least, I wanted to know if it is reasonable to create an AI in this manner when being in a small team.

TLDR: I am satisfied with the project itself, but the final AI is a hilarious disaster. Still, the project had some interesting results which could be developed further.

In the following, I will give a short overview about what has been done in order to create the project, but if you want to inform you in higher detail feel free to download and read the Bacherlor Thesis.

Keeping the game simple

In order to create the game itself we tried to mimic the well established formula of Dark Souls. Since the thesis of me and my partner Daniel Schellhaas had nothing to do with great gameplay, but more with efficiency, we wanted to invest as little time as possible in the game itself. So we decided to create a boss fight.

The boss fight allowed us to concentrate on a small aspect of a normally bigger game. There is no inventory or levelling system which has to be taken into account when creating a souls like. We did not manage to find someone to create sound files for us, so we decided to create the game without it.

In this game everything is done through animations. The buttons of the controller start an animation. While the animation plays, the next action can be queued and colliders for damage are activated (in the case that the knight or the monster is attacking) or deactivated (when the knight is dodging an attack).

When a character is hit, the weapon deals damage which is modified by a multiplier of the first collider that has been entered. This triggers a lerp in the UI to show the viewer what is going on.

To have the AI control the character, a controller is simulated. Basically the machine learning algorithm is trained to press the buttons in this game while the presented information changes. Thus the game can be taken over by a human player at any time.

The ability to “see” also was an important factor. For this, certain variables like distance to the wall, life and transformation of the enemy, attacks and so on and so forth are collected by a raycast based vision system. A predefined number of rays are cast 360° into space to set the variables in the agent.

Creating reinforcement learning trainings from real life examples

In my research I took the time to read about the way children and dogs learn. From that I wanted to deduce a training for the machine learning algorithm. After reading quite a few of academic and non academic sources, I decided to settle with two essential trainings for skills children and dogs learn.

Long before children go to school they are exposed to the written language and learn the recognition of logos. In school they learn to separate these logos which are read as a whole into smaller pieces by learning to pronounce each letter individually. Then they learn to put the pieces back together to form words and finally read whole texts.

The result was to split the training into different phases in which the machine learning agent (the training AI) is learning the game. In the first phase the agent is playing the game as a whole for some time (This has been removed in the final training, since there was no inherent learning value in this phase). The second phase represents the most interesting part.
Every action the agent can take is separated and trained individually. The agent is only rewarded for the specific actions it has to learn.
Then, some of these actions get combined to form a more complex task until finally the full game is played.

Click to enlarge

The result was a sort of dojo where multiple agents learn the basics of the game at the same time. This is possible because each agent that is spawned feeds into the same brain which ultimately is used in the final product.

The other example is the house-training of a dog. When training a dog to do its business, the main principle is rewarding wanted behaviour, while gradually taking away space to to mistakes. So when a dog is introduced into its new home, the place is setup with pads where it can do its business anywhere. These pads get taken away over time while taking the dog out with a pad for it to do the business on. Finally there is only one pad left which is used to lure the dog outside when going for a walk.

So, a training was built where in the beginning everything is rewarded and over time the rewards get taken away. The agent is confronted with the whole game from the beginning and a really simple AI is fighting against it.

Failing is a part of the process

These trainings where then tested for seven days. Which is a very small training time considering other ambitious projects. The result was unusable. The house training AI was like it is on a rampage without any regard for the position or even presence of the enemy. While the child training had an interesting effect: It learned exactly what was trained, but unfortunately it only came as far as three trainings. So it learned to rotate to a certain direction, moving in that general direction and finally attacking. While this does not sound like it could be used as is, it sounds like something that can wield great results when administered properly and by taking the time to develop it further.