Youtube was recently suggesting to me videos of people training NEAT neural networks for video games. I've noticed that often the training process was quite slow (for example in this Trackmania example).
Is there a way (algorithmic approach or an idea) to easily simulate video games, without actually rendering the pixels on screen and making the training much quicker? In addition to that, if you also know of a tool that does that, please, share it with me.
When the environments have been set up via a shared library, like Open AI's Gym, then the internal logic for deciding rendering (or not) will be different for each environment, but there may be a standardised config/method argument to determine whether the game engine is rendered to screen. That could apply to nearly all the environments available in the library. You should check the documentation if for example you are trying out NEAT on some Atari games that have been ready packaged for use with computer agents.
It is not uncommon for game mods to hijack or replace DLLs, including those of the graphics API. If you cannot manipulate the engine itself, then perhaps you can inject an alternative graphics API implementation that consists of stubs and mockups.
Trackmania (2020) is a racing game, created by Nadeo and released by Ubisoft, in which players drive a car from a start to a finish through a fixed number of checkpoints, and aim to do so as quickly as possible. The largest regular event in the game is Cup of the Day (CotD), in which several thousand players are given fifteen minutes to learn a map and set the fastest time they can. Divisions are created based on the best times recorded by each player, with Division 1 hosting the fastest 64 players, Division 2 hosting the next fastest 64, and so on. Within each division, players race on the track for multiple rounds, and the slowest players each round get eliminated. The last man standing wins the division, and the winner of Division 1 wins that Cup of the Day.
The earliest attempt we know of at a self-driving car in Trackmania was from 2017, by a German software developer named Andreas Rottach (going by Rottaca), who published the code on GitHub. The idea was supervised learning: take examples of cars driving well, take screenshots throughout the game, label the screenshots with the keys (left, right, or neither) which were being hit at that moment, and train the network to hit the appropriate key when given the appropriate screenshot.
Learning from existing gameplay is nice, but even better is unsupervised learning, so that the program can teach itself without needing human guidance on whether or not a given program is good. To that end, Yann Bouteiller and Edouard Geze, from the Polytechnique Montral in Canada, created not just a single program but a full unsupervised learning pipeline for Trackmania 2020, called TMRL. Like Rottaca, Yann and Eduoard created a convolutional neural network to parse input data - but this time, without directly training that network from human runs. Instead, a single human run was done to create a reference trajectory to create rough estimates of where the track was and where the car needed to go. The car drove on its own, and was rewarded based on how much of the track it covered in a given amount of time.
And in the spirit of human-computer rivalry, Mr. Bouteiller and Mr. Geze were featured on the French show Underscore_, in which their program - trained for 200 hours in the LIDAR environment - competed on a track against professional players. To quote the results:
Not exactly superhuman performance, quite yet - but certainly better than playing randomly, or smashing into walls. And while their Trackmania Roborace League has not yet gotten much traction, the framework they created was built on and improved on by others.
Nienders concluded that this was due to the difference in the information available. Sophy had information about the track curvature of the upcoming 6 seconds of track, based on the current speed. TMRL, however, only had distance measurements from the LIDAR. While the TMRL program could plan for the next turn, it could not plan two turns ahead, and this fundamentally limited the program to mere safe driving, avoiding walls and crashes, but never optimizing.
So, he implemented that same track curvature lookahead in Trackmania. By driving a track and using LIDAR measurements to find the track edges, he created a pair of Bezier curves representing the two borders of the track, and then trained the neural network to take a segment of that curve as the input for the car.
Programmer AndrejGobeX, from Slovenia, also built off the TMRL library, but improved it in a different direction. Where Neinders added track curvature lookahead, AndrejGobeX kept the LIDAR-type input, but changed the learning algorithm. Andrej used both supervised learning - like Rottaca five years earlier - and the Soft Actor-Critic Model that TMRL used, as well as implementing a genetic algorithm called NEAT (without much success), the Proximal Policy Optimization algorithm, and the Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm.
Using deep Q-learning, PedroAI set up an impressive training loop. Three collector agents drive tracks, using a neural network with the game state (both the current screenshot and data such as position and velocity) as the input, and produces an output estimating the total reward the car is expected to experience throughout the rest of the run for each possible keystroke. The reward being the speed of the car at each moment (a tenth of a second) in the direction of a reference human-driven trajectory, plus a bonus for actually reaching the finish line.
But, even though the training has thus far consistently plateaued on all the various attempts at and tweaks of training, PedroAI proved that the concept of a generalized Trackmania-playing program can work. A neural network can play better than randomly across more than a hundred tracks. His program is the best covered thus far, and as of the start of 2024 remains by far the best generalized Trackmania-playing program.
A year later, he released a new program, with very different implementations to what everyone else had done. Instead of processing the image with LIDAR or with convolutions, he processed it with a technique known as Variational Auto-Encoding. Resizing the screenshots to be 64x64 allowed him to not only store the images in color, but also to provide the last eight frames into a neural network, rather than just the most recent frame by itself; this got a program almost to a gold medal on a three-lap endurance track, stymied only by a bad reward value. By 2021, he had moved to TM2020, where his program could come within 0.3 seconds of the world record on a training map, without access to any game information besides the screen and the car speed - no information about gear, about surfaces, tire contact, acceleration, or even where on the map it was.
Bluemax666 eventually set aside the project, having achieved what was (in 2021) the best results in Trackmania machine learning for both TM2 and (in limited test tracks) TM2020; results that were competitive with human amateurs. Over the next two years, two others would take the next step forward, getting results not just competitive with human amateurs, but better altogether.
The Linesight project, on Trackmania Nations Forever, is at the start of 2024 the most advanced Trackmania machine learning project made public. Using a convolutional neural network to interpret the screen as a 160x120 greyscale image, and with a small twist on the by-now standard reward function (the distance traveled along a reference trajectory over the next seven seconds), the network is able to train on any map, with no preprogrammed knowledge of track curvature or geometry.
With that kind of generality, the Linesight developers did not want to merely run their program on custom-built tutorial tracks. They wanted a challenge. And the challenge that presented itself was ESL-Hockolicious, a map created in 2008 that has since become one of the most hunted maps in all of TMNF. One minute long, ludicrously optimized, with a variety of tricks, turns, and jumps: anybody, whether human or machine, who got a good time on this map, would need a broad understanding of tech maps and low-speed maneuvering in Trackmania. So, how well did the program do?
After eighty hours of training (training at 9x speed, so roughly a month of equivalent playing for a human player), the program achieved a time of 54.06, a time not achieved by human players until the map was six years old. This placed it at what would be (in the no-shorctuts category) a tie for 20th place on the global leaderboard - out of millions of attempts by thousands of skilled racers, in a map far from trivial to set a good time in. The Linesight project proved that reinforcement learning can get nearly professional-level performance on real maps. And, as we enter the scene in 2024, that is the state-of-the-art.
Trackmania: the world's most competitive racing game. A series of nightmares before the dream could begin: packet losses, Linux to Windows to Linux, VNC and screen-sharing, OpenPlanet and plugins....
2015 Day 6: Turn regions of a 1000x1000 grid of lights off and on again, and count how many are on at the end. After optimizing our brute-force, we explore other potential improvements and speedup...
HATETRIS is the world's hardest version of Tetris; it's Tetris that hates you. In this post, we chronicle our eleven-month journey to get the world record, and some of our wrong turns along the wa...
Experience the thrills of racing and the joy of creation with Trackmania! Choose among three levels of access to discover all the game has to offer and dive into the most compelling remake of the legendary Trackmania Nations.
c80f0f1006