You must import gym_super_mario_bros before trying to make an environment.This is because gym environments are registered at runtime. By default,gym_super_mario_bros environments use the full NES action space of 256discrete actions. To contstrain this, gym_super_mario_bros.actions providesthree actions lists (RIGHT_ONLY, SIMPLE_MOVEMENT, and COMPLEX_MOVEMENT)for the nes_py.wrappers.JoypadSpace wrapper. Seegym_super_mario_bros/actions.py for abreakdown of the legal actions in each of these three lists.
These environments allow 3 attempts (lives) to make it through the 32 stagesin the game. The environments only send reward-able game-play frames toagents; No cut-scenes, loading screens, etc. are sent from the NES emulatorto an agent nor can an agent perform actions during these instances. If acut-scene is not able to be skipped by hacking the NES's RAM, the environmentwill lock the Python process until the emulator is ready for the next action.
The random stage selection environment randomly selects a stage and allows asingle attempt to clear it. Upon a death and subsequent call to reset theenvironment randomly selects a new stage. This is only available for thestandard Super Mario Bros. game, not Lost Levels (at the moment). To usethese environments, append RandomStages to the SuperMarioBros id. Forexample, to use the standard ROM with random stage selection useSuperMarioBrosRandomStages-v0. To seed the random stage selection use theseed method of the env, i.e., env.seed(222), before any calls to reset.Alternatively pass the seed keyword argument to the reset method directlylike reset(seed=222).
In addition to randomly selecting any of the 32 original stages, a subset ofuser-defined stages can be specified to limit the random choice of stages to aspecific subset. For example, the stage selector could be limited to onlysample castle stages, water levels, underground, and more.
The reward function assumes the objective of the game is to move as far rightas possible (increase the agent's x value), as fast as possible, withoutdying. To model this game, three separate variables compose the reward:
c80f0f1006