10 Introduction to Gym

11 Gymnasium

Standard library for RL tasks
Abstracts complexity of RL problems
Provides a plethora of RL environments: from classic control tasks to more complex environments like Atari games.

11.1 Key Gymnasium environments

Some key environment examples are:

CartPole

Agent must balance a pole on a moving cart and prevent it from falling
https://gymnasium.farama.org/environments/classic_control/cart_pole/

MountainCar

Agent must drive a car up a steep hill
https://gymnasium.farama.org/environments/classic_control/mountain_car/

FrozenLake

Agent must navigate a frozen lake with holes, finding a safe path across it
https://gymnasium.farama.org/environments/toy_text/frozen_lake/

Taxi

Picking up and dropping off passengers, focusing of efficient tour planning and task management
https://gymnasium.farama.org/environments/toy_text/taxi/

11.2 Gymnasium interface

No matter the environemnt, the Gymnasium library offers an unified interface for interaction. This interface includes functions and methods to:

Initialize environment
Visually represent environment
Execute actions
Observe outcomes

Let’s explore them using the CartPole environment.

Interacting with Gymnasium environments

In the following snippet we:

Create the environment, by calling the gym.make function and passing the id of the environment CartPole along with render_mode equals rgb_array allowing us to visualize the states using matplotlib. Other render modes exist.
Initialize the environment and return the initial observation along with some auxiliary information, by calling the method env.reset. The seed argument can be used to ensure reproducibility.
Print the intial observation.

import gymnasium as gym

env = gym.make('CartPole', render_mode='rgb_array')
state, info = env.reset(seed=42)
print(state)

[ 0.0273956  -0.00611216  0.03585979  0.0197368 ]

/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/gymnasium/envs/registration.py:531: UserWarning: WARN: Using the latest versioned environment `CartPole-v1` instead of the unversioned environment `CartPole`.

  logger.warn(

The observation as an array represents the environment state, including the position and velocity of both ther cart and the pole.

For other environments, we’ll need to consult the Gymnasium Documentation to understand the details of their states.

Visualizing the state

To get a visual representation of the state, the env.render method returns the state image that we can visualize using the plt.imshow function. Then, by calling plt.show, a snapshot of the environment will be displayed.

import matplotlib.pyplot as plt

def render():
    state_image = env.render()
    plt.imshow(state_image)
    plt.show

render()

Performing actions

In the CartPole environment there are two possible actions:

\(0:\) moving the cart to the left
\(1:\) moving the cart to the right

To execute an action we call env.step and pass the chosen action. This method return five values:

The next state
The reward received
A terminated signal, indicating wether the agent has reached a terminal state such as achieving the goal or loosing
A truncated signal, showing wether a condition like a time limit has been met
Info which provides auxiliary diagnostic information useful for debugging

We omit truncated and info in our script for simplicity and we print the first three return values after moving the cart to the right.

action = 1
state, reward, terminated, truncated, info = env.step(action)

print("State: ", state)
print("Reward: ", reward)
print("Terminated: ", terminated)

State:  [ 0.02727336  0.18847767  0.03625453 -0.26141977]
Reward:  1.0
Terminated:  False

We see the state changed and the agent received a reward of \(1\), since it hasn’t reached a terminal state yet.

Interaction loops

Suppose we want to keep pushing the cart to the right until a termination condition is met and monitor the environment. We add the previous code in a while loop and render the environment at each iteration.

while not terminated:
    action = 1  # Move to the right
    state, reward, terminated, _, _ = env.step(action)
    render()

Here are some chosen plots showing the cart movements to the right.

11.3 References

DataCamp, Reinforcement Learning with Gymnasium in Python