AI with Python - Reinforcement Learning

In this section, you will learn in detail about the concepts reinforcement learning in AI with Python.

Basics of Reinforcement Learning

This type of learning is utilized to reinforce or strengthen the network based on critic information. That is, a network being trained under reinforcement learning, receives some feedback from the environment. However, the feedback is evaluative and not instructive as in the case of supervised learning. Based on this feedback, the network performs the adjustments of the weights to obtain better critic information in future.

This learning process is similar to supervised learning but we might have very less information. The following figure gives the block diagram of reinforcement learning −


Building Blocks: Environment and Agent

Environment and Agent are principle building blocks of reinforcement learning in AI. This chapter discusses them in detail −


An agent is anything that can perceive its environment through sensors and acts upon that environment through effectors.

  • human agent has sensory organs such as eyes, ears, nose, tongue and skin parallel to the sensors, and different organs such as hands, legs, mouth, for effectors.

  • robotic agent replaces cameras and infrared range finders for the sensors, and different motors and actuators for effectors.

  • software agent has encoded bit strings as its programs and actions.

Agent Terminology

The following terms are more frequently utilized in reinforcement learning in AI −

  • Performance Measure of Agent − It is the criteria, which decides how successful an agent is.

  • Behavior of Agent − It is the action that agent performs after any given sequence of percepts.

  • Percept − It is agent’s perceptual inputs at a given instance.

  • Percept Sequence − It is the history of all that an agent has perceived till date.

  • Agent Function − It is a map from the precept sequence to an activity.


A few programs operate in an entirely artificial environment confined to keyboard input, database, computer file systems and character output on a screen.

In contrast, some software agents, such as software robots or softbots, exist in rich and unlimited softbot domains. The simulator has a very detailed, and complex environment. The software agent needs to choose from a long array of activities in real time.

For example, a softbot designed to scan the online preferences of the client and display interesting items to the client works in the real as well as an artificial environment.

Properties of Environment

The environment has multifold properties as examined below −

  • Discrete/Continuous − If there are a limited number of distinct, clearly defined, states of the environment, the environment is discrete , otherwise it is continuous. For example, chess is a discrete environment and driving is a continuous environment.

  • Observable/Partially Observable − If it is possible to decide the complete state of the environment at each time point from the percepts, it is noticeable; otherwise it is only partially observable.

  • Static/Dynamic − If the environment doesn't change while an agent is acting, then it is static; else it is dynamic.

  • Single agent/Multiple agents − The environment may contain different agents which may be of the same or different kind as that of the agent.

  • Accessible/Inaccessible − If the agent’s sensory apparatus can have access to the total state of the environment, then the environment is accessible to that agent; else it is inaccessible.

  • Deterministic/Non-deterministic − If the next state of the environment is totally determined by the current state and the actions of the agent, then the environment is deterministic; else it is non-deterministic.

  • Episodic/Non-episodic − In an episodic environment, each episode consists of the agent perceiving and then acting. The quality of its action depends just on the episode itself. Subsequent episodes don't depend on the actions in the previous episodes. Episodic environments are much simpler because the agent doesn't need to think ahead.


Constructing an Environment with Python

For building reinforcement learning agent, we will utilize the OpenAI Gym package which can be installed with the assistance of the following command −

pip install gym

There are different environments in OpenAI gym which can be utilized for different purposes. Few of them are Cartpole-v0, Hopper-v1, and MsPacman-v0. They require various engines. The detail documentation of OpenAI Gym can be found on

The following code shows an example of Python code for cartpole-v0 environment −

import gym
env = gym.make('CartPole-v0')
for _ in range(1000):


You can construct different environments in a similar way.

Constructing a learning agent with Python

For building reinforcement learning agent, we will utilize the OpenAI Gym package as appeared −

import gym
env = gym.make('CartPole-v0')
for _ in range(20):
   observation = env.reset()
   for i in range(100):
      action = env.action_space.sample()
      observation, reward, done, info = env.step(action)
      if done:
         print("Episode finished after {} timesteps".format(i+1))


Observe that the cartpole can adjust itself.

Input your Topic Name and press Enter.