Unit 11: Reinforcement Learning 2 - Blocked vs. Motion (Intermediate Level)

Objective

This unit will explain how the neural network achieves learning with reinforcement learning method.  The effect of the weights in the neural network connection will be explained.  Ability to do Multiplication of numbers is required.

This unit is an immediate continuation of the Elementary Level of Reinforcement Learning 1 - Blocked vs. Motion.  It is assumed students have just completed Experiment 3 of the Elementary Level.

Revision - Reinforcement Learning

The principle of reinforcement learning (self learning by robot) is to encourage the robot to do certain actions by giving it rewards and punishments.  The robot will then modify its behaviour in order to maximize its level.

The faster the robot goes, the more reward it will get , but if it stops or goes backward, it will get punishment.  To maximize its level, the robot should go straight as often as possible and stop/backward as little as possible.

Neural Network and its components

Look at the neural network in your experiment 3.  Something like the below should be seen.

Stop the Learning when Level 50 is reached

- off the <learning> button

- off the <self drive> button

- keep the <exploration> button ON

Moving: click the <single step> button until the Input "Moving" is ON (with value = 1.00)

As <learning> button has stopped, the weights are not changing any more.

The weights now represent the intelligence that the robot has learned.

- click the <single step> button until the Input "Moving" is ON (with value = 1.00)

Where are the input neurons, connections and weights?

- Identify the values for I1, I2, and the weights.

-fill them into the (         ) in the below pictures.

Moving: use the weights to calculate the values of the output neurons

- Calculate

 O1 = I1 x w11 + I2 x w21

 O2 = I1 x w12 + I2 x W22

 O3 = I1 x w13 + I2 x W23

 O4 = I1 x W14 + I2 x W24

 O5 = I1 x W15 + I2 x W25

 

- which output has the highest value?  What is it?

- verify your calculation result with the output selected on the neural network screen.

- can you verify the result?

- The robot will use the one with the highest value as the next action for input state = "Moving".

 

Blocked: click the <single step> button until the Input "Blocked" is ON (with value = 1.00)

- click the <single step> button until the Input "Blocked" is ON (with value = 1.00)

- Identify the values for I1, I2, and the weights.

-fill them into the (       ) in the below pictures.

Blocked: use the weights to calculate the values of the output neurons

- Calculate

 O1 = I1 x w11 + I2 x w21

 O2 = I1 x w12 + I2 x W22

 O3 = I1 x w13 + I2 x W23

 O4 = I1 x W14 + I2 x W24

 O5 = I1 x W15 + I2 x W25

- which output has the highest value?  What is it?

- verify your calculation result with the output selected on the neural network screen.

- can you verify the result?

- The robot will use the one with the highest value as the next action for input state = "Blocked".

Discussion

When does learning happen for the Reinforcement Learning method?

Why do we switch OFF the <learning> button when it has reached level 50?

After switching OFF the <learning> button, do the weights change any more?

Do you agree the weights represent the intelligence the robot has learned during the self-learning?