Reinforcement Learning Specialisation

Apr 2020

I started off learning the Reinforcement Learning Specialisation to finally learn reinforcement learning formally. I have been learning it on my own till now. With no special courses in Robotics at my college, I had to make up for it with online courses.

I had started studing material on Robotics in my 3rd semester with minimal exposure to statistics and probability theory, and was never able to understand the formal proofs completely. I moved on to applications quickly, and learnt from there. In hindsight, it was a superb decision, as relearning material made me focus on things which were important from practical aspects, and also learn the maths in a more formal manner.

Fundamentals of Reinforcement Learning

The first course in this specialisation is Fundamentals of Reinforcement Learning mainly taught by Martha White, and Adam White. (I just noticed that they had the same surname).

Week 1

After a simple introduction to what is RL, the course moved on to k-armed bandit problem. It closely followed the 2nd Chapter from the Bible on RL by Sutton, and it was given as a weekly reading assignment. It was pretty simple, with introduction to action-value approximations, sample average methods, epsilon-greedy technique, incremental implementation and finally the upper-confidence-bound action selection. Finally a video by Jonathan Langford on Contextual Bandits for Real World Reinforcement Learning was given, wherein he told about the reality gap and the difficulties faced in transferring problems from the simulator to the real world.

We can largely view the bandit problem as a subset of the larger reinforcement learning problem. Hence, the first week was an introduction to this.

The quiz was pretty easy and focussed on the update rule a lot. Apart from it, there were some simple questions on exploration vs exploitation tradeoffs.

$$q_{n+1}=q_n+α_n[R_n−q_n]$$

Finally, the assignment had a Notebook for Bandits and Exploration/Exploitation which was simple to solve. I ran into some issues with np.random.seed, but I changed the seed given in the notebook to get a correct answer.

"A larger step size moves us more quickly toward the true value, but can make our estimated values oscillate around the expected value. A step size that reduces over time can converge to close to the expected value, without oscillating. On the other hand, such a decaying stepsize is not able to adapt to changes in the environment. Nonstationarity—and the related concept of partial observability—is a common feature of reinforcement learning problems and when learning online."

Week 2

Weekly reading assignment: Chapter 3.3 (pages 47-56) of Sutton's book.

Due to my half-assed attempt at IME625 (Stochastic Processes), I knew about Markov Decision Processes in much more detail than before. Glided through the book in 20 mins, and started watching the lectures.

Dynamics of MDP: $$p(s',r|s, a) = Pr(S_t =s',R_t = r|S_{t-1} = s, A_{t-1} = a)$$

The module talked about the reward hypothesis and also discusses how all type of systems don't work properly with a scalar reward function.

A special lecture by Littman was provided, wherein he talked how over the years this hypothesis has shaped itself, and gave birth to different avenues in reinforcement learning.

The week ended with discussions on MDP, the strengths and flexibility of these, and their extensive application across all domains. The final assignment was peer-reviewed where we had to write down about 3 MDP, and their descriptions.

I am very interested in the stock market as well, and would definitely go about finding more about how Markov Decision Processes (and stochastic processes in general) can be applied to this.

Week 3

It started off with an outline of what this module would entail: specifically policy, value functions and Bellman equations. I know that already so I can go through it quickly again.

Review of Specialisation

They had very interesting games each week, which allowed us to understand the concepts better in a really fun way.
The videos were very properly edited, with outline and summary of the videos.
The assignments were detailed as well. These courses are certainly a notch up than other regular Coursera online coures.