RL and Final Project Proposals

Monday July 23, 2018 at 02:07 pm CDT

RL IRL

In supervised learning, a model is provided labeled data, trained on that dataset, and expected to accurately describe new data based on patterns learned. During training, the model tries to make its predictions more closely match the actual labels for the training data.

In reinforcement learning, there is an agent (i.e. the model), an environment the agent can make observations of, and actions the agent is able to perform. There’s no answer key to which the model can refer. Instead, during training, the agent learns what actions to take to maximize the rewards it receives. When the agent does something that helps it achieve some goal, it gets positive rewards. The agent gets negative rewards when it performs actions that do not help it achieve the goal. The agent eventually learns a policy, i.e. rules that govern the its behavior in the environment.

For instance, one might train an agent to play Space Invaders. The agent would receive rewards based on how many points it earned, i.e. the more points it gets, the more positive rewards you give it. Conversely, it would receive a negative award every time an alien blew up the laser.

The cycle that governs the agent’s behavior could be described like so: agent performs an action -> state of the environment changes -> the agent receives a reward (positive or negative) and information about the state of the environment -> the agent performs some action based on the received reward and the environment’s state -> etc. etc. -> Skynet

It should be noted that the policy learned by the agent need not be deterministic, that is, some of the agent’s decisions may be random. You wouldn’t want an agent to behave in a totally random way, but there are situations in which some built-in randomness can be helpful. If you want an agent to be able to explore an unknown space, for example, it may be useful to have it “wander” a bit. To implement this, a value ϵ is chosen which represents the probability of an agent performing a random action.

Final Project Proposal

My original goal was to do a final project in the area of reinforcement learning. I set aside weeks 7 and 8 to focus on RL. This has turned out to be far too ambitious to pull off. I picked up a copy of Maxim Lapan’s Deep Reinforcement Learning Hands-On with the intent of reading and implementing as much as possible from it in the fourteen days I gave myself. It’s unlikely that I’d be able to pick up enough about RL to make an interesting final project in the short amount of time I allotted to studying it,

I did get a chance to play around with OpenAI Gym1. Specifically, I gained some familiarity with Cart-Pole v0. In this environment, there’s a cart that can move left or right and atop the cart is a pole that can rotate around a joint in the cart. The goal is to train the model to move with the velocity required to keep the pole upright. A simple implementation, for instance, could have the cart move left when the pole is leaning left and right when the pole leans right. Of course, this policy isn’t super successful (I couldn’t get more than about 70 steps without the pole tilting over too far).

Project proposals are due at the end of the month, so I’ll need to devote some extra time to thinking about potential projects that are in areas I’m more comfortable with.

1: I would check out this notebook by Aurélien Geron, if you have any issues rendering animations in your notebook. I managed to get things working after I installed these packages apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig, per the OpenAI Gym documentation. I’m not sure which package solved my issue.


Photo by Wym Aris on Unsplash