Personal Blog

minerl1

Project MineRL

Project MineRL: Sample efficient reinforcement learning using human prior Introduction A challenge to develop a system to obtain a diamond in Minecraft using limited amount of training time. Since the task is super hard, the organizers also created a smaller problems like chopping trees, navigate to a point, obtain an iron pickaxe. In this post I am going to share my experience about solving the navigate to a point problem. For those who don’t know what Minecraft is, let me tell you about it briefly. Minecraft is a sandbox game with a 3D world in a block structure. Every object in the game is made from combination of sqaure blocks. I suggest you take a quick look at the game trailer on youtube here to get an idea what the game is about.
invopt-ex1

Inventory Optimization: MDP vs RL

Inventory Optimization is a task of maximizing revenue by taking into account the capital investment, warehouse capacity, supply and demand of stock, leadtime and backordering of stocks. This problem has been well researched and is usually presented in form of a Markov Decision Process (MDP). The (s, S) policy is proved to be a optimal solution for such problems.[s: Reorder stock level, S: Target stock level]. Markov Decision Process (MDP) provide a framework to model decision making process where outcomes are partly random and partly under the control of decision maker. The learner or decision maker is called an agent. The agent interacts with the environment which comprises of everything except the agent.
ql-dqnalgo

Deep Q-Learning

Deep Q-learning We introduce deep neural networks to do the Q-Learning, hence the name Deep Q-Learning. Instead of calculating Q-values for each state-action pair, we calculate Q-values for all actions given the state and then select the action with maximum q-value. This concept was first introduced in Playing Atari with Deep Reinforcement Learning paper. The authors show that they were able to surpass human experts on three out of seven Atari games tested using deep neural networks to solve these reinforcement problems.
img

Q-Learning

Q-Learning Q-Learning is a value based reinforcement algorithm. The idea is that we create a Q-Table which has all the states represented as rows of Q-table and actions as columns. Then for each state we would select an action which has maximum value (q-value). This means that we do not change/implement a policy that our agent will follow, instead we improve our Q-Table to always choose the best possible action. Lets take an example of Frozen Lake game. The environment is as shown in following image.
mdp

Reinforcement Learning

Introduction Reinforcement learning is a type of machine learning problem which is solved using the past experiences and interactions of the agent with the world. In other words, Reinforcement learning is mappings of situations with the available actions to maximize a numerical reward signal. Usually rewards are positive if the action taken is desirable and negative if the action is undesirable. The agent is never told which action to take or which action is optimal for a given situation, but instead it must discover on its own which actions to take that would yield maximum reward. In many cases the action taken for a given situation may affect not only immediate reward but also the next situation and may have consequences on future rewards. These characteristics of trial and error search and delayed rewards are distinguishing features of reinforcement learning.
Dataset

Toxic Comment Classifier

This project is based on Kaggle Competition: Toxic Comment Classification Challenge The challenge was to build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate better than Perspective’s current models. Comments from Wikipedia’s talk page edits were used as dataset to train models. This was my first NLP Competition on Kaggle. Since everything was new to me, I learned a lot of new concepts, terms and techniques during the competiton.