From Theory to Practice: Exploring Markov Decision Processes for Reinforcement Learning in MiniGrid
College:
The Dorothy and George Hennings College of Science, Mathematics, and Technology
Major:
Mathematical Sciences
Faculty Research Advisor(s):
Israel Curbelo
Abstract:
This project endeavors to bridge the gap between theoretical reinforcement learning (RL) frameworks and practical application, focusing on the enhancement of RL algorithms through Markov Decision Processes (MDPs) within the MiniGrid environment, part of the OpenAI Gym suite. MiniGrid, known for its modular and challenging scenarios, provides a versatile platform for exploring the capabilities and limitations of RL algorithms in navigating complex tasks. The initiative is rooted in the application of MDPs to model the environment's dynamics, facilitating the exploration of how agents can develop optimal strategies for task completion through iterative interaction, without prior knowledge of the environment's intricacies.
The exploration begins with a foundational discussion on RL and MDPs, followed by an introduction to the MiniGrid environment and the specific challenges it poses for RL agents. Subsequently, the project outlines the selection and implementation of various RL algorithms, including Policy Iteration, Value Iteration, Q-learning, and Deep Q-Networks (DQN), demonstrating their theoretical underpinnings and the motivation behind their choice.
A significant portion of the project is dedicated to experimental analysis, where we rigorously evaluate the algorithms' performance across diverse MiniGrid scenarios. This examination not only benchmarks success rates and efficiency but also explores the impact of varying reward structures and state representations on learning outcomes. Crucially, the project is committed to exploring and implementing improvements to these algorithms, aiming to enhance their adaptability, convergence speed, and overall effectiveness within the MiniGrid framework.
Our findings contribute to the RL field by offering a detailed comparison of algorithmic performance, underscored by insights into the essential factors that influence the success of RL agents in discrete, challenging environments. Moreover, by proposing and evaluating algorithmic enhancements, this study advances the understanding of how theoretical RL principles can be optimized and applied to solve practical, complex decision-making tasks.
In summary, this project not only highlights the practical applications of MDP-based reinforcement learning in the MiniGrid environment but also sets a precedent for future research aimed at refining RL algorithms for broader, more complex applications. It serves as a pivotal step towards the development of more sophisticated and efficient artificial intelligence systems capable of tackling real-world challenges.