Finding and simulating a set of policies to transfer learning in multi-agent reinforcement learning settings
Course of study:
Artificial Intelligence
Kind of thesis:
Theoretical analysis and Numerical Simulation
Programming languages:
Python
Keywords:
Multi-agent Reinforcement Learning, Transfer Learning, Successor features, Multi-objective environments
Problem:
The VDN algorithm [1] tackles cooperative multi-agent reinforcement learning with a joint reward signal, made complex by vast action and observation spaces. Some problems can be segmented into tasks with distinct reward functions, integrating easily into standard reinforcement learning.
Successor features (SF) separate environmental dynamics from rewards, while generalized policy improvement (GPI) considers multiple policies. Together, they facilitate cross-task information exchange in the RL framework [2].
The UneVen algorithm integrates SF with VDN to refine the policy exploration process [3]. In certain environments, SFs can combine policies to identify optimal solutions without added environment interactions [4], emphasizing the parallels between transfer learning via SFs and multi-objective optimization in RL.
This project aims to transfer knowledge between cooperative multi-agent reinforcement learning tasks, specifically determining the optimal policy set from [4]. An initial approach might tweak the VDN algorithm [1], drawing inpiration from [2], [3], and [4].
In the top figure, two agents must choose the path that gives more return depending on rewards given for collect triangles of different colors. In the bottom figure, a schematic representation of the algorithm that suggests how to add new policies to the set of optimum policies by solving different tasks.
Goal:
Suggest and run simulations of a new cooperative multi-agent reinforcement learning algorithm with a joint reward signal that combines the ideas of VDN, successor features, and the strategy to find a set of optimum policies as described in [4].
Preliminary work:
[1] proposes an algorithm to solve cooperative multi-agent reinforcement learning problems. [2] investigates how to do transfer learning using SF and GPI, and [4]’s work focuses on the algorithm to find the set of policies to deliver the optimum solution when using SF and GPI.
Tasks:
This project can include
Read the literature on SF, GPI, MARL, and multi-objective environments.
Choose a multi-agent environment and run some simulations using VDN.
Slightly modify a multi-agent environment to treat it as a multi-objective problem.
Propose an algorithm that combines VDN, SF, and the strategy used in [4] to identify the set of optimal policies.
Run simulations with the proposed algorithm and assess the results.
The final tasks will be discussed with the supervisor. Please feel free to get in contact.